PDA

View Full Version : How to use QTextCodec



lni
19th December 2013, 22:01
Hi,

I have a following string

"²¹Ðĺ£?#xce;"

What does this mean? How can I use QTextCodec to convert it to readable characters?

Thanks!

ChrisW67
19th December 2013, 22:16
Exactly how many "characters" are in that string? I think the forum is playing with your input. I see literally this:


"²¹Ðĺ£?#xce;"
But is suspect you pasted some smaller number of "gibberish" characters.

What does this mean?
It is impossible to say out of context. This is part of the reason that Unicode is so useful.

How can I use QTextCodec to convert it to readable characters?
Impossible to know without knowing where the source material originated and what encoding was used.

lni
19th December 2013, 22:37
Sorry, don't know what more information I should give.

It is supposed to be some Chinese characters. I think each block of ";" is a character. Normally they are readable Chinese characters that I can display in QTextEdit, but then I am hit by this unreadable string, and can't display it properly in QTextEdit.

The encoding is supposed to be GB2312...

ChrisW67
19th December 2013, 23:41
Do you literally have "²¹Ðĺ£?#xce;" in your string? Am I correct in assuming the "?" is a typo and should be "&"?

Edit: The "&#xhh;" is an HTML/XML escaped byte with hexadecimal ("x") value of hh.

GB2312 is a character set that can be encoded several ways. The "GB18030" QTextCodec is what you probably want to use:


#include <QtCore>

int main(int argc, char **argv)
{
QCoreApplication app(argc, argv);
QByteArray encodedString = QByteArray::fromHex("b2b9d0c4baa3ce");
QTextCodec *codec = QTextCodec::codecForName("GB18030");
QString string = codec->toUnicode(encodedString);
qDebug() << string;
// Outputs three Chinese characters:
// "补心海"
return 0;
}

Note that the last byte in the input string is an incomplete character. Does the result look correct?

lni
20th December 2013, 10:56
Wow! Thanks! It is "补心海拔".

But now in some place I am getting normal string like "补心海拔", in other place I get that unreadable string. How can I tell in my program to handle both cases correctly?

Edit 1: "?" is not a typo, I copy straight from what I get.

Edit 2: It should be 4 Chinese characters, but I only get 3 with your code...

Edit 3: even worse than this. I am getting mixed string like this "X?#xf8;æ ‡", which should be “Xåæ ‡â€. I suppose the user has broken database. But the user don't care, they blame me! So I have to find a workaround.

Edit 4: further investigation, I found that the "?" represents missing string, in this example, "?" should be "&#xb0;&", so the complete string should be (I think) "&#xb2;&#xb9;&#xd0;&#xc4;&#xba;&#xa3;&#xb0;&#xce;". The it gives "补心海拔".

With all these mess, what should I do? Just tell user that they have bad database and ask then to fix their database? I think they would not be happy to hear that...


Thank you!

lni
21st December 2013, 04:14
OK,

The user agrees it is their database problem. Thank you ChrisW67!