PDA

View Full Version : Help needed to convert unicode characters



hybrid_snyper
3rd July 2012, 12:25
Hi it's been a while, new job new challenge and getting stuck into Qt goodness for a living.

I am having some trouble with a QString that is filled with a JSON structure, in amongst that there are some french words and the accented characters in those words are being replaced with encoding looking like /00e8. I have tried the obvious using toUtf8 even converting the QString into a const char* and trying to use QString::fromUtf8, but still qDebug prints the encoded character in unicode form.

Am I doing doing something fundamentally wrong here? :confused:

Thanks

H

ChrisW67
4th July 2012, 00:26
This works fine on my Linux box.


#include <QCoreApplication>
#include <QDebug>

int main(int argc, char **argv)
{
QCoreApplication a(argc, argv);
QString test1 = QString::fromUtf8("\xc3\xa8"); // è encoded in UTF8
QString test2(QChar(0x00e8)); // or directly as Unicode code point number
qDebug() << test1 << test2;
// outputs "è" "è"
return 0;
}


You need to be very clear exactly what is in the source data, and what encoding it is in etc., and what only appears in the output because of something you have done or limitations of the output device.

If the JSON string actually contains the six characters '\', 'u', '0', '0', 'e', '8' then that is a very different thing again.

hybrid_snyper
4th July 2012, 10:26
Thanks, I thought I'd better give some context in the way the JSON is used. I have tried converting raw back to a const char* and using QString::fromUtf8 but that doesn't work either.

As you can see the QString does contain \u00e9, I guess what is going on here would be like if "\n" was treat as two separate bytes and printed as '\' 'n'.


struct JSONProcessor : public RPCRequest::Processor
{
virtual QVariant process( const QString& raw )
{
// This is what raw looks like
// "{\"data\":[{\"id\":\"535666\",\"readable\":true,\"title\":\"B\u00e9linda\"}]}";

//this doesn't work
//qDebug() << QString::fromUtf8(raw);

return someProcessing( raw );
}
};

ChrisW67
5th July 2012, 03:00
OK. Nothing in the standard Qt bag-o-tricks will do this for you. You need to disassemble and un-escape the escaped elements, e.g. \" or \u00e9, yourself using QRegExp, QString etc. (see http://www.ietf.org/rfc/rfc4627.txt) or use something like QJson (http://qjson.sourceforge.net) (LGPL) to do it for you. I would go the second route unless licences prohibit you.

hybrid_snyper
24th August 2012, 15:20
Just thought I would close this thread off and point to the solution that helped me. Turned out the API I was talking to was returning a 'unicode' string representation of special characters and not unicode.

Anyways this helped http://www.qtcentre.org/threads/37315-Qt-Unicode-Problems