Results 1 to 10 of 10

Thread: Qt Unicode Problems

  1. #1
    Join Date
    Sep 2010
    Location
    Germany
    Posts
    28
    Thanks
    1
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows Symbian S60

    Default Qt Unicode Problems

    Hello folks,

    I've got a problem and I really don't know how to solve this.

    For instance, I've got a QString with a german umlaut build with unicode like this:

    QString str = "\u00fc"; // works got "ü" as german umlaut
    If I have a text file which contains the unicode characters qt does not transform it to german umlauts. example:

    Qt Code:
    1. QFile f("test.txt");
    2. f.open(QIODevice::ReadOnly);
    3. QString str = f.readAll();
    To copy to clipboard, switch view to plain text mode 


    Can anyone explain me why? I have tested the codecs example with my text file and again he does not replace the characters and I have used QtextStream and QTextEncoder classes with no success. Anyone can help ?

  2. #2
    Join Date
    Mar 2008
    Location
    Kraków, Poland
    Posts
    1,536
    Thanked 284 Times in 279 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Qt Unicode Problems

    From QFile doc : By default, QFile assumes binary, i.e. it doesn't perform any conversion on the bytes stored in the file.
    So use QTextStream and read carefully about QTextStream::setCodec

  3. #3
    Join Date
    Sep 2010
    Location
    Germany
    Posts
    28
    Thanks
    1
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows Symbian S60

    Default Re: Qt Unicode Problems

    I have already tried this class and I know that I have to use UTF-8 but it still does not convert me the strings.

    Well,
    even if i launch the codecs example with my text file it does not convert the string.

  4. #4
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: Qt Unicode Problems

    This code:
    Qt Code:
    1. #include <QtCore>
    2. #include <QDebug>
    3.  
    4. int main(int argc, char *argv[])
    5. {
    6. QCoreApplication app(argc, argv);
    7.  
    8. QString test = QString::fromStdWString(L"\u00fc");
    9.  
    10. QFile data("test.txt");
    11. if (data.open(QIODevice::WriteOnly)) {
    12. QTextStream out(&data);
    13. out.setCodec("UTF-8");
    14. out << test;
    15. }
    16. data.close();
    17.  
    18. if (data.open(QIODevice::ReadOnly)) {
    19. QTextStream in(&data);
    20. in.setCodec("UTF-8");
    21. QString str;
    22. in >> str;
    23. qDebug() << str << test;
    24. }
    25.  
    26. }
    To copy to clipboard, switch view to plain text mode 
    happily prints "ü" "ü". It writes this UTF-8 encoded file containing U+00FC:
    Qt Code:
    1. $ od -tx1 test.txt
    2. 0000000 c3 bc
    3. 0000002
    4. $ cat test.txt
    5. ü
    To copy to clipboard, switch view to plain text mode 

  5. #5
    Join Date
    Sep 2010
    Location
    Germany
    Posts
    28
    Thanks
    1
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows Symbian S60

    Default Re: Qt Unicode Problems

    Yes that works for sure, but i still have a problem because in my main application I get a QNetworkReply which contains the data (with \uXXXX).


    Qt Code:
    1. QString test = QString::fromStdWString(L"\u00fc");
    To copy to clipboard, switch view to plain text mode 

    And this only works because your compiler does the byte conversation for you.

    So in my case I get a string which contains those \uXXXX unicode characters and I convert them using a function I have wrote which works pretty simple:

    • Get the hex value
    • Use QByteArray::fromHex()
    • Replace \uXXXX with the byte array


    This method works for German umlauts and some more things but the problem is it does not work with the cyrillic alphabet and now I really don't know what to do there:

    Here is a snippet :

    Qt Code:
    1. QByteArray strReply = m_pSearchReply->readAll();
    2.  
    3. bool buni = strReply.indexOf("\\u");
    4. if (buni) {
    5. do
    6. {
    7. int idx = strReply.indexOf("\\u");
    8.  
    9. QByteArray hex = strReply.mid(idx+2, 4);
    10. hex.replace("0", "");
    11. hex = QByteArray::fromHex(hex);
    12. strReply.replace(idx, 6, hex);
    13.  
    14. } while (strReply.contains("\\u"));
    15. }
    16.  
    17. [...]
    To copy to clipboard, switch view to plain text mode 

    So anyones now what I am doing wrong with the unicode character sets?

  6. #6
    Join Date
    Mar 2008
    Location
    Kraków, Poland
    Posts
    1,536
    Thanked 284 Times in 279 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Qt Unicode Problems

    I think You are doing wrong nothing. UTF character representation is TWO binary bytes. Something like \uXXXX is not UTF binary character. This is an ASCII representation.
    Are You sure that this codes are correct ?

  7. #7
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: Qt Unicode Problems

    Quote Originally Posted by Sven View Post
    Yes that works for sure, but i still have a problem because in my main application I get a QNetworkReply which contains the data (with \uXXXX)
    The literal \u00FC identifies a Unicode code point. Despite being expressed in hex, it does not necessarily dictate the actual bits that are used to represent that character.

    Read http://www.joelonsoftware.com/articles/Unicode.html for more

    The real question is, "How is the Unicode character encoded in the stream you are receiving?" The encoding dictates how you should try to get the character(s) into a QString. So, get your stream into a QByteArray and dump it in hex and see what you have.

    If you are receiving a 0xC3 byte followed by a 0xBC byte then it is UTF-8 encoded.
    If you are receiving a 0x00 byte followed by a 0xFC byte, or in the reverse order, then it is probably UTF-16/UCS-2 encoded.
    If you are receiving a three 0x00 bytes followed by a 0xFC byte, in a few possible byte orderings, then it is probably UTF-32/UCS-4 encoded.

  8. #8
    Join Date
    Sep 2010
    Location
    Germany
    Posts
    28
    Thanks
    1
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows Symbian S60

    Default Re: Qt Unicode Problems

    I'll check it out later. But what can I do with the QString if i know which encoding the string has got?

    Edit: Ok I have checked it and it seems that there is no encoding because I cannot find any UTF-X Header in that QNetworkReply.
    Last edited by Sven; 27th December 2010 at 16:59.

  9. #9
    Join Date
    Sep 2010
    Location
    Germany
    Posts
    28
    Thanks
    1
    Thanked 4 Times in 4 Posts
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows Symbian S60

    Post Re: Qt Unicode Problems

    Ok finally i found a solution for my problem. I wrote my own string class which converts the unicode which is needed in my special case.

    Anyways i think it's not the fastest way and still not the best solution but if anyone got a better method, let me know..

    Qt Code:
    1. class QscGrooveString : public QString
    2. {
    3. public:
    4. QscGrooveString(const QString &aData = QString())
    5. : QString(aData) {
    6.  
    7. if (contains("\\u")) {
    8. do {
    9. int idx = indexOf("\\u");
    10. QString strHex = mid(idx, 6);
    11. strHex = strHex.replace("\\u", QString());
    12. int nHex = strHex.toInt(0, 16);
    13. replace(idx, 6, QChar(nHex));
    14. } while (indexOf("\\u") != -1);
    15. }
    16. }
    17. };
    To copy to clipboard, switch view to plain text mode 

  10. #10
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: Qt Unicode Problems

    By UTF-x header I assume that you mean the optional byte order marker which is just another Unicode code point. The absence of a BOM tells you nothing about the stream of bytes that follow, but you need to know about that stream in order to make sense of it. The presence of a BOM allows you to make a guess at the encoding and byte order where that is relevant.

    You are receiving a string "\\u00FC" which is not a Unicode character in some UTF-encoded form, but a series of six characters. No wonder you couldn't make sense of it. What is sending this across the network?

    Your solution will work for the Unicode Basic Multilingual Plane, i.e. code points below U+10000, but not the supplementary code points. This might be adequate depending on purpose (it includes the Cyrillic code points).

    You could save a few string operations:
    Qt Code:
    1. class QscGrooveString : public QString
    2. {
    3. public:
    4. QscGrooveString(const QString &aData = QString())
    5. : QString(aData)
    6. {
    7. int idx = -1;
    8. while ( ( idx = indexOf("\\u") ) != -1 ) {
    9. int nHex = mid(idx + 2, 4).toInt(0, 16);
    10. replace(idx, 6, QChar(nHex));
    11. }
    12. }
    13. };
    To copy to clipboard, switch view to plain text mode 

Similar Threads

  1. Unicode strings int Qt 4.2.1
    By mkrentovskiy in forum Qt Programming
    Replies: 12
    Last Post: 29th December 2011, 09:02
  2. Problems to compile (Unicode)
    By valy12 in forum Qt Programming
    Replies: 6
    Last Post: 11th April 2010, 11:37
  3. Unicode
    By qtuser20 in forum Qt Programming
    Replies: 0
    Last Post: 28th September 2009, 21:43
  4. Printing Unicode?
    By auba in forum Qt Programming
    Replies: 2
    Last Post: 4th June 2009, 15:24
  5. Problems with Unicode(UTF8)
    By cristiano in forum Qt Programming
    Replies: 15
    Last Post: 5th December 2006, 12:33

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.