Results 1 to 7 of 7

Thread: unicode

  1. #1
    Join Date
    Dec 2007
    Location
    Groningen Netherlands
    Posts
    182
    Thanks
    16
    Thanked 2 Times in 1 Post
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default unicode

    Hello,
    I want to load a file with unicode:

    The file:
    Qt Code:
    1. Никифорова Сэндэма Сохондо, Читинская обл.,
    2. сон Москва, Московская обл.,
    To copy to clipboard, switch view to plain text mode 

    I made this function:
    Qt Code:
    1. void loadfromfile(QStringList *list, QString file, bool unicode)
    2. {
    3. QFile File(file);
    4. if (File.open(QFile::ReadOnly))
    5. {
    6. QTextStream in(&File);
    7. if (unicode)
    8. in.setCodec("UTF-16");
    9. QString line;
    10. do
    11. {
    12. line = in.readLine();
    13. if (!line.isEmpty())
    14. list->append(line);
    15. }
    16. while (!line.isNull());
    17. File.close();
    18. }
    19. }
    To copy to clipboard, switch view to plain text mode 

    Yet when I load the file I am still not seeing the right characters:

    Qt Code:
    1. loadfromfile(&list, filename, true);
    2. foreach (s, list)
    3. {
    4. msg.setText(s);
    5. msg.exec();
    6. }
    To copy to clipboard, switch view to plain text mode 

    Thanks for help.

  2. #2
    Join Date
    Jan 2008
    Location
    Alameda, CA, USA
    Posts
    5,230
    Thanks
    302
    Thanked 864 Times in 851 Posts
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: unicode

    Run it in the debugger, set a breakpoint on line 12 in loadFromFile() and inspect what "line" contains when the readLine() call returns. Is it what you expect?

  3. #3
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: unicode

    And make sure the file is indeed UTF-16, not some other unicode, e.g. UTF-8 or UTF-32, or not UTF at all.

    Cheers,
    _

  4. #4
    Join Date
    Dec 2007
    Location
    Groningen Netherlands
    Posts
    182
    Thanks
    16
    Thanked 2 Times in 1 Post
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: unicode

    Thank you both.

    I suspect it indeed has to do with the encoding. Problem, I don't know what it is, it was generated by some software as export data and I like to import it.

    If you like to try, I attached "test.txt", it has Russian chars in it.
    test.txt

    ps simple code to demonstrate:
    Qt Code:
    1. QFile file("test.txt");
    2. file.open(QIODevice::ReadOnly | QIODevice::Text);
    3. QTextStream in(&file);
    4. QString s = in.readLine();
    5. msg.setText(s);
    6. msg.exec();
    7. file.close();
    To copy to clipboard, switch view to plain text mode 
    Last edited by JeanC; 11th May 2015 at 11:57.

  5. #5
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: unicode

    First part of file in hex
    Qt Code:
    1. 0000000: cd e8 ea e8 f4 ee f0 ee e2 e0 20 d1 fd ed e4 fd .......... .....
    2. 0000010: ec e0 20 20 20 20 20 4a 41 4e 20 30 39 2c 20 31 .. JAN 09, 1
    3. 0000020: 39 35 36 30 36 3a 30 32 3a 34 38 20 41 4d 20 20 95606:02:48 AM
    4. 0000030: 20 20 2d 30 39 3a 30 30 31 31 32 45 33 32 27 30 -09:00112E32'0
    5. 0000040: 30 22 35 31 4e 34 39 27 30 30 22 d1 ee f5 ee ed 0"51N49'00".....
    6. 0000050: e4 ee 2c 20 d7 e8 f2 e8 ed f1 ea e0 ff 20 ee e1 .., ......... ..
    7. 0000060: eb 2e 2c 20 0d 0a f1 ee ed 20 20 20 20 20 20 20 .., .....
    8. 0000070: 20 20 20 20 20 20 20 20 20 20 20 20 20 46 45 42 FEB
    To copy to clipboard, switch view to plain text mode 
    From the absence of a 16-bit byte-order-mark (either 0xfe 0xff or 0xff 0xfe), an 8-bit encoding.
    From the space chars (hex 0x20) and digits (0x30 - 0x39) either an eight-bit or UTF-8 encoding, not UTF-16 which would have a zero byte associated with each of these bytes.
    From the line endings, 0d 0a, Windows
    First few bytes not valid UTF-8 encoding.

    Looks like Windows CP1251 encoded Cyrillic to me.
    First three bytes == first three characters: Ник

    Same file encoded UTF-8 test-utf8.txt
    Last edited by ChrisW67; 11th May 2015 at 13:40.

  6. #6
    Join Date
    Dec 2007
    Location
    Groningen Netherlands
    Posts
    182
    Thanks
    16
    Thanked 2 Times in 1 Post
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: unicode

    Thanks Chris, (the thanks button is not working on the forum)
    With that file and setCodec("UTF-8") I get correct texts.
    Problem is I have to deal with the file as it is. Is there any way I can read it at all?
    If not, shrug.. unicode can be such a mess sometimes..

  7. #7
    Join Date
    Apr 2013
    Location
    Prague
    Posts
    258
    Thanks
    3
    Thanked 65 Times in 59 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: unicode

    It is cp1251 most likely (like Chris writes). I was able to open the file using Kate and got a sensible text. Checking the coding of the text, I got cp1251.

Similar Threads

  1. [disable unicode]
    By tri407tiny in forum Newbie
    Replies: 3
    Last Post: 5th May 2010, 02:20
  2. Unicode
    By qtuser20 in forum Qt Programming
    Replies: 0
    Last Post: 28th September 2009, 21:43
  3. Printing Unicode?
    By auba in forum Qt Programming
    Replies: 2
    Last Post: 4th June 2009, 15:24
  4. i have a problem with Qt unicode
    By coder1985 in forum Qt Programming
    Replies: 5
    Last Post: 20th November 2007, 20:08
  5. source in unicode 4.0
    By conexion2000 in forum Qt Programming
    Replies: 1
    Last Post: 10th August 2007, 11:28

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.