Results 1 to 12 of 12

Thread: Character encoding issues

  1. #1
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Default Character encoding issues

    Hello,
    I am writing a qt console application; in my application, i am parsing an rss source , and then i am searching for a keyword(that comes from user input), finally i am writing the related item in the rss source to a text file.

    I get the word to be searched from the console:
    Qt Code:
    1. cout<<"Authorname ?\n";
    2. cin>>author;
    3. searchedAuthor=author;//searchedAuthor is a QString
    To copy to clipboard, switch view to plain text mode 

    parsing the rss:
    Qt Code:
    1. if (currentTag == "dc:creator"){
    2. authorString += xml.text().toString();
    To copy to clipboard, switch view to plain text mode 

    searching for my author:
    Qt Code:
    1. if(authorString.contains(searchedAuthor,Qt::CaseInsensitive)){
    2. outputfile << titleString<< " "<<linkString <<" "<< descriptionString << authorString <<"\n";
    To copy to clipboard, switch view to plain text mode 

    This code works good for English. But i should use Turkish language( which has some extra characters like "ğ,ş,ı,ö,ç".
    The problem is that : if authorString contains some of that extra characters "QString::contains" function cannot find the author that it should find. "cout" function works without any problem(displays characters corectly). I dont know much about the character encoding issues;but i've seen setCodecForTr in the docs:

    Qt Code:
    1. QTextCodec::setCodecForTr(QTextCodec::codecForName("eucTR"));
    To copy to clipboard, switch view to plain text mode 

    I didnt know if eucTR is installed or not, just tried...This also didn't work..What can i do to make the "QString::contains" function work properly for Turkish language?

    Thanks in advance...

  2. #2
    Join Date
    Dec 2006
    Posts
    849
    Thanks
    6
    Thanked 163 Times in 151 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Character encoding issues

    You have to make sure that you read data with the encoding set to whatever the data is encoded in. If the data is turkish text, probably you need to do something like QTextCodec::setCodecForLocale(QTextCodec::codecFor Name("eucTR"));

    Note that it probably is better not to set that globally, but only for your input file. See QTextStream::setCodec().


    (Note that setCodecForTr sets the codec to be used for translations (the tr("...") calls in Qt code), the "Tr" has nothing to do with Turkish here ;-)

    HTH

  3. #3
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Default Re: Character encoding issues

    Hello,
    Do you mean something like that:

    Qt Code:
    1. cout<<"Author name ?\n";
    2. QTextCodec::setCodecForLocale(QTextCodec::codecForName("eucTR"));
    3. cin>>author;
    4. searchedAuthor=author;//searchedAuthor is a QString
    To copy to clipboard, switch view to plain text mode 

    I tried that; but the result didn't change.
    And how will i know if "eucTR" exists or not?(is there a list of installed codecs?)

    the "Tr" has nothing to do with Turkish here ;-)
    yes you are right ...

  4. #4
    Join Date
    Dec 2006
    Posts
    849
    Thanks
    6
    Thanked 163 Times in 151 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Character encoding issues

    no, I'd drop cin here.
    try something like
    Qt Code:
    1. QTextStream in(stdin);
    2. in.setLocale("eucTR");
    3. in >> searchedAuthor;
    To copy to clipboard, switch view to plain text mode 

    alternatively:
    Qt Code:
    1. cin >> author;
    2. searchedAuthor = QString::fromAscii(author.c_str()); // after setting the locale
    3. // ... if (see docs) QTextCodec::setCodecForCStrings() has been set
    To copy to clipboard, switch view to plain text mode 
    HTH

  5. The following user says thank you to caduel for this useful post:

    yagabey (15th December 2008)

  6. #5
    Join Date
    Sep 2006
    Posts
    6
    Thanked 1 Time in 1 Post
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Character encoding issues

    First of all:
    Qt always uses unicode for character encoding, so all the turkish, german, chinese letters etc. are represented.

    Microsoft windows always uses a country specific encoding,
    f.e:
    - Codepage 850 in western europe: http://de.wikipedia.org/wiki/Codepage_850
    - Codepage 857 for turkish: http://de.wikipedia.org/wiki/Codepage_857

    => So you have to convert the input from your specific encoding to unicode.

    Let's have a look at:
    http://doc.trolltech.com/4.4/qtextcodec.html
    There you can read:
    The supported encodings are:
    [...]
    # IBM 850
    # IBM 866
    # IBM 874
    [...]
    # ISO 8859-1 to 10

    So, unfortunately your needed encoding "IBM 857" is missing.
    I don't know if it works, but tryp ISO 8859-9:
    http://de.wikipedia.org/wiki/ISO_8859-9

    A simple test program goes like this (look at my comments in the code!):

    Qt Code:
    1. #include <QtCore>
    2. #include <QtGui>
    3.  
    4. #include <iostream>
    5. using namespace std;
    6.  
    7. int main(int argc, char** argv) {
    8. QApplication app(argc, argv);
    9.  
    10. char author[100];
    11.  
    12. cout<<"Authorname ?\n";
    13. cin>>author;
    14.  
    15. // Just for to get an idea of how the character codes are seen internally.
    16. //Please enter some of your arbitrary chars. In german I always use "äöü".
    17. for (int i=0; i<100; i++) {
    18. int c=author[i];
    19.  
    20. if (c<0) c+=256;
    21. printf("%d ", c);
    22. }
    23.  
    24. QByteArray encodedString=author;
    25. // QTextCodec *codec=QTextCodec::codecForName("IBM 850"); // western europe
    26. // QTextCodec *codec=QTextCodec::codecForName("IBM 850"); // turkish but will not work :-(
    27. QTextCodec *codec=QTextCodec::codecForName("ISO 8859-9"); // try it
    28.  
    29. if (!codec) {
    30. printf("Codec not supported.\n");
    31.  
    32. return 0;
    33. }
    34.  
    35. QString searchedAuthor=codec->toUnicode(encodedString);
    36.  
    37. // this message box gives you a validation if the encoding is interpreted correctly.
    38. // out put on the console does not show you anything, because it does not use unicode
    39. QMessageBox::information(NULL, "Ausgabe", searchedAuthor);
    40.  
    41. return 0;
    42. }
    To copy to clipboard, switch view to plain text mode 

    Have fun, Gérôme

  7. #6
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Unhappy Re: Character encoding issues

    I couldnt make "c_str()" function work although I added <cstring>, <string> headers?

    Qt Code:
    1. searchedAuthor = QString::fromAscii(author.c_str());
    To copy to clipboard, switch view to plain text mode 

    That function returned true..(Codec supported..)
    Qt Code:
    1. if (!codec) {
    2. printf("Codec not supported.\n");
    3. return 0;
    4. }
    To copy to clipboard, switch view to plain text mode 

    I also tried:
    Qt Code:
    1. q = q->codecForName("ISO-8859-9");
    2. QTextCodec::setCodecForCStrings(q);
    To copy to clipboard, switch view to plain text mode 

    and

    Qt Code:
    1. QTextCodec *codec=QTextCodec::codecForName("ISO 8859-9");
    To copy to clipboard, switch view to plain text mode 

    in the message box, characters are not correct again..(it shows "Ä°" instead of "İ" )
    Qt Code:
    1. QMessageBox::information(NULL, "Ausgabe", searchedAuthor);
    To copy to clipboard, switch view to plain text mode 

    what else should i do?

  8. #7
    Join Date
    Dec 2006
    Posts
    849
    Thanks
    6
    Thanked 163 Times in 151 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Character encoding issues

    I assumed that author is a std::string; if it is not (maybe it's just a char[20] or so...), just drop it.

  9. #8
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Default Re: Character encoding issues

    Qt Code:
    1. searchedAuthor = QString::fromAscii(author);
    To copy to clipboard, switch view to plain text mode 

    didnt fix the problem

  10. #9
    Join Date
    Dec 2006
    Posts
    849
    Thanks
    6
    Thanked 163 Times in 151 Posts
    Qt products
    Qt4
    Platforms
    Unix/X11

    Default Re: Character encoding issues

    show us the (complete) code you are using

  11. #10
    Join Date
    Sep 2006
    Posts
    6
    Thanked 1 Time in 1 Post
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Character encoding issues

    Quote Originally Posted by yagabey View Post

    Qt Code:
    1. QTextCodec *codec=QTextCodec::codecForName("ISO 8859-9");
    To copy to clipboard, switch view to plain text mode 

    in the message box, characters are not correct again..(it shows "Ä°" instead of "İ" )
    Qt Code:
    1. QMessageBox::information(NULL, "Ausgabe", searchedAuthor);
    To copy to clipboard, switch view to plain text mode 

    what else should i do?
    So, the right codec is definetyly Codepage 857, which is not supported by Qt :-(

    You have to implement the conversion from CP 857 to ISO-8859-9 yourself.

    Just walk throught the byte array and convert chars>127:

    Qt Code:
    1. for (int i=0; i<auth_len; i++) {
    2. int c=author[i];
    3.  
    4. if (c<0) {
    5. c+=256;
    6.  
    7. switch(c) {
    8. // f.e. the g with a "bow" on it
    9. case 167: author[i]=240; break; // or maybe 240-256=-16, try it, I've got no turkish windows to test it!
    10. // I think you don't have to do it for all 128 chars, only for the few
    11. // turkish special chars you need
    12. }
    13. }
    14. }
    To copy to clipboard, switch view to plain text mode 

    G.

  12. #11
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Default Re: Character encoding issues

    Here is a summary of the code:


    Qt Code:
    1. #include <QHttp>
    2. #include <QUrl>
    3. #include <QBuffer>
    4. #include <QFile>
    5. #include <QTextStream>
    6. #include <QXmlStreamReader>
    7. #include <QHttp>
    8. #include <QByteArray>
    9.  
    10.  
    11. class ColumnListing
    12. {
    13. Q_OBJECT
    14. public:
    15. ColumnListing();
    16.  
    17. public slots:
    18. void fetch();
    19. void readData(const QHttpResponseHeader &);
    20.  
    21. private:
    22. void parseXml();
    23.  
    24. QXmlStreamReader xml;
    25. QString currentTag;
    26. QString linkString;
    27. QString titleString;
    28. QString descriptionString;
    29. QString authorString;
    30. QString urltext;
    31. QString inputNews;
    32. QString searchedAuthor;
    33. QFile file;
    34. QFile *htmlFile;
    35. QHttp httpInstance;
    36. int subconnectionId;
    37. QUrl urlColumnToGo;
    38.  
    39. QHttp http;
    40. int connectionId;
    41.  
    42. };
    To copy to clipboard, switch view to plain text mode 

    constructor part:
    Qt Code:
    1. ColumnListing::ColumnListing(QWidget *parent)
    2. : QWidget(parent)
    3. {
    4.  
    5. connect(&http, SIGNAL(readyRead(const QHttpResponseHeader &)),
    6. this, SLOT(readData(const QHttpResponseHeader &)));
    7.  
    8. char *input;
    9. char *author;
    10. input = new char[100];
    11. author = new char[100];
    12.  
    13. cout<<"Newspaper ?\n";
    14. cin>>input;
    15. inputNews=input;//inputNews is global QString
    16.  
    17. cout<<"Author ?\n";
    18. cin>>author;
    19. searchedAuthor=author;//searchedAuthor is global QString
    20.  
    21. if (inputNews=="milliyet"){
    22. urltext.append("http://www.milliyet.com.tr/D/rss/rss/RssY.xml?ver=51");
    23. }
    24. else if (inputNews=="sabah"){
    25. urltext.append("http://www.sabah.com.tr/rss/yazarlar.xml");
    26. }
    27. else if (inputNews=="radikal"){
    28. urltext.append("http://www.radikal.com.tr/radikal_yazar.xml");
    29. }
    30. /*** More News sites here....***/
    31.  
    32. file.setFileName("output.txt");//file is global QFile
    33. if (!file.open(QFile::ReadWrite | QFile::Truncate))
    34. return;
    35.  
    36. htmlFile= new QFile("htmloutput.html");//htmlFile is global QFile
    37. if (!htmlFile->open(QFile::ReadWrite | QFile::Truncate))
    38. return;
    39. }
    To copy to clipboard, switch view to plain text mode 

    fetching:
    Qt Code:
    1. void ColumnListing::fetch()
    2. {
    3. xml.clear();
    4. QUrl url(urltext);
    5. http.setHost(url.host());
    6. connectionId = http.get(url.path());
    7. }
    To copy to clipboard, switch view to plain text mode 

    parsing rs doc:
    Qt Code:
    1. void ColumnListing::parseXml()
    2. {
    3. QTextStream inputText(&file);
    4.  
    5. while (!xml.atEnd()) {
    6. xml.readNext();
    7. if (xml.isStartElement()) {
    8. if (xml.name() == "item")
    9. linkString = xml.attributes().value("rss:about").toString();
    10. currentTag = xml.name().toString();
    11. } else if (xml.isEndElement()) {
    12. if (xml.name() == "item") {
    13.  
    14. if(authorString.contains(searchedAuthor,Qt::CaseInsensitive) ){
    15. inputText << titleString<< " "<<linkString <<" "<< descriptionString << authorString <<"\n";
    16.  
    17. QUrl url(linkString);
    18.  
    19. httpInstance.setHost(url.host());
    20. subconnectionId = http.get(url.path(),htmlFile);//write the author page into html file
    21. }
    22.  
    23. titleString.clear();
    24. linkString.clear();
    25. descriptionString.clear();
    26. authorString.clear();
    27.  
    28. }
    29.  
    30. } else if (xml.isCharacters() && !xml.isWhitespace()) {
    31. if (currentTag == "title"){
    32. titleString += xml.text().toString();
    33. }
    34. else if (currentTag == "link"){
    35. linkString += xml.text().toString();
    36. }
    37. else if (currentTag == "description"){
    38. descriptionString += xml.text().toString();
    39. }
    40. else if (currentTag == "dc:creator"){
    41. authorString += xml.text().toString();
    42. }
    43. }
    44. }
    45. if (xml.error() && xml.error() != QXmlStreamReader::PrematureEndOfDocumentError) {
    46. qWarning() << "XML ERROR:" << xml.lineNumber() << ": " << xml.errorString();
    47. http.abort();
    48. }
    49. }
    To copy to clipboard, switch view to plain text mode 

    main.cpp
    Qt Code:
    1. int main(int argc, char **argv)
    2. {
    3. QApplication app(argc, argv);
    4.  
    5. q = q->codecForName("ISO-8859-9");
    6. QTextCodec::setCodecForCStrings(q);
    7.  
    8. ColumnListing *columnlisting = new ColumnListing;
    9. columnlisting->fetch();
    10. return app.exec();
    11. }
    To copy to clipboard, switch view to plain text mode 

  13. #12
    Join Date
    Dec 2007
    Location
    London
    Posts
    206
    Thanks
    40
    Qt products
    Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android

    Default Re: Character encoding issues

    Ok at last I made it work:

    instead of:
    Qt Code:
    1. searchedAuthor = QString::fromAscii(author);
    To copy to clipboard, switch view to plain text mode 

    I used :
    Qt Code:
    1. searchedAuthor = QString::fromUtf8(author);
    To copy to clipboard, switch view to plain text mode 

    and now everything works perfectly, thank you all...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.