PDA

View Full Version : output UTF?



mikro
18th May 2006, 22:27
hi,
i have written an app, that is able to output the SQLite-DBs content to an SQL-File and also read this.
As all the text in the database is in german where we have Umlaute (special characters like ä) i am getting problems with reading of the SQL-File. Qt seems to expect the file to be UTF. If i convert the file to UTF with my texteditor it works, otherwise i will see garbage instead of my Umlaute.
I am writing the file with
-----
QTextStream out(&file);
out.setGenerateByteOrderMark (true);
out << content;
------
(the second line is new: i am trying to make sure i write UTF, but it doesn't work ;(

i read the file with:
-------
if (!file.open(QIODevice::ReadOnly | QIODevice::Text)) {
QMessageBox::critical(this,tr("Datei konnte nicht geöffnet werden"),tr("Das Öffnen der Datei ist fehlgeschlagen:\n")+fileName);
return;
}
QTextStream in(&file);
--------
From my trial with setGenerateByteOrderMark it seems that now the textfiles i produce look like UTF, but aren't. So either i need a second method for QTextStream to make it really produce UTF or i need something to enable reading of ANSI-Text.

jacek
18th May 2006, 23:02
There are at least three UTF encodings --- which one do you want to use?

Try:
QTextStream out( &file );
out.setCodec( "UTF-8" );
out << content;
...
QTextStream in( &file );
in.setCodec( "UTF-8" );
...

wysota
18th May 2006, 23:31
There are two issues here.

First, SQLite3 accepts UTF-8 and UTF-16, but requires proper methods to be used for inserting data into the database (like using QSQLITE driver or sqlite client library, which is used by QSQLITE too)

Second, sqlite console doesn't support UTF. It may sound crazy but it doesn't do any conversion from what we call a local encoding (like latin1, latin2) to UTF-8 when it passes data to the database. So if you used the sqlite console (sqlite3 application) to enter data, you'll get garbage when reading it using proper methods (meaning, using the client library).

Of course the same stands for reading data from the database. So follow what Jacek has written but also bear in mind to properly access the database, as it may not contain utf-8 characters at all.

mikro
18th May 2006, 23:38
thank you both. Meanwhile i realized, that just the setGenerateByteOrderMark seemed to have worked: there were some bad characters left in my database, so it wasn't my apps fault to export them.
nevertheless i am sure it makes sense to use the setCodec for output. But what about the input? how can i know which format the file is in. As long as it has been generated by my app it's easy, but of course i want to be able to use my importfilter for easily inserting handwritten data as well.
shouldn't Qt automatically convert everything to UTF?

jacek
19th May 2006, 00:00
how can i know which format the file is in
Let the user specify it.


shouldn't Qt automatically convert everything to UTF?
Internally Qt does use Unicode, but when you read something you must tell it what encoding to use (otherwise it will assume that input is encoded using QTextCodec::codecForLocale() or try to guess if it isn't UTF-16).