PaladinKnight
5th April 2010, 17:51
Hi!
There's a program I want to modify that has some problems parsing an XML file that uses UTF-8.
The content of some of the fields in the XML file is dumped into a flat file using a QTextStream (to which I tried specifying the encoding) but I can see that the characters which are not present in normal 7-bit ASCII are not correctly processed.
For example, a UTF-8 character that takes two bytes ends up in the flat file taking 4-5 bytes.
My guess is that when the file is read Qt (the code in question uses a QDomDocument) thinks that the file is in ISO-8859-1 (or something like that) and read the UTF-8 character as two characters. When it then tries to dump it in the flat file it tries to store these two characters as separate UTF-8 multi-bytes characters.
The end result is that the text strings end up being corrupted.
Is there a way to tell a QDomDocument which character set to use? Is it supposed to do it by itself using the XML header or is there something else to do? The correct character set (UTF-8) is declared in the XML file header.
Thank you!
Nick
There's a program I want to modify that has some problems parsing an XML file that uses UTF-8.
The content of some of the fields in the XML file is dumped into a flat file using a QTextStream (to which I tried specifying the encoding) but I can see that the characters which are not present in normal 7-bit ASCII are not correctly processed.
For example, a UTF-8 character that takes two bytes ends up in the flat file taking 4-5 bytes.
My guess is that when the file is read Qt (the code in question uses a QDomDocument) thinks that the file is in ISO-8859-1 (or something like that) and read the UTF-8 character as two characters. When it then tries to dump it in the flat file it tries to store these two characters as separate UTF-8 multi-bytes characters.
The end result is that the text strings end up being corrupted.
Is there a way to tell a QDomDocument which character set to use? Is it supposed to do it by itself using the XML header or is there something else to do? The correct character set (UTF-8) is declared in the XML file header.
Thank you!
Nick