PDA

View Full Version : Unicode/ASCII characters in QTextStream



yren
23rd November 2009, 17:31
When I read file contents into a QString like following:

QFile myFile(test.u);
myFile.open(QIODevice::ReadOnly);

QTextStream ts(&myFile);
QString strContent = ts.ReadAll();

Is QTextStream smart enough to determine if the file content is Unicode or ASCII? Since QString holds data internally in Unicode format, I would get a totally different string if it cannot distinguish.

Thanks!

squidge
23rd November 2009, 18:37
From the docs:

void QTextStream::setAutoDetectUnicode ( bool enabled )

If enabled is true, QTextStream will attempt to detect Unicode encoding by peeking into the stream data to see if it can find the UTF-16 or UTF-32 BOM (Byte Order Mark). If this mark is found, QTextStream will replace the current codec with the UTF codec.

yren
23rd November 2009, 19:09
From the docs:

void QTextStream::setAutoDetectUnicode ( bool enabled )

If enabled is true, QTextStream will attempt to detect Unicode encoding by peeking into the stream data to see if it can find the UTF-16 or UTF-32 BOM (Byte Order Mark). If this mark is found, QTextStream will replace the current codec with the UTF codec.

Thanks! How does the auto-detect know if it is Unicode file? Does Unicode file have some kind of header?

squidge
23rd November 2009, 19:25
It has an optional header. If the header is present and encoded with UTF-16 for example, the file will start with the bytes 0xFF,0xFE or 0xFE,0FF depending on byte order. The same happens for UTF-32, although there are 4 bytes instead of 2.

If the header is missing, I don't see how it could be recognised easily. If you know its not a binary file, then any character >127 or certain characters below 32 could assume unicode.