PDA

View Full Version : UTF-16 files



jcr
6th September 2006, 21:35
Hello,
I am processing files that are usually "ASCII text, with CRLF line terminators" but, once in a while I am getting the data (only numbers and simple english characters) in "Little-endian UTF-16 Unicode character data, with CRLF line terminators" files. How can I recognize those UTF-16 files and convert them into ASCII text files?
Many thanks

jacek
6th September 2006, 21:39
ASCII text shouldn't contain 0x00 bytes, while in UTF-16 around 50% of bytes should be 0x00. You can also ask the user.

jcr
7th September 2006, 00:12
Hi,
Well, I did it the way below and it works fine for my purpose and for the specific files I am dealing with but I am not sure how general it is:


38 QFile in(QString::fromStdString("datazip/temp/" + fileName));
39 in.open(QIODevice::ReadOnly);
40 QByteArray bufIn, bufOut;
41 bufIn = in.readAll();
42 in.close();
43 if((int)bufIn.at(0) < 0)
44 {
45 for(int i(0); i<bufIn.size(); ++i)
46 if((int) bufIn.at(i) > 0)
47 bufOut.append(bufIn.at(i));
48 QFile out(QString::fromStdString("datazip/temp/" + fileName));
49 out.open(QIODevice::WriteOnly);
50 out.write(bufOut);
51 out.close();
52 }

Thanks!

jacek
7th September 2006, 13:43
Well, I did it the way below and it works fine for my purpose and for the specific files I am dealing with but I am not sure how general it is:
Iff those files contain only ASCII characters, but are encoded in UTF-16, there shouldn't be any problem.


if((int)bufIn.at(0) > 0)

if( bufIn.at(0) != 0 )might be safer.