PDA

View Full Version : QT4.6 Converting utf8 to '/uwxyz' and back.



Greebley
3rd November 2016, 22:19
I need to convert ut8 (currently read in by a QTextStream readLine() ) to ascii text:
1) if ascii (<=7F), then just use the ascii value so 'a' to 'a'
2) if not ascii then convert utf8 to '/u' followed by the hex value wxyz.
For example the Euro symbol would be the 6 ascii charaters '/u20AC'

I also have to go the other way where I have the string '/u20AC' and want to output as utf8 for the Euro.

I am having troubles determining whether QT string functions can help me with this or not.

It looks like if I use 'toUtf8 I will get a byte array with the euro as bytes E2 82 AC and could parse manually, but that is a bunch of work.

Is there a way I can get the unicode hex values from the utf8 QString?

SirJonas
6th November 2016, 01:27
i tried it but i couldn't too

ChrisW67
7th November 2016, 10:28
There is no such beast as a "utf8 QString". QString is a collection of QChar, essentially 16-bit Unicode basic multilingual plane code points that are trivially accessible using QString::at() or other methods. The file or stream you are reading from may be UTF-8 encoded and decoded by QTextStream.



#include <QCoreApplication>
#include <QFile>
#include <QTextStream>
#include <QDebug>

int main(int argc, char **argv)
{
QCoreApplication app(argc, argv);

QFile file("test.txt");
if (file.open(QIODevice::ReadOnly)) {
QTextStream in(&file);
in.setCodec("UTF-8");
QString line = in.readLine();

qDebug() << line;

QString result;
for(int i = 0; i < line.size(); ++i) {
const ushort code = line.at(i).unicode();
if (code < 0x0080)
result += line.at(i);
else
result += QString("\\u%1").arg(code, 4, 16, QChar('0'));
}

qDebug() << result;
}
return 0;
}

Output:


"test €1234"
"test \u20ac1234"

Radek
7th November 2016, 18:08
... as far as the line QString contains unicode characters below 0xFFFF (in fact, below 0xD800). The unicode in Qt is, in fact UTF16 and the characters above 0xFFFF are encoded as two ushorts. If the characters above 0xFFFF threaten, then use a small improvement:


#include <QVector>
...
QString line = in.readLine();

qDebug() << line;

QString result;
QVector<uint> utf8 = line.toUcs4();

for( int i = 0; i < utf8.size(); ++i )
{
const uint code = utf8.at(i);

if( code < 0x0080 ) result += line.at(i);
else result += QString("\\u%1").arg(code, 4, 16, QChar('0'));
}

qDebug() << result;

Greebley
11th November 2016, 18:28
I have gotten it to work using the ideas here:
Setting the QTextStream to utf8 and then working character by character