PDA

View Full Version : How to do QTextCodec conversion?



lni
12th October 2013, 19:35
I have been struggling with character conversion, please help.

I have a file in Chinese (GB2312), I need to read it and then print it on screen (in linux), I got garbage output on screen such as ��?��. I have tried for hours with different conversion. Nothing works!

Here is the code:



static QString toUnicode( const QString& str )
{
//return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ).toLocal8Bit(); // <- not working
//return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ).toUtf8(); // <- not working
return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ); // <- not working
}

void my_read_func()
{
QSettings settings( filename, QSettings::IniFormat );
settings.setIniCodec( QTextCodec::codecForName( "GB2312" ) );
settings.beginGroup( "xxxx" );
{
foreach( const QString& key, settings.allKeys() ) {
QStringList strList = settings.value( key ).toStringList();

qDebug() << key << "=" << strList; // <- garbage out

foreach( const QString& str, strList ) {
qDebug() << toUnicode( key ) << "=" << toUnicode( str ); // <- garbage out
}

}
}
settings.endGroup();
}

anda_skoa
12th October 2013, 22:15
static QString toUnicode( const QString& str )
{
//return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ).toLocal8Bit(); // <- not working
//return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ).toUtf8(); // <- not working
return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() ); // <- not working
}



That doesn't make any sense. Your str contains 16-bit Unicode already, what you want is some 8-bit encoding.
If your shell/erminal is also using "GB2312" then you need to call the fromUnicode() function to generate the 8-bit version of that codec.

Cheers,
_

lni
18th October 2013, 11:52
That doesn't make any sense. Your str contains 16-bit Unicode already, what you want is some 8-bit encoding.
If your shell/erminal is also using "GB2312" then you need to call the fromUnicode() function to generate the 8-bit version of that codec.

Cheers,
_

How do I do this? My terminal is English locale with utf8. I try the following and still doesn't work

qDebug() << QTextCodec::codecForName( "utf8" )->toUnicode( str.toAscii() ).toLocal8Bit();
* qDebug() << QTextCodec::codecForName( "utf8" )->fromUnicode( filename.toLocal8Bit() );
Thanks.

anda_skoa
18th October 2013, 17:21
qDebug() << QTextCodec::codecForName( "utf8" )->toUnicode( str.toAscii() ).toLocal8Bit();

str.toAscii() will obviously do something very wrong if str does not contain ASCII.



* qDebug() << QTextCodec::codecForName( "utf8" )->fromUnicode( filename.toLocal8Bit() );

So you convert the filename to local 8 bit encoding, which, according to you is UTF-8, then you transfer it back into 16bit Unicode and have qDebug() again encode it into 8 bit?

Have you had a look at QString::toUtf8()?
That is a short form of using th UTF-8 text codec and calling its fromUnicode() function.

Cheers,
_

lni
19th October 2013, 05:03
str.toAscii() will obviously do something very wrong if str does not contain ASCII.


So you convert the filename to local 8 bit encoding, which, according to you is UTF-8, then you transfer it back into 16bit Unicode and have qDebug() again encode it into 8 bit?

Have you had a look at QString::toUtf8()?
That is a short form of using th UTF-8 text codec and calling its fromUnicode() function.

Cheers,
_

Sorry, I probably should state more clear.

I try to read an ini file using QSettings, the ini file contains GB2312 text.

I want to read this file to work in all locale settings. The application may have done QTextCodec::setCodecForLocale( QTextCodec::codecForName( "GB2312" ) ), or may not have done that, in which case it is Linux English locale by default.

The ini is like

xxx=yyy

which xxx is in GB2312, and yyy is also in GB2312. yyy is pointing to a file path, which I need to open the file to read.

My approach is to decode both in UTF8 because the actual file name is in UTF8.

Here is what I have done. I don't like temporarily set and restore the locale. It looks ugly.




struct MyInfo {
QString key;
QString filename;
};

static QString toUnicode( const QString& str ) {
return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() );
}


* static MyInfo parseValue( const QString& key, const QStringList& strList ) {
MyInfo info;
info.key = key;
info.filename = strList.first():

return info;
}

static QMap<QString, MyInfo> info;

static void readIni( const QString& iniFileName ) {
if( QFile::exists( iniFileName ) ) {

// save codec
QTextCodec* codec = QTextCodec::codecForLocale();

// save codec utf-8 for this file
QTextCodec::setCodecForLocale( QTextCodec::codecForName( "utf8" ) );

QSettings settings( iniFileName, QSettings::IniFormat );
settings.setIniCodec( QTextCodec::codecForName( "GB2312" ) );
settings.beginGroup( tag );
{
foreach( const QString& key, settings.allKeys() ) {
QStringList strList = settings.value( key ).toStringList();
QString localKey = toUnicode( key );
MyInfo info = parseValue( localKey, strList );
info[ info.name ] = info;
}
}
settings.endGroup();

// restore codec (this looks ugly)
QTextCodec::setCodecForLocale( codec );
}

...

}

// after reading the ini, I have to do like this:

static QImage createImage( const QString& key ) {

// save codec
QTextCodec* codec = QTextCodec::codecForLocale();
QTextCodec::setCodecForLocale( QTextCodec::codecForName( "utf8" ) );

QImage image( info[ key ].filename );

// restore codec (this looks ugly)
QTextCodec::setCodecForLocale( codec );

return image;
}

anda_skoa
19th October 2013, 15:55
I try to read an ini file using QSettings, the ini file contains GB2312 text.

The ini is like

xxx=yyy

which xxx is in GB2312, and yyy is also in GB2312. yyy is pointing to a file path, which I need to open the file to read.

My approach is to decode both in UTF8 because the actual file name is in UTF8.


How can yyy be in GB2312 and contain a UTF8 filename?
Is yyy a character sequence with two different encodings?





static QString toUnicode( const QString& str ) {
return QTextCodec::codecForName( "GB2312" )->toUnicode( str.toLatin1() );
}


That doesn't make any sense. str is already in unicode, that's how QStrings work internally.

Cheers,
_