PDA

View Full Version : Reading Unicode character from file and converting it in to QString of actual text



gunturrohith
21st July 2015, 06:40
Hi,

I have to read unicode character of Indian Languages(Hindi/Telugu etc) from a file,
then i have to convert it in to the QString of actual text/font of that language.

While we are passing unicode manually it's working fine like following :


QString str = QString::fromStdWString(L"\u0926\u0947\u0935\u0928\u093e\u0970");
qDebug() << str; // It's printing actual font;


But while we are trying to read unicode from file then not able to move ahead,
our requirement is like that :


QString fileText;
QFile file(argv[1]); //argv[1] is file name we are passing on command line.
QTextStream out(&file);
if(file.open(QIODevice::ReadOnly | QIODevice::Text))
{
fileText=out.readAll();
}
file.close();
qDebug()<<"data in the file:\n"<<fileText; // printing unicode character as it is in file;

// PROBLEM

std::wstring temp_text = fileText.toStdWString();
QString test = QString::fromStdWString(temp_text); // Not Working

qDebug()<<"text:"<<test; // It's printing unicode character as, conversion not happening


My doubt is on fromStdWString() , In first case we are passing "L", what is this ?
And how to paas this "L" in fromStdWString(temp_text);

Thanks

anda_skoa
21st July 2015, 07:09
Why do you want to convert the QString into a wstring just to reverse the conversion immediately again?

Cheers,
_

gunturrohith
21st July 2015, 07:31
Hi Anda,

Thanks for your quick reply !

At line No. 9,
qDebug()<<"data in the file:\n"<<fileText; // printing unicode character as it is in file;

Here we have to print data in actual hindi font, but it's printing unicode as it is in file.
So after reading the unicode from file we have to print it in to Hindi(local) font.

Can you guide me how to do ?

yeye_olive
21st July 2015, 09:27
In the line

QString str = QString::fromStdWString(L"\u0926\u0947\u0935\u0928\u093e\u0970");
the 'L' marks L"\u0926\u0947\u0935\u0928\u093e\u0970" as a wide string literal, i.e. a NUL-terminated string of wide characters (of type wchar_t), which are probably 16-bit wide on your system. In summary, this is probably a native UTF-16 string. The compiler then allocates a temporary std::wstring with this string literal, then builds a QString out of it. This works well because QString::fromStdWString() expects its argument to be encoded in UTF-16. By the way, you can eliminate the intermediate std::wstring by calling QString::fromWCharArray() instead.

When reading from a file, things are a bit different. A file is made of bytes, not 16-bit wide characters; you need to tell QTextStream how to decode those bytes into Unicode characters. By default, QTextStream uses QTextCodec::codecForLocale(), i.e. the encoding associated with the system locale. This may or may not be what you want. You should call QTextStream::setCodec() to explicitly set the appropriate codec. You can obtain a codec with QTextCodec::codecForName(); for instance, QTextCodec::codecForName("UTF-8") returns a UTF-8 codec.

Your first task should be to determine the encoding of the text file.

anda_skoa
21st July 2015, 09:39
qDebug()<<"data in the file:\n"<<fileText; // printing unicode character as it is in file;

Here we have to print data in actual hindi font, but it's printing unicode as it is in file.
So after reading the unicode from file we have to print it in to Hindi(local) font.


qDebug() is a debug stream, you might have more luck with writing into cout or using a QTextStream on top of the stdout file handle.

For cout you would probably need to convert the QString to the local 8 bit encoding, using QString::toLocal8Bit()

Cheers,
_

gunturrohith
21st July 2015, 11:43
Hi yeye_olive,

Thanks for your replay.

I didn't get clarity,can you provide any example for this one.

Thanks

yeye_olive
21st July 2015, 12:09
Find out how the text file is encoded.
In your code, right after the line

QTextStream out(&file);
add a line

out.setCodec("replace this with the name of the encoding (see the documentation for QTextCodec)");
to set the encoding you determined in 1.

Read the documentation for QTextCodec and QTextStream for details.

gunturrohith
21st July 2015, 13:29
Hi yeye_olive,

I already tried this one.But,I didn't get result

QFile file(argv[1]); (Here i gave hindi unicode file)
QTextStream out(&file);
out.setCodec("UTF-16");
out.setAutoDetectUnicode(true);
if(file.open(QIODevice::ReadOnly | QIODevice::Text))

{
fileText=out.readAll();
}
file.close();
qDebug()<<"data in the file:\n"<<fileText;

it prints like"ç•œæŒ°ã€°ç•œæŒ°ã„°ç•œæ °ãˆ°ç•œæŒ°ã°" unknown code But i want original language of unicode

Thanks.

yeye_olive
21st July 2015, 13:40
I have no idea what you mean by "hindi unicode" or "original language of unicode". There is only one Unicode (to rule them all). If you do not understand what a text encoding is, then I am afraid I cannot help you. I cannot do anything for you unless you follow those steps in order:

Read some material on text encodings to understand what they are.
Determine what the encoding of your file is.
Start coding.