View Full Version : convert ampersand encoded HTML into something readable
tetsuoii
16th October 2010, 22:41
I'm extracting strings from a webpage, but they are full of national characters encoded in html ampersand - hash - ascii code digits - semicolon format, looking something like this:
"sekretær i kø"
This is awful, and I can't find any information on how to decode these special characters, to the point where I'm close to writing a parser just for the purpose. The web page is encoded in utf-8 format, and the browser has no problem displaying it but in Qt all I get is mangled text strings...
Does anyone know how to read html encoded characters right?
Lykurg
16th October 2010, 23:02
Do not double post! I'll close this one.
EDIT: Oh, come on, decide where you want to post before you post! And once you posted, don't change and make the first one unreadable. Ask a moderator for moving your post if really needed.
For educational purpose, I leave this one closed. Edit your first one and you will be get an answer.
ChrisW67
17th October 2010, 04:30
0xFFFF ? A Unicode non-character perhaps? What was the question?
Lykurg
17th October 2010, 10:32
Ok, a little mess here, but now both threads or merged and open again. ChrisW67's answer was referring to the now deleted post which hadn't had a question...
Lykurg
17th October 2010, 10:37
Ok, and now to prove that we are gentile here:
Have a look at QTextDocument. Set the html and receive the plain text back. A more lightweighted solution would be to search for such notations and replace them by hand.
tetsuoii
24th October 2010, 18:49
sorry 'bout the double posting, i'll try not to heat your helmet next time =)
anyway, setting all labels to label->setTextFormat(Qt::RichText); fixed all my problems, both the one described above and the one where I couldn't use norwegian ascii characters which I had to substitute with æ etc. that don't display like "h<?>lvetes j<?>vla kr<?>ket<?>r" :confused: anymore!
It also improved my mood, which was on a slope..., So to all scandinavian, french, german, polish and other special character users, the setTextFormat( Qt::RichText ); function is highly recommended!
And thanks alot to you, Helmet-Man, for your valuable advice which may have saved me days of work!
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.