PDA

View Full Version : Special character's HTML entity to string



Hiba
3rd March 2009, 11:27
I have a set of HTML files, that are used in 2 different ways in my program: I have QTextBrowser for displaying the HTML pages normally inside my program and I parse manually header information from these files in order to display the information in a completely separate list.

The problem is, that the language in these files contains special characters, that are encoded as HTML entities, for example:

<h1>tekućina</h1>

This is displayed correctly on my QTextBrowser, but in the separate list this is parsed wrong: I simply read this line with standard getLine() to std:string, so the string will contain the
ć instead of ć.

Is there any easy way around this, like how to convert these HTML entities to QString? Or would the better option to edit the HTML files somehow to contain the actual character ć instead of the entity, without breaking the QTextBrowser functionality?

talk2amulya
3rd March 2009, 11:34
I simply read this line with standard getLine() to std:string

where r u reading the line from?

Hiba
3rd March 2009, 11:46
where r u reading the line from?
From the *.html files, I read every line actually, but I'm interested only of the string inside the h1-tags. This is completely separate from the QTextBrowser functionality I have currently, and my point bringing it up was that the same HTML files must work on both cases.

talk2amulya
3rd March 2009, 12:07
i dont think that would be possible, perhaps you would need to put a hack by reading the content into a QTextEdit using setHtml, read out of it using toHtml() into a string, then parse through it. all this time u will need QTextEdit only for sm time, and u can destroy it as soon as u have read back from it so it wont create any hassle. But if you come up with a better solution, let us know :)

Hiba
3rd March 2009, 14:05
I was in a bit of hurry so I brute forced it and made a conversion function, that changes the HTML entities to correct characters (I manually mapped them, the language in question did not have too many special characters).