XML parsing error with QDom and QStreamReader
Hello,
Currently I struggle with parsing some XML data without idea how to solve my problem. What I want to do is to parse text data from one of the elements in XML, everything is working but not as expected.
XML example:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<textData>
<text>
...
<des> Sample data with link <a href = ..> and bb code [bbcode], with entities ™ </des>
</text>
</textData>
and I expect to get this:
Code:
Sample data with link <a href = ..> and bb code [bbcode], with entities ™
Using QDom works fine, I can retrieve that text but the problem is that text is parsed. So I don't get <a href ..>.
When I use QXmlStreamReader approach thing are getting even more frustrating because data stops on first entity occurrence (I'm using polish characters so ... utf-8 don't handle all of them).
So conclusion that I cam up with is that error occur with text() function and parsing.
And the question: is there a way to force QDom / QXmlStreamReader to output raw / unformated text? (currently I don't want to use regexp to parse id out, but it seams that's only way)
Best regards.
Re: XML parsing error with QDom and QStreamReader
First of all utf-8 handles polish characters just fine. Second of all if your xml file is not a valid xml file then, well... don't expect an xml parser to parse it. If you use entities (such as ™), they need to be declared first. If you want a verbatim copy of the contents of the tag, wrap the contents into a CDATA section.
Re: XML parsing error with QDom and QStreamReader
Thanks CDATA is what I need.
BTW. What do you mean that polish characters are handled fine. I always got for ó a "ó", well I guess You mean that this is correct with I also agree.
BTW. How to declare a entities?
Re: XML parsing error with QDom and QStreamReader
Quote:
Originally Posted by
Talei
BTW. What do you mean that polish characters are handled fine. I always got for ó a "ó"
I mean that utf-8 can handle polish characters very well. This site is utf-8 encoded and these characters are displayed properly: ĄĆĘÅNÃ“ÅšÅ»Å¹Ä…Ä‡Ä Å‚Å„Ã³Å›Å¼Åº
Quote:
BTW. How to declare a entities?
I'm sorry, this is not a site for teaching XML. Google for XML+entity or something like that. You probably need to reference this document (or similar): http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent