PDA

View Full Version : XML parsing error with QDom and QStreamReader



Talei
5th March 2011, 18:33
Hello,
Currently I struggle with parsing some XML data without idea how to solve my problem. What I want to do is to parse text data from one of the elements in XML, everything is working but not as expected.

XML example:

<?xml version="1.0" encoding="UTF-8"?>
<textData>
<text>
...
<des> Sample data with link <a href = ..> and bb code [bbcode], with entities &trade; </des>
</text>
</textData>
and I expect to get this:

Sample data with link <a href = ..> and bb code [bbcode], with entities &trade;

Using QDom works fine, I can retrieve that text but the problem is that text is parsed. So I don't get <a href ..>.

When I use QXmlStreamReader approach thing are getting even more frustrating because data stops on first entity occurrence (I'm using polish characters so ... utf-8 don't handle all of them).

So conclusion that I cam up with is that error occur with text() function and parsing.

And the question: is there a way to force QDom / QXmlStreamReader to output raw / unformated text? (currently I don't want to use regexp to parse id out, but it seams that's only way)

Best regards.

wysota
5th March 2011, 19:23
First of all utf-8 handles polish characters just fine. Second of all if your xml file is not a valid xml file then, well... don't expect an xml parser to parse it. If you use entities (such as &trade;), they need to be declared first. If you want a verbatim copy of the contents of the tag, wrap the contents into a CDATA section.

Talei
5th March 2011, 19:31
Thanks CDATA is what I need.
BTW. What do you mean that polish characters are handled fine. I always got for ó a "&oacute;", well I guess You mean that this is correct with I also agree.
BTW. How to declare a entities?

wysota
5th March 2011, 20:52
BTW. What do you mean that polish characters are handled fine. I always got for ó a "&oacute;"
I mean that utf-8 can handle polish characters very well. This site is utf-8 encoded and these characters are displayed properly: ĄĆĘŁNÃ“ÅšÅ»Å¹Ä…Ä‡Ä Å‚Å„Ã³Å›Å¼Åº


BTW. How to declare a entities?
I'm sorry, this is not a site for teaching XML. Google for XML+entity or something like that. You probably need to reference this document (or similar): http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent