reading XML node and sub nodes

Printable View

15th August 2014, 21:04
elcuco

1 Attachment(s)

reading XML node and sub nodes
Hi all,

I need to parse an XML file. When a specific node, is found, I need to get the exact test inside that tag, not only the "xml text".

Example:
Code:

<root> <a>11111</a> <a>test 123</a> </root>
In this XML I need to be able to get: "1111" and "test 123".

I am using this code in Qt5:
Code:

void test1() { QString rawXML = "<root>" " <a>11111</a>" " <a>test 123</a>" "</root>"; QXmlStreamReader xml(rawXML); QStringRef s; xml.readNextStartElement(); s = xml.name(); if (s!="root") { return; } while (!xml.atEnd()) { xml.readNextStartElement(); s = xml.name(); if (s!="a") { break; } QString ss = xml.readElementText(QXmlStreamReader::IncludeChildElements); qDebug("%s", qPrintable(ss)); } }
This is not working as I expect. I am getting "test 123" and not "test 123".

What is the best approach to handle this situation? How can I parse the XML and getting the desired result?

Attachment 10558

EDIT: similar stackoverflow question: http://stackoverflow.com/questions/5...ing-html-in-qt
15th August 2014, 22:52
ChrisW67

Re: reading XML node and sub nodes

Take a look at the tokenString() function. You may be able to construct the result as you go through the start, text, and end tokens of the element.
16th August 2014, 12:02
anda_skoa

Re: reading XML node and sub nodes

Alternatively use QDomDocument for parsing, QDomDocument::elementsByTagName to get all the <a> tags, then call QDomNode::save() on all their children.

Cheers,
_

Re: reading XML node and sub nodes

Another solution that does not work:

Code:

QByteArray ba(rawXML);
	QBuffer bytes;
 
	bytes.setBuffer(&ba);
	bytes.open(QIODevice::ReadOnly);
	QXmlStreamReader xml(&bytes);
...
                QIODevice *device = xml.device();
		pstart  = device->pos();
		QString ss = xml.readElementText(QXmlStreamReader::IncludeChildElements);
		pend = device->pos();
 
		char line[100];
		int len = pend-pstart;
		device->seek(pstart);
		device->read(line, len);
		device->seek(pend);

This is because it seems that seems that QXmlStreamReader will buffer my whole data in advance. .. so "pos()" will always return the last byte in the raw data.

Re: reading XML node and sub nodes

2nd attempt, using QDomDocument:
Code:

void test2() { const char* rawXML = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" "<root>" " <a>11111</a>" " <a>test 123</a>" "</root>"; QDomDocument xml("rawXML"); QByteArray ba(rawXML); xml.setContent(ba); QDomElement rootElement = xml.documentElement(); qDebug("root element has %d childs", rootElement.childNodes().count()); qDebug("root element is %s ", qPrintable(rootElement.nodeName()) ); QDomNodeList a = rootElement.elementsByTagName("a"); for (int i=0; i< a.length(); i ++) { QString aContent; QTextStream ts(&aContent); a.at(i).save(ts,0); qDebug("- [%s]", qPrintable(aContent)); QDomNode d = a.at(i); } }
Now, I get not only the content, but the tags as well. This is the output:
Code:

root element has 2 childs root element is root - [<a>11111</a> ] - [<a> test 123 </a> ]
I assume I can trim the leading

Quote:

<a>

and ending

Quote:

</a>\n

, but this is at the same level of ugliness I am trying to avoid.

Any other idea?

EDIT:

I should be testing more before testing. anda_skoa, I re-read your post and changed:
Code:

const char* rawXML = "<root><a>11111</a><a>test 123</a></root>"; for (int i=0; i< alist.length(); i ++) { QDomNode a = alist.at(i); QString aContent; QTextStream ts(&aContent); a.firstChild().save(ts,0); qDebug("- [%s]", qPrintable(aContent)); }
Which is better:

Quote:

- [11111]
- [test 123
]

Still, I get an extra newline after then
Code:


which is bad for me... but its much better then before. It seems that QDomDocument is adding extra newlines when parsing. Am I correct? How can I disable this "feature"?

17th August 2014, 23:01
anda_skoa

Re: reading XML node and sub nodes

Does the newline matter? it is still the same XML content, no?

One other thing you could look into is XQuery, Qt has a module for that as well (called Qt XML patterns)

Cheers,
_
18th August 2014, 06:58
elcuco

Re: reading XML node and sub nodes

I modified the XML source to be in a single line (see last example). The newline is important, but since it can be part of the input (I think it needs to be escaped, so it may be possible to filter out). Still bugs me as it feels like Qt is doing extra work behind my back.

Yes, XmlPatterns is another option... I was looking into it, but as I know nothing about it, I was hoping someone would give me the one-liner I am looking for... :)
18th August 2014, 08:08
anda_skoa

Re: reading XML node and sub nodes

Hmm, I am not sure the newline at this point (after an end tag) is relevant as far as XML goes.

Cheers,
_
18th August 2014, 17:00
elcuco

Re: reading XML node and sub nodes

Quote:

Originally Posted by anda_skoa

Hmm, I am not sure the newline at this point (after an end tag) is relevant as far as XML goes.

Cheers,
_

In theory you are right.
In practice, I am paring XMLs that contain data which (once) was a user input, and thus I need to store the exact that the user wrote.

Note to self:
If you ever get into problems ... UUENCODE or base64 that &^%&^% text.... unfortunately this time the format is pre-defined for me.