PDA

View Full Version : QXmlQuery XSLT stripping whitespace



simula67
4th June 2009, 06:10
Hello,

I am using QXmlQuery to transform some xml. The problem that I am experiencing is that when I have whitespace text nodes between elements that are transformed, the whitespace text nodes are discarded. I have included a ready example below. In this example, the spaces between the <b> elements are stripped and a newline is added. If there are non-whitespace characters between the <b> elements, nothing is stripped and no newline is added. Likewise, if two <b> elements are adjacent with no whitespace separating them, a newline is added.

I have looked through the docs for anything related to normalizing whitespace, but my searches have thus far turned up empty.

Thanks for any help,
Mark


#include <QtCore/QCoreApplication>

#include <QXmlQuery>
#include <QXmlFormatter>
#include <QBuffer>

#include <iostream>

int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);

QString content_str = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
"<article>Hey <b>txt</b> <b>txt</b><b>yes</b> <b>no</b> and <b>or</b> normal text</article>";
QBuffer content_buffer;
content_buffer.open(QBuffer::ReadWrite);
content_buffer.write(content_str.toUtf8());
content_buffer.reset();

QString transform_str = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
"<xsl:stylesheet version=\"2.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"
" <xsl:template match=\"/\"><xsl:apply-templates/></xsl:template>"
" <xsl:template match=\"article\"><div id=\"article\"><xsl:apply-templates/></div></xsl:template>"
" <xsl:template match=\"b\"><strong><xsl:apply-templates/></strong></xsl:template>"
"</xsl:stylesheet>";
QBuffer transform_buffer;
transform_buffer.open(QBuffer::ReadWrite);
transform_buffer.write(transform_str.toUtf8());
transform_buffer.reset();

// Define query.
QXmlQuery query(QXmlQuery::XSLT20);
query.setFocus(&content_buffer);
query.setQuery(&transform_buffer);

// Evaluate query to an output buffer.
QBuffer output_buffer;
output_buffer.open(QBuffer::ReadWrite);
QXmlFormatter output_formatter(query, &output_buffer);
output_formatter.setIndentationDepth(0);
query.evaluateTo(&output_formatter);
output_buffer.reset();

QString output_xml_str(output_buffer.data());

std::cout << output_xml_str.toStdString() << std::endl;

// Output is:
//<div id="article">Hey <strong>txt</strong>
//<strong>txt</strong>
//<strong>yes</strong>
//<strong>no</strong> and <strong>or</strong> normal text</div>

return a.exec();
}

faldzip
4th June 2009, 07:33
... I have whitespace text nodes ...
for me there are no such text nodes. You have just some nodes and whitespaces between them. What you want is that this XML:


<root>
<node>some text</node>
<node>some other text</node>
</root>
should mean something else than this:


<root>
<node>some text</node>
<node>some other text</node>
</root>
or this:


<root>



<node>some text</node>



<node>some other text</node>



</root>
But they are all the same as an XML structure and means exactly the same as:


<root><node>some text</node><node>some other text</node></root>
and this is the way they will be processed.

simula67
4th June 2009, 08:50
Hello faldżip,

You are wrong concerning the importance of whitespace within an xml document. The rules that specify the content that must be kept and can be discarded within an XML document are known as Canonical XML. The Canonical XML specification clearly states that whitespace between elements within the document element must be retained and can not be discarded.

http://www.w3.org/TR/xml-c14n#Example-WhitespaceInContent

This is especially crucial for mixed content schemas such as XHTML where "<em>hello</em> <em>hello</em>" is a very different beast than "<em>hello</em><em>hello</em>". Whitespace is content and stripping it is the destruction of said content.

Thanks,
Mark