Results 1 to 2 of 2

Thread: QtXml/QDomDocument and invalid attribute syntax

  1. #1
    Join Date
    Apr 2010
    Posts
    11
    Qt products
    Qt4
    Platforms
    MacOS X Unix/X11 Windows

    Default QtXml/QDomDocument and invalid attribute syntax

    Hi,

    i am using QDomDocument to parse our XMLFiles - works nice and fast, except one big problem.

    Some of the xml files are using invalid syntax in attributes like this:

    Qt Code:
    1. <xml>
    2. <node someAttribute="this is <b>text</b>" />
    3. </xml>
    To copy to clipboard, switch view to plain text mode 

    As you can see there are "<" and ">" in the attribute-value. Our old xml parser had no problems with this since it's in quotes.

    I can't change all the xml files by hand because we get a ton of them every day from our customers. So.. does anyone have any idea how I can solve this problem?

    QDomDocument doesn't work with this.. it just stops reading the xml at the first node with a "<" in the attribute-value

    Thanks,
    Aya

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: QtXml/QDomDocument and invalid attribute syntax

    Well... your file is not xml, it's "xml-like". Every proper xml parser should bail out on this. What you could do is that you can prescan the whole file using a regular expression and convert all invalid attributes to valid ones before passing the text to an xml parser. Something like:
    Qt Code:
    1. QMap<QString,QString> replacements;
    2. replacements.insert("<", "&lt;");
    3. replacements.insert(">", "&gt;");
    4. // etc
    5. QMapIterator<QString,QString> iter(replacements);
    6. while(iter.next()){
    7. text = text.replace(iter.key(), iter.value());
    8. }
    To copy to clipboard, switch view to plain text mode 

    Of course this is an oversimplification as this would replace all angle brackets and that's certainly not what you want. You have to detect attributes first (using regular expressions) and only operate on their contents.

    Maybe something like this?
    Qt Code:
    1. QRegExp("([A-Za-z]+)\\s*=\\s*\"([^\"]+)\""); // cap(1) contains attr name, cap(2) contains value
    To copy to clipboard, switch view to plain text mode 
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


Similar Threads

  1. Replies: 6
    Last Post: 6th September 2010, 14:38
  2. Split QDomDocument to new QDomDocument
    By estanisgeyer in forum Qt Programming
    Replies: 4
    Last Post: 28th January 2009, 10:59
  3. QDomDocument inside other QDomDocument
    By estanisgeyer in forum Qt Programming
    Replies: 1
    Last Post: 13th November 2008, 16:27
  4. Help about QtXml
    By hgedek in forum Newbie
    Replies: 1
    Last Post: 17th December 2007, 13:09
  5. Replies: 13
    Last Post: 10th October 2007, 16:38

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.