Hello,
I got some broken xml files, in which some xml tags contain white spaces, tab, or line break.
How can I remove them? I try below codes, but it doesn't work.
Thank you!
Qt Code:
To copy to clipboard, switch view to plain text mode
Hello,
I got some broken xml files, in which some xml tags contain white spaces, tab, or line break.
How can I remove them? I try below codes, but it doesn't work.
Thank you!
Qt Code:
To copy to clipboard, switch view to plain text mode
Your regexp is wrong. It matches only tags that contain the opening and closing brackets and one or more spaces, essentially only empty opening tags: "< >". It also will not match closing tags "</tag>" or tags for self-contained elements "<tag/>".
But I do not think you can do this with a simple regexp match and a string replace.
It is true that XML element tag names and attribute names cannot contain space characters. However, other characters are allowed, such as '.', '_', and '-'. Uppercase, lowercase, and numbers are also allowed.
More importantly though, XML opening element tags can contain attributes, and attributes must be separated by one or more spaces. Attribute values can contain embedded spaces. You do not want to remove either the spaces between attributes or the spaces inside attribute values. In additions, attribute values can contain '<' and '>' characters, so you can't use those as part of a regexp match either.
Finally, you also have to distinguish between a tag name with an embedded space and a tag name followed by a space before an attribute name.
So I think you will have to forget about using rexexp and essentially write a mini XML parser that embeds these rules of XML into its matching and replacement logic. You might be able to write a regexp that matches an entire XML opening or closing tag, but then you would have to parse the content of the tag to ensure that the only spaces you were replacing are those embedded in the tag name.
Google for "recursive descent XML parsing" for some code you might be able to adapt.
Last edited by d_stranz; 7th November 2021 at 17:04.
<=== The Great Pumpkin says ===>
Please use CODE tags when posting source code so it is more readable. Click "Go Advanced" and then the "#" icon to insert the tags. Paste your code between them.
sophvic (23rd November 2021)
If your problem is that some xml elements has a space in the name (which is indeed forbidden), then you can do it with an algorithm like this:
1. Find all closing tags with space, extract an element name from them and store those names in a QStringList.
2. For each name in the list make a substitution name (by removing spaces, or replacing them with '_' or whatever you prefer).
3. For each name in the list run a find-replace for "<%1" this will ensure that any opening tag name will be corrected without touching possible attributes.
4. Do the same for closing tags ( "</%1>" )
This is a great idea and seems like it will work around all of the difficulties I mentioned with attributes.If your problem is that some xml elements has a space in the name (which is indeed forbidden), then you can do it with an algorithm like this:
<=== The Great Pumpkin says ===>
Please use CODE tags when posting source code so it is more readable. Click "Go Advanced" and then the "#" icon to insert the tags. Paste your code between them.
Bookmarks