PDA

View Full Version : QTextEdit <body> tags text only



certqt
22nd January 2011, 12:28
I am storing some simple rich text (HTML) strings in a database which are input from a QTextEdit widget and using a QDataWidgetMapper.. I need the formatting attributes (bold, italic, underline, bullets) that rich text provides.

Using the default implementation this means that each record stores the whole QTextDocument which is somewhat wasteful, as it probably only contains about 10 or so words of useful text. Given that all entries in the database will use the same DOCTYPE, CSS etc. I would prefer only to store the data between the <BODY> </BODY> tags.

My idea is to subclass QTextEdit and implement a toHtmlBody() (or just reimplement toHtml()) however I haven't found a neat way of pulling just the body from the document.

So far I tried QXmlStreamReader and various regex's I found on the web.

Any ideas are most welcome!

Lykurg
22nd January 2011, 12:35
A fast way would be to reimp the toHtml() method and use regular expressions to filter all between the body tags. But then you still will have a lot of garbage. The best solution would probably be to create a custom method, and use QTextCursor to transform the text in the way you want. (only b and i tags for example)

certqt
22nd January 2011, 12:49
Thanks!


A fast way would be to reimp the toHtml() method and use regular expressions to filter all between the body tags. But then you still will have a lot of garbage.

Yep, this is what I am trying will keep looking for a suitable regular expression (if anyone has one it would be great!).

I am interested though what other garbage I might have - so far I haven't seen anything except the tags that I am expecting but this might be because I am only typing and formatting, I guess 'pasting' could be a problem.

Lykurg
22nd January 2011, 14:11
if you allow pasting then you have to take care about a lot of stuff, so maybe it is an option for you to use e.g. HTML tidy (http://tidy.sourceforge.net/).