I thought QDomDocument could do the job but it just can't handle HTML... at least i cannot make it work.
Any ideas?
Thanks
I thought QDomDocument could do the job but it just can't handle HTML... at least i cannot make it work.
Any ideas?
Thanks
What is the job? QDomDocument is built to work with XML. HTML, in general, is not well-formed XML, and XML parsers will generally choke on it
Our goal is to load a web page containing a table displaying data we must import in a MySQL database.
The web page and the table will never change in their structure.
Will
QTextEdit::setHtml or QWebView::setHtml be of some use to you ?
I should have point out that this extraction process has to be automated... The application will be a deamon that will load the web page every morning at 1am.
I am currently looking at QWebPage but have some difficulties use this QWebKit module... looks like i always have to use QWebPage then QWebFrame then QWebElement....
Not sure am on the right track.
Maybe you can use an XQuery see http://doc.qt.nokia.com/4.7-snapshot/qxmlquery.html. If you document is not valid XML, it might be even better to parse it via a reg-exp and extract the required information.
XQuery adds some complexity to your project, as you need to understand it first ;-) Here is a tutorial: http://www.w3schools.com/xquery/default.asp.
Good luck!
If the web page will never change, then just slurp the HTML into a string and parse it yourself, perhaps using regular expressions. Or, if the page section containing whatever you're interested in conforms to XML specifications, extract that and hand it off to the XML parser for final processing.
One thing for sure, we cannot use QRegExp as it is not compliant with standard RegExp expressions....
Again, assuming we want to extract "this is a test" from "<TH>this is a test</TD>", QRegExp would handle lookahead but not the backward equivalent so it is possible to say "return string that is immediatly followed by </TD>" but it is not possible to have "return string that immediatly follows the string <TH>".
So at best i could extract the following string:
"<TH>this is a test"
Anyone knows how to do this ?
Bookmarks