PDA

View Full Version : Qt way of getting/parsing style attributes from html tags? local file batch



toglia3d
10th June 2010, 17:18
I have this old and dirty code I wrote some years ago that basically parsed css styles from local html documents. That code is really really ugly, and I would like to Qtify it!:cool:

I don't need rendering the webpage at all, just opening local files, reading them and closing them (in batch). How would you guys tackle this?

I first thought using the webkit somehing like:

QWebPage page;
QWebFrame * frame = page.mainFrame();
QUrl fileUrl("localFile.html");
frame->setUrl(fileUrl);
QWebElement document = frame->documentElement();
QWebElementCollection elements = document.findAll("div");
foreach (QWebElement element, elements){
std::cout << element.attribute("style").toStdString() << std::endl;
}
That didn't work... Why?

Can I work with html files like legal xml? In that case, getting the attributes from the styles is going to be somewhat annoying, knowing all the styles are declared in the "head"...

Any ideas, what Qclasses can be helpful here?

numbat
10th June 2010, 19:11
This will get you everything. That may not be what you want. I'm not sure of a way of getting only author specified styles.


#include <QtWebKit>
#include <QApplication>
#include <QtCore>
int main(int argc, char * argv[])
{
QApplication a(argc, argv);

QWebPage page;
QWebFrame * frame = page.mainFrame();
frame->setHtml("<html><head><style>div { color:blue; }</style><body><div style=\"font-size:120%;\">Hello World!</div></body></html>");
QWebElement document = frame->documentElement();
QWebElementCollection elements = document.findAll("div");
foreach (QWebElement element, elements)
{
QString style = element.evaluateJavaScript("getComputedStyle(this).cssText").toString();
QStringList styles = style.split(';');
foreach(QString pair, styles)
{
QStringList keyvalue = pair.split(':');

if (keyvalue.length() == 2)
qDebug() << keyvalue.at(0).trimmed() << " = " << keyvalue.at(1).trimmed();
}
}
}

toglia3d
10th June 2010, 20:08
Thanks numbat. I will check it out when I get home. Seems like your doing everything I need.

Another thing, is there a more elegant way of putting the file into a QString than the following?


QFile file("local.html");
QTextStream in(&file);
QString completeHtml;
while (!in.atEnd()) {
completeHtml.append(in.readLine());
}

numbat
10th June 2010, 20:15
You can use readAll. Here's a way to just get the rules in the style sheets:


#include <QtWebKit>
#include <QApplication>
#include <QtCore>
int main(int argc, char * argv[])
{
QApplication a(argc, argv);

QWebPage page;
QWebFrame * frame = page.mainFrame();
frame->setHtml("<html><head><style>div { color:blue; } span {font-weight: bold;}</style><body><div style=\"font-size:120%;\">Hello World!</div></body></html>");
QVariant vt = frame->evaluateJavaScript(
"out = []; "
"for (j = 0; j < document.styleSheets.length; j++) { "
"for (i = 0; i < document.styleSheets[j].cssRules.length; i++) { "
"out[i] = document.styleSheets[j].cssRules[i].style.cssText; } } "
"out = out;");
foreach (QVariant v, vt.toList())
{
qDebug() << v.toString();
}
}

toglia3d
10th June 2010, 20:44
Thanks again.

Seriously, Qt has all the nutrients and vitamins a young programmer needs to grow healthy.

wysota
11th June 2010, 10:44
You don't need QtWebKit just to parse an xml file. Either use one of QXmlStreamReader or QDomDocument or use QtXmlPatterns module to query for specific tags in the file. If you just want contents of all "style" attributes in the file, the last approach will be best. Something like this might work:

QXmlQuery qry;
qry.setQuery("doc('file.html')//@style"); // "doc('...')//div@style" will give you all 'style' attrs from all 'div' tags
// or:
// QFile f("file.html");
// f.open(QFile::ReadOnly|QFile::Text);
// qry.setFocus(&f);
// qry.setQuery("//@style");
QStringList list;
qry.evaluateTo(&list);

toglia3d
11th June 2010, 10:52
Huh? :confused: Please excuse me while I pick my jaw up from the ground.
Thats just beautiful.

genomega
14th June 2010, 03:42
Thanks: Found this thread by searching, answered my question.