PDA

View Full Version : Parsing HTML document problem.



ayanda83
8th December 2016, 07:30
Here (http://web1.capetown.gov.za/web1/TenderPortal/Tender)is a page I am trying to parse. I am trying to extract the "Tender numbers" from the HTML table on the page, which I can do just fine. The problem is that the HTML table is segmented into 5 segments and can be navigated using segment numbers at the bottom of the table. At the moment my program extracts only "Tender numbers" on segment 1 and not the rest of the segments. When I open the web page in a native browser, the entire HTML table appears in one HTML source file not segmented, hence I was under the impression that if I use QWebEnginePage::runJavaScript() on the page to extract the "Tender numbers", the function would get all of them. How can I solve this problem. Thanking you in advance.
void Cape_Town_Page::mine_to_tenders()
{
QFile tenderRefs_file("C:/Users/C5248134/Desktop/Projects/Ithala/Include/tender_refs.txt");

if(!tenderRefs_file.open(QFile::WriteOnly | QFile::Text))
{
qDebug() << "Tender refs file did not open ---- " <<tenderRefs_file.errorString() <<endl;
}
else
qDebug() << "opened" <<endl;

QTextStream out(&tenderRefs_file);
QStringList refs_list;

if(this->url().toString() == "http://web1.capetown.gov.za/web1/TenderPortal/Tender")
{
QString refs_script;

refs_script.append("function getRefs(){"
"var refs = document.body.querySelectorAll(\"tr.gridDetails\");"
"var refArr = [];"
"var i;"
"for(i = 0; i < refs.length; i++){"
"refArr[i] = refs[i].firstElementChild.textContent;"
"}"
"return refArr;"
"}"
"getRefs();"
);


runJavaScript(refs_script, [&](const QVariant data){
qDebug() << data.toStringList() <<endl;

});

}
else
return;

}