Results 1 to 8 of 8

Thread: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

  1. #1
    Join Date
    Jul 2012
    Posts
    201
    Thanks
    26
    Thanked 1 Time in 1 Post
    Qt products
    Qt4
    Platforms
    Windows

    Default Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    Hi there guys, compliments of the season. I'm creating this web crawler that mines data off websites. Here is my problem, I am using QNAM to pull the HTML from a website and then I create a QWebEnginePage object where I set the HTML to the page for the purpose of using the QWebEnginePage::runJavaScript() to extract the data I need from the HTML. The problem is that the QWebEnginePage::runJavaScript() function doesn't work if I run it immediately after the call to QWebEnginePage::setHtml(). So I resorted to calling a lambda within QObject::connect() where the signal is QWebEnginePage::loadFinished() and the outcome was that QWebEnginePage::runJavaScript() will retrieve data that is typically at the top of the page (e.g. "document.title;" will run) but the function return an empty string for the data that is in the mid-section to the end of the HTML document. This gives me an impression that I am trying to access elements that are not loaded by the page yet, which is confusing because I am running a lambda on a loadFinished() signal. Please see code below.
    Qt Code:
    1. void CPT_Page::getTenderSite()
    2. {
    3. page_1 = new QWebEnginePage(this);
    4. page_1->setHtml(tenderPage_reply->readAll());
    5.  
    6. QWebEngineView *view_1 = new QWebEngineView();
    7. view_1->setPage(page_1);
    8. view_1->show();
    9.  
    10. QObject::connect(page_1, &QWebEnginePage::loadFinished, [&](){page_1->runJavaScript("document.getElementById(\"rfqsTable\").innerHTML;"
    11. ,[&](const QVariant &data){
    12. qDebug() << data.toString() <<endl;
    13. });});
    To copy to clipboard, switch view to plain text mode 
    }

  2. #2
    Join Date
    Jul 2012
    Posts
    201
    Thanks
    26
    Thanked 1 Time in 1 Post
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    Anybody...

  3. #3
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    setHtml is definitely asynchronous, as the WebKit/Blink architecture uses a separate process for the web content processing.

    loadFinished() should however be the indicator of when the web content has been loaded.

    Maybe the web content executes scripts as well which change the DOM tree after load?

    Cheers,
    _

  4. #4
    Join Date
    Jul 2012
    Posts
    201
    Thanks
    26
    Thanked 1 Time in 1 Post
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    Quote Originally Posted by anda_skoa View Post

    Maybe the web content executes scripts as well which change the DOM tree after load?

    _
    Thank you for your reply. I suspected that too because the data I am trying to retrieve is in a JQuery data table. Now, is there a way to disable the page scripts and still be able to run my own script on the page? I've read about QWebEngineScript::ScriptWorldId in the documentation and I am not sure if that would be my solution.
    Last edited by ayanda83; 10th January 2017 at 14:03.

  5. #5
    Join Date
    Jul 2012
    Posts
    201
    Thanks
    26
    Thanked 1 Time in 1 Post
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    The line of code below disables the JavaScript that comes with the page but that unfortunately means I cannot run my own javascript on the same page.
    Qt Code:
    1. page_1->settings()->setAttribute(QWebEngineSettings::JavascriptEnabled, false);
    To copy to clipboard, switch view to plain text mode 

  6. #6
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    Are you sure the table is even there if you disable the script?
    I.e. isn't jQuery creating it?

    Maybe there is some form of script event in the web domain that you can use to trigger you script.

    Cheers,
    _

  7. #7
    Join Date
    Jul 2012
    Posts
    201
    Thanks
    26
    Thanked 1 Time in 1 Post
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    Quote Originally Posted by anda_skoa View Post
    Are you sure the table is even there if you disable the script?
    I.e. isn't jQuery creating it?

    _
    Because I am using QNAM to get the HTML, I get all the table data and all the page scripts. When I disable the scripts, the table is still there in pure html format but because the scripts are disabled, I am unable to run my own script. If I enable the scripts, something in JQuery interferes with my own script hence I cannot retrieve the data.

    The url below is the site I am trying to get the data from (i.e. the data inside the table).
    http://web1.capetown.gov.za/web1/TenderPortal/Tender/
    I've been exploring a different approach where I try to strip the HTML of all in-page scripts (i.e. using regular expressions) before I set the HTML to the QWebEnginePage. I am still working on getting the correct regular expression though, (I.e. one that is going to match everything in-between script tags and then using QString::remove() I will remove the scripts from the HTML). But I am still struggling to get the right Regular Expression. See code below.
    Qt Code:
    1. QString html = (QString)tenderPage_reply->readAll();
    2. QString temp_html = html.remove(QRegExp("\\b<script.*</script>\b"));
    3.  
    4. qDebug() << temp_html <<endl;
    5.  
    6. QFile linksFile(QDir::currentPath().append("/Include/Program_Files/tenderMainPage2.txt"));
    7.  
    8. if(!linksFile.open(QFile::WriteOnly | QFile::Text))
    9. {
    10. QMessageBox msgBox_2;
    11. msgBox_2.setText("Tender Main File did not open...");
    12. msgBox_2.exec();
    13. return;
    14. }
    15.  
    16.  
    17. QTextStream out(&linksFile);
    18. out << temp_html <<endl;
    19.  
    20. page_1 = new QWebEnginePage(this);
    21. //page_1->settings()->setAttribute(QWebEngineSettings::JavascriptEnabled, true);
    22.  
    23. page_1->setHtml(temp_html);
    24.  
    25. QWebEngineView *view_1 = new QWebEngineView();
    26.  
    27. view_1->setPage(page_1);
    28. view_1->show();
    29.  
    30. QObject::connect(page_1, &QWebEnginePage::loadFinished, [&](){page_1->runJavaScript("document.getElementById(\"rfqsTable\").innerHTML;"
    31. ,QWebEngineScript::MainWorld, [&](const QVariant &data){
    32. qDebug() << data.toString() <<endl;
    33. });});
    34. }
    To copy to clipboard, switch view to plain text mode 

  8. #8
    Join Date
    Jan 2006
    Location
    Graz, Austria
    Posts
    8,416
    Thanks
    37
    Thanked 1,544 Times in 1,494 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Is QWebEnginePage::setHtml() synchronous or Asynchronous?

    If the content you are passing at setHtml does already contain all the data, one thing you could try is to just replace all "http/https" occurences with an URL scheme that the web engine simply can't load.

    Cheers,
    _

  9. The following user says thank you to anda_skoa for this useful post:

    ayanda83 (16th January 2017)

Similar Threads

  1. Replies: 5
    Last Post: 26th October 2016, 14:56
  2. Replies: 4
    Last Post: 2nd May 2015, 12:09
  3. Replies: 2
    Last Post: 16th September 2014, 16:56
  4. QGraphicsTextItem - setHTML()
    By Claymore in forum Qt Programming
    Replies: 7
    Last Post: 17th September 2009, 09:34
  5. Asynchronous server msg vs synchronous functions
    By nouknouk in forum Qt Programming
    Replies: 9
    Last Post: 2nd February 2006, 17:10

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.