Results 1 to 19 of 19

Thread: Parsing html via QT after update

  1. #1
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Parsing html via QT after update

    Hello. I was use this code on qt 5.3.1 32 BIT
    But Now i have 5.3.2 64 BIT
    Qt Code:
    1. std::string html = std::move(output.buffer); // html from curl - all ok
    2. QWebPage * tmp_webpage = new QWebPage();
    3. tmp_webpage->mainFrame()->setHtml(QString::fromStdString(html));
    4. std::fstream test_stream;
    5. test_stream.open("example14.html", std::ios::out | std::ios::in);
    6. test_stream << tmp_webpage->mainFrame()->toHtml().toStdString(); // html was cut in about 50%
    7. test_stream.close();
    8.  
    9. QWebFrame * tmp_frame = tmp_webpage->mainFrame();
    10. QWebElement mainTable_site = tmp_frame->findFirstElement(QString::fromStdString(mainTable_selector)); // not found because qt cut my correct html
    To copy to clipboard, switch view to plain text mode 

    IF it is necessarry i can share my HTML (but trust me - it is all okey) After update QT to 64 bit and 5.3.2 version sth going wrong.

    Best regards

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    What if you convert to something that can accept non-ascii input instead of std::string which will cut the text at the first occurence of a null character? E.g. stay with QString?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  3. #3
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    Quote Originally Posted by wysota View Post
    What if you convert to something that can accept non-ascii input instead of std::string which will cut the text at the first occurence of a null character? E.g. stay with QString?
    Hmm this?
    Qt Code:
    1. QString html = std::move(output.buffer);
    2. QWebPage * tmp_webpage = new QWebPage();
    3. tmp_webpage->mainFrame()->setHtml(html);
    4. std::fstream test_stream;
    5. test_stream.open("example14.html", std::ios::out | std::ios::in);
    6. test_stream << tmp_webpage->mainFrame()->toHtml().toStdString(); //only toHtml is not enough
    7. test_stream.close();
    To copy to clipboard, switch view to plain text mode 

    These functions return the same ^^

    But This

    Qt Code:
    1. QString html = std::move(output.buffer);
    2. QWebPage * tmp_webpage = new QWebPage();
    3. tmp_webpage->mainFrame()->setHtml(html);
    4. std::fstream test_stream;
    5. test_stream.open("example14.html", std::ios::out | std::ios::in);
    6. test_stream << html.toStdString();// tmp_webpage->mainFrame()->toHtml().toStdString();
    To copy to clipboard, switch view to plain text mode 

    Returns good html... (to test_stream)

    The HTML is CUT after
    Qt Code:
    1. </script>
    To copy to clipboard, switch view to plain text mode 
    and in append code is
    Qt Code:
    1. </div></div></body></html>
    To copy to clipboard, switch view to plain text mode 
    where in orginal code (HTML - after </script>) is
    Qt Code:
    1. <table "class">....
    To copy to clipboard, switch view to plain text mode 


    And at the end:

    Qt Code:
    1. std::cout << html.size() << " vs " << tmp_webpage.mainFrame()->toHtml().length();
    To copy to clipboard, switch view to plain text mode 

    Returns: 13876 vs 8509 ... where is my html code?
    Last edited by dram; 14th October 2014 at 00:08.

  4. #4
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Start by replacing fstream with QFile (and possibly QTextStream if you want to stream the text in) and do not convert via std::string.

    However the real question here is why are you using curl to download an html document and then you set it on a browser object that could download the document by itself.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  5. #5
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    wysota - thanks for reply.

    I'm using curl because i login to site and do many things. I thought curl is the best way.
    And i set html on browser object to parse these HTML

    Ok let see - with QFile code

    Qt Code:
    1. QByteArray html = std::move(output.buffer);
    2. QByteArray html_second;
    3.  
    4. QWebPage tmp_webpage;// = new QWebPage();
    5. //tmp_webpage->mainFrame()->setHtml(html); // result was the same in setcontent / sethtml
    6. tmp_webpage.mainFrame()->setContent(html);
    7.  
    8. html_second.append(tmp_webpage.mainFrame()->toHtml());
    9. QFile test_stream("example14.html");
    10. test_stream.open(QIODevice::ReadWrite | QIODevice::Truncate);
    11. test_stream.write(html_second);
    12. test_stream.close();
    To copy to clipboard, switch view to plain text mode 

    Still not solved

    It is sth wrong with setHtml/content
    Qt Code:
    1. QByteArray html = std::move(output.buffer);
    2. test_stream.write(html);
    To copy to clipboard, switch view to plain text mode 
    There var "html" stores good HTML, only after setcontent and ->toHtml code have changed
    Last edited by dram; 14th October 2014 at 09:57.

  6. #6
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Does the page contain the proper content when you view it in QWebView? Or is it already truncated?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  7. #7
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    I dont think so because plain text is too cut

    I dont display QWebView in my program.

    All of these operations are doing in background.

  8. #8
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Quote Originally Posted by dram View Post
    I dont display QWebView in my program.
    So do it just for the test.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  9. #9
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    I do not understand why i have to do it

    Let see in orginal html code i have

    Qt Code:
    1. <table id="production_table" >
    To copy to clipboard, switch view to plain text mode 

    But in setHtml - this and later part is cut.

    So

    Qt Code:
    1. std::string mainTable_selector = "table[id=\"production_table\"]";
    2. QWebElement mainTable_site = tmp_frame->findFirstElement(QString::fromStdString(mainTable_selector));
    To copy to clipboard, switch view to plain text mode 
    Found - NULL - so there is not this element.

    But remember, my code was working on 32 bit and version 5.3.1 ...

  10. #10
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    I doubt the architecture has anything to do with this. What happens if you save the downloaded content in a file and then point QWebPage directly to that file?

    Does this work?

    Qt Code:
    1. QByteArray html = output.buffer;
    2. QFile file("file.html");
    3. file.open(QIODevice::WriteOnly|QIODevice::Truncate);
    4. file.write(html);
    5. file.close();
    6. QWebPage page;
    7. connect(page.mainFrame(), &QWebFrame::loadFinished, [&]() { qDebug() << page.mainFrame()->findAll("#production_table").count(); }); // CONFIG+=c++11 for lambda to work
    8. page.mainFrame()->setUrl(QUrl::fromLocalFile("file.html"));
    9. loop.exec();
    To copy to clipboard, switch view to plain text mode 
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  11. #11
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    But when i changed QT Version(5.3.1 -> 5.3.2) and all have ruined

    I change

    Qt Code:
    1. connect(page.mainFrame(), &QWebFrame::loadFinished, [&]() { qDebug() << page.mainFrame()->findAll("#production_table").count(); }); // CONFIG+=c++11 for lambda to work
    To copy to clipboard, switch view to plain text mode 
    to
    Qt Code:
    1. connect(page.mainFrame(), &QWebFrame::loadFinished, [&]() { qDebug() << page.mainFrame()->findAllElements("#production_table").count(); } ); // CONFIG+=c++11 for lambda to work
    To copy to clipboard, switch view to plain text mode 

    Because findAll method not found in qtwebframe

    In console i have got '0'

    But thread is hanging on loop.exec();
    Last edited by dram; 14th October 2014 at 16:33.

  12. #12
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Does file.html contain the expected content or is it truncated?
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  13. #13
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    expected code - Wysota

    Maybe QWebFrame from 5.3.1 (But on 64 bit) will fix my problem?

    Could you tell me where could i found 5.3.1 (64 bit) version ?

    I think i should put only QWebFrame from version 5.3.1 ?

    QT team changed some code in QWEBFRAME in update from 5.3.1 to 5.3.2 ?
    Last edited by dram; 14th October 2014 at 17:27.

  14. #14
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Could you attach the file here? Just use the exact same file saved with QFile, do not copy and paste the content.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  15. #15
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    https://www.dropbox.com/s/1ivmgv8fyq...file.html?dl=0

    (btw where can i find qt 5.3.1 windows 64 msvc 2013 x64bit ? i found windows 32bit msvc2013 64bit but where is windows 64 bit?
    Last edited by dram; 14th October 2014 at 22:44.

  16. #16
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    (i cant edit) I think it is setContent/setHtml fail. After update to 5.3.2(64bit) my code has crashed...

    Remember, before update all had been working.

    Now i have strange problem, maybe someone from QT Support can answer question? Why html code after setHtml function is truncate?

  17. #17
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Quote Originally Posted by dram View Post
    (i cant edit) I think it is setContent/setHtml fail. After update to 5.3.2(64bit) my code has crashed...
    I really doubt this is caused by setHtml.

    This is a test app where your page seems to work fine:

    Qt Code:
    1. #include <QtWidgets>
    2. #include <QWebView>
    3. #include <QWebPage>
    4. #include <QWebInspector>
    5. #include <QWebSettings>
    6.  
    7. int main(int argc, char **argv) {
    8. QApplication app(argc, argv);
    9. QWebView view;
    10. QWebInspector inspector;
    11. QWebPage page;
    12. page.settings()->setAttribute(QWebSettings::DeveloperExtrasEnabled, true);
    13. view.setPage(&page);
    14. view.setUrl(QUrl::fromLocalFile("/path/to/file.html"));
    15. // view.setUrl(QUrl("http://www.google.com"));
    16. inspector.setPage(&page);
    17. inspector.show();
    18. view.show();
    19. return app.exec();
    20. }
    To copy to clipboard, switch view to plain text mode 

    It requires /path/to/file.html to point to the file downloaded with curl.

    I tested it with Qt 5.3.2 on 64 bit system (Linux though, but it shouldn't matter).
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  18. #18
    Join Date
    Jul 2014
    Posts
    16
    Qt products
    Qt5
    Platforms
    Windows

    Default Re: Parsing html via QT after update

    But how to find
    Qt Code:
    1. QWebElement mainTable_site = tmp_frame->findFirstElement(QString::fromStdString(mainTable_selector));
    To copy to clipboard, switch view to plain text mode 

    In our code?

    Is there meaningful answer why my code after update doesn't work ?

  19. #19
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing html via QT after update

    Quote Originally Posted by dram View Post
    But how to find
    Qt Code:
    1. QWebElement mainTable_site = tmp_frame->findFirstElement(QString::fromStdString(mainTable_selector));
    To copy to clipboard, switch view to plain text mode 

    In our code?

    Is there meaningful answer why my code after update doesn't work ?
    Right now I was trying to verify if setHtml is broken or not. It seems it is not so you have to look for the problem in your code and not in Qt. So far I did not see any complete piece of code of yours so it is hard for me to test it. It might help if you prepared a minimal compilation example reproducing the problem, similar to what I did. I can only give you a hint that I would wait until the page is fully loaded before trying to access its contents.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


Similar Threads

  1. Problem getting HTML Parsing
    By baluk in forum Newbie
    Replies: 0
    Last Post: 21st July 2010, 12:32
  2. HTML parsing
    By enno in forum Qt Programming
    Replies: 2
    Last Post: 29th September 2009, 12:52
  3. QTextEdit html parsing trouble
    By DpoHro in forum Qt Programming
    Replies: 1
    Last Post: 20th January 2008, 12:40
  4. Parsing HTML
    By stevey in forum Qt Programming
    Replies: 2
    Last Post: 1st December 2006, 21:01
  5. HTML Parsing
    By awalesminfo in forum Qt Programming
    Replies: 3
    Last Post: 19th March 2006, 12:31

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.