Results 1 to 9 of 9

Thread: Best way to load and parse an HTML file ??

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Jul 2008
    Posts
    12
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by aamer4yu View Post
    Why do you wanna do that ???
    If you want to display HTML sites, you may have a look at QWebView from Qt 4.4 onwards
    Because i am creating a crawler, a robot application that loads dhtml documents, extract some links, and recurse for each of those links...

    So far, i've tested QHttp but it is not always working. Sometimes the pages load perfectly (i.e. http://www.google.ca)
    and sometimes, it loads a "302 Found" dummy page or worst. (i.e any url that represents a google query.)

  2. #2
    Join Date
    Oct 2006
    Location
    New Delhi, India
    Posts
    2,467
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    8
    Thanked 334 Times in 317 Posts

    Default Re: Best way to load and parse an HTML file ??

    Well in that case I geuss u will have to parse the html file urself. Am not aware of such a class in Qt.
    may be regular expressions might be of some help to u for parsing ...

  3. #3
    Join Date
    Feb 2008
    Posts
    50
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    1
    Thanked 2 Times in 2 Posts

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by tuthmosis View Post
    Because i am creating a crawler, a robot application that loads dhtml documents, extract some links, and recurse for each of those links...

    So far, i've tested QHttp but it is not always working. Sometimes the pages load perfectly (i.e. http://www.google.ca)
    and sometimes, it loads a "302 Found" dummy page or worst. (i.e any url that represents a google query.)
    Sometimes Google displays captcha because of suspecting Bot search. That`s probably your 302 problem - 302 response code means redirected.
    If you want to parse html... and fetch the links ... use RegExp to do it.

  4. #4
    Join Date
    Jan 2006
    Posts
    371
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows
    Thanks
    14
    Thanked 18 Times in 17 Posts

    Default Re: Best way to load and parse an HTML file ??

    .. or user Perl which has dedicated classes for this subject. Maybe Qt is not the best solution for your problem.

  5. #5
    Join Date
    May 2008
    Posts
    4
    Qt products
    Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Best way to load and parse an HTML file ??

    This probably won't be read byt eh original thread author, but i'll post anyway for the record.

    You can use mozilla's engine "Gecko" to parse HTML or XML. go here and read :http://developer.mozilla.org/en/Gecko

    hope this helps anyone.

  6. #6
    Join Date
    Jul 2008
    Posts
    12
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: Best way to load and parse an HTML file ??

    Quote Originally Posted by mave-rick View Post
    This probably won't be read byt eh original thread author, but i'll post anyway for the record.

    You can use mozilla's engine "Gecko" to parse HTML or XML. go here and read :http://developer.mozilla.org/en/Gecko

    hope this helps anyone.
    WOW... Thanks mave-rick !!!
    I hope this does what it claims !... In parsing stuff...

    I'll try to find wrapping class to ease it's usage with C++.... Eclipse and QT.

    Tahnks again

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.