Download Page Source of Website

Printable View

10th March 2010, 21:38
AjRomano

Download Page Source of Website

Hello. I took a class last semester that used Qt and have wanted to explore and continue to use it. I had an idea for a program that I would like to write that would involve checking a website for certain words/phrases when the program is executed. I have looked through the Qt Documentation and come close to what I need in order to make it happen, but I hit a snag.

The QHttp class has a "get" function that can retrieve the Page Source code from a website and store it to a local file on the computer. Qt even has an example program that does just that. However, the get function only accepts what it calls an "absolute path," which ends with a .com, .html, etc. How would I go about doing the same thing with a relative path with more characters that follow those endings, such as this?

I am using Linux Ubuntu 9.04 and Qt version 4.5.0. I would also like to be able to compile the program on Windows XP.

Thank you for your time and help.
10th March 2010, 22:34
squidge

Re: Download Page Source of Website

You would have to convert your relative path to an absolute path before passing it to the function.

However, your example that you posted ("such as this') is an absolute, not a relative.
11th March 2010, 02:17
AjRomano

Re: Download Page Source of Website

Thank you, that is true. The problem I am experiencing is that when I use the get function on the example URL, it only "gets" (downloads and stores the page source code of) the first part of the URL: http://foodpro.studentprograms.vt.ed...3/pickMenu.asp
and ignores the rest:
?locationNum=15&locationName=D2+%26+DXPRESS&dtdate =3%2F15%2F2010&mealName=Lunch&sName=Virginia+Tech+ Culinary+Services

It seems that the second part are some type of "variables" being provided to the website (I don't know the web development lingo) as a result of a couple clicks on the site. Is there some way to pass them to the get function? Or is there some other function to accomplish the same task? I don't necessarily need to use the get function in the QHttp class - it just seemed like the closest thing to what I needed.
11th March 2010, 06:22
ChrisW67

Re: Download Page Source of Website
The part of the URL following the '?' onward is the query string and is one way of passing parameters to a web page generating script.

The demo program is deliberately ignoring that portion in the code:
Code:

void HttpWindow::downloadFile() { ... QByteArray path = QUrl::toPercentEncoding(url.path(), "!$&'()*+,;=:@/"); if (path.isEmpty()) path = "/"; httpGetId = http->get(path, file); ... }
It uses only the path component of the URL rather than the whole URL. Have a look at the docs for QUrl for more about the components of the URL and how to access them.