PDA

View Full Version : please explain QUrl::isValid() to me



momesana
14th April 2008, 23:14
Hi,
Before using regular expressions to validate a link I tried QUrl::isValid() and expected it to tell me when an url is not valid, just as a browser like konqi would throw an error message like "malformed url" with a big red error icon at me leaving no doubt that I shouldn't try this again. QUrl::isValid() however kindly waves through anything I pass to it except an empty string.

The documentation says this:


The URL is run through a conformance test. Every part of the URL must conform to the standard encoding rules of the URI standard for the URL to be reported as valid.


The conformance encoding rules seem to be pretty lax.
Why is that?

Here is an example:



#include <QtGui>

int main()
{
QList<QUrl> urls;
urls << QUrl()
<< QUrl(" ")
<< QUrl("http://google.de")
<< QUrl("++ !! 3w27 WHY AM I VALID? 64a.sdk).d.d.d:/77(8*euiafds");

foreach (const QUrl url, urls) {
qDebug() << "is " << url << "valid? Answer: " << url.isValid();
}
}



Thanx in advance

wysota
15th April 2008, 00:01
Apart from the first two the other urls should be valid... Maybe the second one as well, I'm not sure.

IMHO QUrl should really be called QUri...

momesana
15th April 2008, 00:37
the second one also successfully validates. Only the first one returns false. I Just read the german wikipedia article regardin URI's. Seems like almost anything is valid ...

According to the article, this regexp can be used to validate an Uri:


^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
The Scheme -- if present at all -- may not start with : / ? or #. QUrl::isValid() catches the colon and #, but misses to report an invalid url upon encountering the other two charachters. Then, if // folows, it may not be immediately followed by / ? or #. This time QUrl::isValid() catches one of them, namely the slash but misses the other ones. Maybe I misunderstood the regexp though. My experience in this regard is pretty limited. The string ?http://#qtcenter.org passes the test though it should be invalid as far as I understood the above regular expression.

Thanx

wysota
15th April 2008, 08:23
QUrl::isValid() catches the colon and #, but misses to report an invalid url upon encountering the other two charachters.
I think it prepends "file://" in case a protocol is not explicitely given.


he string ?http://#qtcenter.org passes the test though it should be invalid as far as I understood the above regular expression.

What is you interpret it as file://?http://#qtcenter.org ? According to me it is valid then - it consists of an empty path and a fragment containing "http://#qtcenter.org" or even just a file with a strange path.

momesana
15th April 2008, 12:01
Ok,
the url is valid. Since the other parts are optional, something like this:
?http://#qtcenter.org
would actually be a standalone fragment, thus being valid.

Thanx

Netheril
19th May 2010, 02:19
There is another option to check if a url is valid. But the problem is that wroks only for windows systems, but I'll show it here anyway.

Here is the code:



#include <windows.h>

HMODULE hUrlMon = LoadLibrary( L"urlmon.dll" );
wchar_t url[] = TEXT("http://www.google.com"); //the url example to test if is valid or not.

if( hUrlMon )
{
typedef HRESULT (__stdcall *isValidURL_ptr)( LPBC pBC, LPCWSTR szURL, DWORD dwReserved );
isValidURL_ptr isValidURL_fn = ( isValidURL_ptr )GetProcAddress( hUrlMon, "IsValidURL" );
HRESULT hr = isValidURL_fn( NULL, url, 0 );
if( S_OK == hr )
{
qDebug() << "URL Valid: " << hr;
}else{
qDebug() << "URL Invalid: " << hr;
}
}