PDA

View Full Version : QRegularExpressionValidator: am I missing something?!



agarny
1st April 2014, 23:07
Hi,

I am trying to validate (using QRegularExpressionValidator) a URL. It got the regular expression from https://gist.github.com/dperini/729294 and, from what I can tell, it looks good (and it certainly seems to work when I tried using some online regular expression testers). However, my tests using Qt are anything but conclusive. So, could it be that I wrongly 'converted' the regular expression to C++?

Otherwise, worse is the fact that if I try to enter something like "http://uuuuuuuuuuuuuuuuuuuu", then my test application kind of hangs up. In fact, if I run it in debug mode and pause it, I can see that QRegularExpressionValidator::validate() makes an indirect call to pcre16_exec, which itself makes call to match, which itself makes call to match, which itself makes call to match, etc. Could this be a bug in QRegularExpressionValidator or even pcre16_exec?

QRegularExpressionValidatorTest.pro:


QT += core gui

greaterThan(QT_MAJOR_VERSION, 4): QT += widgets

TARGET = QRegularExpressionValidatorTest
TEMPLATE = app

SOURCES += main.cpp

main.cpp:


#include <QApplication>
#include <QDesktopWidget>
#include <QDialog>
#include <QLineEdit>
#include <QRegularExpressionValidator>
#include <QVBoxLayout>

int main(int pArgC, char *pArgV[])
{
QApplication application(pArgC, pArgV);

QDialog *dialog = new QDialog();
QVBoxLayout *dialogLayout = new QVBoxLayout(dialog);

dialog->setLayout(dialogLayout);
dialog->setWindowTitle("URL Tester");

// We want to validate a URL, the regular expression of which comes from
// https://gist.github.com/dperini/729294
// It is released under the MIT license and is therefore fine for us to use

QString urlRegExp = QString() +
"^"
// protocol identifier
"(?:(?:https?|ftp)://)" +
// user:pass authentication
"(?:\\S+(?::\\S*)?@)?" +
"(?:" +
// IP address exclusion
// private & local networks
"(?!(?:10|127)(?:\\.\\d{1,3}){3})" +
"(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" +
"(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" +
// IP address dotted notation octets
// excludes loopback network 0.0.0.0
// excludes reserved space >= 224.0.0.0
// excludes network & broacast addresses
// (first & last IP address of each class)
"(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" +
"(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" +
"(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" +
"|" +
// host name
"(?:(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)" +
// domain name
"(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)*" +
// TLD identifier
"(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}]{2,}))" +
")" +
// port number
"(?::\\d{2,5})?" +
// resource path
"(?:/[^\\s]*)?" +
"$";

QLineEdit *dialogValue = new QLineEdit(dialog);

dialogValue->setValidator(new QRegularExpressionValidator(QRegularExpression(url RegExp, QRegularExpression::CaseInsensitiveOption), dialog));
dialogValue->setMinimumWidth(qApp->desktop()->availableGeometry().width()/5);

dialogLayout->addWidget(dialogValue);

dialogLayout->setSizeConstraint(QLayout::SetFixedSize);

dialog->show();

return application.exec();
}

ChrisW67
2nd April 2014, 00:26
This code runs fine for me on Linux:


#include <QCoreApplication>
#include <QRegularExpression>
#include <QRegularExpressionValidator>
#include <QDebug>

int main(int argc, char **argv)
{
QCoreApplication application(argc, argv);

// We want to validate a URL, the regular expression of which comes from
// https://gist.github.com/dperini/729294
// It is released under the MIT license and is therefore fine for us to use

const QString urlRegExp = "^"
// protocol identifier
"(?:(?:https?|ftp)://)"
// user:pass authentication
"(?:\\S+(?::\\S*)?@)?"
"(?:"
// IP address exclusion
// private & local networks
"(?!(?:10|127)(?:\\.\\d{1,3}){3})"
"(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})"
"(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})"
// IP address dotted notation octets
// excludes loopback network 0.0.0.0
// excludes reserved space >= 224.0.0.0
// excludes network & broacast addresses
// (first & last IP address of each class)
"(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])"
"(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}"
"(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))"
"|"
// host name
"(?:(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)"
// domain name
"(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}0-9]+-?)*[a-z\\x{00a1}-\\x{ffff}0-9]+)*"
// TLD identifier
"(?:\\.(?:[a-z\\x{00a1}-\\x{ffff}]{2,}))"
")"
// port number
"(?::\\d{2,5})?"
// resource path
"(?:/[^\\s]*)?"
"$";


QRegularExpression re(urlRegExp, QRegularExpression::CaseInsensitiveOption);
qDebug() << re.isValid(); // true

QString testString("http://uuuuuuuuuuuuuuuuuuuuuuuuu");
QRegularExpressionMatch match = re.match(testString);
qDebug() << match; // QRegularExpressionMatch(Valid, no match)

QRegularExpressionValidator validator(re);
int pos = 0;
qDebug() << validator.validate(testString, pos); // 0 == QValidator::Invalid


// Just for kicks:
QUrl url(testString, QUrl::StrictMode);
qDebug() << url << url.isValid(); // QUrl( "http://uuuuuuuuuuuuuuuuuuuuuuuuu" ) true

return 0;
}


Note that I constructed the regular expression string using C++ string literal concatenation rather than your QString operators not that I expect that to be of any consequence. I say "runs fine" not "works fine" because the test string is a valid URL.

Is there a reason you cannot use QUrl to do the validation work for you?

agarny
2nd April 2014, 01:07
Note that I constructed the regular expression string using C++ string literal concatenation rather than your QString operators not that I expect that to be of any consequence.
Indeed, I have just replaced 'my' version with yours and, as expected, it didn't make any difference.


I say "runs fine" not "works fine" because the test string is a valid URL.
Yes, everything runs fine indeed.


Is there a reason you cannot use QUrl to do the validation work for you?
Maybe, I am not sure. The fact is that I need the user to enter the URL. So, for this, I am using QLineEdit and I thought I would validate the user's entry as he enters it. Hence, my use QRegularExpressionValidator as a validator for QLineEdit. Now, are you implying that there is a way to achieve the same using QUrl as some kind of a validator? If so, I imagine that I would need to create my own validator and make it use QUrl to do the validation?

ChrisW67
2nd April 2014, 04:56
The behaviour I see with your original code is that the line edit stops accepting more characters after the 20th 'u' is typed (making the total string length 26). It doesn't hang, backspace works for example. Once the validator stops returning Intermediate and starts returning Invalid the characters get dropped. A quick tweak of my code, shows the validator changing its mind:


QString testString("http://");
for (int i = 0; i < 32; ++i) {
testString += "a";
QRegularExpressionValidator validator(re);
int pos = 0;
qDebug() << (i+1) << testString << validator.validate(testString, pos);
}



1 "http://a" 1
2 "http://aa" 1
3 "http://aaa" 1
4 "http://aaaa" 1
5 "http://aaaaa" 1
6 "http://aaaaaa" 1
7 "http://aaaaaaa" 1
8 "http://aaaaaaaa" 1
9 "http://aaaaaaaaa" 1
10 "http://aaaaaaaaaa" 1
11 "http://aaaaaaaaaaa" 1
12 "http://aaaaaaaaaaaa" 1
13 "http://aaaaaaaaaaaaa" 1
14 "http://aaaaaaaaaaaaaa" 1
15 "http://aaaaaaaaaaaaaaa" 1
16 "http://aaaaaaaaaaaaaaaa" 1
17 "http://aaaaaaaaaaaaaaaaa" 1
18 "http://aaaaaaaaaaaaaaaaaa" 1
19 "http://aaaaaaaaaaaaaaaaaaa" 1
20 "http://aaaaaaaaaaaaaaaaaaaa" 1
21 "http://aaaaaaaaaaaaaaaaaaaaa" 0
22 "http://aaaaaaaaaaaaaaaaaaaaaa" 0
23 "http://aaaaaaaaaaaaaaaaaaaaaaa" 0
24 "http://aaaaaaaaaaaaaaaaaaaaaaaa" 0
25 "http://aaaaaaaaaaaaaaaaaaaaaaaaa" 0
26 "http://aaaaaaaaaaaaaaaaaaaaaaaaaa" 0
27 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaa" 0
28 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaa" 0
29 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 0
30 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 0
31 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 0
32 "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" 0

It also gets progressively slower as the string gets longer. Even with 32 'a' chars, adding ".com" will validate (you cannot do it through the line edit though). I can see no reason for this behaviour. It might be some sort of recursion limit in the regex library.


If so, I imagine that I would need to create my own validator and make it use QUrl to do the validation?
Yes. Using QUrl will only give Invalid/Valid indication, where a QRegularExpression based approach can give an Intermediate indication for hasPartialMatch().

agarny
2nd April 2014, 05:35
I tried the validation using QUrl and it's not sufficient for what I need. As you wrote, a QRegularExpression-based approach allows for intermediate state, which is useful. Also, I noticed that QUrl is too clever when it comes to validation. I mean that it will try to correct a user's mistake if it can. Anyway, QUrl validation is not an option.

Regarding QRegularExpressionValidator, there is clearly something 'funny' going on, which is the reason I decided to create a bug report for it: https://bugreports.qt-project.org/browse/QTBUG-38034.