PDA

View Full Version : QRegExp Help



ToddAtWSU
15th November 2010, 19:48
I am trying to split a string using QRegExp but am having no luck. An example string of what I am looking for is:

AB-123456 or CD-123456.78. As you can see, they all will begin with 2 letters, followed by a -, followed by 6 numbers. Then a "." with 1-4 numbers is optional. Currently, I have a QString that looks as follows: "AB-123456CD-123456.78EF-123456". I need to split this into its 3 parts: AB-123456, CD-123456.78, EF-123456. Here is my code. The QStringList returns a 1-item list with the whole string returned. Any ideas.


QString text = "AB-123456CD-123456.78EF-123456";
QRegExp rx( "^[A-Z]{2,2}-[0-9]{6,6}(.[0-9]{1,4})?$" );
rx.setPatternSyntax( QRegExp::Wildcard );
QStringList strList = text.split( rx, QString::SkipEmptyParts );

I thought by using the Wildcard syntax, then the "." would just be a ".". Is this correct? I have tried it with and without the ^ and the $, but I get the same results. Any ideas why my QString isn't being split into three pieces?

Thanks!

bred
15th November 2010, 19:53
Try this:

QRegExp rx( "^([A-Z]{2,2}-[0-9]{6,6})(\\.[0-9]{1,4})?$" );

Nb.
the point is a regexp command, so for matching a point you must write \\.


AB-123456 or CD-123456.78. As you can see, they all will begin with 2 letters, followed by a -, followed by 6 numbers. Then a "." with 1-4 numbers is optional. Currently, I have a QString that looks as follows: "AB-123456CD-123456.78EF-123456". I need to split this into its 3 parts: AB-123456, CD-123456.78, EF-123456. Here is my code. The QStringList returns a 1-item list with the whole string returned. Any ideas.

Edit
Try this:
^(\\w{2,2}\\-\\d{6,6})(\\w{2,2}\\-\\d{6,6})\\.(\\w{2,2}\\-\\d{6,6})$

The parenthesis are the capturing operator.
So you must uses the member funtion:
http://doc.qt.nokia.com/4.7/qregexp.html#capturedTexts

For capturing your text ...

Lykurg
15th November 2010, 21:51
Although we normally don't deliver completed codes, here we go:
QString text = "AB-123456CD-123456.78EF-123456";
QRegExp rx( "([A-Z]{2}-[0-9]{6}(?:\\.[0-9]{1,4})?)" );

QStringList list;
int pos = 0;
while ((pos = rx.indexIn(text, pos)) != -1) {
list << rx.cap(0);
pos += rx.matchedLength();
}
First, if you use wildcard syntax only "*", "?" and "[]" are allowed, but you also have used "()". So it havn't worked. Also if you use reg exp. ^ and $ isn't what you want, because then your line is only allowed to have one of your pattern. Not more. To get an independent solution use the code above where it does not matter how often your pattern is found. Note also that with (?: foo bar ) you exclude the matched characters from the captured texts.

So try to understand, what the code actually does, and please do not blind copy and paste...

wysota
16th November 2010, 00:56
I think the expression can be even simpler - without the look-ahead check. I didn't test it but it should work:

QRegExp rx( "[A-Z]{2}-\\d{6}(\\.\\d{1,4})?" );

Then a while loop is employed as already suggested. But I would change the condition of the loop - pos should always be 0 with a positive match. Otherwise you'll be skipping characters which do not match instead of returning some parsing error.

ToddAtWSU
16th November 2010, 13:23
QString text = "AB-123456CD-123456.78EF-123456";
QRegExp rx( "([A-Z]{2}-[0-9]{6}(?:\\.[0-9]{1,4})?)" );

QStringList list;
int pos = 0;
while ((pos = rx.indexIn(text, pos)) != -1) {
list << rx.cap(0);
pos += rx.matchedLength();
}

Can you please explain what the
?:\\. is trying to say.

Also, why do you loop through the QRegExp and not just use the QString::split function that is already created?

Lykurg
16th November 2010, 15:33
?: the reason why (...) is not captured.
\\. escaping the . that it is taken literal and not in the meaning "any character"
As to the split function: try to use it, and see why we use a loop.

MarekR22
16th November 2010, 15:35
QString::split uses matches of regular expression as separators not as founded strings!
about: "\\." - first back slash escape from C++ special character in literal. First + Second back slash is single slash in final string and this is escape character in regular expression special character (dot in reg exp means "any character", in this case we want to have a real dot).