PDA

View Full Version : QRegExp matchedLength seems incorrect



Arjan
15th July 2009, 07:22
Hi all,

I was working on a project to parse C++ code, when I encountered strange regex behaviour.

I created a regex to match an if statement and it's following 'true' section.


"\b(if\s*\(([^\(\)]*|\([^\(\)]*\))*\)\s*)(\{([^\{\}]*|\{[^\{\}]*\})*\}|[^;\{\}]*;)"


let me explain what's going on here; the regex searches for an the occurence of 'if' followed by 0 or more whitespace, followed by a '(' and a matching ')' character, followed by 0 or more whitespace. This will be stored in the first backreference.
Then it searches for the first '{' and a matching '}' character, if these could not be found, it searches to find the first ';'.

So when using this regexp, one should be able to find the if condition by checking the first captured text. ( cap(1) )
In practice however, I have found that I sometimes get a positive match, meaning QRegExp.indexIn returnes >= 0, but that the matchedLength of the regexp is in fact 0, and the captured texts are also empty. On other occasions everything works as expected..

As regular expressions can be hard to get right, I thought I'd post this here first before filing a bug report @ Qt :)

please let me know if I have overlooked something!

thanks in advance!

- Arjan

Ginsengelf
15th July 2009, 07:48
Hi, there is a hint in Qt docs:


The C++ compiler transforms backslashes in strings, so to include a \ in a regexp, you will need to enter it twice, i.e. \\. To match the backslash character itself, you will need four: \\\\.


Or did you just omit the double backslashes for better readability?

Ginsengelf

Arjan
15th July 2009, 08:44
Hi, there is a hint in Qt docs:


Or did you just omit the double backslashes for better readability?


jup, I'll spare you the 'real' string :D

As posted, the regexp does work, it's just that there are instances when it does not, when it behaves as stated in the start post. Any ideas on that?

Ginsengelf
15th July 2009, 10:22
Could you post an example string where the expression does not work properly?

Arjan
15th July 2009, 10:47
I've been testing using the string "a a() { a; if( b ) b; if( c ) { c; } d; }".

In a normal test, this works as expected ( even though valgrind complains about invalid reads in QRegExp::matchedLength / cap )

In my program however, the match sometimes fails. I suspect it has something to do with the fact that the string is being looked at recursively, combined with the fact that the QRegExp objects are copied to a QVector.

I have not been able to create a 'bare' test application in which this fails, so I suspect there is something wrong in the regex engine, which only fails in specific conditions.

Arjan
16th July 2009, 18:03
I have been able to create a testcase!

If anyone sees what exactly is going on here, I would like to know :)

you can find the source code here (http://arjanhouben.nl/RegExpTest.zip)

and here's the valgrind log:

==3771== Memcheck, a memory error detector.
==3771== Copyright (C) 2002-2008, and GNU GPL'd, by Julian Seward et al.
==3771== Using LibVEX rev 1884, a library for dynamic binary translation.
==3771== Copyright (C) 2004-2008, and GNU GPL'd, by OpenWorks LLP.
==3771== Using valgrind-3.4.1-Debian, a dynamic binary instrumentation framework.
==3771== Copyright (C) 2000-2008, and GNU GPL'd, by Julian Seward et al.
==3771== For more details, rerun with: -v
==3771==
==3771== My PID = 3771, parent PID = 3639. Prog and args are:
==3771== ./RegExpTest
==3771==
==3771== Invalid read of size 4
==3771== at 0x40CB52C: QRegExp::matchedLength() const (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== Address 0x4667048 is 1,728 bytes inside a block of size 1,764 free'd
==3771== at 0x4025DFA: free (vg_replace_malloc.c:323)
==3771== by 0x40D08CC: (within /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D0A6B: QRegExp::setPattern(QString const&) (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804A14E: RegExpTest::findBlocks(int) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771==
==3771== Invalid read of size 4
==3771== at 0x40D4BA2: QRegExp::capturedTexts() const (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D4DD7: QRegExp::cap(int) const (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D4E7F: QRegExp::cap(int) (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804A6F1: RegExpTest::findBlocks(int) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== Address 0x4667048 is 1,728 bytes inside a block of size 1,764 free'd
==3771== at 0x4025DFA: free (vg_replace_malloc.c:323)
==3771== by 0x40D08CC: (within /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D0A6B: QRegExp::setPattern(QString const&) (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804A14E: RegExpTest::findBlocks(int) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771==
==3771== Invalid read of size 4
==3771== at 0x40D4BB4: QRegExp::capturedTexts() const (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D4DD7: QRegExp::cap(int) const (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D4E7F: QRegExp::cap(int) (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804A6F1: RegExpTest::findBlocks(int) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== Address 0x4667044 is 1,724 bytes inside a block of size 1,764 free'd
==3771== at 0x4025DFA: free (vg_replace_malloc.c:323)
==3771== by 0x40D08CC: (within /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x40D0A6B: QRegExp::setPattern(QString const&) (in /usr/lib/libQtCore.so.4.5.0)
==3771== by 0x804A14E: RegExpTest::findBlocks(int) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x804BF51: RegExpTest::RegExpTest(QString const&, int const&, int const&, RegExpTest::Type const&) (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771== by 0x8049253: main (in /home/arjan/C++/RegExpTest/RegExpTest)
==3771==
==3771== ERROR SUMMARY: 37 errors from 3 contexts (suppressed: 33 from 1)
==3771== malloc/free: in use at exit: 0 bytes in 0 blocks.
==3771== malloc/free: 4,856 allocs, 4,856 frees, 849,164 bytes allocated.
==3771== For counts of detected errors, rerun with: -v
==3771== All heap blocks were freed -- no leaks are possible.