PDA

View Full Version : QRegExp parsing matched quoted strings



jhndnn
26th January 2010, 17:18
I'm trying to use QRegExp to parse a string, where each token is either separated by a space or, if the token contains a space, is enclosed in double quotes. If the token contains a double quote, the double quote is escaped with \".

so from this string

test "a string" "string \" escaped" 1 2
I'd like to extract

[test], [a string], [string " escaped], [1], [2]
I was previously using a technique stolen from this bloc post (http://blog.stevenlevithan.com/archives/match-quoted-string) ( which apparently looks to be down right at this point ) and was using the following regex in some Flex code

/((")(?:\\?.)*?\2) | \S+/gsx
Translating that directly to QT didn't work - a assume because it uses features not available in QRexExp. Here's my little test app


#include <QtCore/QCoreApplication>
#include <QRegExp>
#include <QStringList>
#include <QDebug>
void test(const QString& text, const QString& pattern)
{
qDebug() << "testing " << text << " against " << pattern;
QRegExp rx(pattern);
int pos = 0;
while ((pos = rx.indexIn(text, pos)) != -1) {
qDebug() << rx.capturedTexts();
qDebug() << rx.cap(1);
pos += rx.matchedLength();
}
}

int main(int argc, char *argv[])
{
test( "test \"a string\" \"string \\\" escaped\" 1 2", "([\"])(?:.)*([\"])");
test( "test \"a string\" \"string \\\" escaped\" 1 2", "((\")(?:\\\\?.)*?\\2)");
QCoreApplication a(argc, argv);
return a.exec();
}

Any ideas on how to accomplish this? Or do I need to switch to using a character based parser?

mattc
26th January 2010, 19:00
The following pattern works for me with the input you provide and a few more I tested:


test(
"test \"a string\" \"string \\\" escaped\" 1 2",
"((?:[^\\s\"]+)|(?:\"(?:\\\\\"|[^\"])*\"))");

Output of qDebug() << rx.cap(1):


testing "test "a string" "string \" escaped" 1 2" against "((?:[^\s"]+)|(?:"(?:\\"|[^"])*"))"
"test"
""a string""
""string \" escaped""
"1"
"2"

jhndnn
26th January 2010, 23:05
Thanks, that work perfectly.