PDA

View Full Version : QRegExp for extracting the string between two HTML tags...



tuthmosis
26th May 2010, 22:23
Greetings !

Say i have <td>this is a test</td>... what would be the regexp pattern to get the string "this is a test" ?

Thanks !

Lykurg
26th May 2010, 22:28
Since that is such an basic question, please use our Newbie section next time, and please read the documentation on regular expression. It don't have to be the Qt one. Any basic introduction on regular expressions. (http://www.regular-expressions.info/quickstart.html and grouping)

tuthmosis
26th May 2010, 23:47
Sorry but i had my reasons to ask... I know regexp, no need to insult me.
If i ask it's because normal regexp don't seam to be compatible with QRegExp.
For my need, the following should work.
(?<=<TH>)([a-zA-Z0-9 ])+(?=</TD>)

Applied to <TH>test</TD> ... this pattern will return <TH>test... so the backward lookup doesn't seem to be implemented.

Lykurg
27th May 2010, 06:55
Sorry but i had my reasons to ask... I know regexp, no need to insult me.
If i ask it's because normal regexp don't seam to be compatible with QRegExp.
Well you didn't ask the following in the first time!

For my need, the following should work.
(?<=<TH>)([a-zA-Z0-9 ])+(?=</TD>)

Applied to <TH>test</TD> ... this pattern will return <TH>test... so the backward lookup doesn't seem to be implemented.
Yes, not all is implemented, but one can achieve all with some extra work.
QString str = "<TH>test</TD> foo <TH>bar</TD>";
QRegExp rx("<TH>([a-zA-Z0-9 ]+)</TD>");
QStringList list;
int pos = 0;
while ((pos = rx.indexIn(str, pos)) != -1)
{
list << rx.cap(1);
pos += rx.matchedLength();
}
qWarning() << list; // ("test", "bar")