PDA

View Full Version : QRegExp question for JSON



marcvanriet
12th January 2011, 00:44
Hi,

I'm trying to use a QRegExp as a quick way to find a value in a simple JSON string. Unfortunately I'm not so familiar with regular expressions and the examples I find on the internet don't seem to work with QRegExp.

I have a message like below and I want the 'dev' field :
{"dev":"RSNGSM32", "cmd":"setVoltage", "voltage":7.23}

I tried this :

QRegExp reFindName( "\"dev\":\"(.+?)\"" );
if( reFindName.indexIn( sMessage ) >= 0 )
{
QString sEqName = reFindName.cap(0);
qDebug() << "device name is" << sEqName;
}


But it doesn't work. I read the documentation on QRegExp and fooled around with the expression, but got nothing that works.

Any advice would be appreciated.
Best regards,
Marc

norobro
12th January 2011, 02:28
A couple of minor changes should produce what you want:
//QRegExp reFindName( "\"dev\":\"(.+?)\"" );
QRegExp reFindName( "\"dev\":\"(.*)\"" );
reFindName.setMinimal(true);
if( reFindName.indexIn( sMessage ) >= 0 )
{
//QString sEqName = reFindName.cap(0);
QString sEqName = reFindName.cap(1);
qDebug() << "device name is" << sEqName;
}
BTW, there is a good regex tester under ../examples/tools/regex

HTH

MarekR22
12th January 2011, 11:47
Hi, I think solution above captures too much. So try like that (it will catch in group nr 1 only: RSNGSM32

QRegExp reFindName( "\"dev\": *\"(\\w*)\"" );

If you need spaces in device name then:

QRegExp reFindName( "\"dev\": *\"([\\w ]*)\"" );

Lykurg
12th January 2011, 12:57
Hi, I think solution above captures too much.Then you might want to check it. Because the solution works perfekt. (if you set minimal matching to true). And to your solution, what do you do when the device name also have a :, or a !, or a $ or a =, ar a... So this will be more flexible (even escaped " will be allowed):
QRegExp reFindName( "\"dev\":\"(((\\\\\")|[^\"])*)\"" );

marcvanriet
12th January 2011, 22:08
Hi All,

All your solutions work for my application (the device names are rather simple, but nice to know that something exotic won't give unexpected results).

"\"dev\": *\"(\\w*)\""
"\"dev\": *\"([\\w ]*)\""
"\"dev\":\"(((\\\\\")|[^\"])*)\""


Jeeezus.... looks more like ASCII art then code :-)

Regards,
Marc

Urthas
13th January 2011, 01:03
I am not at all familiar with JSON but I am quite familiar with tasks of this general nature. Purely for the sake of discussion, are non-regex solutions feasible? For example, if the example string you gave can be taken as representative, then could you get so-called key-value pairs by splitting on "," and then parse each pair in turn by splitting on ":" and testing the key for equality with the string "dev"? The associated value yields the answer.

Lykurg
13th January 2011, 06:44
And what you want to do if a value or a key of that string contains a ","? That's the benefit of enclosing values in quotes. Also, there is not really a speed difference since the regular expressions are perfectly optimized. And using regexp, you have more power, you are more flexible and you have to maintain less/less "complex" code.

marcvanriet
13th January 2011, 11:50
Hi,
Enclosed special characters are indeed something that prevent using a simple split(). But it would be quite easy to write a simple parser that processes the JSON string, handles the special characters, and adds all that it encounters onto a key+value list of some sort. RegEx magic requires less coding though. And others will promote awk or similar tools.

I'm using QJSON by the way for further processing of the message. But I first needed to know which device the message is intended for, in order to call the corresponding message handler.

Regards,
Marc

Urthas
14th January 2011, 21:03
And what you want to do if a value or a key of that string contains a ","?

Well, it's generally best to "know your data" when formulating a regex too. If the data fields of interest are expected to contain meta-characters that are elsewhere used as delimiters, then of course you can't split() on those characters. On the other hand (and this is something only the OP could answer) will your data ever contain delimiter-characters? If not, then this isn't a problem.


And using regexp...you have to maintain less/less "complex" code.

Fewer lines, sure. Lower complexity? I suppose that's a matter of opinion. Be kind to your maintenance programmer who may or may not be yourself.