PDA

View Full Version : Parsing Text File --> Guide Needed



Ali_Talib
23rd December 2008, 04:40
Hi All,

it my pleasure to join the Qt Center. I really enjoy programming with Qt.
But I have some problems in reading text file and processing the data. I need your guide to help in my issue and improving my skills in
programming.

My issue is I have read a file with a standard structure format, which is:


Date Aug-01-1962
START
&ITEM Name='A' MAX= 4322.00 /
&ITEM Name='BB' MAX= 2323.00 /
&ITEM Name='BBB' MAX= 6183.00 /
&ITEM Name='BBA' MAX= 1383.00 /
&ITEM Name='ABB' MAX= 1407.00 /
&ITEM Name='DDA' MAX= 1371.00 /
&ITEM Name='AFF' MAX= 5785.00 /
END


I have to item name and its MAX value or other values and processing them and have ability to change these values in the original text file.
Could please help how to token each line and check how many values there and get them separately in effective way.

Your help is appreciated,
Ali

caduel
23rd December 2008, 07:02
Hard to say without seeing the grammar for the input...

for simple stuff:
read the file line by line;
match the lines against regexs;
add what you gathered from the parsed/matched line to your parse state;
perhaps you might need a state machine if the input is more complex
(to detect things like: here should have been the "START" token, but I got a "&ITEM" instead)

for more complicated stuff:
write a real parser (e.g. recursive descent)

HTH

fullmetalcoder
27th December 2008, 18:25
A sample regexp that could help you to extract data from files of the specified format (provided there are no extra features of the format you did not mention... as caduel said, a more precise grammar would help) :


QRegExp pattern("\\s+&INDEX\\s+(?:(\\w+)=\\s*(\'[^\']+\'|\\S+)\\s+)+/")You can match this regexp against every line of your input. If it matches then you can extract data using the list of capturedTexts() (item n is the attribute name (e.g Name or MAX in your example) and n+1 is the corresponding value, as a quoted string or a sequence of non-whitespace characters, n starting from 1) :



// supposing pattern has been matched against a string you can iterate as follow :
int max = pattern.numCaptures() - 1;
QStringList caps = pattern.capturedTexts();
for ( int i = 1; i < max; i += 2 )
{
QString attributeName = caps.at(i);
QString attributeValue = caps.at(i + 1);

// do something with that data now
}
Note : I have not tested this code so there might be a compilation error left but the principle is there.

Ali_Talib
31st December 2008, 08:04
Thanks caduel

it is great idea but very advance for my application, but later it may help me.

Thanks fullmetalcoder

That is what I have tried and work with me as I want.


Thanks for your response.
Ali

sigger
3rd January 2009, 02:54
If you're not locked in to the file format, if you change your file format to XML you may have an easier time parsing using QT's built in XML and DOM classes

fullmetalcoder
4th January 2009, 14:49
Not such a good advice really. XML is a very "trendy" format lately. Many people use it everywhere, mostly to do things it has not been designed for in the first place, just because it saves them the effort of writing proper loading/saving routines themselves. XML is great, no doubt, but it is a design error (though it might be a decent fallback choice for time reasons) to use it for, say, storing settings or any kind of data that do not match the following criteria :


REQUIRES flexible hierarchy
makes the XML signal/noise ratio low, i.e the space taken by tags is WAY smaller than that taken by the actual data (typically, such data fall in the "documents" category)

Also, it is worth keeping in mind that most XML parsers, while being easy to use are not lightning fast and that even the fastest will be significantly slower that a parser for a simpler language (may it be homemade or something like JSON (http://json.org) or another xml alternative (http://web.archive.org/web/20060325012720/www.pault.com/xmlalternatives.html)...)

wysota
4th January 2009, 22:24
I sense a flamework coming ;)

I tend to disagree about XML. STN ratio doesn't matter as storage and bandwith are becoming cheaper and cheaper and besides XML yields an excellent compression ratio. As for settings, it definitely IS something that requires flexible hierarchy if you plan future enhancements - back-and-forth compatibility is then trivial.

On the other hand I do agree that this notation is used much too often nowadays.

sigger
5th January 2009, 02:27
I sense a flamework coming ;)

Not from here. I agree completely on speed and STN.

Based on my own usage pattern, I suspect its overused because its easy to get up and running quickly and easy to modify later.

I was just pointing out the existence of the classes in case it fit the bill.

Probably all moot since he does refer to the data files being in a standard format, with no suggestion he can alter the file format.

fullmetalcoder
5th January 2009, 11:46
As for settings, it definitely IS something that requires flexible hierarchy if you plan future enhancements - back-and-forth compatibility is then trivial.
Well, of course it depends what kind of settings you need to store. Quite often, flat text files can actually be enough and if hierarchy is needed it can still be achieved using indentation or a similar method.

Anyway, we're getting off topic...

wysota
5th January 2009, 21:21
Quite often, flat text files can actually be enough and if hierarchy is needed it can still be achieved using indentation or a similar method.
I more meant that version A introduces parameter X which version A-1 can't handle so when you move settings from A to A-1, it will or will not preserve parameter X and when you run version A back, the setting will or will not be there. It works the other way round with the default values as well. Hmm... I don't know if what I have written is possible to understand, sorry :)


Anyway, we're getting off topic...
Right :)

fullmetalcoder
6th January 2009, 17:03
I more meant that version A introduces parameter X which version A-1 can't handle so when you move settings from A to A-1, it will or will not preserve parameter X and when you run version A back, the setting will or will not be there. It works the other way round with the default values as well. Hmm... I don't know if what I have written is possible to understand, sorry
What you say is clear (at least to me) but wrong.The forward/backward compatibility is NOT specific to the backend used (XML, JSON, "flat" text, ...), as long as it is structured, but to the way the app handles it.

Whathever the format used, some applications read through the data and apply the settings on start and then overwrite it when saving settings (that would be using Qt xml stream reading/writing for instance).

Some however, whatever the format used, will convert it to an internal data structure in a lossless, and only apply settings they recognize (that would be using QDomDocument for instance).

wysota
6th January 2009, 17:19
What you say is clear (at least to me) but wrong.The forward/backward compatibility is NOT specific to the backend used (XML, JSON, "flat" text, ...), as long as it is structured, but to the way the app handles it.
Yes, that's true. But what I mean is that you don't need to do anything in your application to support it when using XML, as long as you don't regenerate the tree from scratch. With other backends this is also possible but not that straightforward.


Some however, whatever the format used, will convert it to an internal data structure in a lossless, and only apply settings they recognize (that would be using QDomDocument for instance).

There is a good chance things like comments in the settings file will be lost when converting back and forth. But as I said - I mean this is very easy with XML and a little harder with "other" backends.