Page 1 of 2 12 LastLast
Results 1 to 20 of 23

Thread: best way to parsing this line

  1. #1
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default best way to parsing this line

    Hello,
    I have to parsing (and to do something) with lines read from a file like 10,50,30,4 (10,22,3,^,22 -> the ^ means the line contains an error so I have to reject it)...
    My code seems ok and it's this:
    Qt Code:
    1. const char delim=',';
    2. void myFunction(vector<string>& lineOfFile, const char& delim) {
    3. vector<string>::iterator it = lineOfFile.begin();
    4. for( ;it != lineOfFile.end(); ++it) {
    5. vector<double> line;
    6. size_t pStart=-1;
    7. size_t pEnd=-2;
    8. double val;
    9. string subStr("");
    10. bool lineIsOk = true;
    11. while (pEnd != -1 && lineIsOk) {
    12. pStart = (int) (*it).find(delim, pStart+1);
    13. pEnd = (int) (*it).find(delim, pStart+1);
    14. subStr = (*it).substr(pStart+1, pEnd-1 - pStart );
    15. if (subStr != "^" ) {
    16. val = (double) atof( subStr.c_str() );
    17. subStr="";
    18. line.push_back(val);
    19. }
    20. else { lineIsOk = false; }
    21. }
    22. if ( lineIsOk) insertInOtherStruvture(line);
    23. }
    To copy to clipboard, switch view to plain text mode 
    Problem now is that this is working only delim=',' and I'd like parsing that line in way like mine (it seems simple, doens't it?) but with other delim for example, I'd like cover line like
    10 2 22 200 1
    or
    10 space space space 22 space 2 33
    if the creator of file don't create the file correct with more space (but I'd like to parse it how it was)



    Could you help me, please? thanks a lot.
    Regards

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    Your case is very simple. You can have three types of tokens in your data stream:
    - digits
    - a separator
    - an "error" token

    Everything else determines a parse error (you can even forget the last token type and treat the error character as a parse error as well.

    First you'll need a lexer function that reads the data stream and returns tokens. Then you have to have a parsing function that fetches tokens using the lexer function and interprets them according to a grammar of your choice. The parsing function knows which symbols to expect and can interprete them or generate a parse error.

    Here is a primitive lexer:

    Qt Code:
    1. enum Token { EndOfStream=-1, EndOfLine=-2, Separator=-3, Error=-4 };
    2. int lexer(istream &stream){
    3. int value = 0;
    4. bool hasValue = false;
    5. while(1){
    6. if(stream.eof()) return EndOfStream;
    7. int ch = stream.peek(); // peek a char
    8. if(ch>='0' && ch<='9'){
    9. // a digit, calculate value
    10. value = value*10 + (ch-'0');
    11. hasValue = true;
    12. stream.get(); // discard the character
    13. continue;
    14. } else if(hasValue) return value; // if not a digit and already read a digit, return the value
    15. stream.get(); // discard the upcoming character which was already peeked
    16. switch(ch){
    17. case ' ': case '\t': break; // white space - ignore
    18. case '\n': return EndOfLine;
    19. case ',': return Separator;
    20. default: return Error;
    21. }
    22. }
    23. return Error;
    24. }
    To copy to clipboard, switch view to plain text mode 

    The parser is really a finite state machine:
    Qt Code:
    1. enum State { NumberOrSeparatorOrEnd, Number, SeparatorOrEnd, NumberOrEnd };
    2. void parser(){
    3. //...
    4. int token;
    5. State state = NumberOrEnd;
    6. do {
    7. token = lexer(stream);
    8. switch(state){
    9. case NumberOrEnd:
    10. if(token==EndOfLine || token==EndOfStream) return;
    11. else if(token>=0){
    12. processNumber(token);
    13. state = SeparatorOrEnd; // expect a separator or end
    14. } else {
    15. error = true;
    16. return; // parse error
    17. }
    18. break;
    19. case SeparatorOrEnd:
    20. if(token==Separator){
    21. state = Number;
    22. } else if(token==EndOfLine || token==EndOfStream){
    23. return;
    24. } else {
    25. error = true; return; // parse error
    26. }
    27. break;
    28. //...
    29.  
    30. }
    31. }
    32. //...
    33. }
    To copy to clipboard, switch view to plain text mode 

    And there you have a complete extendible parser

  3. The following user says thank you to wysota for this useful post:

    mickey (1st March 2008)

  4. #3
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    OK, but isn't more fast a way (if there is) that put the entire line of the file into a vector<string> (even the delim) and then work with string method to extract the numbers?

    ?
    Regards

  5. #4
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    The method I provided is the fastest possible as you process each character only once (thus the complexity is exactly O(n) where 'n' is the number of characters in the string).

  6. #5
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    hello,

    could you suggests me any change for cover the case that the file contain double value instead integer (eg: 2.3 2.55 3 4 2.3 0 1). Or better: how change sif I don't know what type it'll be? (float, double, int)
    Regards

  7. #6
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    The only difference is that after a digit you may expect a comma (or dot) and after the comma (or dot) at least one digit (and no more commas/dots).

  8. The following user says thank you to wysota for this useful post:

    mickey (1st March 2008)

  9. #7
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    One more question please: If you see my first post, I have "char* delim" as parameter; so I can read file that contains ',' as separator and other separator at choice. But I can choose as delim ' '; in that case inside your switch (that won't compile) what'll happen? In general ' ' and '\t' can be in the file because the file constructor (the user) can be a bit distracted and insert space at choice (and my program will accept if even if not full correct); but how can I do if user decide (and it can) choose ' ' as delimiters (eg. 200 3 4 5 3 instead of 200,3,4,5,3) ? (i have no idea how to treat this case).

    Thanks in advance.
    Regards

  10. #8
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    Quote Originally Posted by mickey View Post
    But I can choose as delim ' '; in that case inside your switch (that won't compile) what'll happen?
    If you want tokens to be configurable either don't use switch and use the equality operator instead or in each case check if a particular symbol is a token you expect depending on the configuration.

    In general ' ' and '\t' can be in the file because the file constructor (the user) can be a bit distracted and insert space at choice (and my program will accept if even if not full correct);
    The lexer ignores or spaces and tabs - it treats the [\t ]+ regular expression as a token separator (just like for example C syntax does). \n should probably also be in the group but I guess it depends on your needs.

    but how can I do if user decide (and it can) choose ' ' as delimiters (eg. 200 3 4 5 3 instead of 200,3,4,5,3) ? (i have no idea how to treat this case).
    See above.

    BTW. Have you seen bison/flex and QLALR?

  11. #9
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    Hello,
    I don't understand if I don't understand you or viceversa (I refere to the case of ' ' as delimiter); However, I modified your initial switch to cope this case; is it right or there is a clear way to do this?; (the problem with previous switch and delimiter ' ' was that one time lexer see a blank it return immediately Separator as token (but I could not be right because afte,r it could find other blank or '\n' or '\t'; maybe the code explain better what I mean))....
    Qt Code:
    1. switch(ch){
    2. case ' ' : case '\t':
    3. if (delim == ' ') {
    4. switch (fileStream.peek()) {
    5. case delim: break;
    6. case '\n': break;
    7. case '\t': break;
    8. default: return Separator;
    9. };
    10. }
    11.  
    12. else break; // white space - ignore
    13. case '\n': line++; return EndOfLine;
    14. case ',' :
    15. if (delim == ',')
    16. return Separator;
    17. else return Error;
    18. case -1 : return EndOfStream;
    19. default : return Error;
    20. };
    To copy to clipboard, switch view to plain text mode 
    Second:
    enum State { NumberOrEnd, Number };
    State token = Number;
    cout << token; // I'd like print the string "Number" and not '1'; is it possibile?
    Last edited by mickey; 1st March 2008 at 16:17.
    Regards

  12. #10
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    This case code of yours doesn't look good... If you want a whitespace as a separator simply change your grammar, you don't have to change the lexer except maybe removing the part that handles commas. You don't need to report a separator, just return numbers.

    Here is EBNF for comma separated lines

    EBNF Code:
    1. CommaSeparatedLine ::== Number, { Separator, Number } ;
    To copy to clipboard, switch view to plain text mode 

    And here is one for space separated lines:

    EBNF Code:
    1. SpaceSeparatedLine ::== Number, { Number } ;
    To copy to clipboard, switch view to plain text mode 

    In the second case you simply don't handle commas. As the lexer returns the value of Number and not single digits it should be easy.

    Oh, and about your question about printing enum names. It's possible in Qt when you register an enum in a metaobject using Q_ENUMS.
    Last edited by wysota; 1st March 2008 at 16:23. Reason: Corrected EBNF notation

  13. #11
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    my parser have to accept files that contain as separator (only) "delim"; delim is a console parameter; so at times it could be ','; at other times could be blank; if user doesn't specify anything from console, by default the (only) accepted separator will be ','
    Regards

  14. #12
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    How about you just use strchr() and atof() and forget about the parser?

  15. #13
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    At the moment it enjoy me. What do u think about this mixed if-switch (is it accetable?Or is it better a pure if-chains?). thanks.
    Qt Code:
    1. if (token >=0) { //in this case token could be more digits than one
    2. ..............
    3. }
    4. else {
    5. switch (token) {
    6. case Special
    7. case EndOfLine:
    8. case EndOfStream:
    9. case ........
    10. };
    11. }
    To copy to clipboard, switch view to plain text mode 
    Last edited by mickey; 3rd March 2008 at 00:44.
    Regards

  16. #14
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    Switch and if are equivalent, so that doesn't really matter...

  17. #15
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    ? But with if in the worst case if have to do 'n' compare where 'n' is the number of if in chains. switch is implemented in a totally different way; with this last I have to do only one compare. Isn't it? Should be quicker......
    Regards

  18. #16
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    Quote Originally Posted by mickey View Post
    switch is implemented in a totally different way; with this last I have to do only one compare. Isn't it?
    Not really. There is no machine instruction that would allow to make an arbitrary number of comparisons. Compile to assembly (gcc -S) and see for yourself.

  19. #17
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    Hello,
    I'm still using the parser and assuming it a good way, I have this question: I have in many part of my program to know how many lines have the file and how many elements have the lines (10, 20, 20 40 -> this line has 4 elements). My program has to check if all line have the same number of element, so I thought to do this check inside the parser; if they haven't the same number of element the parse generates Error; and I thought to count the lines of file in the parser too. All of these soil the parser with additional if and variables; isn't better put the counting of lines out parser (in the constructor class for examples). I don't like the parser become so large.....any suggests, please?
    Regards

  20. #18
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    Let me put it this way - the parser is something that deals with grammar (syntax), not logic (semantics). I suggest you parse your file into some kind of list or tree or whatever else you want and then perform semantic checks or actions on the resulting structure. Of course you can combine the two operations to save time when you can deduce while parsing that the result will be useless (like when the number of items in each line differs). If you wrap that all in a class with different methods performing different tasks, the code should be simple and easily readable.

  21. #19
    Join Date
    Jan 2006
    Posts
    976
    Thanks
    53
    Qt products
    Qt3
    Platforms
    Windows

    Default Re: best way to parsing this line

    OK, I could put it in a vector <vector<double> > as you can see before and then check everything (eg. find the number of line, the number of element of a line, the min max values of each single line). All those need to scan the vector of vector from begin to end; so I thought to do this while parsing (but It'll soil the parser). So, read what you have write above, I'm thinking to do one method that while parsing do: push_back into the vector, increment the number of elements, and check for min, max. Are you mean this? With this parser appear more light...thanks
    Regards

  22. #20
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: best way to parsing this line

    There is no single correct way to complete the task. You may do as you see fit depending on what's most important for you.

Similar Threads

  1. QTcpSocket exception.
    By Fastman in forum Qt Programming
    Replies: 9
    Last Post: 29th January 2008, 13:51
  2. Some very weird compilation warnings
    By MarkoSan in forum Qt Programming
    Replies: 21
    Last Post: 23rd January 2008, 16:48
  3. Qwizard crashed when created in a slot
    By joshlareau in forum Qt Programming
    Replies: 9
    Last Post: 15th January 2008, 09:16
  4. KDE/QWT doubt on debian sarge
    By hildebrand in forum KDE Forum
    Replies: 13
    Last Post: 25th April 2007, 06:13
  5. QTableView paints too much
    By Jimmy2775 in forum Qt Programming
    Replies: 2
    Last Post: 26th July 2006, 18:42

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.