Results 1 to 7 of 7

Thread: Parsing CSV data

  1. #1
    Join Date
    Oct 2010
    Posts
    1
    Thanks
    1
    Qt products
    Qt3 Qt4

    Default Parsing CSV data

    Can anyone assist me with writing a program to parse csv files? I've searched for a good way to parse data using QT, but most info suggests using third party software or learning QLALR, which is not an option.

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing CSV data

    You can parse CSV data using a regular expression. The expression is available somewhere in the web.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  3. #3
    Join Date
    Mar 2009
    Location
    Brisbane, Australia
    Posts
    7,729
    Thanks
    13
    Thanked 1,610 Times in 1,537 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows
    Wiki edits
    17

    Default Re: Parsing CSV data

    Quote Originally Posted by cmre123 View Post
    Can anyone assist me with writing a program to parse csv files? I've searched for a good way to parse data using QT, but most info suggests using third party software or learning QLALR, which is not an option.
    "CSV" covers a huge range of pseudo-standard formats with all sorts of nasty peculiarities, for example quotes or no quotes, header line(s), embedded commas in fields, variable escape characters, fields that contain embedded new line characters etc. See RFC 4180 for example. The reason many suggest using a third-party library is that this variability is taken care of by someone else.

    As Wysota wrote, you can use a regular expression but you will have fun with certain CSV variations. If the format is suitably simple and reliable then a QString::split() might be adequate.

  4. The following user says thank you to ChrisW67 for this useful post:

    zeFree (3rd December 2011)

  5. #4
    Join Date
    Mar 2010
    Location
    Gdynia, Poland
    Posts
    12
    Thanked 3 Times in 1 Post
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Parsing CSV data

    Hi.
    Here is the fast method to parse a line of text in to the tokens. It is faster than using regexp and this one is designed for CSV according to RFC.
    Qt Code:
    1. std::vector<std::string> tokenize(const std::string& str,char delimiter) {
    2. std::vector<std::string> tokens;
    3.  
    4. unsigned int pos = 0;
    5. bool quotes = false;
    6. std::string field = "";
    7.  
    8. while(str[pos] != 0x00 && pos < str.length()){
    9. char c = str[pos];
    10. if (!quotes && c == '"' ){
    11. quotes = true;
    12. } else if (quotes && c== '"' ){
    13. if (pos + 1 <str.length() && str[pos+1]== '"' ){
    14. field.push_back(c);
    15. pos++;
    16. } else {
    17. quotes = false;
    18. }
    19. } else if (!quotes && c == delimiter) {
    20. tokens.push_back(std::string(field));
    21. field.clear();
    22. } else if (!quotes && ( c == 0x0A || c == 0x0D )){
    23. tokens.push_back(std::string(field));
    24. field.clear();
    25. } else {
    26. field.push_back(c);
    27. }
    28. pos++;
    29. }
    30. return tokens;
    31. }
    To copy to clipboard, switch view to plain text mode 

  6. The following 3 users say thank you to adzajac for this useful post:

    cmre123 (29th October 2010), Lis (18th December 2010), wendelmaques (14th January 2011)

  7. #5
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing CSV data

    I don't think it parses CSV correctly. It doesn't consider embedded newlines and I think it will also fail on embedded quote characters and will certainly fail on unicode strings. Oh, and I don't think it is faster than a regular expression as you are pushing characters to string one by one which is basically slow. You can easily improve it by storing positions of beginning and end of a token and then extracting the whole item in one go.
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


  8. #6
    Join Date
    Mar 2010
    Location
    Gdynia, Poland
    Posts
    12
    Thanked 3 Times in 1 Post
    Qt products
    Qt3 Qt4
    Platforms
    Unix/X11 Windows

    Default Re: Parsing CSV data

    Quotes even embedded are taken account and '\n' also if they are in quotes as in RFC. The Unicode if fact is not supported it must be ascii. About performance, it's faster than boost::tokenizer ( i've measured that, because CSV's in my project are huge ) and I think it could be faster than regexp. Regexp could give you a ability to parse files with different character length i.e. unicode.

  9. #7
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Parsing CSV data

    I think that efficiency of boost::tokenizer is strictly based on how the parsing function is implemented. From what I understand boost::tokenizer is just a wrapper over the parsing function. So I believe your solution can be put into boost::tokenizer too if you strip the outside while loop. But I still claim that either a regexp or a dedicated parser based on automata would be faster. It's not that this is really important unless you are parsing megabytes of csv a second
    Your biological and technological distinctiveness will be added to our own. Resistance is futile.

    Please ask Qt related questions on the forum and not using private messages or visitor messages.


Similar Threads

  1. DTD Parsing in qt4?
    By darshan.hardas in forum Qt Programming
    Replies: 0
    Last Post: 1st July 2009, 10:39
  2. XML parsing
    By systemz89 in forum Qt Programming
    Replies: 4
    Last Post: 29th December 2007, 18:31
  3. QTreeWidget - Parsing Data.
    By Preeteesh in forum Qt Programming
    Replies: 3
    Last Post: 17th June 2007, 12:57
  4. XML Parsing in Qt 3.3
    By ToddAtWSU in forum Qt Programming
    Replies: 5
    Last Post: 18th April 2007, 18:54
  5. Xml parsing
    By probine in forum Qt Programming
    Replies: 4
    Last Post: 15th December 2006, 11:28

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.