Results 1 to 9 of 9

Thread: fastest way to read large files (with mixed content)

  1. #1
    Join Date
    Aug 2009
    Posts
    122
    Thanks
    74
    Qt products
    Qt4
    Platforms
    Windows

    Question fastest way to read large files (with mixed content)

    I have large tabular text files that contain both numbers and letters. I want to use the numbers. What is the fastest way to read these data.
    Currently I'm doing

    Qt Code:
    1. QFile genofile(Genoname); if(!genofile.open(QIODevice::ReadOnly | QIODevice::Text)) {exit(1);}
    2. QTextStream geno_stream(&genofile);
    3.  
    4. while(!geno_stream.atEnd())
    5. {
    6. QString genoRow = geno_stream.readLine();
    7. QStringList row = genoRow.split("\t");
    8.  
    9. for(int i=0; i<row.size(); i++)
    10. {
    11. double value;
    12. if(row.at(i)!="NA") value=row.at(i).toDouble();
    13. else value =-9.0
    14. }
    15. }
    To copy to clipboard, switch view to plain text mode 

    I need the process to be maximally fast. Is this a fast way?

    What would be the best way when the data don't contain letters (all are numbers)?

  2. #2
    Join Date
    Jan 2006
    Location
    Munich, Germany
    Posts
    4,714
    Thanks
    21
    Thanked 418 Times in 411 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    Why not just use QTextStream?
    ==========================signature=============== ==================
    S.O.L.I.D principles (use them!):
    https://en.wikipedia.org/wiki/SOLID_...iented_design)

    Do you write clean code? - if you are TDD'ing then maybe, if not, your not writing clean code.

  3. #3
    Join Date
    Aug 2009
    Posts
    122
    Thanks
    74
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: fastest way to read large files (with mixed content)

    Is QTextStream faster?

    Basically, I'd like to know how fast (in relative terms) is QString to double conversation by .toDouble()?
    Or is this conversion something that needs to be avoided for the sake of speed?

  4. #4
    Join Date
    Feb 2006
    Location
    Munich, Germany
    Posts
    3,312
    Thanked 879 Times in 827 Posts
    Qt products
    Qt3 Qt4 Qt/Embedded
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    String to double conversions are expensive - don't do it, when you need performance. The fastest way might be to store the values as binary data and to use QFile::map(). Then you don't need to read the file at all. ( See mmap(). )

    HTH,
    Uwe

  5. #5
    Join Date
    Jan 2006
    Location
    Munich, Germany
    Posts
    4,714
    Thanks
    21
    Thanked 418 Times in 411 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    Is QTextStream faster?
    Since this is just two or so lines of code, why not just test and see?

    Basically, I'd like to know how fast (in relative terms) is QString to double conversation by .toDouble()?
    The performance is not only in the string conversion, but also in the iteration through the text and extracting the correct substrings to be converted to double.
    So you will have to make a more complex test case to measure it with any practical results.

    @Uwe: but if you have binary data - how do you get to the correct fields?
    You will have to implement what QTextStrem is doing (more or less).
    Probably I just don't follow on what you mean...
    ==========================signature=============== ==================
    S.O.L.I.D principles (use them!):
    https://en.wikipedia.org/wiki/SOLID_...iented_design)

    Do you write clean code? - if you are TDD'ing then maybe, if not, your not writing clean code.

  6. #6
    Join Date
    Feb 2006
    Location
    Munich, Germany
    Posts
    3,312
    Thanked 879 Times in 827 Posts
    Qt products
    Qt3 Qt4 Qt/Embedded
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    Quote Originally Posted by high_flyer View Post
    @Uwe: but if you have binary data - how do you get to the correct fields?
    You will have to implement what QTextStrem is doing (more or less).
    No, when the data in the file is aligned like in memory( usually dumped from a C/C++ array ) you can simply map it to an address ( a pointer for the same data structure where the data has been dumped before ) and access it like regular memory using C/C++ classes/structs. No reading or parsing of the file is necessary. F.e. when the file is simply an array of doubles you can cast the address, where the file was mapped to, to a double * - or QPointF * pointer and access it like an usual array.

    See http://en.wikipedia.org/wiki/Memory-mapped_file

    Of course such an implementation is not cross platform ( f.e. big endian, little endian problems, compiler incompatibilities ). But storing doubles as strings has rounding errors and the conversion is expensive.

    Uwe

  7. The following user says thank you to Uwe for this useful post:

    timmu (9th September 2012)

  8. #7
    Join Date
    Jan 2006
    Location
    Munich, Germany
    Posts
    4,714
    Thanks
    21
    Thanked 418 Times in 411 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    Yes, all of that is clear - but it means you need to know the format of the data you are looking for, and its not what I understood is the case.
    I have large tabular text files that contain both numbers and letters.
    If it is (that is, you know where to look for numbers and where not), then ok, this is a good solution.
    But if all you know you that you have fields which *might* be what you are looking for, then you need to implement some logic to parse it.
    ==========================signature=============== ==================
    S.O.L.I.D principles (use them!):
    https://en.wikipedia.org/wiki/SOLID_...iented_design)

    Do you write clean code? - if you are TDD'ing then maybe, if not, your not writing clean code.

  9. The following user says thank you to high_flyer for this useful post:

    timmu (9th September 2012)

  10. #8
    Join Date
    Aug 2009
    Posts
    122
    Thanks
    74
    Qt products
    Qt4
    Platforms
    Windows

    Default Re: fastest way to read large files (with mixed content)

    Thank you, High-flyer and Uwe, for this useful discussion. Basically I have 2 situations:
    1. In one file I don't know if an entry is a word or a number. I only need to use an entry (and convert it to double) if it is a number. In this situation QTextStream is perhaps not a bad choice.
    2. Another situation is where I know all values are doubles. In this case I care only that the process is maximally fast. What is the best method to use then as far as what Qt offers?

    My third questions is this: Is string->float conversion faster than string->double.

    Thanks!

  11. #9
    Join Date
    Jan 2006
    Location
    Munich, Germany
    Posts
    4,714
    Thanks
    21
    Thanked 418 Times in 411 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: fastest way to read large files (with mixed content)

    1. I would use QTextStream, yes.
    2. Mapping as Uwe suggested.

    I would be surprised if there was any major difference between string to float/double - but I didn't look in to that.
    ==========================signature=============== ==================
    S.O.L.I.D principles (use them!):
    https://en.wikipedia.org/wiki/SOLID_...iented_design)

    Do you write clean code? - if you are TDD'ing then maybe, if not, your not writing clean code.

Similar Threads

  1. How can we handle large files in qt?
    By aurora in forum Qt Programming
    Replies: 2
    Last Post: 13th February 2012, 13:15
  2. Can't read large file (2.5GB) with QFile
    By enricong in forum Qt Programming
    Replies: 6
    Last Post: 18th July 2011, 04:14
  3. large dll files issues
    By alireza.mirian in forum Qt Programming
    Replies: 6
    Last Post: 5th January 2011, 22:50
  4. QDomDocument can't read the file content
    By baluk in forum Newbie
    Replies: 21
    Last Post: 24th September 2010, 13:43
  5. Replies: 6
    Last Post: 17th January 2008, 23:46

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.