PDA

View Full Version : More appropriate, section or split?



ucntcme
12th October 2007, 04:18
I am needing to work with some rather large tab separated files (up to 2GB or more). I can read each line in and then split it or use QString::section. But which would be more performant? Given the differences in usage for the two is not a drop in replacement, I'd like to get some idea of which would be better for million+ line counts and the docs are silent on it.

So, has anyone some experiences between the two for comparison? I'll likely be needing to put the data into a model for display, searching, and sorting, etc. later on. I know I won' tbe needing all of the fields (19).

spud
12th October 2007, 10:21
If you're really concerned about performance you could use QString::indexOf() and the lightweight wrapper QStringRef.



QStringRef avoids the memory allocation and reference counting overhead of a standard QString by simply referencing a part of the original string. This can prove to be advantageous in low level code, such as that used in a parser, at the expense of potentially more complex code.

The code would look something like this:


QVector<QStringRef> fields(MAX_FIELDS);
foreach(QString line, ...)
{
int start=0, end=0, validFields=0;
while(validFields < MAX_FIELDS && (end=line.indexOf(',', start)) != -1)
{
fields[validFields++] = QStringRef(&line, start, end-start);
start=end+1;
}
if(validFields < MAX_FIELDS && start<=line.size())
fields[validFields++] = QStringRef(&line, start, line.size()-start);

// fields now contain 'validFields' valid entries(some of which might be empty)
}

ucntcme
12th October 2007, 12:10
Thanks, that did make it significantly faster, a drop of approximately 33%.