PDA

View Full Version : [solved] Alternatives to the slow QTextStream pos() method ?



nekkro-kvlt
19th August 2012, 17:19
Hi All,

I'm trying to build a Component based on QPlainText edit to handle large text files (like 50Mo).
My approach is to load the first line of the files, and to run a thread that will read the whole file doing two things:
-Detect total line number to update the scrollbar maxvalue
-Build indexes by keeping in memory the file offset every x lines (currently 1000) to fast seek into the file.

The problem is that the pos() function is very slow to return its value. For exemple, reading a text file of 60Mo almost take 1s, but if I have to count get the offest, it takes 1 minute...

Here is my code:


void FileSpecUpdater::run()
{

/* QFile file(_FileName);
if (file.open (QIODevice::ReadOnly)) //read write later
{
QMessageBox::critical(0, "Error opening File in Thread", "Couldn't open file: " + _FileName);
return;
} */
QTextStream stream ( _FileHandle );

int lc=0;
QString text;
while( !stream.atEnd() ) {
text = stream.readLine();
lc++;
QTime currTime = QTime::currentTime();
if(lc%INDEX_EVERY == 0)
{
lineOffset lo;
lo.lineNumber=lc;
lo.Offset=0;
// lo.Offset=stream.pos();
_FileSpecsToUpdate->lineMaps.append(lo);
qDebug()<< currTime.msecsTo(QTime::currentTime()) << " : " << lc;
currTime = QTime::currentTime();
}
}
_FileSpecsToUpdate->setLineNumber(lc);
// emit finished();
}

I get in the output:


0 : 1000
0 : 2000
0 : 3000
0 : 4000
0 : 5000
0 : 6000
0 : 7000
0 : 8000
0 : 9000
0 : 10000
0 : 11000
0 : 12000
0 : 13000
0 : 14000

now if I uncomment "lo.Offset=stream.pos()", I get:



61 : 2000
95 : 3000
120 : 4000
151 : 5000
180 : 6000
214 : 7000
242 : 8000
275 : 9000
302 : 10000
328 : 11000
363 : 12000
401 : 13000
440 : 14000
450 : 15000
486 : 16000
517 : 17000
539 : 18000
573 : 19000
606 : 20000
632 : 21000


It's even exponentially longer each time...

Since i'm reading the file from scratch, maybe I can try to compute manually the file offset, but the problem is that I don't know if my lines will be terminated by \n or \n\r, thus it makes the result impredicable AFAIK.

Now, the doc states "Because QTextStream is buffered, this function may have to seek the device to reconstruct a valid device position. This operation can be expensive, so you may want to avoid calling this function in a tight loop.", so anyone have any idea on how could I proceed ?

Added after 48 minutes:

I do like this now, works way better, even if it was more painful ...


void FileSpecUpdater::run()
{
QByteArray buffer;
_FileHandle->seek(0);
int lc=0;
int offset=0;
while (!_FileHandle->atEnd())
{
buffer=_FileHandle->read(FILE_READ_BUFFER_SIZE);
int c = buffer.count("\n");
if((lc%INDEX_EVERY) > (lc+c)%INDEX_EVERY)
{
int lineDelta = (lc+c)%INDEX_EVERY;
lineOffset lo;
lo.lineNumber=lc + c - lineDelta;
int curPos=0;
for(int i = 0; i<(c - lineDelta-1); i++)
{
curPos=buffer.indexOf("\n",curPos+1);
}
lo.Offset=offset + curPos +1;
_FileSpecsToUpdate->lineMaps.append(lo);
}
lc+=c;
offset+=FILE_READ_BUFFER_SIZE;
}


}