PDA

View Full Version : How to list large number of files & folders ?



rawfool
23rd July 2014, 13:27
I'm writing a utility function to list files & folders recursively/non-recursively in a given directory and returning the output as QStringList.


QStringList Utility::dirListing(const QString & path, bool recursive)
{
QDir dir(path);
if(!dir.exists())
return QStringList();

QStringList list;

if(recursive)
{
QDirIterator iterator(path, QDirIterator::Subdirectories);
while(iterator.hasNext())
{
list << iterator.next();
}
}
else
{
QDirIterator iterator(path, QDirIterator::NoIteratorFlags);
while(iterator.hasNext())
{
list << iterator.next();
}
}
return list;
}


But will this be fine for large data set ?

How can I do this batch wise ?

Kindly give me some hints to handle this case. Thank you.

wysota
23rd July 2014, 15:40
It depends what you mean by "fine".

rawfool
23rd July 2014, 16:09
If my directory contains more than 10 million sub-directories and if each subdirectory consists of 10 million files, how do I do this processing without creating a overhead. Using current function will prove costly, so I'm thinking of a better logic to handle such scenarios.

anda_skoa
23rd July 2014, 17:49
What do you want to do with the file paths once you have them?

Cheers,
_

rawfool
24th July 2014, 07:04
I'm creating dirListing(const QString & path, bool recursive) function in FileUtil class as dynamic library, where user will use different APIs of this FileUtil class.
And now the requirement is to write a function which does a feature similar to ls -lR. And this function is expected to handle large data sets and give output without any overhead or without consuming much of CPU.
So my question is how do I design this function to give output to the caller with above conditions.
For this I'm having one idea, which I'm not sure of implementation -
- Can I use signals & slots mechanism to send finite amount of data in batches ? If so, then how can I design the function ?

Please give suggestions for this scenario.

wysota
24th July 2014, 08:49
If my directory contains more than 10 million sub-directories and if each subdirectory consists of 10 million files,

Then you will run out of memory :D


how do I do this processing without creating a overhead. Using current function will prove costly, so I'm thinking of a better logic to handle such scenarios.

You might have a thread (or a couple of them) that will iterate over the directory and will signal new paths to the main thread where they will be intercepted and added to a dynamic list.

ChrisW67
24th July 2014, 09:02
If my directory contains more than 10 million sub-directories and if each subdirectory consists of 10 million files,
You are expecting to hold 1018 QStrings in-memory on a machine that likely has RAM measured in small multiples of 109 bytes. You need TARDIS RAM for that.

how do I do this processing without creating a overhead.
What you have asked to do is all overhead. You don't try to store such a large structure in memory. Does such a list have to exist at all given the files on disk already represent the list?

Using current function will prove costly, so I'm thinking of a better logic to handle such scenarios.
Yes, indeed. Process each file name as it comes off the iterator, for example.

anda_skoa
24th July 2014, 11:17
I'm creating dirListing(const QString & path, bool recursive) function in FileUtil class as dynamic library, where user will use different APIs of this FileUtil class.
And now the requirement is to write a function which does a feature similar to ls -lR. And this function is expected to handle large data sets and give output without any overhead or without consuming much of CPU.

That doesn't answer my question.
What is a use case for that function?
What is the calling code expected to do for example?

Cheers,
_

rawfool
25th July 2014, 12:56
Actually I was told by my boss to create a function that gives list of dirs & files recursively. So I'm also not sure of use case scenario.
Thank you.

anda_skoa
25th July 2014, 13:11
Ok.

Then the best option is to just create a QDirIterator.
Or, if the listing should first hav all files and then recurse, an object with a similar interface (hasNext/next) that internaly use QDirIterator.

Cheers,
_