PDA

View Full Version : Delete huge number of files in subdirs



madrich
19th August 2019, 07:00
Hi,

I'd like to remove a large number of files (10T-1M) in a subdirectory structure efficiently. Currently I have already a recursive implementation using QFile QDir. unfortunately, this is not very efficient on Windows NTFS, maybe on other platforms too.

Of course I'd like to be platform independent as much as possible.
Any way to speed up this?

Best regards,
Richard

Fareanor
19th August 2019, 09:20
The problem with the recursive approach is that you are limited to quite few (depending on the platform, hardware, ...) recursive calls otherwise you will run out of memory. I don't know how deep your structure is but if you exceed the 100 recursive calls then I think the recursive approach is not a good idea.
In this case, I would suggest you to look at some operational research methods as Depth-first search (https://en.wikipedia.org/wiki/Depth-first_search) or Breadth-first search (https://en.wikipedia.org/wiki/Breadth-first_search) to explore your structure without recursion.

This being said, I don't know if you can speed it up since as far as I know you still have to iterate over each file you want to delete. With such a huge amount of data, the overhead you have is probably induced by the read-write access time on the hard disk.

But if there is a solution (without changing to another hard disk with better performances), I would be glad to hear it too :)

d_stranz
19th August 2019, 18:07
unfortunately, this is not very efficient on Windows NTFS

Windoze does a lot of bookkeeping when files and directories are changed (moved, deleted, created, modified). It could be this overhead that is slow, not the actual file deletion. You might also want to make sure that Windoze isn't putting your files into the Recycle Bin. I'm not sure what actually happens when you delete a file from within Qt.

If your files are organized such that you can simply delete the top-level directory and everything below it, then you might try firing off an external process to do that (like "rmdir /s /q dir"). You could try the experiment - make a copy of your file tree, then delete one from within Windows Explorer, and the other using a command window and the rmdir command and time the different approaches.

anda_skoa
24th August 2019, 09:30
This is always going to be a bit slow due to all the directory lookups.

A friend of mine ran into this with a large cache file directory structure on a web server, when even the "rm" tool would take minutes, sometimes even hours, to remove the tree.

In such a setup the most efficient option is to not delete at all but putting the cache on a separate file system and reformat when needed.

Cheers,
_