PDA

View Full Version : QDirIterator internals?



zaphod.b
4th December 2013, 14:48
Hi all,

I have a thin wrapper around QDirIterator. This wrapper should be "filterable" by an internal regular expression.

Out-of-the-box QDirIterator supports globbing but not regexes. Also, from the doc

QDirIterator is uni-directional (i.e., you cannot iterate directories in reverse order) and does not allow random access
so I cannot simply set a savepoint, then iterate+skip, and finally revert to the savepoint.
Finally, QDirIterator doesn't provide a copy ctor or clone either.

Question:
Is a QDirIterator's order guaranteed? In other words: Is it safe to simulate a copy ctor by creating a new QDirIterator from the original one's path(), then advance it until its filePath() matches the original one's?

Or is there another solution to my problem? Thx for your consideration.


PS: I'm on 5.1. Mind that the QTCLASS link above links to 4.8, though on first sight there are no differences.

anda_skoa
4th December 2013, 16:15
I haven't had a look at the code but my understanding is that it uses the platform's directory traversing API.

So creating a new QDirIterator will call into that API again. If anything has changed the new iterator might have a different iteration result sequence than the first one.

However I don't see why you would need to go back for your use case of filtering using regular expression.
Your iterator will call into the ecapsulated iterator untill your criteria is met and then return this data until it is moved forward, no?

Cheers,
_

zaphod.b
4th December 2013, 17:25
Hi and thx,


I haven't had a look at the code but my understanding is that it uses the platform's directory traversing API.
I had a short glimpse but refrained from getting into the code for complexity.


So creating a new QDirIterator will call into that API again. If anything has changed the new iterator might have a different iteration result sequence than the first one.
If your understanding is correct, then these changes would apply to the "original" iterator as well, wouldn't they? In other words: Unbuffered iterators are always invalidated if the underlying collection changes, aren't they?
Edit:
As long as the order of paths QDirIterator returns is invariant, proceeding to the same filePath() should "sync" the iterators. Of course I'd miss changes up to this path, but the same would be true for the original iterator. Or so I think :confused:


However I don't see why you would need to go back for your use case of filtering using regular expression.
Your iterator will call into the ecapsulated iterator untill your criteria is met and then return this data until it is moved forward, no?
The encapsulated iterator doesn't support regexes. I'd have to do this myself and query hasNext() multiple times to skip unmatched iterator states. This would require to call next() in turns, thus altering the iterator's state, which hasNext() shouldn't do.

Do I overlook anything?

anda_skoa
5th December 2013, 10:06
The encapsulated iterator doesn't support regexes. I'd have to do this myself and query hasNext() multiple times to skip unmatched iterator states. This would require to call next() in turns, thus altering the iterator's state, which hasNext() shouldn't do.

Do I overlook anything?

Maybe I misunderstood what you are trying to do then. My interpretation was that you wanted to iterate over a directory and only get files which match a certain regular expression.
Bascially iterating over a "view" on the directory based on the regex.

Cheers,
_

zaphod.b
5th December 2013, 12:56
Maybe I misunderstood what you are trying to do then. My interpretation was that you wanted to iterate over a directory and only get files which match a certain regular expression.
Bascially iterating over a "view" on the directory based on the regex.
You've got me right. I still don't see how this is trivial however...

Could you please elaborate on how you would solve this task? In particular I don't see how to implement hasNext() such that it skips items returned by QDirIterator::hasNext() that are unmatched by the regex? Unless there's a way to restore QDirIterator's state somehow, that is...

Thx for taking the trouble!

anda_skoa
5th December 2013, 14:06
Basically you have the next match already retrieved from the QDirIterator at all times
Pseudo code:


class RegExIterator
{
public:
RegExIterator(Dir, RegEx)

bool hasNext();
QFileInfo next();

private:
RegEx regex;
QDirIterator dirIt;
QFileInfo nextMatch;

void findNext();
};




RegExIterator(...)
// store params in members
// init QDirIterator member
{
findNext():
}

bool hasNext()
{
return nextMatch != QFileInfo();
}

QFileInfo next()
{
QFileInfo result = nextMatch;

findNext();

return result;
}

void findNext()
{
// iterate over dirIt until match found
// store in nextMatch
}


Cheers,
_

zaphod.b
5th December 2013, 16:09
Thx, but sorry, I don't think this solves the problem.


Basically you have the next match already retrieved from the QDirIterator at all times
Is that so? I think not.

Edit:
Ah, I think I can follow your train of thoughts at last!
Do you say you are ok with hasNext() altering dirIt's state, and have next() only return this altered state rather than advance dirIt itself?
This will break if next() is called without hasNext(). I'd not want to do this.

Unfortunately your pseudo code spares the relevant part:




void findNext()
{
// iterate over dirIt until match found
// store in nextMatch
}


How exactly do you do this?
"Iterate over dirIt" generally implies calling QDirIterator::hasNext() and QDirIterator::next() (for the regex match) in turns, the latter changing the iterator's state.
"Until match found" generally implies to run these cycles more than once unless the very first item returned by QDirIterator matches.

Also in your pseudo code hasNext() would have to call findNext(), too.

Let me try to clarify.
The general pseudo code for the caller using your syntax is


RegExIterator rxIt(dir, rx);
while ( rxIt.hasNext() ) {
QFileInfo fi = rxIt.next();
//do something with fi
}

Now consider this directory structure


dir/
a
x
aa
y

and assume a regex of "a+". The expexted matches are {a, aa}, while {x, y} shall be skipped by the iterator.
Further assume the iterator's position (= state) to be aa. Expected result of hasNext() is false, and next() shall return an invalid result.
The hasNext() algorithm steps would be (loop serialized for clarity)


call dirIt.hasNext() //true
call dirIt.next() //necessary to (1)match regex, (2)prepare for 2nd hasNext(). This alters dirIt's position to y!
call dirIt.hasNext() again //false
revert dirIt position to aa //hasNext() must not alter state!

Now, given QDirIterator's limitations I listed in this thread's first post, if I could perform the above algorithm on a dirIt copy, it would not change the original dirIt's state and I'd be safe.
In the above example, inside hasNext() I'd create another QDirIterator based on dir/, advance it to aa, then perform the algorithm on the "copy". Obviously this would only work if dirIt and copyDirIt return the items in the same order.

anda_skoa
6th December 2013, 10:03
Do you say you are ok with hasNext() altering dirIt's state, and have next() only return this altered state rather than advance dirIt itself?
This will break if next() is called without hasNext(). I'd not want to do this.


No. Only next() alters the state, hasNext() does not. It only returns the value of a comparison, it does not call findNext().



Unfortunately your pseudo code spares the relevant part:

How exactly do you do this?
"Iterate over dirIt" generally implies calling QDirIterator::hasNext() and QDirIterator::next() (for the regex match) in turns, the latter changing the iterator's state.
"Until match found" generally implies to run these cycles more than once unless the very first item returned by QDirIterator matches.

Exactly.



void findNext()
{
while ( dirIt.hasNext() ) {
QString file = dirIt.next();
if ( regexp.match( file ) ) {
nextMatch = dirIt.fileInfo();
return;
}
}

nextMatch = QFileInfo();
}

You could also just keep the string instead of the file info.



Also in your pseudo code hasNext() would have to call findNext(), too.

No, only next() and the constructor do.



Let me try to clarify.
The general pseudo code for the caller using your syntax is


RegExIterator rxIt(dir, rx);
while ( rxIt.hasNext() ) {
QFileInfo fi = rxIt.next();
//do something with fi
}

Now consider this directory structure


dir/
a
x
aa
y

and assume a regex of "a+". The expexted matches are {a, aa}, while {x, y} shall be skipped by the iterator.


The call sequence for that would be like this:


RegExItterator constructor
QDirIterator constructor
findNext
dirIt.hasNext -> true
dirIt.next -> "a"
nextMatch -> "a"

RegExIterator.hasNext
nextMatch valid -> true

RegExIterator.next
result = "a"
findNext
dirIt.hasNext -> true
dirIt.next -> "x" (no match, loop continues)
dirIt.hasNext -> true
dirIt.next -> "aa"
nextMatch -> "aa"
return result

RegExIterator.hasNext
nextMatch valid -> true

RegExIterator.next
result = "aa"
findNext
dirIt.hasNext -> true
dirIt.next -> "y" (no match, loop continues)
dirIt.hasNext -> false
nextMatch -> reset/clear
return result

RegExIterator.hasNext
nextMatch invalid -> false


Cheers,
_

zaphod.b
6th December 2013, 10:42
Oh my... This is trivial indeed. :o
Sometimes you just don't see the wood for the trees. I was afraid that was the case here, and it was...


only next() and the constructor do [ed.: look ahead].
Don't know anymore, but to look ahead in the ctor before any hasNext() or next() may have been my blind spot.

Thank you so much for bearing with me. :)

--
I'd like to change the thread title from "QDirIterator internals?" to "[SOLVED] QDirIterator & regex filter" but can't edit the first post anymore. Admin?