Filtering iterators

**high_flyer** · 31st August 2011, 16:49

you deduce your conclusion from *your* imaginary implementation, which is not *mine*.

True, I can only work with what I have, and that is the information I have (or rather don't)
So I just took what I know from experience to be the "usual" case.

This takes O(n). Which is pretty fast. Actually, I think, is the best result when it comes to random order.

Which was my point all along.

Since hasNext always advance the hidden _iter, the iterator does EXACTLY the same number of iteration over the sequence.

I'll go even one step toward you and say, probably, less, as it only searches the next item, and not all.
As I said in my previous post:

Using the filtering iterator approach, you will be looping through at least parts of your set for every iterator operation.

Which is the core of my argument.
If use your iterator in such a "sterile" example like you posted, the difference is only semantical, as you point out.
But I take it, that if you take the trouble to create an iterator, you will want to use it in real use cases.
This usually involves creating local iterators, and each call to next() on them is going to do a comparison, or many, depending on how far is the desired item from current position of the iterator.

But when you use a filtered list, you perform your filtering only once, and use it in all the places you need to instead of creating a new iterator (which is forced to filter again when you use it).

Of course, it's OBVIOUS that the simpler implementation above requires less steps, less calls, and thus is more efficient, but both solution has the same time complexity.

I know what you meat by that so I want go in length to split hairs.
But more function calls, do lower the efficiency as you stated, and in large sets, that can be a major performance hit.

Hope I was clearer this time.

I guess there might be some use cases where filtering iterators make sense, maybe on rather small sets, which are highly diverse (that is, with large amount of types), and in such cases, instead of filtering the set many times (per type) you can just use an filtering iterator.
But for large sets which are not very diverse, I think they are not practical, except for the ease of use (from a developer perspective)

1. the first is calculated once ONLY if you don't have to change the filtering function often (as I do)
2. it's easy to cache the second example using a QList when you access the data in the second while loop. But in my program, since data and filtering functions are mutable, cache is a minor advantage. Data is asked almost once when it's needed, and the next call is almost sure to access to new data, hence requiring a caching refresh.

I hope this answers your points.

Yes it does, and I agree, that in you case, probably, this will make sense.

**akiross** · 31st August 2011, 17:08

Originally Posted by high_flyer

If use your iterator in such a "sterile" example like you posted, the difference is only semantical, as you point out.
But I take it, that if you take the trouble to create an iterator, you will want to use it in real use cases.

You may not believe it, but this is exactly how I intend to use the iterators

That is my *real* use case.

Originally Posted by high_flyer

But more function calls, do lower the efficiency as you stated, and in large sets, that can be a major performance hit.

Very true, but this depends on implementation: raw pointers implementation for iterators may yield a very efficient result, and the iterator pattern (or paradigm or idiom or whatever we want to call it) lets you to separate concerns in the architecture, yet allowing a very fast implementation. As you could notice, my choice of using java-style iterator means that I'm not concerned about performances, as it's clearly stated in qt documentation "they are a bit slower".

Originally Posted by high_flyer

I guess there might be some use cases where filtering iterators make sense, maybe on rather small sets, which are highly diverse (that is, with large amount of types), and in such cases, instead of filtering the set many times (per type) you can just use an filtering iterator.
But for large sets which are not very diverse, I think they are not practical, except for the ease of use (from a developer perspective)

You're absolutely right, but this is not my case

Large collections may happen, but since filtering tests are not small and quick, but they involve dynamic features, RTTI, string comparisons, etc, function calling is a minor issue.

**wysota** · 31st August 2011, 17:44

Originally Posted by akiross

About speed: of course a cache could be made, but would be made at user's place, instead of the container providing the iterators.

If you ask me, I'd prefer to have the container cache things than forcing the user to do it.

@wysota: I think I'm missing something, because your container looks like what I'm willing to do, but I don't see how you're filtering the content once: you return the iterator and intermediate result is not saved anywhere (it seems, but I don't know how the DOM works in Qt). From your code, it seems that every time you call the begin_const(), an iterator is built from the first element, and the filtering is done every time. In fact, you're retrieving the next element using

Qt Code:

Switch view

m_e = m_e.nextSiblingElement(m_t);

m_e = m_e.nextSiblingElement(m_t);

To copy to clipboard, switch view to plain text mode

which is the actual filtering.

That's correct. My point was to have a container instead of an iterator. The container can do the filtering while it is created, that's my point. However in my exact code, no such thing is done, the filtering is done on the fly.

Caching of the result (to do the filtering once), would require saving the items you find in the iterator and use the saved copy until the original data is not changed.

No, you cache filtering results, not the data. Say you have 100 elements in a container. When you iterate it for the first time, you can fill a bitmap with zeroes and ones marking whether a particular item meets the conditions or not. Then when you iterate again or when you go back, you can use the bitmap to avoid checking the data again.

**akiross** · 31st August 2011, 22:22

Originally Posted by wysota

No, you cache filtering results, not the data. Say you have 100 elements in a container. When you iterate it for the first time, you can fill a bitmap with zeroes and ones marking whether a particular item meets the conditions or not. Then when you iterate again or when you go back, you can use the bitmap to avoid checking the data again.

Yes, but filtering may depend from the data and the data may change. What if you filter your XML tags using the value of an attribute (say "class=foo") and you change that value later on?

Anyway, you're right about filtering containers: I didn't think of that, but it's kinda obvious now that you noticed it. But I think that in case of "infinite iterators", for example when data is generated from external devices, network or pipelined algorithm, containers would be pointless. Not that's my case, just pointing it out

Anyway, I'm branched my source and I'm trying filtered containers now. I think it's more practical in some points of my code, because I'm working with non-linear set ordering and iterators aren't clean to use... So, knowing about copy-on-write on qt containers, it's worthy to try.

Thanks for all your advises, guys.
~Aki

**wysota** · 31st August 2011, 22:29

Originally Posted by akiross

Yes, but filtering may depend from the data and the data may change.

In that case your iterator is pretty much useless anyway.

What if you filter your XML tags using the value of an attribute (say "class=foo") and you change that value later on?

Then you are violating the basic rule of not cutting the branch you're currently sitting on. Regular iterators are not mutable by default. Special considerations need to be taken to make an iterator mutable.

Anyway, you're right about filtering containers: I didn't think of that, but it's kinda obvious now that you noticed it. But I think that in case of "infinite iterators", for example when data is generated from external devices, network or pipelined algorithm, containers would be pointless. Not that's my case, just pointing it out

Iterators would be as pointless. The point of iterator is that you can iterate over all elements. If the data is infinite, it is hard to iterate through it. Then you have a generator and not an interator.

**akiross** · 31st August 2011, 23:16

I'll be quick

1. Why?

2. I wasn't talking about mutable iterators. Probably there is a misunderstanding here, because caching works the way I say: when you cache something, you must ensure it's validity when you're using it later. If data has changed (and it may change also when you finished to iterate it), you must refresh. That's my point.

3. Generators generates. Iterators iterates. I think they are quite different things. You generate data with generators and process it; you CAN use iterators to process such data. If you want to call "iterators" the things that are associated with a finite container, fine, but you've to find another word for such things that work on infinite data sequences, because generator is wrong in this case. By the way, STL calls these things iterators, or stream iterators. For example, istream_iterator is not bounded by any number, and it's a perfectly legal iterator.

You may be interested in this: http://stackoverflow.com/questions/2...tor-bad-design

~Aki

**wysota** · 1st September 2011, 00:15

Originally Posted by akiross

I'll be quick

Likewise.

1. Why?

Deleting the pointed item invalidates the iterator. So does addition in certain cases.

2. I wasn't talking about mutable iterators. Probably there is a misunderstanding here, because caching works the way I say: when you cache something, you must ensure it's validity when you're using it later. If data has changed (and it may change also when you finished to iterate it), you must refresh. That's my point.

My point is that the iterator/container/whatever can only work on const objects. And you can't modify those.

but you've to find another word for such things that work on infinite data sequences

Yeah, I call them generators

If something generates something then it is a generator. If you want to iterate over a generator, that's fine but the iterator iterates, not generates, if you ask me. I know it's just a matter of taxonomy but still.

istream_iterator is not bounded by any number, and it's a perfectly legal iterator.

If it has end() defined, then it is finite. Otherwise I hate to be in a skin of a person trying to process every element of such collection.

**akiross** · 1st September 2011, 18:00

Originally Posted by wysota

Deleting the pointed item invalidates the iterator. So does addition in certain cases.

My point is that the iterator/container/whatever can only work on const objects. And you can't modify those.

Modification doesn't regard only creation and deletion. There is also modification which doesn't invalidate iterators. Anyway, *this was not the point* and you went on another topic. The point was: if you use iterators for filtering, and data is highly mutable (NOT while iterating), then caching is useless and iterators are Ok to use.

Originally Posted by wysota

Yeah, I call them generators

If something generates something then it is a generator. If you want to iterate over a generator, that's fine but the iterator iterates, not generates, if you ask me. I know it's just a matter of taxonomy but still.

Probably you didn't read well what I wrote OR I expressed badly. I said:

I think that in case of "infinite iterators", for example when data is generated from external devices, network or pipelined algorithm, containers would be pointless.

I didn't say that iterators were used to generate data (and I thought that was obvious). A stream is not an iterator, but is an infinite source of data. I was talking about using iterators OVER an infinite collection of data, and as you say: "if you want to iterate over a generator, that's fine". And that was the point of my statement in first place.

Originally Posted by wysota

If it has end() defined, then it is finite. Otherwise I hate to be in a skin of a person trying to process every element of such collection.

I don't agree. You're assuming that we are talking about STL-style iterators, which uses begin() and end(). Java-style iterators, for example, checks using "hasNext()" and not testing if they reached a defined end. That method could return true to give an infinite iterator.
In addition, I think that the semantic of the "end" method in STL iterators is NOT do define a length of a container, but to define a case in which has no longer sense to go on (which is very very different from saying that the iterator will find an end).
It's easy to build a circular list with >0 elements which will iterate forever, even if the end() is defined.
~Aki

**wysota** · 1st September 2011, 19:09

Originally Posted by akiross

Modification doesn't regard only creation and deletion.

No, but if there exist a situation when modifying a container is unsafe then modifying a container is in general unsafe.

The point was: if you use iterators for filtering, and data is highly mutable (NOT while iterating), then caching is useless and iterators are Ok to use.

I don't see your point. If you modify an element in a way that it changes the result of filtering then from the "filtering container's" point of view you "add" or "remove" an element from the container. I don't see how you intend to cope with that in a general case. At some point you will wake up in "nowhereland" and you won't even notice it (e.g. you modify the current element in such a way that changes the filtering result of the element hence your iterator should immediately become invalid, however you have no way of rechecking the filtering condition for the current element and also calling next() and prev() will not get you to the original element as during "prev()" the condition will be rechecked and the element filtered out). Of course in many cases it will work out just fine but I'm not really fond of software that "usually works".

**akiross** · 1st September 2011, 23:57

Originally Posted by wysota

I don't see your point. If you modify an element in a way that it changes the result of filtering then from the "filtering container's" point of view you "add" or "remove" an element from the container.

Sure. Didn't say no

The point was: if you have to change 90% of the container data every time you have to access it, then, probably, caching is not worth the effort. The point was: containers with caching are surely useful, but there exist at least one case when caching is not and iterators alone will do. Got it?

~Aki

**wysota** · 2nd September 2011, 00:08

If it doesn't cost you anything to cache then I don't see why not cache. Let's end this discussion, it probably doesn't lead to any useful conclusions

Thread: Filtering iterators

Thread Tools

Search Thread

Display

Hybrid View

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Re: Filtering iterators

Similar Threads

Qt Creator How to watch STL containers/iterators during debugging?

Iterators v.s. index based for loops

Storing QList QLinkedList Iterators for automatic deletion

qDeleteAll() and linked list iterators

QLinkedList and iterators

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions