PDA

View Full Version : Item position in QMap?



lni
13th March 2009, 16:21
Hi,

I am currently using std::map to look up item position by key as (the key is either std::string or int, so it has operator==, operator>, and operator<):



std::map::const_iterator it = map.find( key );
if ( it != map.end() ) {
return std::distance( map.begin(), it );
}


I have 500,000 items in the map, and need to iterate through all 500,000 items and get each item's position. The loop takes 1 hours!!! So it is horribly wrong.

Can someone give suggestion to speed it up, preferably still using std::map because the codes were like that originally, but if QMap can give much better performace, then I would have a reason to change to QMap...

Thanks

gjshannon
13th March 2009, 17:43
My first guess is that you are iterating through most of the items.
Hence, an index is required in order to avoid iterating through most items before finding what you need.
I'm not real familiar with STL components that might do this. In Java there is TreeMap, which provides a red/black tree index to significantly reduce lookup times.
A natural order is required for each object before such a tree will work.
The help text for QMap indicates that it stores (key,value) pairs that provides a fast lookup based upon the key (i.e. via the find() iterator). In Qt this may be an option to consider in lieu of std::map. I assume that the key type must provide a natural ordering before the map lookup will work efficiently.

fullmetalcoder
14th March 2009, 10:56
Huh, I am not familiar with the internals of std::map but I suppose the results are similar to that of other map classes (such as QMap) i.e O(log(n)) lookup so something strikes me in your design : if I am not mistaken you are getting an iterator to an item using find() in a loop (which leads to O(n log(n)) asymptotic behavior

I am not sure what you are trying to do but I suppose actually iterating over would be much better, e.g (pseudo code):


int rank = 0;
iterator it = container.begin();
while ( it != container.end() )
{
doSomething(it.key(), it.value(), rank);
++it;
++rank;
}


By the way, are you sure the map is the only bottleneck in your code? I have used QMaps with more than 100k items as well and iterating over the contents (and modifying them) never took more than a couple hundred of ms so one hour of map iteration looks very unrealistic to me, even with a very unoptimal lookup strategy.

wysota
14th March 2009, 13:17
Is using QHash (or an std:: equivalent) an option? It would drastically reduce lookup times.

lni
14th March 2009, 21:38
I end up writing my own binary search function. My array is in increasing order, I just need the position of an item in the array....Not sure how to use QHash...

I was misled by QList::indexOf, which uses sequentially search and is killing me...



template <typename Type>
int indexOf( const QList<Type>& lists, const Type& value, int pos_from, int pos_to )
{
if ( pos_to < pos_from ) {
return -1; // not found
}

int mid = pos_from + ( ( pos_to - pos_from ) / 2 );
if ( lists[ mid ] > value ) {
return indexOf( lists, value, pos_from, mid - 1 );
} else if ( lists[ mid ] < value ) {
return indexOf( lists, value, mid+1, pos_to );
} else {
return mid; // found
}
}

wysota
14th March 2009, 22:48
I end up writing my own binary search function.
I'm sure you do realize Qt already has one, right?


My array is in increasing order, I just need the position of an item in the array....
So... why are you using a map?


Not sure how to use QHash...
It's like QMap but with logarithmic complexity instead of QMap's linear when dealing with lookups and inserts. It doesn't guarantee the key order though.




template <typename Type>
int indexOf( const QList<Type>& lists, const Type& value, int pos_from, int pos_to )
{
if ( pos_to < pos_from ) {
return -1; // not found
}

int mid = pos_from + ( ( pos_to - pos_from ) / 2 );
if ( lists[ mid ] > value ) {
return indexOf( lists, value, pos_from, mid - 1 );
} else if ( lists[ mid ] < value ) {
return indexOf( lists, value, mid+1, pos_to );
} else {
return mid; // found
}
}


Hooray! We're reinventing the wheel again! ;)

How about:
QList<Type>::const_iterator it = qBinaryFind(lists, value);?

fullmetalcoder
15th March 2009, 12:35
It's like QMap but with logarithmic complexity instead of QMap's linear when dealing with lookups and inserts. It doesn't guarantee the key order though.
Just for the sake of correctness : QHash offers (amortized) O(1) lookups whereas QMap offer O(log(n)) lookups. This comes from the fact that the data is sorted by key (ascending order) in a QMap and searching is done via binary search whereas data is unsorted in a QHash but associated to an internal key computed from the actual key using a hashing function. The "amortized" means that O(1) will only be obtained if the has function is O(1) itself and if the hashing function associate (as much as possible) different hash to different keys.

The docs (http://doc.trolltech.com/4.4/containers.html#algorithmic-complexity)explain that (though they of course do not cover the theory of algorithmic complexity...)

wysota
15th March 2009, 13:40
For sake of completeness here is something to read as well:
http://doc.trolltech.com/qq/qq19-containers.html

lni
15th March 2009, 17:02
QList<Type>::const_iterator it = qBinaryFind(lists, value);

Wow!!! There is a qBinaryFind in Qt already!!! Thanks for the hint!! Qt has grown so big, it is good and bad :)

lni
15th March 2009, 17:06
Wow!!! There is a qBinaryFind in Qt already!!! Thanks for the hint!! Qt has grown so big, it is good and bad :)

Wait a sec, I need the index of the item in the container, not the iterator...

QStringList list;
list << "one" << "two" << "three";

I need
int indexOf( list, "two" )

which return 1 in this example.

wysota
15th March 2009, 17:54
Wow!!! There is a qBinaryFind in Qt already!!! Thanks for the hint!! Qt has grown so big, it is good and bad :)
It's been there for ages.


Wait a sec, I need the index of the item in the container, not the iterator...


QList<...>::const_iterator it = qBinaryFind(container, ...);
int index = (it==container.end()) ? -1 : it-container.begin();

lni
15th March 2009, 19:11
QList<...>::const_iterator it = qBinaryFind(container, ...);
int index = (it==container.end()) ? -1 : it-container.begin();

That is exactly what I did using stl


std::map::const_iterator it = map.find( key );
if ( it != map.end() ) {
return std::distance( map.begin(), it );
}

return -1;



This code took 1 hour to run through a container that contains 500,000 items. Unless "it - container.begin()" is different from std::distance( ... ).

lni
15th March 2009, 19:31
This is weird, could it be stl bug in std::distance?

I compare stl and Qt container, when totalItems is more than a certain number, say 10000, the stl becomes extremely slow but QList is fine...



#include <QList>
#include <set>
#include <iostream>
#include <math.h>

static int findIndex( const std::set<int>& stlSet, int value )
{
std::set<int>::const_iterator it = stlSet.find( value );
if ( it != stlSet.end() ) {
return std::distance( stlSet.begin(), it );
}

return -1;
}

static void testStl( int totalItems )
{
std::set<int> stlSet;

for ( int idx = 0; idx < totalItems; idx++ ) {
stlSet.insert( idx );
}

for ( int idx = 0; idx < totalItems; idx++ ) {
//std::cout << findIndex( stlSet, idx ) << std::endl;
findIndex( stlSet, idx );
}

}

static int findIndex( const QList<int>& qlists, int value )
{
QList<int>::const_iterator it = qBinaryFind( qlists, value );
if ( it != qlists.end() ) {
return it - qlists.begin();
}

return -1;
}

static void testQList( int totalItems )
{
QList<int> qlists;

for ( int idx = 0; idx < totalItems; idx++ ) {
qlists << idx;
}

for ( int idx = 0; idx < totalItems; idx++ ) {
//std::cout << findIndex( qlists, idx ) << std::endl;
findIndex( qlists, idx );
}

}

int main( int argc, char** argv )
{
int totalItems = atoi( argv[1] );

std::cout << "Start testQList" << std::endl;
testQList( totalItems );
std::cout << "End testQList\n" << std::endl;

std::cout << "Start testStl" << std::endl;
testStl( totalItems );
std::cout << "End testStl" << std::endl;

}

wysota
15th March 2009, 20:54
Qt containers are in general much faster than their stl counterparts. std::map has linear lookup complexity as far as I remember so you need to do up to 500000 comparisons to find an item in it. It's simply not scalable. Use a hash array of some sort (like QHash) instead. Maybe you should tell us what you are holding in the map and why you need to find the index of some item there? Maybe we can suggest a better solution then...

lni
16th March 2009, 02:01
Qt containers are in general much faster than their stl counterparts. std::map has linear lookup complexity as far as I remember so you need to do up to 500000 comparisons to find an item in it. It's simply not scalable. Use a hash array of some sort (like QHash) instead. Maybe you should tell us what you are holding in the map and why you need to find the index of some item there? Maybe we can suggest a better solution then...

I have an array of pointers, each having an unique key (type of std::string and int). So I made a map to hold them (std::map<std::string, MyClass*>, and std::map<int, MyOtherClass*>).

I use std::map because it is sorted and is fast to look them up by the key. Now, I have a key and I want to find the location (or index) of the item within the container.

std::map<..>::const_iterator it = map.find( key ) is fast as expected, but I need to convert the iterator to its position inside the container, which I used std::distance( map.begin(), it ), and it is extremely slow (take hours to look up 500,000 items)...

wysota
16th March 2009, 11:11
I use std::map because it is sorted and is fast to look them up by the key.
Consider the fact that there are different sort orders than just alphabetical. There are faster solutions than this, QHash is one of them.


Now, I have a key and I want to find the location (or index) of the item within the container.
Ok, but why? :) What do you intend to do with it?


std::map<..>::const_iterator it = map.find( key ) is fast as expected, but I need to convert the iterator to its position inside the container, which I used std::distance( map.begin(), it ), and it is extremely slow (take hours to look up 500,000 items)...

I'm not sure what std::distance() does, but you should be able to substract iterators to receive an index. In some cases, like vectors this should be instantaneous. Maybe you should switch to a sorted vector? Use lower_bound (or qLowerBound) to insert items in the vector and abuse the fact that vectors stores data in adjacent memory cells - the index of the item is a difference of pointers representing both items divided by the size of the pointer - this is O(1) op.