PDA

View Full Version : Nested QHash in range for loop



Scope
2nd December 2015, 20:51
Hi,

It is possible to use this data structure:



struct Data{
QString scope;
QString name;
QVariant value;
};

QHash<QString, QHash<QString, Data>> data;


in range for loop like that:



for( const Data & d : data )
qDebug()<< d.scope << d.name << d.value;

Scope
3rd December 2015, 08:10
I know that I must implement begin() and end() inside class which contains these nested QHash.
But how connect all ranges from inner QHash?

yeye_olive
3rd December 2015, 10:25
The nested QHashes must be wrapped in a class for which you need to implement your own iterator and const_iterator. Such an iterator could consist of a triple ((const)iterator on the outer QHash, (const)iterator on the current inner QHash, (c)end() of the current inner QHash).

Do you really need to go through all this trouble? Consider theses options:

Replace your nested QHash with a flat QHash whose key type is a pair of QStrings.
Keep the nested QHash, and iterate over it with two nested for loops.

I recommend option 1. Option 2 (and your nested structure in general) only makes sense if you need to collectively manage all bindings sharing the same outer key.

Scope
3rd December 2015, 11:50
QPair can not be a key in my case.
Every Data structure is identified by scope ( first QString ) and name (second QString ) and now when i want set some Data i have function:



class Container
{
public:
void setData(const Data &data);

private:
QHash<QString, QHash<QString, Data>> d;

};


void Container::setData(const Data &data)
{
const QString scope = data.scope;
const QString name = data.name;

if( d.contains(scope) ) {
if( d.value(scope).contains(name) )
d[scope][name] = data;
else
d[scope].insert(name, data);
}
else {
d.insert(scope, QHash<QString, Data>{{name, data}});
}
}


and now I want allow to use this class in range for loop like that



Container container;
for(const Data &data : container )
qDebug()<< data.scope << data.name << data.value;


but if I understand you correctly it will be very difficult.

yeye_olive
3rd December 2015, 16:05
QPair can not be a key in my case.
Really? What about this:

class Container
{
public:
void setData(const Data &data);

private:
QHash<QPair<QString, QString>, Data> d;

};

void Container::setData(const Data &data)
{
d.insert(qMakePair(data.scope, data.name), data);
}

And here is how you iterate on the container:


class Container {
// ...
public:
using const_iterator = QHash<QPair<QString, QString>, Data>::const_iterator;
const_iterator cbegin() const {return d.cbegin();}
const_iterator cend() const {return d.cend();}
const_iterator begin() const {return d.begin();}
const_iterator end() const {return d.end();}
// I omit the non-const iterators on purpose, because they would be more complex than const_iterator, as we have an invariant to maintain on d
// ...
};

// ...
Container container;
// ...
for(const Data &data : static_cast<const Container &>(container))
qDebug()<< data.scope << data.name << data.value;


Disclaimer: I have not tried to compile these snippets.

Scope
3rd December 2015, 17:01
Really? What about this:

Yes, because in my case, scope( first string ) must be unique and name of data ( second string ) also must be unique in their scope. Then your example with QPair as key has horrible bad performance and complexity when we compare this to nested hash.

yeye_olive
3rd December 2015, 18:00
Yes, because in my case, scope( first string ) must be unique and name of data ( second string ) also must be unique in their scope.
Which is equivalent to saying that there must be only one Data for each unique (scope, name) combination. Which is why I suggested the pair as key in the first place.

Then your example with QPair as key has horrible bad performance and complexity when we compare this to nested hash.
How do you know? Have you run benchmarks? I would like to hear your argument, because I actually believe the opposite to be true in general. You do realize that a hash table has amortized constant average complexity, right? Besides, a flat QHash wastes less memory than nested ones.

If nesting was so much better, then we would never see things like QHash<QString, Foo>. Instead, people would use crazy structures like QHash<QString, QHash<QString, Foo> > in which you split the original string into two substrings to index into the two levels. But guess what, they don't.

Seriously though, there is nothing wrong with a hash table indexed by a pair of QStrings. It is about as efficient a structure as you can get with generic containers. As I already pointed out, nesting is useful if you have operations like slicing (e.g. quickly get all bindings for a given scope) or a very specific distribution of (scope, name) pairs. But all we know is that you are interested in iterating over the whole (flat) collection. Other people seem to agree with me:
http://stackoverflow.com/questions/22673370/advantages-and-disadvantages-of-using-a-concatenated-key-rather-than-nested-map
http://programmers.stackexchange.com/questions/250124/nested-maps-vs-combined-keys

But hey, if you want to waste your time with overly complex data structures with no gain in performance or memory usage (and most likely a loss), be my guest.

Scope
3rd December 2015, 20:24
How do you know? Have you run benchmarks? I would like to hear your argument

As you said when we want take all values from specific scope



// with QPair as key
QList<Data> Container::data(const QString &scope)
{
QList<Data> data;
for( const QPair<QString, QString> &key : d.keys() )
if( key.first == scope )
data.append( d.value(key));

return data;
}

// with nested QHash
QList<Data> Container::data(const QString &scope)
{
if( d2.contains(scope) )
return d2.value(scope).values();

return QList<Data>();
}


Maybe first version can be better using std::find_if or std::binary_search but second version is faster.
But maybe you have right that QPair as key is not worse as I said...

I am just trying find the best structure, for properties like scope, name and value.
Main goal is high performance for insert and lookup, and possibility use in range for loop.

yeye_olive
3rd December 2015, 22:30
OK, if you need such slicing, then the nested QHashes are the reasonable solution; in that case your setData method would be more efficiently implemented by:

void Container::setData(const Data &data) {
d[data.scope].insert(data.name, data);
}
You could implement a flat iterator by a triple (see my first post).

Scope
3rd December 2015, 23:17
Thank you very much,
I found in boost iterator_facade template and example



# include "node.hpp"
# include <boost/iterator/iterator_facade.hpp>

class node_iterator
: public boost::iterator_facade<
node_iterator
, node_base
, boost::forward_traversal_tag
>
{
public:
node_iterator()
: m_node(0) {}

explicit node_iterator(node_base* p)
: m_node(p) {}

private:
friend class boost::iterator_core_access;

void increment() { m_node = m_node->next(); }

bool equal(node_iterator const& other) const
{
return this->m_node == other.m_node;
}

node_base& dereference() const { return *m_node; }

node_base* m_node;
};


but i did not know how jump from end of one QHash to begin another?

yeye_olive
4th December 2015, 09:53
Here you go, a proof of concept in pseudocode of the flat const iterator. It only supports dereferencing and incrementation. It is in fact a quadruple (current and end iterator for outer and inner maps):

Data members:
// Iterator on the outer QHash
QHash<QString, QHash<QString, Data> >::const_iterator m_outer;
// End iterator on the outer QHash
QHash<QString, QHash<QString, Data> >::const_iterator m_outerEnd;
// Iterator on the inner QHash. Has an indeterminate value if m_outer == m_outerEnd; otherwise, points to the current binding in the current inner map
QHash<QString, Data>::const_iterator m_inner;
// End iterator on the inner QHash. Has an indeterminate value if m_outer == m_outerEnd; otherwise, is the end() iterator of the current inner map
QHash<QString, Data>::const_iterator m_innerEnd;

// Dereferences the iterator; assumes m_outer != m_outerEnd
Dereference:
Return *m_inner

// Tests for equality with another iterator on the same container
Equals(const const_iterator &other):
If m_outer == m_outerEnd
Return other.m_outer == other.m_outerEnd
Else
Return m_outer == other.m_outer && m_inner == other.m_inner

// A private operation that updates the state right after m_outer has received a new value
NewOuter:
While m_outer != m_outerEnd
m_inner = m_outer->begin()
m_innerEnd = m_outer->end()
If m_inner != m_innerEnd
Break
Increment m_outer

// Sets up this iterator at the beginning of the collection c
Begin(const QHash<QString, QHash<QString, Data> > &c):
m_outer = c.begin()
m_outerEnd = c.end()
Call NewOuter

// Sets up this iterator at the end of the collection c
End(const QHash<QString, QHash<QString, Data> > &c):
m_outer = m_outerEnd = c.end()

// Increments this iterator; assumes m_outer != m_outerEnd
Increment:
Increment m_inner
If m_inner == m_innerEnd
// Done iterating on the current inner map; move to the next outer binding
Increment m_outer
Call NewOuter

anda_skoa
4th December 2015, 15:46
And be aware that using a ranged for on a Qt container will require the Qt container to do a deep copy if it is used in more than one variable at the time of iteration.

Cheers,
_

Scope
4th December 2015, 18:25
Thanks yeye_olive
Can you look on my implementation? I tested this code and think that it works fine.



class Container
{
public:
Container();

void setData(const Data &data);


using value_type = Data;
using inner_range = QHash<QString, value_type>;
using outer_range = QHash<QString, inner_range>;

class iterator : std::iterator_traits<inner_range::iterator>
{
public:
using iterator_category = std::forward_iterator_tag;

iterator(outer_range::iterator const & outer_iterator,
outer_range::iterator const & outer_end)
: outer_iterator(outer_iterator), outer_end(outer_end)
{
update();
}

iterator & operator++()
{
++inner_iterator;
if(inner_iterator == inner_end)
{
++outer_iterator;
update();
}
return *this;
}

value_type operator*() const
{
return *inner_iterator;
}

bool operator==(iterator const & rhs) const
{
bool lhs_end = outer_iterator == outer_end;
bool rhs_end = rhs.outer_iterator == rhs.outer_end;
if(lhs_end && rhs_end)
return true;
if(lhs_end != rhs_end)
return false;
return outer_iterator == rhs.outer_iterator
&& inner_iterator == rhs.inner_iterator;
}

bool operator!=(iterator const &other) const
{
return !(*this == other);
}


private:
outer_range::iterator outer_iterator, outer_end;
inner_range::iterator inner_iterator, inner_end;

void update()
{
while(outer_iterator != outer_end)
{
inner_iterator = (*outer_iterator).begin();
inner_end = (*outer_iterator).end();
if(inner_iterator == inner_end)
++outer_iterator;
else
break;
}
}
};


Container::iterator begin();
Container::iterator end();

private:
QHash<QString, QHash<QString, Data>> d;
};


anda_skoa


And be aware that using a ranged for on a Qt container will require the Qt container to do a deep copy if it is used in more than one variable at the time of iteration.

What exactly you mean?
When I use this container like that:



for( const Data &x : d )
qDebug()<< x.scope << "::" << x.name << ", " << x.value;


then it should not create a copy of each elemenet.

And importan question, whether i should stay with QHash whether I can obtain better performance using QMap?

Especially when keys are type of QString which can have between one to over a dozen characters and when each Container should have average 1 <= x <= 30 properties as Data.

anda_skoa
5th December 2015, 11:28
What exactly you mean?
When I use this container like that:



for( const Data &x : d )
qDebug()<< x.scope << "::" << x.name << ", " << x.value;


then it should not create a copy of each elemenet.


Qt containers are implicitly shared, they use reference counting to avoid deep copying on assignment/copying.



QHash<QString, QString> h1;
h1.insert("a", "1");
h1.insert("b", "2");

QHash<QString, QString> h2 = h1; // h2 and h1 shared the same data. reference count is 2

for (const QString &value, h1) { // h1 detaches, creates a deep copy, h2 and h1 now point to different data, each having a ref count of 1
}

Detaching is triggered by calling non-const methods. begin()/end() are non-const unless the instance they are called on is const.

So make sure that either
- the container is not having a ref count > 1
- the container is constant
or use foreach()

Cheers,
_

Scope
5th December 2015, 19:13
Ok, I think that this should not be a problem - if Qt containers inside range for loop create a deep copy then my class can have a similar behavior.



And importan question, whether i should stay with QHash whether I can obtain better performance using QMap?

Especially when keys are type of QString which can have between one to over a dozen characters and when each Container should have average 1 <= x <= 30 properties as Data.


Thanks,

anda_skoa
6th December 2015, 08:50
Ok, I think that this should not be a problem - if Qt containers inside range for loop create a deep copy then my class can have a similar behavior.

Well, usually one doesn't use ranged-for with a Qt container unless it is ensured that there won't be the penalty of the deep copy.

Cheers,
_

Scope
6th December 2015, 15:16
I found on Qt blog that this problem can be solve using simple template



template<class T> const T &const_(const T &t) { return t; }
for(auto it : const_(vector)) { ... }


Thanks and regards

yeye_olive
7th December 2015, 11:17
Can you look on my implementation? I tested this code and think that it works fine.
Here are my comments on your implementation:

Your iterator advertizes itself a forward iterator, therefore it should provide default construction and post-incrementation.

Line 36: the return type should ve value_type &, not value_type (see more about this below).

Operator ==: your implementation works perfectly, but you could make it faster thanks to the precondition that *this and rhs are on the same container. For instance:

return outer_iterator == rhs.outer_iterator && (outer_iterator == outer_end || inner_iterator == rhs.inner_iterator);
About the wrong return type at line 36: you chose to implement an iterator (as opposed to a const_iterator), which is expected to let the user modify the elements of the container; that is why operator*() should return a non-const reference to the element. Naturally you should also add a comment warning the user against modifying Data::scope and Data::name through this iterator, as this would break the invariant of your Container. (std::map and std::unordered_map have exactly the same problem.)

You should also provide a const_iterator in addition to the iterator, especially if all the iterations you care about are read-only (and the fact that your current operator*() returns a copy of the element is a clear indication that it is the case). Not only will you enjoy the added safety from the type system, but you will also gain a performance advantage since the underlying QHash::const_iterators will not force the QHash to perform a deep copy in case of implicit sharing.


And importan question, whether i should stay with QHash whether I can obtain better performance using QMap?

Especially when keys are type of QString which can have between one to over a dozen characters and when each Container should have average 1 <= x <= 30 properties as Data.
You will most likely obtain better performance with a QHash than with a QMap. The main advantage of QMap over QHash is that the collection is sorted. Unless you care about that, you should stick with QHash.


I found on Qt blog that this problem can be solve using simple template



template<class T> const T &const_(const T &t) { return t; }
for(auto it : const_(vector)) { ... }

For that to work, you must provide a const_iterator and const begin() and end() methods. Also, your syntax is friendlier than the static_cast beast I wrote in a previous post, but requires the additional helper definition of const_; I still cannot understand why the standard library does not provide it.