PDA

View Full Version : QSortFilterProxy with large table



rknowles
31st July 2009, 19:40
I'm trying to filter a table with over 50000 rows using QSortFilterProxy. It works fine on much smaller tables and works well with some filter choices. The worst case is filtering on a boolean column with about 1/2 true, 1/2 false. This filtering sometimes takes over 8 minutes! Is QSortFilterProxy the wrong way to go with this size table? Any alternatives? One thing I notice is that it appears to filter the entire model before redrawing the view. Is there anything like a proxyView, that just filters enough to fill up the view and doesn't bother with the entire model?

Thanks,
Dick

franz
31st July 2009, 20:00
Isn't that in conflict with the whole idea of Model/View programming?

rknowles
31st July 2009, 21:28
Isn't that in conflict with the whole idea of Model/View programming?

Do you mean the "proxyView" idea? It might be, but what I'd like to happen with this big file is when the view needs 20 rows, it gets them from somewhere (the proxyModel?), and the proxy filters at that time and only until it gets 20 rows. I don't need the entire dataset filtered, only a small subset that is displayed. I just thought this was one approach to the slowdown I see.

numbat
1st August 2009, 12:17
I tested the following code. It takes about five seconds to sort the true/false column and ten seconds for the text column. My computer is definitely not a super computer so eight minutes is rediculous. Are you using a custom model? Have you used model test with it?


MainWindow::MainWindow(QWidget *parent)
: QMainWindow(parent)
{
QStandardItemModel * model = new QStandardItemModel(0, 2);
for (int i = 0; i < 100000; i++)
{
model->insertRow(model->rowCount());
model->setData(model->index(model->rowCount() - 1, 0), QString("Testing %1").arg(i), Qt::DisplayRole);
model->setData(model->index(model->rowCount() - 1, 1), i < 50000 ? QVariant(true) : QVariant(false), Qt::DisplayRole);
}

QSortFilterProxyModel * proxy = new QSortFilterProxyModel;
proxy->setSourceModel(model);
proxy->sort(1, Qt::AscendingOrder);

QTreeView * tv = new QTreeView;
tv->setSortingEnabled(true);
tv->setModel(proxy);

setCentralWidget(tv);
}

rknowles
3rd August 2009, 16:36
I tested the following code. It takes about five seconds to sort the true/false column and ten seconds for the text column. My computer is definitely not a super computer so eight minutes is rediculous. Are you using a custom model? Have you used model test with it?

Indeed, my test with your code sorts quickly- under 2 seconds for either column. But what I want to do is filter: display just the true, just the false, or all rows. I added my own proxy to your test with this method:

bool MyProxy::filterAcceptsRow( int source_row, const QModelIndex & source_parent ) const
{
static bool res;

switch (filterType)
{
case ALL_FILTER:
res = true;
break;

case TRUE_FILTER:
res = this->sourceModel()->data(this->sourceModel()->index(source_row,1,source_parent)).toBool() == true;
break;

case FALSE_FILTER:
res = this->sourceModel()->data(this->sourceModel()->index(source_row,1,source_parent)).toBool() == false;
break;

default:
res = true;
break;
}
return res;
}

with an enum switched by radio buttons to the three values. When I filter times are

ALL to (TRUE or FALSE) ~20 seconds
(TRUE to FALSE) or (FALSE to TRUE) ~2 seconds
(TRUE or FALSE) to ALL~5 seconds

However, if I rearrange the data from your setting of first half true, second half false:


model->setData(model->index(model->rowCount() - 1, 1), i < 50000? QVariant(true) : QVariant(false), Qt::DisplayRole);

to alternating true-false:


model->setData(model->index(model->rowCount() - 1, 1), i % 2 == 0 ? QVariant(true) : QVariant(false), Qt::DisplayRole);

the times now become

ALL to (TRUE or FALSE) ~11 minutes
(TRUE to FALSE) or (FALSE to TRUE) ~7 seconds
(TRUE or FALSE) to ALL~13 minutes

It seems the algorithm is sensitive to the data order, and my actual data is much closer to the alternating case than to the half and half case.

Thanks,
Dick

rknowles
3rd August 2009, 16:52
Goofed on my previous posting. The first list of results:



ALL to (TRUE or FALSE) ~20 seconds
(TRUE to FALSE) or (FALSE to TRUE) ~2 seconds
(TRUE or FALSE) to ALL~5 seconds


should be

ALL to (TRUE or FALSE) ~2 minutes
(TRUE to FALSE) or (FALSE to TRUE) ~7 seconds
(TRUE or FALSE) to ALL~2 minutes

numbat
3rd August 2009, 19:50
Sorry, I mistakenly read your original post as sorting for some reason. Anyway, with the below code, it takes about one second on my machine. Could you post a minimal compilable example that takes several minutes?


class MyProxy : public QSortFilterProxyModel
{
bool filterAcceptsRow( int source_row, const QModelIndex & source_parent ) const;
};

static int state;
bool MyProxy::filterAcceptsRow( int source_row, const QModelIndex & source_parent ) const
{
bool res;
switch(state)
{
case 0:
res = true;
break;

case 1:

res = this->sourceModel()->data(this->sourceModel()->index(source_row,1,source_parent)).toBool() == true;
break;


case 2:
res = this->sourceModel()->data(this->sourceModel()->index(source_row,1,source_parent)).toBool() == false;
break;
}

return res;
}


MyProxy * proxy;
void MainWindow::onclick()
{
state++;
if (state > 2) state = 0;

proxy->invalidate();
}

MainWindow::MainWindow(QWidget *parent)
: QMainWindow(parent)
{
QStandardItemModel * model = new QStandardItemModel(0, 2);
for (int i = 0; i < 100000; i++)
{
model->insertRow(model->rowCount());
model->setData(model->index(model->rowCount() - 1, 0), QString("Testing %1").arg(i), Qt::DisplayRole);
model->setData(model->index(model->rowCount() - 1, 1), i % 2 ? QVariant(true) : QVariant(false), Qt::DisplayRole);
}

proxy = new MyProxy;

proxy->setSourceModel(model);

QSplitter * splitter = new QSplitter;
QTreeView * tv = new QTreeView;
QPushButton * button = new QPushButton;
connect(button, SIGNAL(clicked()), this, SLOT(onclick()));
tv->setModel(proxy);
splitter->addWidget(tv);
splitter->addWidget(button);

setCentralWidget(splitter);
}

rknowles
3rd August 2009, 21:27
Numbat,

I used your recent filter code and it works nice and fast on my PC. The slow version, which I based on your earlier post, is attached. One difference I noted was that the earlier sort version had sort turned on and the later filter version did not. So I tried turning off sorting in the attached code. It seemed to help a bit but didn't speed it up like your newer version. So- what silly mistake have I introduced in the attached code?

Dick

rknowles
3rd August 2009, 21:49
I believe I just found the mistake. My slow code did this when one of the buttons was clicked:


...
dynamic_cast<MyProxy *>(ui->treeView->model())->filterChanged(type);
...

void MyProxy::filterChanged(FilterType newFilterType) {
filterType = newFilterType;
invalidateFilter();
}
I made a change to the code like what I saw in Numbat's fast example:


...
dynamic_cast<MyProxy *>(ui->treeView->model())->filterChanged(type);
...

void MyProxy::filterChanged(FilterType newFilterType) {
filterType = newFilterType;
// invalidateFilter();
invalidate();
}
Now it flies! Any idea what's different about these two calls? I used 'invalidateFilter' because the documentation says

This function should be called if you are implementing custom filtering (e.g. filterAcceptsRow (http://doc.trolltech.com/4.3/qsortfilterproxymodel.html#filterAcceptsRow)()), and your filter parameters have changed.That seemed to fit what I was doing.

(Off to test this in production code ...)

numbat
4th August 2009, 10:20
invalidateFilter is definitely the problem. After wading through the source code it appears that invalidate just saves persistent indexes (which saves the selection) and clears the model before reloading it. On the other hand, invalidateFilter tries to keep the current data and add and subtract data from it, as well as emitting rows-inserted signals, etc, in the correct order. The upside is that while invalidate is O(n), invalidateFilter appears to be at least O(n^2).

The documentation should at least state the difference between invalidate and invalidateFilter and Qt should investigate whether invalidateFilter is too slow.

Someone should probably file a bug report. At least you've documented the problem on the web.

rknowles
5th August 2009, 14:55
I submitted a bug report to Nokia.

garyob
23rd September 2009, 12:15
Is this problem now fixed?

I am having what I think is the same problem with Qt 4.5.2.

The problem is not with how the filter works but how Qt handles changes to the amount of items passing the filter.

If I have say 600 items which alternate between true and false (e.g. even rows true and odd rows false) and I have a QSortFilterProxyModel that will filter out all false values.

When the view is first populated, everything works fine and it is fast. However when I change all the values to true it takes a few minutes to complete.

After that if I change all the values to false it completes immediately and if I then set all the values to true, it still completes immediately.

The problem seems to be when the rows being added to the list of rows that pass the filter are disjoint.

I traced into the Qt code and found QSortFilterProxyModelPrivate::insert_source_items
which calculates a vector of intervals to be added to the list of rows that pass the filter.

Qt then iterates through the vector and emits beginInsertRows(), adds the rows in the interval to the list of rows that pass the filter, recreates the mappings for all of the indexes and emits endInsertRows().

If there is only one interval (i.e. all data passes/fails or all of the new data is consectutive) there is only one iteration and the beginInsertRows() and endInsertRows() are only emitted once. This takes very little time.

BUT if there are many intervals (i.e. all the data is disjoint with no two values being consectutive) there are x iterations (for x items being added), the mappings get recalculated and the signals are emitted for each iteration. This takes a LONG time (minutes) and nothing else happens while the loop ios happening. Then the emitted signals get processed and the attached views then get updated x times when once is actually necessary.

It seems that the Qt code cannot handle the addition of a disjoint set very efficiently.

rknowles
23rd September 2009, 13:52
Here's part of the reply when i sent the bug report:


On Thursday, 06. Aug 2009 13:18 Richard J Knowles wrote:
> I've attached a test. This uses "invalidateFilter" in "myproxy.cpp"
> and
> is deadly slow to filter. If you edit that call to "invalidate" and
> rebuild filtering is quite fast.
> ...
> Thanks for checking this out.

I just tried the exemple with both Qt 4.5 and Qt master (the development branch) and indeed, with Qt master the example starts in around 1s. But with Qt 4.5 it is so slow that I have to kill it.

As I said, I suspect it is this change: http://qt.gitorious.org/qt/qt/commit/f328a68

So I consider the bug fixed :-)

Thanks for your bug report
--
Olivier Goffart
Qt Software http://qt.nokia.com

I don't know how this fix is related to what's released. Maybe someone more knowledgable of qt.gitorious can tell us. I had found a workaround for my code (replacing "invalidateFilter" with "invalidate") and stopped worrying about when I'd see the fix propogated. I still don't know why I should choose one of those methods over the other.