PDA

View Full Version : Convert QLinkedList to QList



Phlucious
4th June 2012, 22:59
I have a dataset with literally millions of entries (often 1 million to 1 billion, depending... it's LIDAR point data, if you care). Currently, my container is a QList of pointers because I need the best performance I can possibly achieve and I like being able to sort it conveniently using multiple parameters (using qStableSort), but insertion and deletion with array-based lists that large is incredibly slow.

Therefore, I am considering QLinkedList as an alternative for faster insertions/deletions, but the transition means redoing a significant amount of code since it's not indexed. I'm comfortable with using QMap and QHash, but I'm unfamiliar with QLinkedList and would like some feedback from you before I take the dive. This article (http://doc.qt.nokia.com/qq/qq19-containers.html) has been very helpful as I learn the nuances of Qt containers.

I do not need easy random access at this time, but I do need the ability to sort my data using multiple user-specified parameters—for example, sorting my point data by X, Y, then Z or by Day, Hour, then Second. As far as I can tell, the only way to accomplish this would be to generate sorted QLists (or QMaps?) on-demand, but I cannot determine a simple way to generate a QList from a QLinkedList other than manually iterating through the LinkedList and generating the QList one item at a time. Is there a more efficient way to accomplish this?

The sorting method can change during runtime. Thanks!

amleto
4th June 2012, 23:03
are you sure this is worth it? Have you checked the difference in performance in release mode? You might be surprised. I will try to find a vid that shows why.

http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style 44:44
Also you may like to investigate this: http://www.codeproject.com/Articles/340797/Number-crunching-Why-you-should-never-ever-EVER-us


imo, you should not be using linked list.

ChrisW67
5th June 2012, 00:12
You have a billion pointers to entries in a QList: that's 8GB just for the pointers (because it will have to be a 64-bit environment at 8 bytes per pointer) before you even have the data they actually point at. If you switch to a QLinkedList you'll instantly double or triple that: forward and backward pointers plus the pointer to the data.

Do you really need to keep all this data in memory? Have you considered using a relational database with well chosen indexes purely for its sorting and grouping ability? If you use Sqlite you can even keep the database in memory (if you have enough) and still have the speed advantage.

wysota
5th June 2012, 00:17
Just my five cents...

There are data structures which make it possible to make data representation much more compact than with regular lists. Furthermore some data structures will allow to dump rarely used data to storage and resurect it back to memory if needed. Consider using tree-based structures, they can be much much more efficient than listsand their hierarchical nature will allow to limit memory usage.

Phlucious
5th June 2012, 00:37
are you sure this is worth it? Have you checked the difference in performance in release mode? You might be surprised. I will try to find a vid that shows why.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style 44:44


Fascinating video! I'm going to have to watch the whole thing sometime soon. Thanks for that link. I wan't convinced that a QLinkedList is a good idea (which is why I posted here before overhauling everything!) and upon viewing those links I'm led to believe that the cost of overhauling my code exceeds any benefit it might provide.


Do you really need to keep all this data in memory? Have you considered using a relational database with well chosen indexes purely for its sorting and grouping ability? If you use Sqlite you can even keep the database in memory (if you have enough) and still have the speed advantage.

Memory can be a concern, which is why I compiled a 64bit version of Qt for use with MSVC2010. I'm interested in learning more about cached processing, but I'm not sure it works in this application—the workflow for my application basically follows this workflow:
1. Open a LAS file.
2. Do some stuff with ALL of the points.
3. Save a new LAS file.
4. Repeat 1-3 through a directory.

I haven't considered SQL an option yet since I'm unfamiliar with SQL. I'll consider looking into it.


There are data structures which make it possible to make data representation much more compact than with regular lists. Furthermore some data structures will allow to dump rarely used data to storage and resurect it back to memory if needed. Consider using tree-based structures, they can be much much more efficient than listsand their hierarchical nature will allow to limit memory usage.
Frankly, I don't even know where to begin when it comes to designing a tree-based structure for this dataset. I'm interested in learning more.



Based on everyone's posts, I've accepted the fact that a QLinkedList probably isn't the solution I'm looking for. Since deletions and insertions occur in within a single function, it made sense to me to instead regenerate the QList after marking points for deletion/insertion.

ChrisW67
5th June 2012, 01:36
The "best" solution could very much depend on exactly what stuff you are doing to all the points in the LAS file. Your options might also be different if the LAS file has fixed record sizes. For example, there are methods for sorting huge files without loading the entire file into memory. See http://en.wikipedia.org/wiki/External_sorting

wysota
5th June 2012, 09:28
Memory can be a concern, which is why I compiled a 64bit version of Qt for use with MSVC2010. I'm interested in learning more about cached processing, but I'm not sure it works in this application—the workflow for my application basically follows this workflow:
1. Open a LAS file.
2. Do some stuff with ALL of the points.
3. Save a new LAS file.
4. Repeat 1-3 through a directory.
The question is whether processing the last point in the set requires you to know the first point in the set. If not then you don't need to pre-read every point to memory, you can load just those you need and then forget them before you load a next portion of points.


Frankly, I don't even know where to begin when it comes to designing a tree-based structure for this dataset. I'm interested in learning more.

For example, since your data are points (x, y records) a simple optimisation would be to store the x value in a node and all y values for that point in children of that node. The more x values repeat, the more benefit it brings you.

Another approach would be to calculate groups of points that are close to each other, store the mean position in a node and deltas to each point in child nodes (since deltas will contain much smaller values, they can be stored using less memory). This allows you to dump all the child nodes to disk, keeping just their parent and when the parent is asked to give back its child nodes, you can load them back to memory, potentially swaping some other group of points to storage.

These are just ideas, the optimal approach depends on HOW you use your data.

Phlucious
13th June 2012, 19:30
The "best" solution could very much depend on exactly what stuff you are doing to all the points in the LAS file. Your options might also be different if the LAS file has fixed record sizes. For example, there are methods for sorting huge files without loading the entire file into memory. See http://en.wikipedia.org/wiki/External_sorting

Another approach would be to calculate groups of points that are close to each other, store the mean position in a node and deltas to each point in child nodes (since deltas will contain much smaller values, they can be stored using less memory). This allows you to dump all the child nodes to disk, keeping just their parent and when the parent is asked to give back its child nodes, you can load them back to memory, potentially swaping some other group of points to storage.
Fascinating ideas! I'm definitely going to look into some sort of external cache system soon, so I love both ideas conceptually. Thanks for giving me a direction to go with this!