PDA

View Full Version : Fast serialization/deserialization



fullmetalcoder
2nd May 2007, 15:50
For some reason, my application must create a big hierarchical model on launch and maintain it until it finishes execution. The problem I'm facing is that this model is generated from a set of files which are parsed resulting in a loading longer than 10 seconds which is not very affordable... Thus I thought about serializing the data into a formated plain-text file and reading it on next launches instead of parsing the files again. This brings a performance enhancements but not sufficient IMO (gain : 2-3 seconds). My current approach is to load a full file to memory and to parse line one by one (each of them "generating" a tree node). I thought about two ways to improve performance but I don't know how to apply these concepts :

Change the loading/parsing approach (is sequential reading faster than full reading?)
Allocate memory for all nodes to reduce time wasted in allocations/relocations inside the deserialization loop, the main problem being that this nodes are not of the same type...Any hints on how to do this? BTW any other idea that would lead to significant performance enhancement is welcome...

mr.costa
3rd May 2007, 14:38
Well, serializatión and persistence are not trivial. Take a look on:

http://www.s11n.net/

wysota
3rd May 2007, 15:35
What about using QDataStream?

fullmetalcoder
3rd May 2007, 18:41
What about using QDataStream?
I once tried it and also examined it's output (Qt Assistant db's if anyone cares...) and two things came to me :

The output seems to be twice as big (reading them in a plain text editor you can see a space between every character... maybe because strings are saved as Unicode instead of local 8 bit...)
Is it really faster than QTextStream ???Besides I don't ATM use QTextStream for deserialization but only for serialization (which is quite fast). As far as I understood, most of the CPU time is wasted doing memory allocation for nodes and temporary parsing variables (mostly QByteArray and QList<QByteArray>). Thus my question is : how can I reserve an amount of memory to speed up loading? (the tree typically takes around 60Mb...)


Well, serializatión and persistence are not trivial. Take a look on:
http://www.s11n.net/Looks good but I don't feel like adding dependencies... Moreover my problem is not really in I/O but rather in memory management and speed enhancements around it... Ease of use is not my primary focus here. See, I'm ready to use a dirty hack if it does not break portability and shrinks loading under 3-4 seconds...

wysota
3rd May 2007, 19:36
The output seems to be twice as big (reading them in a plain text editor you can see a space between every character... maybe because strings are saved as Unicode instead of local 8 bit...)
Then compress all strings and store them as byte arrays.


Is it really faster than QTextStream ???
No, but it should occupy less space and is platform independent.


Thus my question is : how can I reserve an amount of memory to speed up loading? (the tree typically takes around 60Mb...)
You can use something which is called "placement new" - first you reserve a pool in memory and then when you call new, the memory doesn't have to be allocated so the constructor is called immediately. And when you free (delete) an object, its memory goes back to the pool.

fullmetalcoder
4th May 2007, 17:38
You can use something which is called "placement new" - first you reserve a pool in memory and then when you call new, the memory doesn't have to be allocated so the constructor is called immediately. And when you free (delete) an object, its memory goes back to the pool.
Sounds great but how do I do that???:eek:

wysota
4th May 2007, 18:30
Google is your friend...

http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.10
http://www.parashift.com/c++-faq-lite/dtors.html#faq-11.14
http://www.google.com/search?q=placement+new

marcel
4th May 2007, 18:40
This is pretty clear and proved helpful to me ( see the Placement Syntax section ).
http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/language/ref/clrc05cplr199.htm

regards

fullmetalcoder
7th May 2007, 18:10
Thanks for your links, they helped me understanding this topic a little better. Yet, my problem isn't solved...


I really can't afford placement new syntax
I want my pool to be used by the whole app, including possible plugins...I have crafted a decent memory pool (maybe the Troll could consider adding one to Qt to save use such efforts and potential leaks...) and did some test using overloaded global new/delete operators. It seems to go quite fine but how can I make sure that they will be used by plugins? And is there a way to make them replace those used by external libraries such as Qt (I use quite a lot of string manipulation which are way too allocation consuming...)?

wysota
7th May 2007, 19:59
I don't think it would be very smart to do that (I mean to replace all new calls with placement new). Maybe you could explain what you're trying to achieve and we'll find a solution together?

fullmetalcoder
7th May 2007, 20:05
I don't think it would be very smart to do that (I mean to replace all new calls with placement new). Maybe you could explain what you're trying to achieve and we'll find a solution together?
I don't want to use placement new at all... What I want is to override new/delete calls at application level so that everything goes to a pre-allocated memory pool. If I manage to do this I'll have a significant speed up (pool-managed allocation is 5-10 times faster than "on the fly" allocation). My question was more about the possibility of effectively overriding global memory allocators and a few tests showed that I can achieve what I want (I'm running under Linux). Thus, the next question is on the portability of this method...

wysota
7th May 2007, 21:19
I don't want to use placement new at all... What I want is to override new/delete calls at application level so that everything goes to a pre-allocated memory pool.
Hmm... isn't it what placement new does? :)


If I manage to do this I'll have a significant speed up (pool-managed allocation is 5-10 times faster than "on the fly" allocation). My question was more about the possibility of effectively overriding global memory allocators and a few tests showed that I can achieve what I want (I'm running under Linux). Thus, the next question is on the portability of this method...

Yes, you can achieve that, but the question is - is it worth the effort? You could use the stack as the "allocator" instead of heap as it is much faster (there is no real allocation, the stack is already allocated for the process).Also trying to optimize the algorithm itself might prove simpler to achieve.

gfunk
8th May 2007, 00:28
Sounds like you just want to override global new?



void* operator new (size_t size)
{
void *p=my_pool_alloc(size);
if (p==0) // did malloc succeed?
throw std::bad_alloc(); // ANSI/ISO compliant behavior
return p;
}


I'm not sure plugins would go to this new though, I imagine operator overrides are resolved at compile time?
I definitely agree that thousands of small new calls are very bad - has to bug the OS every time for memory, that's awful/adds up.
But of course, you'll probably run into all sorts of trouble if you ever need to dynamically increase your pool size. I personally would prefer more restricted overrides of new (class-limited) rather than global. Then again, your current needs are probably different.
Contest deadline? :P

wysota
8th May 2007, 01:15
I have a question... What happens if you want to allocate your pool using "new"? :) What happens when you delete an object? I don't think overriding operator new is enough. You'd have to override delete as well and I don't think you can do that in a reliable way.

fullmetalcoder
8th May 2007, 10:04
I have a question... What happens if you want to allocate your pool using "new"? :) What happens when you delete an object? I don't think overriding operator new is enough. You'd have to override delete as well and I don't think you can do that in a reliable way.

As long as the global pool is not set the global overridden new/delete operator call malloc/free
I've of course overriden the global delete operator as well and it does work

wysota
8th May 2007, 10:57
I've of course overriden the global delete operator as well and it does work

Are you sure this is safe? Did you override delete[] and new[] as well? There are many things that seem to work but fail under special conditions.

fullmetalcoder
8th May 2007, 11:14
Are you sure this is safe? Did you override delete[] and new[] as well? There are many things that seem to work but fail under special conditions.
I'm truing something and until now it seems quite safe. However I know there might be issues, especially with some platforms/compilers which would lead to a different behaviour of operators but I think it is worth trying... We'll see. :)

And no, I did not override new[] and delete[] because :

I don't use them anywhere
the default implementation call operator new(size_t) passing it n * sizeof(Type_X) so it does not really matterDid I do something wrong here?

wysota
8th May 2007, 12:07
I don't use them anywhere
But you want to force alien code to use your operators as well, so it might be important to override it.


Did I do something wrong here?
I think you're fine, but it might be worth reimplementing [] operators as well. You never know what different compilers will do. For example allocating using new [] and deleting using delete (without []) crashes Windows applications but not Linux ones, so I suspect there might be some differences here.

fullmetalcoder
8th May 2007, 16:51
But you want to force alien code to use your operators as well, so it might be important to override it.


I think you're fine, but it might be worth reimplementing [] operators as well. You never know what different compilers will do. For example allocating using new [] and deleting using delete (without []) crashes Windows applications but not Linux ones, so I suspect there might be some differences here.
Good point.

Everything looked fine until now but unfortunately I'm facing kinda big trouble : my pool is not thread safe and any time threading appears (even in QLibrary for example) I get a segfault in QMutex...

As the docs says, I've tried using a QMutex and a QMutexLocker in alloc() and dealloc() functions but I end up with this :

QMutex::lock: Deadlock detected in thread -1208572208
and the app hangs forever (without consuming CPU however)...

Any hint? Or will I be forced to keep poor performances? :(:crying:

gfunk
8th May 2007, 17:47
Can you dump out which thread/function holds which mutex?
I would make sure that the alloc/dealloc functions work fine from multiple threads from simple test cases first, before tackling the entire application, in case there are problems with it. Debugging multithreading problems is some of the hardest things to debug.

Then again, if internal Qt things like QLibrary are actually now using your memory allocator, I would not expect everything to work perfectly...

fullmetalcoder
8th May 2007, 18:08
Can you dump out which thread/function holds which mutex?
I don't quite get what you mean... :confused:


I would make sure that the alloc/dealloc functions work fine from multiple threads from simple test cases first, before tackling the entire application, in case there are problems with it.
That's actually what I'm trying to do... Threads, or more precisely mutexes, (that I didn't expected to see BTW) crash it...


Debugging multithreading problems is some of the hardest things to debug.
True enough but I didn't even suspected that using classes like QLibrary or QTextStream in combination with my custom new/delete would cause some QMutex to segfault...


Then again, if internal Qt things like QLibrary are actually now using your memory allocator, I would not expect everything to work perfectly...
Why so? If they use new/delete correctly they won't see any change and if they use malloc/free nothing will change...

Anyway the code I'm working with is available here : http://edyuk.tuxfamily.org/misc/qpool.tar.gz
As you may see there are some (commented) pieces of code dealing with mutex/locks. Unfortunately they all made the app hang...

gfunk
8th May 2007, 19:37
I don't quite get what you mean... :confused:

What's the call stack look like when it hangs? What is thread -1208572208 doing with the mutex?



Why so? If they use new/delete correctly they won't see any change and if they use malloc/free nothing will change...

I don't know if it is bad for sure, I just assumed that it would be risky.
Alternatively, maybe you can just move the QPOOL_INIT to after you use QLibrary, and code it so that QLibrary uses the regular new (when the pool is not initialized). I doubt the performance would be hindered much. I don't know about the other places (QTextStream, etc) though.

fullmetalcoder
9th May 2007, 12:58
What's the call stack look like when it hangs? What is thread -1208572208 doing with the mutex?
I'm not trying to play with threads but one is created by default... What happens is that some class use QMutex to ensure thread-safety and this QMutex segfaults in QMutex::lock() or QMutex::unlock(). When I place a QMutex in the pool class to make it thread safe I end up with a deadlock (according to QMutex) and the app just hangs...


[fullmetalcoder@localhost qpool]$ gdb qpool
GNU gdb Red Hat Linux (6.3.0.0-1.122rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1".

(gdb) run 5000
Starting program: /home/prog/edyuk/trunk/3rdparty/qpool/qpool 5000
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xf55000
[Thread debugging using libthread_db enabled]
[New Thread -1208736048 (LWP 3236)]
QPool::QPool(5000) [0x84e9130]
Current thread : 0x84ed3c8 [-1208736048]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208736048 (LWP 3236)]
0x009a06fc in QMutex::unlock (this=0x84edd0c) at thread/qmutex.cpp:250
250 thread/qmutex.cpp: No such file or directory.
in thread/qmutex.cpp
(gdb) bt
#0 0x009a06fc in QMutex::unlock (this=0x84edd0c) at thread/qmutex.cpp:250
#1 0x00a02937 in qRegisterResourceData (version=1, tree=0x151b620 "", name=0x151bb20 "", data=0x151ca60 "")
at ../../include/QtCore/../../src/corelib/thread/qmutex.h:84
#2 0x014dbe4c in qInitResources_qstyle () at .rcc/release-shared/qrc_qstyle.cpp:15609
#3 0x014dbe8d in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available.
) at .rcc/release-shared/qrc_qstyle.cpp:15612
#4 0x014dbef5 in __do_global_ctors_aux () from /usr/local/Trolltech/Qt-4.2.2/lib/libQtGui.so.4
#5 0x01045ad5 in _init () from /usr/local/Trolltech/Qt-4.2.2/lib/libQtGui.so.4
#6 0x00f63b78 in call_init () from /lib/ld-linux.so.2
#7 0x00f63c76 in _dl_init_internal () from /lib/ld-linux.so.2
#8 0x00f674c3 in dl_open_worker () from /lib/ld-linux.so.2
#9 0x00f637b9 in _dl_catch_error () from /lib/ld-linux.so.2
#10 0x00f66d0a in _dl_open () from /lib/ld-linux.so.2
#11 0x00ddae04 in dlopen_doit () from /lib/libdl.so.2
#12 0x00f637b9 in _dl_catch_error () from /lib/ld-linux.so.2
#13 0x00ddb400 in _dlerror_run () from /lib/libdl.so.2
#14 0x00ddad49 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#15 0x00a3edfb in QLibraryPrivate::load_sys (this=0x84ed5a4) at plugin/qlibrary_unix.cpp:177
#16 0x00a3a1ac in QLibraryPrivate::load (this=0x84ed5a4) at plugin/qlibrary.cpp:445
#17 0x00a3a1ee in QLibrary::load (this=0xbf843590) at plugin/qlibrary.cpp:668
#18 0x080491e9 in main (argc=Cannot access memory at address 0x17529
) at main.cpp:58
(gdb) print d
$1 = (QMutexPrivate *) 0x4edd1500
(gdb)
I don't know if it is bad for sure, I just assumed that it would be risky.
Alternatively, maybe you can just move the QPOOL_INIT to after you use QLibrary, and code it so that QLibrary uses the regular new (when the pool is not initialized). I doubt the performance would be hindered much. I don't know about the other places (QTextStream, etc) though.My problem has nothing to do with the example... I've crafted a quick test case to check that custom allocation would work with file not knowing that a custom new has been defined (ok), shared libs (ok), plugins (crash...).

Edit : New exciting development! QMutex going crazy.... (or is it the owner?)

allocating 152 bytes at 0x81c0cb1...
allocating 36 bytes at 0x81c0d49...
allocating 72 bytes at 0x81c0d6d...
allocating 8 bytes at 0x81c0db5...
allocating 92 bytes at 0x81c0dbd...
allocating 8 bytes at 0x81c0e19...
allocating 24 bytes at 0x81c0e21...

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1208551728 (LWP 4141)]
0x006346fc in QMutex::unlock (this=0x81c0db4) at thread/qmutex.cpp:250
250 thread/qmutex.cpp: No such file or directory.
in thread/qmutex.cpp
(gdb) bt
#0 0x006346fc in QMutex::unlock (this=0x81c0db4) at thread/qmutex.cpp:250
#1 0x00696937 in qRegisterResourceData (version=1, tree=0x1314620 "", name=0x1314b20 "", data=0x1315a60 "")
at ../../include/QtCore/../../src/corelib/thread/qmutex.h:84
#2 0x012d4e4c in qInitResources_qstyle () at .rcc/release-shared/qrc_qstyle.cpp:15609
#3 0x012d4e8d in __static_initialization_and_destruction_0 (__initialize_p=Variable "__initialize_p" is not available.
) at .rcc/release-shared/qrc_qstyle.cpp:15612
#4 0x012d4ef5 in __do_global_ctors_aux () from /usr/local/Trolltech/Qt-4.2.2/lib/libQtGui.so.4
#5 0x00e3ead5 in _init () from /usr/local/Trolltech/Qt-4.2.2/lib/libQtGui.so.4
#6 0x008f6b78 in call_init () from /lib/ld-linux.so.2
#7 0x008f6c76 in _dl_init_internal () from /lib/ld-linux.so.2
#8 0x008fa4c3 in dl_open_worker () from /lib/ld-linux.so.2
#9 0x008f67b9 in _dl_catch_error () from /lib/ld-linux.so.2
#10 0x008f9d0a in _dl_open () from /lib/ld-linux.so.2
#11 0x00aa6e04 in dlopen_doit () from /lib/libdl.so.2
#12 0x008f67b9 in _dl_catch_error () from /lib/ld-linux.so.2
#13 0x00aa7400 in _dlerror_run () from /lib/libdl.so.2
#14 0x00aa6d49 in dlopen@@GLIBC_2.1 () from /lib/libdl.so.2
#15 0x006d2dfb in QLibraryPrivate::load_sys (this=0x81c05a4) at plugin/qlibrary_unix.cpp:177
#16 0x006ce1ac in QLibraryPrivate::load (this=0x81c05a4) at plugin/qlibrary.cpp:445
#17 0x006ce1ee in QLibrary::load (this=0xbfb15060) at plugin/qlibrary.cpp:668
#18 0x08049139 in main (argc=Cannot access memory at address 0x174c1
) at main.cpp:58
As you can see in this log a QMutex object receive a unlock call from some Qt function. So what's wrong? Well, the address of the mutex is just between two blocks of memory allocated previously. Which mean that the internal variable (especially the private component) have not been initialized properly causing a segfault... Next question is to figure out how the address used as QMutex object has been obtained. And why does it work perfectly when I reduce alloc() to malloc() and dealloc() to free()???

I'm really puzzled here...:crying:

Edit 2 : I'm about to get mad... When I recompiled my sample plugin WITHOUT linking it to Qt everything went fine. This is :

REALLY REALLY WEIRD
not a proper solutionbut I'll keep it anyway...

So next problem is to disable Qt garbage collection or to manage to destroy the pool after it... Otherwise I get a segfault on exit because the memory owned by the pool is freed before the calls to destructors... Any idea?