Results 1 to 13 of 13

Thread: Problem in converting Large QDomDocument to QByteArray

  1. #1
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Unhappy Problem in converting Large QDomDocument to QByteArray

    Hi everyone,


    I am using "QDomDocument" class to create XML out of my database having more than 5 lakh records. Following is the code I am using to achieve this:-

    Qt Code:
    1. QDomDocument XMLdoc;
    2. //Put all the records inside XMLdoc
    3. QByteArray XmlByte = XMLdoc.toByteArray(); //Here is where I am facing the problem
    To copy to clipboard, switch view to plain text mode 

    So the issue is if my database has less than 5 lakh records (say 3 lakh), then "XMLdoc.toByteArray()" is completely fine.
    But if my database has >= 5 lakh records, then "XMLdoc.toByteArray()" hangs. I am not able to figure out how to solve this. Is this due to some memory issue or something?

  2. #2
    Join Date
    Aug 2008
    Location
    Ukraine, Krivoy Rog
    Posts
    1,963
    Thanked 370 Times in 336 Posts
    Qt products
    Qt3 Qt4 Qt5
    Platforms
    MacOS X Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    What does "lakh" mean?
    Qt Assistant -- rocks!
    please, use tags [CODE] & [/CODE].

  3. #3
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by spirit View Post
    What does "lakh" mean?
    1 lakh = 100,000

  4. #4
    Join Date
    Oct 2009
    Posts
    483
    Thanked 97 Times in 94 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    If I am not mistaken QByteArray always stores its contents in a single continuous memory block, which probably leads to many reallocations as XMLdoc is serialized. Clearly this approach does not scale. Why do you need the QByteArray for? If its only use is to be written to a file (or any QIODevice), I suggest an alternative: set up a QTextStream on top of that device and call QDomNode::save() for XMLdoc. That way the serialized representation will be progressively computed and written to the device and never be stored completely in memory.

    Note that your approach still stores the whole database in memory (QDomDocument) but at least QDomDocument does not use a single continuous memory block. If this is not the primary representation of your data, it may even be better to use QXmlStreamWriter directly, without the need to build a QDomDocument.

  5. #5
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by yeye_olive View Post
    If I am not mistaken QByteArray always stores its contents in a single continuous memory block, which probably leads to many reallocations as XMLdoc is serialized. Clearly this approach does not scale. Why do you need the QByteArray for? If its only use is to be written to a file (or any QIODevice), I suggest an alternative: set up a QTextStream on top of that device and call QDomNode::save() for XMLdoc. That way the serialized representation will be progressively computed and written to the device and never be stored completely in memory.
    Well, you are correct regarding the memory point. But I still need QByteArray, because I need to send the XML over a TCP socket.

    Note that your approach still stores the whole database in memory (QDomDocument) but at least QDomDocument does not use a single continuous memory block. If this is not the primary representation of your data, it may even be better to use QXmlStreamWriter directly, without the need to build a QDomDocument.
    Yaa, but I found QDomDocument much easy and simple to use rather than QXmlStreamWriter. But tell me one thing, my program is going to run in an embedded system. So as you have pointed, from memory point of view I should be using QXmlStreamWriter rather than QDomDocument, right? Because in embedded systems, memory and speed are very critical factors.

  6. #6
    Join Date
    Oct 2009
    Posts
    483
    Thanked 97 Times in 94 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by sattu View Post
    Well, you are correct regarding the memory point. But I still need QByteArray, because I need to send the XML over a TCP socket.
    This is debatable. The approach I outlined works for any QIODevice, and QTcpSocket derives from QIODevice (and even if for some reason you did not use QTcpSocket but another interface to TCP sockets, you could still write a QIODevice wrapper for it). But you have a point. Since QTcpSocket works asynchronously, data is sent when control goes back to the event loop. Calling QDomNode::save() would therefore serialize the QDomDocument completely in memory, which is essentially the same as your current approach.

    Therefore you should not serialize the whole document in one go, but progressively as needed. For example you could write some data and wait for the socket to emit its bytesWritten() signal before sending more. You may want to check that the socket's bytesToWrite() is below a certain threshold before deciding to send more data. I think that QXmlStreamWriter will be well adapted to this approach (read below).

    Quote Originally Posted by sattu View Post
    Yaa, but I found QDomDocument much easy and simple to use rather than QXmlStreamWriter. But tell me one thing, my program is going to run in an embedded system. So as you have pointed, from memory point of view I should be using QXmlStreamWriter rather than QDomDocument, right? Because in embedded systems, memory and speed are very critical factors.
    Right, QXmlStreamWriter has the advantage that you do not need to build the whole XML representation in memory. So, to expand on my comment above, here is what I suggest:
    - Define a class responsible for reading the database, serializing its XML representation and sending it over TCP.
    - You will be reading the database progressively. Add member fields allowing you to track what you have already read. For example, if your database is a container such as QVector, you could use a const_iterator referring to the next element to read.
    - Set up a QXmlStreamWritter on top of the QTcpSocket.
    - Write a method sendSomeData() that -- well -- sends some data. In the QVector example, the function could read an element, advance the iterator, and serialize that element by calling the appropriate methods of the QXmlStreamWritter. It may be useful to check the socket's bytesToWrite() to decide when you think you have written enough data.
    - Begin sending some data. For example you can call sendSomeData() once.
    - Use bytesWritten() and bytesToWrite() for the socket as explained above to decide when to call sendSomeData() again.
    - When sendSomeData() has nothing left to do, it can e.g. emit a signal sendingDone() to let the rest of the application know that everything has been sent.

    Using this approach you can send the whole data over TCP without ever allocating too much memory.

  7. #7
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Thanks a lot yeye_olive. Your approach certainly would save a lot of my memory. And yes, taking your streaming hint, I put my QByteArray inside a QTextStream (just for now to test) and then i did XMLdoc.save(QTextStream). Now it works perfectly fine for any length of data without any issues.

    Quote Originally Posted by yeye_olive View Post
    - You will be reading the database progressively. Add member fields allowing you to track what you have already read. For example, if your database is a container such as QVector, you could use a const_iterator referring to the next element to read.
    Well, this i can't do as i need to put all the records inside a single Parent node. My XML structure is like:-

    Qt Code:
    1. <MyParent>
    2. <record>
    3. <UserID> 1 </UserID>
    4. <UserName> XYZ1 </UserName>
    5. <UserType> Admin </UserType>
    6. </record>
    7. <record>
    8. <UserID> 2 </UserID>
    9. <UserName> XYZ2 </UserName>
    10. <UserType> User </UserType>
    11. </record>
    12. <record>
    13. <UserID> 3 </UserID>
    14. <UserName> XYZ3 </UserName>
    15. <UserType> User </UserType>
    16. </record>
    17. .
    18. .
    19. .//and so on. Each <record></record> represents a single record in the table.
    20. .
    21. .
    22. .
    23. </MyParent>
    To copy to clipboard, switch view to plain text mode 

    So, if i keep reading the database progressively and then send over the socket, then I think i would need to have many "MyParent" nodes instead of just one. I am not sure though because I have never used "QXmlStreamWriter" class.

    What I was thinking was:-
    1) First to form the entire XML from all the records available.
    2) Then send packet by packet from the formed XML over the socket.
    What do you say?

  8. #8
    Join Date
    Oct 2009
    Posts
    483
    Thanked 97 Times in 94 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by sattu View Post
    So, if i keep reading the database progressively and then send over the socket, then I think i would need to have many "MyParent" nodes instead of just one. I am not sure though because I have never used "QXmlStreamWriter" class.
    You can achieve the result you want with QXmlStreamWriter:
    1. Initialize the document with
    Qt Code:
    1. streamWriter.writeStartDocument();
    2. streamWriter.writeStartElement(QString::fromAscii("MyParent"));
    To copy to clipboard, switch view to plain text mode 
    2. Each time you want to add a record, do
    Qt Code:
    1. streamWriter.writeStartElement(QString::fromAscii("record"));
    2. streamWriter.writeTextElement(QString::fromAscii("UserID"), QString::number(theUserID));
    3. streamWriter.writeTextElement(QString::fromAscii("UserName"), theUserName);
    4. streamWriter.writeTextElement(QString::fromAscii("UserType"), theUserType);
    5. streamWriter.writeEndElement();
    To copy to clipboard, switch view to plain text mode 
    3. When you have written all the records, do
    Qt Code:
    1. streamWriter.writeEndElement();
    2. streamWriter.writeEndDocument();
    To copy to clipboard, switch view to plain text mode 
    Have a look at QXmlStreamWriter's api to see how you can choose the codec and auto formatting options (line breaks, indentation). What you need to do is monitor bytesWritten() and bytesToWrite() on the QTcpSocket to gradually process the records. Use an iterator (or the equivalent for the data structure you are using) to keep track of the next record to process.

    Quote Originally Posted by sattu View Post
    What do you say?
    I would not do that as it builds a huge data structure in memory when you can easily generate it on-the-fly with QXmlStreamWriter as explained above.

  9. The following user says thank you to yeye_olive for this useful post:

    sattu (22nd August 2012)

  10. #9
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Ok, I got your idea. I tried using "QXmlStreamWriter" class the way you told. It's working great. But till now, I haven't told you regarding the protocol that we follow in our socket programming.
    What we ultimately send through the socket is (HEADER + ACTUAL_DATA).
    1) HEADER contains the info regarding the length of ACTUAL_DATA.
    2) ACTUAL_DATA is the XML data that we need to form and send.

    So at the receiver end:-
    1) We first check the HEADER value to get the length that we actually need to receive.
    2) Then we keep looping for ACTUAL_DATA, until we have received the total no.of bytes specified in HEADER.
    So, if we use "QXmlStreamWriter" class, then how do we know in advance regarding the length of the TOTAL XML data that would be formed?
    What ultimately is the requirement is that, at receiver end we need to loop continuously until we have received the entire XML data. Earlier we had thought of using 'EOF' character to know that we have received the entire data. But we had problems with that, so we switched to adding a HEADER before ACTUAL_DATA so that we know in advance how much data we are actually going to receive.
    So, can you suggest any change in our protocol so that we can use "QXmlStreamWriter" class and simultaneously meet the objective of informing the receiver when it has received the complete data?

  11. #10
    Join Date
    Oct 2009
    Posts
    483
    Thanked 97 Times in 94 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Well, the two usual ways to deal with the "message framing" problem of TCP are indeed using a delimiter or prefixing with the length of the message.

    With your current approach (prefixing with the length of the message), I am afraid that you need to serialize the whole data before sending any of it, because you need to determine and send the size first. You can still improve over your original "XMLdoc.toByteArray()" solution with any of these options:
    1. Use QXmlStreamWriter on top of a QByteArray (see the QXmlStreamWriter::QXmlStreamWriter(QByteArray *) constructor). This will allocate a huge QByteArray and serialize to it but at least you will not build a huge QDomDocument. There is still the problem of the periodic reallocations of the QByteArray as it grows. Once the QByteArray is ready, send its size through the QTcpSocket, then send its contents progressively using bytesWritten() and bytesToWrite() to avoid duplicating the whole QByteArray in the socket's internal buffer.
    2. Same as 1, but call QByteArray::reserve() with an overestimate of the final size to avoid reallocations. It may be feasible in your case, it all depends on whether the size of a serialized record and the number of records are predictable.
    3. (More sophisticated.) Make QXmlStreamWriter operate on a custom QIODevice that stores the data in a linked list of QByteArrays. When more storage space is needed, a new QByteArray is allocated (and QByteArray::reserve() is called on it with a suitable value, e.g. the double of the size of the previous QByteArray) and simply appended to the linked list, which does not require any reallocation. Keep track of the total number of bytes written since the beginning and of the number of remaining bytes pre-reserved on the last QByteArray to know when to allocate a new one. When everything is serialized, send the total length, then the data progressively as usual. You will need a const iterator to know which QByteArray of the list is currently being sent, and an integer to know the position of the next byte to send in that QByteArray.

    The other approach (using a delimiter) is much easier here. If changing the protocol is still an option, I would suggest you do that. It is well-adapted to cases like this one in which you cannot compute the size of a message in advance, which is typical with text-based serializations like XML. Since XML does not use the NUL ('\0') character, why don't you use that as a delimiter? Then all you have to do on the writing end is use the QXmlStreamWriter as you currently do and, when everything is serialized, write an additional byte '\0' to the socket. On the receiving end, you can read the incoming data blocks and scan them to stop just before the '\0' byte. You can send all these blocks to a QXmlStreamReader (using QXmlStreamReader::addData()) and pull the XML tokens to gradually rebuild the structure.

  12. #11
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Actually olive, using a delimiter isn't feasible as many times we need to transfer binary data. There is a possibility of the delimiter being present in the ACTUAL_DATA.
    So, your first option looks the best but then again it's not possible to serialize the whole data before sending because there are chances of memory segmentation happening when no.of records are huge. So we are planning to add the following modifications to our protocol:-

    1) Reading from the database, forming the XML and then sending over the socket.....everything will happen frame by frame as told by you.
    2) Each frame will have the usual HEADER containing the length of ACTUAL_DATA present in that particular frame.
    3) Modification:- The first frame will have the additional info of the TOTAL NO.OF RECORDS that we would be send gradually. So at the receiver end we would keep looping until the no.of parsed records becomes equal to the value present in the HEADER of first frame.
    4) I am planning to skip the following step suggested by you. I hope this shouldn't be an issue:-
    Qt Code:
    1. streamWriter.writeStartDocument();
    2. streamWriter.writeEndDocument();
    3. Reason- I don't want the following header in my XML: <?xml version="1.0" encoding="UTF-8"?>
    To copy to clipboard, switch view to plain text mode 

  13. #12
    Join Date
    Oct 2009
    Posts
    483
    Thanked 97 Times in 94 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by sattu View Post
    Actually olive, using a delimiter isn't feasible as many times we need to transfer binary data. There is a possibility of the delimiter being present in the ACTUAL_DATA.
    You could use a hybrid protocol using length-prefixing for some messages and delimiters for others. Just prefix each message with a byte indicating which one of the two conventions is used for that message. This allows you to choose length-prefixing for binary data with predictable length, and delimiters for XML. This is not difficult to do, and may be the best approach in your situation.

    Quote Originally Posted by sattu View Post
    So, your first option looks the best but then again it's not possible to serialize the whole data before sending because there are chances of memory segmentation happening when no.of records are huge. So we are planning to add the following modifications to our protocol:-

    1) Reading from the database, forming the XML and then sending over the socket.....everything will happen frame by frame as told by you.
    2) Each frame will have the usual HEADER containing the length of ACTUAL_DATA present in that particular frame.
    3) Modification:- The first frame will have the additional info of the TOTAL NO.OF RECORDS that we would be send gradually. So at the receiver end we would keep looping until the no.of parsed records becomes equal to the value present in the HEADER of first frame.
    OK, that works too. Besides the receiving end can use the total number of records to optimize the allocation of the internal structures for storing the database.

    Quote Originally Posted by sattu View Post
    4) I am planning to skip the following step suggested by you. I hope this shouldn't be an issue:-
    Qt Code:
    1. streamWriter.writeStartDocument();
    2. streamWriter.writeEndDocument();
    3. Reason- I don't want the following header in my XML: <?xml version="1.0" encoding="UTF-8"?>
    To copy to clipboard, switch view to plain text mode 
    Is there a good reason for not including this small header? Frankly it does not weigh much compared to the huge database. The documentation for QXmlStreamWriter does not explicitly state that what you suggest is forbidden, but it does not state that it works as you expect either. I am more concerned about the receiving end. By removing this header (called the XML declaration), you remove the information about the text encoding. It seems that QXmlStreamReader relies on the XML declaration since it does not offer a way to set a codec manually. If I were you I would keep the XML declaration and save myself some trouble.

    Finally, if you can completely change the protocol, why do you use XML in the first place? You could encode the database to binary data. For example:
    Database = [number of records (integer)][record 1][record 2]...
    Record = [UserID (integer)][UserName (string)][UserTye (integer)]
    integer = big-endian 32-bit unsigned integer
    string = UTF-8 encoded string followed by NUL byte

    This format would also be suited to progressive serialization and deserialization.

  14. The following user says thank you to yeye_olive for this useful post:

    sattu (22nd August 2012)

  15. #13
    Join Date
    Sep 2010
    Location
    Bangalore
    Posts
    169
    Thanks
    59
    Qt products
    Qt4 Qt/Embedded
    Platforms
    Unix/X11 Windows

    Default Re: Problem in converting Large QDomDocument to QByteArray

    Quote Originally Posted by yeye_olive View Post
    It seems that QXmlStreamReader relies on the XML declaration since it does not offer a way to set a codec manually.
    Actually at receiver end we use vb.net. And it works fine without the XML declaration part.

    Quote Originally Posted by yeye_olive View Post
    Finally, if you can completely change the protocol, why do you use XML in the first place? You could encode the database to binary data.
    Yaa, we have kept this as an option. Rather than fetching the records manually, putting the field names and values into a container and then forming the XML out of it, we could directly send the entire database binary file over the socket. At the receiver end we could save it to another database file and then extract the field values there itself. This could save a lot of overhead in case of huge transactions.

Similar Threads

  1. Problem with QDomDocument
    By justin123 in forum Qt Programming
    Replies: 2
    Last Post: 18th March 2011, 08:40
  2. converting QByteArray to unsigned short
    By sattu in forum Qt Programming
    Replies: 16
    Last Post: 28th September 2010, 14:51
  3. QByteArray to new QDomDocument
    By cknoblock in forum Qt Programming
    Replies: 0
    Last Post: 5th November 2009, 22:37
  4. QDomDocument problem !!!
    By probine in forum Qt Programming
    Replies: 1
    Last Post: 14th December 2006, 23:30
  5. Problem converting .ui files from Qt3 to 4
    By Amanda in forum Qt Programming
    Replies: 6
    Last Post: 28th October 2006, 05:34

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.