PDA

View Full Version : tricks when designing project file formats?



pir
19th July 2006, 10:03
Hi!

My program consists of a 3d scene with a lot of objects and settings. I want to write the contents of the whole scene into a project file, like in Maya or Photoshop. So I need to write down all information etc...

I suppose it can be done quite straight forward. But I wonder if someone know about any good siteś about this? Maybe there are some nice techniques that can make this process less painfull...


Painfull stuff I would like to solve:
If I create this project file, and then decide to make some changes to the program like removing information or adding information. This change in information would then make the project file format to change. This would make the old project file formats unreadable... but there must be some nice workaround for this...



thanks for reading
pir

Cesar
19th July 2006, 11:16
You might consider using xml for project file. This way you will avoid problems with different format versions: older program versions won't recognise new tags/properties and will omit them. To reduce the size of your project file you might qCompress() it before writing to disc.

gfunk
19th July 2006, 23:18
Yep, get on board the XML train! Everyone's doing it. It's cool. :cool:
Alternatively, you could try a table/database (as with QtSql's SQLite support). But that could be overkill.

wysota
19th July 2006, 23:35
Most file formats consist of two parts.

First part is the header which is responsible for identification of the file format -- it often contains some "magic number", version information, author data, etc and it should have a constant length (or its length should be stored somewhere near the beginning of the file).

The second part is responsible for keeping the data itself. The way you implement it depends only on you. This data storage part is often organised as a vector of chunks, where each chunk has its internal structure (chunk header and body which can differ depending on the chunk type). Each of the chunks holds some part of data and their meaning and behaviour depends on the type of data it represents.

Chunks are especially nice if you want your file format to be incremental -- then you can add a new chunk of data on the end of the file without the need to rewrite the already existing part. It is convenient to have a constant sized header in such a situation (so that you can be sure you won't have to rewrite the whole file just to replace one byte in the header).

Other types of file formats (like BMP for example) are a plain dump of held data -- in case of BMP it is the header containing picture size, then the colour lookup table for non-truecolour images and then a dump of pixell data (upside down and in BGR format, AFAIR).

Additional thing to remember is that often files contain data in some compressed format and it is important to choose the compression algorithm depending on the characteristics of data representation (for example mentioned qCompress can give much worse results than some other compression).

pir
20th July 2006, 11:13
I suppose I can wait a while with the compression issue?

I have looked around a bit now, what you're saying makes sence. What about using serialization to create the project file? It seems like this teqnichue uses all the tricks you've mentioned.

I've found a library called Boost, which seems to be quite common ( could install it from my Suse installation disc ). But I don't know if it's platform independent... or what to call it ( should be able to compile on at least the most common Linux ditrs. Mac OS and Windows ).

Have you used this library or some similar for serialization? Is there any drawbacks using serialization? Is it for example possible to compress a serialized file?

thanks for reading and answering
pir

wysota
21st July 2006, 09:25
If you use Qt (and I think you do), you can do the serialisation through QDataStream without any need for additional dependencies, although Boost is ok. Serialisation is only part of the job that needs to be done. You'll have to provide some headers anyway and it would be smart of you if you implemented the file format so that it allows some flexibility and plain serialisation won't be enough here -- you'll have to do a bit of coding yourself.

pir
21st July 2006, 09:44
I'm not so good at this file business, so I would be grateful if you could explain what you mean a bit closer... why do I need a more specialized format? I've looked at the boost documentation and there are some kind of functionality for handling different versions of the saved and loaded classes. Is there a problem when using boost serialization and having large chunks of polygon data?

My plan so far is to just serialize the data and write it to a file... and maybe compress it someway if that is a good idea... have I missed something?

I use Qt, but I'm a bit fanatic about not using Qt on operations that can be handled without it. The reason is that my program has evolved from a mess of C code using Motif into C++ using Qt. It was a real pain to find the real program in all the motif code. I thought that if someone need to replace Qt with something else in the future it would be nice to be able to do this as clean as possible. But of course, there are limits to how much you can separate the code...



thanks
pir

wysota
21st July 2006, 11:06
I'm not so good at this file business, so I would be grateful if you could explain what you mean a bit closer... why do I need a more specialized format?
Because serialisation will only allow you to read/write the object data, nothing more. If you just want to dump your data structures to disk, then go ahead, but please don't call it a file format :)


I've looked at the boost documentation and there are some kind of functionality for handling different versions of the saved and loaded classes. Is there a problem when using boost serialization and having large chunks of polygon data?
It's not a problem, as I said, boost is fine. But serialisation of objects is only a part of the job which needs to be done (and at least to me a plain dump is a bit controversial, I doubt you really need all data from the structures dumped into the file). The least you should do is to define a proper header -- the file type.



My plan so far is to just serialize the data and write it to a file... and maybe compress it someway if that is a good idea... have I missed something?
"It depends". Everything that works is ok, but many things could be done better, right?



I use Qt, but I'm a bit fanatic about not using Qt on operations that can be handled without it. The reason is that my program has evolved from a mess of C code using Motif into C++ using Qt. It was a real pain to find the real program in all the motif code. I thought that if someone need to replace Qt with something else in the future it would be nice to be able to do this as clean as possible. But of course, there are limits to how much you can separate the code...

QDataStream serialises objects in a well documented way and for your own objects it is you who writes the serialisation routines so you can control the process. If you use boost, the very same thing you wrote above applies, just substitute occurences of "Qt" with "Boost" -- you still get a dependency. I don't know how about you but I tend to avoid dependencies which are only there because I need a single function call from a library...

pir
21st July 2006, 11:38
What you're saying makes sense about using boost or Qt. Also a good point about wether it is so smart to use an additional library when only using one function from it. One thing is that I find Qt a bit hard to compile because of qmake and it's project files... don't want to dive into that more than I need to right now. Also Boost seems to have a lot of other nice libraries, so I will use more of it later, maybe... plus it seems like it's more available... I could simply install it from my installation disc. But, point taken. Actually thought of that myself.

Sorry about using the format word. As I said before, I'm new to files. I've always felt a bit unconfortable when working with files. What defines a format? Is the expression "file format "equivalent with "file type"?

You have to exuse my slow mind... I still don't get what you're after when talking about defining the structure, or what to call it, of the file instead of making a dump... Ok, by dumping I don't mean push absolutely everything into the saved file... and about the headers you're talking about... what would they contain? The version is included by Boost... have I forgotten something?

sorry about the large body of text, I didn't insert any quotes because I think it would make it worse...
thanks
pir

wysota
21st July 2006, 12:44
Sorry about using the format word. As I said before, I'm new to files. I've always felt a bit unconfortable when working with files. What defines a format? Is the expression "file format "equivalent with "file type"?
It's not about the word. All I meant was that a "file format" has to be something more than a plain dump of data structures.



I still don't get what you're after when talking about defining the structure, or what to call it, of the file instead of making a dump... Ok, by dumping I don't mean push absolutely everything into the saved file... and about the headers you're talking about... what would they contain? The version is included by Boost... have I forgotten something?

The header should contain some information about the file type itself, so that its properties can be easily extracted by some external applications and so that the file type can be recognized. Furthermore if you just make a plain dump, you are forcing yourself to change the file format everytime the structure of your data changes. But it is your choice, I don't say it's wrong, I only say you strip yourself from flexibility.

pir
21st July 2006, 13:08
Well, it's my choice at present... while I'm trying to get things working... actually I'm thinking about what is needed and what information mustn't be saved. Some stuff would screw up the program if I saved it. So saving the complete state would not be that good.

Thanks, I got some grip of what you're saing.
pir