Hey Guys,
I'm looking for an professionell way for parsing Text files.
How to start here with Qt? Is there any "Qt-way" or must I realize this with regular expressions?
My purpose is, parsing C++ header files ...
Hey Guys,
I'm looking for an professionell way for parsing Text files.
How to start here with Qt? Is there any "Qt-way" or must I realize this with regular expressions?
My purpose is, parsing C++ header files ...
I want to extract class, functions and member informations of an header file.
Handmade means some orgies with QRegExp?!
I don't think you have to use QRegExp that much. I'd say, first separate the header file by its whitespace (and the places between words and special characters) to get a list of tokens. Then go through the list with some sort of finite state machine next to it.
"The strength of a civilization is not measured by its ability to wage wars, but rather by its ability to prevent them." - Gene Roddenberry
If You want store some settings, easiest way is to use QSettings.
And what do you want to do with the extracted data?
Not at all! Handmade means crafting a lexer and a parser in C++. The lexer reads a character stream and turns it into a sequence of token (or a token stream). Then the parser analyzes the tokens and builds a tree/tag list/whatsoever you want.
If you don't want to waste time writing a parser from scratch you can have a look at the one used by Qt tools (lupdate or qt3to4, or both) or generate one using yacc/ANTLR or akin.
Current Qt projects : QCodeEdit, RotiDeCode
Storing isn't that problem. An Alghorithm for parsing these files is the problem ...
After cutting all whitespaces, I've to go through it via QRegExp I think ... The best way I think ...
A lexer can be implemented in any number of ways. Using regular expressions is one of the hardest/slowest.
I say to look in the opposite direction, towards finite automata , especially if you are interested in only one language.
Regards
bad idea, you are not aware of the context in which the string is found. some bad examples (FullMetalCoder will be able to explain about NFA and DFA, I suggest you to read about Turing machines as well)
Qt Code:
// class a { int foo; }; #if 0 class a { int foo; }; #endif /* class a { int foo; }; */To copy to clipboard, switch view to plain text mode
Note to JayJay: That's (Non)Deterministic Finite Automata. And while Turing Machines are very interesting, I don't think they're relevant for this at all.
Anyway, I think everyone here is pointing you in the right direction. You need to create a token-stream/list and go through them with a finite automaton. But how formal and elaborate you want to make your parser really depends on what kind of stuff you expect to find in these header files. Are they more predictable than elcuco's example?
"The strength of a civilization is not measured by its ability to wage wars, but rather by its ability to prevent them." - Gene Roddenberry
Hehe... That's just so true! Well, I've already suggested to look into Qt sources. The lexer/parsers used by the porting tool (just checked), which are also used by the HEAD version of KDevelop BTW, are ready to use, though I've never tried to actually use them. If all you is "lesser" parsing (only symbol/tags) put into a tree then you can consider using QCodeModel 2. It's a small module I made for that very usage (for Edyuk) and which works pretty well turning full Qt headers into a tree in about 6 seconds... http://edyuk.svn.sf.net/svnroot/edyu...ty/qcodemodel2 (do a checkout... you can't dl the sources from there...)
Current Qt projects : QCodeEdit, RotiDeCode
Bookmarks