I am currently writing an engine for generic text extraction and transformation. I got the generic lexer and the generic parser to work but I'm a bit unsure about the design to use for the transformation layer.
When the parser finds a match it notifies an handler with a match identifier and a list of matched tokens (some more data could be passed but at the moment I see no use for that).
Now the question is : how could the actions to perform upon a given match (taking the matched tokens into account of course) be specified?
Ideally I'd want to avoid resorting to a full-blown scripting language as I want the engine to be as fast as possible (currently the lexer can handle 1.4Mb of C++ in less than 100ms on my laptop, 40% of it being spent loading the data from disk to RAM and the parser has can extract the whole structure of qwidget.h in 5ms (smaller file but much more complex to handle)).
Is there any spec for such text transformations or do I have to create my own? If so do you have any ideas to create something that would be relatively easy to implement and easy to understand and use for people not necessarily versed in programming.
As a side note, here are two things I would expect that spec to be able to describe :
- creation of a class tree from code
- autogeneration of SIP bindings files from C++ sources
Bookmarks