PDA

View Full Version : Parse Text File with Qt



rhf417
13th May 2010, 21:46
Hi All,

I need to parse the text file below and process the data using Qt. However, I do not have such an experience and do not know what is best way to handle this job in Qt.

Here is an example of my text file:

Frame(BoardList(Board(ID("Board1"),
TestPointList((NO(1),
X(1m),
Y(1m),
Z(0)),
(No(2),
X(2m),
Y(2m),
Z(0)),
(No(3),
X(2.5m),
Y(2.5m),
Z(0))),
NetList((ID("Net1"),
PinList((ID("Pin1")),
(ID("Pin2")))),
(ID("Net2"),
PinList((ID("Pin3")),
(ID("Pin4")),
(ID("Pin5")))),
(ID("Net3")))),
(Board(ID("Board2"),
...
)))

This text file is used to describe the data of PCB boards. Basically, each file contains a frame, which contains a list of board. Each board contains a list of test points and a list of Nets. Each Net contains a list of Pins.

In the example above, there are two boards. The first board "board1" contains three test points ( (1,1m,1m,0),(2,2m,2m,0) and (3, 2.5m, 2.5m, 0) ) and three nets. The first net "Net1" contains two pins ("Pin1" and "Pin2"). The second net "Net2" contains three pins ("Pin3", "Pin2", "Pin3"). The third net "Net3" contains no pin.
The content of the second board is omitted here.

My job is to
1) Parse a given text file and get the data (e.g. boards, test points and nets) out of it.
2) Process the data in the memory. For example, add/remove/modify a net or a test point.
3) Save the updated data back into the text file.

Any guide will be appreciated. Thanks.


Rice

squidge
13th May 2010, 23:11
Although you could probably use QFile and QString for crude parsing, a proper lexical analyser would be better:

http://dinosaur.compilertools.net/lex/index.html

ChrisW67
14th May 2010, 03:14
Given that the source file has some grammatical structure you could use the traditional lex/yacc approach to build a parser. You could also try QLALR for the Qt experience.

http://labs.trolltech.com/page/Projects/Compilers/QLALR
http://qt.nokia.com/developer/qtquarterly/qlalr-adventures

rhf417
14th May 2010, 18:49
Is there an easier way? For example, will QRegExp be capable of doing my job?
I do not want to parse the text file from scratch (e.g. use QFile and QString). I am interested in QLALR. But it is hard to start with QLALR due to the poor documentations.
Traditional lex/yacc is powerful. However, it requires additional learning process for a person (like me) who never use it before. I am not sure whether it is worth learning lex/yacc for this simple job I have.

BTW: My working environment is Visual C++ 2008 and Qt4.4.

squidge
14th May 2010, 19:09
I don't know, from the file it seems like a flexible language, so you can't really use a regexp IMO. I'm guessing the above file isn't everything you need to support, and other files may be written out slightly differently.

What would be better than the above file is the file format specification document; then you know what you need to support.

tbscope
14th May 2010, 21:29
Here's an example.
Of course, I made this in just a couple of minutes to demonstrate how you can parse this file.
There are tons of places that can benefit from improvements, but that is up to you.
And I only used a few elements of your file, not all, that too is up to you.

MainWindow.h


#ifndef MAINWINDOW_H
#define MAINWINDOW_H

#include <QMainWindow>

namespace Ui {
class MainWindow;
}

struct BoardElement {
QString id;
};

struct FrameElement {
QList<BoardElement *> boardElements;
};

class MainWindow : public QMainWindow {
Q_OBJECT
public:
MainWindow(QWidget *parent = 0);
~MainWindow();

protected:
void changeEvent(QEvent *e);

private slots:
void startParsing();

private:
Ui::MainWindow *ui;
};

#endif // MAINWINDOW_H


MainWindow.cpp

#include "mainwindow.h"
#include "ui_mainwindow.h"

#include <QFile>
#include <QTextStream>
#include <QDebug>

MainWindow::MainWindow(QWidget *parent) :
QMainWindow(parent),
ui(new Ui::MainWindow)
{
ui->setupUi(this);

connect(ui->pushButton, SIGNAL(clicked()), this, SLOT(startParsing()));
}

MainWindow::~MainWindow()
{
delete ui;
}

void MainWindow::changeEvent(QEvent *e)
{
QMainWindow::changeEvent(e);
switch (e->type()) {
case QEvent::LanguageChange:
ui->retranslateUi(this);
break;
default:
break;
}
}

void MainWindow::startParsing()
{
QFile file("./file.txt");

if (!file.open(QFile::ReadOnly))
return;

QTextStream ts(&file);

FrameElement *frameElement;
BoardElement *boardElement;

QChar ch;
int depth = 0;
int currentState = 0;
//states tell the parser which element it is parsing.
// 1 = Frame element
// 2 = BoardList element
// 3 = Board element
// 4 = Id element
// 5 = ame element (between "")

QList<int> stateList;

QString element = "";
do {
ts >> ch;

if (ch == '(') {
++depth;

if (element == "Frame" || element == "frame") {
stateList.append(1);
frameElement = new FrameElement;
qDebug() << "Found a Frame element";
}
if (element == "BoardList" || element == "boardlist") {
stateList.append(2);
qDebug() << "Found a BoardList element";
}
if (element == "Board" || element == "board") {
stateList.append(3);
if (!frameElement) {
qDebug() << "Parsing error: no frame element found!";
return;
}
boardElement = new BoardElement;
frameElement->boardElements.append(boardElement);
qDebug() << "Found a Board element";
}
if (element == "Id" || element == "id" || element == "ID") {
stateList.append(4);
qDebug() << "Found an Id element";
}


element = "";
}
else if (ch == ')') {
--depth;

if (element.startsWith("\"")) {
if (stateList.count() < 2) {
//parsing error, the list should have at least 2 items
return;
}
int currentState = stateList.at(stateList.count() - 2); // Go back two states, the previous is the id, the one before the id
// tells wich object receives the name
qDebug() << "Found a name element in state " << currentState;

element = element.section("\"",1,1);
if (currentState == 3) {
boardElement->id = element;
qDebug() << "Set the board element id to:" << boardElement->id;
}

}

if (!stateList.isEmpty())
stateList.removeLast();
element = "";
}
else if (ch == '\r' || ch == '\n' || ch == ' ') {
// do nothing
// If you are able to use spaces in names, then create a boolean value to check if you're parsing a name.
// If you do (boolean = true), do not swallow spaces but add them to the parsed string.
// If you don't (boolean = false), swallow the spaces
}
else {
element += ch;
}
} while (!ts.atEnd());

file.close();
}

wysota
15th May 2010, 12:37
The problem with this parser is that you expect the file to have all those elements - it is fitted to this exact file (or to be precise to the exact keywords used here). But this is not sufficient in general case.

The language used here is predicate-based. It seems like prolog, lisp or lua (and probably dozen others that I am not familiar with). Predicates themselves are surely parsable by regexp but lexical analisys is one thing (and this could be done by QRegExp or even using code suggested by tbscope) but semantical one is a completely different thing. In the end you need to have a proper parser - either your own (be it hand-crafted or auto-generated) or from a library supporting this actual language (whatever it is).

ChrisW67
16th May 2010, 08:26
Is there an easier way? For example, will QRegExp be capable of doing my job? Bits of the task certainly. However, matching whole, nested compund structures where deelimiters must be balanced rapidly becomes very difficult.

I do not want to parse the text file from scratch (e.g. use QFile and QString). I am interested in QLALR. But it is hard to start with QLALR due to the poor documentations.
Traditional lex/yacc is powerful. However, it requires additional learning process for a person (like me) who never use it before. I am not sure whether it is worth learning lex/yacc for this simple job I have.
Fair call on the QLALR documentation... there's a high level of assumed knowledge in the examples that are around. You might like the better documented GOLD Parsing System (http://www.devincook.com/).

I think the effort of learning lex/yacc, or the theory of operation, is well worth it for the potential to help in many future projects. However, time and effort required to learn this must be offset by your circumstance. Are you parsing one file, or thousands? Are they very variable (numbers of net-lists etc.)? Is the grammar we can see only a subset of a larger possible grammar?

rhf417
17th May 2010, 16:04
I did some research on the parser generator these days and found a good tool - "Visual Parser++" which is easy to learn. I am working under windows and C++. I feel that Visual Parser++ is very convenient to use. I have parsed my text file with it successfully. The only drawback is that there will be no support since Sandstone does not exist anymore.

Basically, the text file shown above is the only file I need to handle. I only need to parse one file at a time, not thousand of them. The grammar you can see here is not all, but the rest are very similar. I would like to spend more time to learn Lex/Yacc and QLALR later, but not now, I guess.

For me, tbscope's method is the one I would like to apply to my current project. It looks quite straightforward and does not require any additional tools. Thank you very much. Thanks all for all the valuable advices.

wysota
17th May 2010, 16:30
Basically, the text file shown above is the only file I need to handle. I only need to parse one file at a time, not thousand of them.
It's not the number of files you wish to parse at a time that may be a problem. The problem is you may want to parse different files (with different contents) in general.

rhf417
17th May 2010, 18:00
It's not the number of files you wish to parse at a time that may be a problem. The problem is you may want to parse different files (with different contents) in general.


I understand. If there are many different types of files to parse, a parser generator would be the better choice.

squidge
17th May 2010, 19:34
Be careful and do your research. I've fallen into this trap before - build a parser for the files I had that worked brilliantly. Then, a month later I had some more files and it only parsed one of them correctly. I didn't then have the time to fix the program properly to parse them all, so hacked it up to make it work. Do that a few times and you have a nightmare of an application to maintain or a range of application that parse different variations.

Its so much easier to do it right from the beginning.

markanth
26th June 2013, 03:47
Does anyone know where to find an old copy of Visual Parse++? It is a great tool to teach parsing. We have been searching for it, and all we can find are URLs that point to the no longer existing Sandstone site.