QRegExp for octal escape sequence in QString [Archive]

bmn

6th January 2016, 08:30

Hello,

I have a QString which contains octal escape sequences that i have to parse.

A string can for example look like this:

"test123/032test/032/0352"

which should result in

"test123 test #2"

after i am done with replacing the octal escape sequences.

I tried to use QRegExp for this, but I am not able to filter the sequences.

Would really appreciate any help.

Thank you in advance.

yeye_olive

6th January 2016, 08:55

I tried to use QRegExp for this, but I am not able to filter the sequences.
What have you tried? Please show some code and we will help you towards a solution.

You do not have to use QRegExp (or QRegularExpression if you use Qt 5) for this. A hand-written parser with a basic state machine does the trick.

bmn

6th January 2016, 09:25

Thank you for your response.

I tried the following way, but since the escape sequence is treated as a single character it is obviously not working.

[\\\d\d\d]{4,4}

After I convert the given escape sequence to a char, '\032' will lead to 26 decimal, which is not what i want, since it should be treated as 32 decimal to represent a ' ' (space) in ascii.
But I can overcome this by simple adding 6 to whatever result i will get.

char tmp = myStr.at(i).toLatin1(); // '\032' -> 26 dec
char ascii = tmp + 6; //convert to ascii encoding 26 dec -> 32 dec (' ')

yeye_olive

6th January 2016, 10:41

This is confusing.

You mentioned "octal escape sequences" in your first post, although you apparently want to interpret them as decimal numbers. What do you mean by "octal", then?

Your first post suggests that your escape sequences consist of a slash followed by three decimal digits, but your second post suggests that they begin with a backslash.

Your second post apparently confuses C octal escape sequences interpreted by the compiler in your source code with the escape sequences that your program shall interpret at runtime. These are two completely unrelated things.

So, could you please

precisely describe the format and meaning of your escape sequences;
provide the complete code that you wrote, e.g. in the form of a function that takes a QString and returns the QString obtained by interpreting the escape sequences

?

bmn

8th January 2016, 09:43

Hello,

The purpose is to decode a DNS-API full service name string. I ended up writing a small state-machine as you suggested. In case anybody is interested, here is the code.
The 'tmp' QString is the escaped string name of the service.

/*
* All strings used in the DNS-SD APIs are UTF-8 strings. Apart from the exceptions noted below,
* the APIs expect the strings to be properly escaped, using the conventional DNS escaping rules:
*
* '\\' represents a single literal '\' in the name
* '\.' represents a single literal '.' in the name
* '\ddd', where ddd is a three-digit decimal value from 000 to 255,
* represents a single literal byte with that value.
* A bare unescaped '.' is a label separator, marking a boundary between domain and subdomain.
*/
state = startChar;
for (int i=0; i<tmp.count(); i++) {
QChar ch = tmp.at(i);
switch (state) {
case startChar:
if ((ch == '\\')) {
signPos = i;
nrStr.clear();
state = digit1;
}
break;
case digit1:
if ((ch == '\\') || (ch == '.')) {
tmp.remove(i-1, 1);
i--; //String is shortened by one, so rescan the last sign
state = startChar;
}
else if ((ch >= '0') && (ch <= '2')) {
nrStr.append(ch);
state = digit2;
}
break;
case digit2:
if ((ch >= '0') && (ch <= '9')) {
nrStr.append(ch);
state = digit3;
}
break;
case digit3:
if ((ch >= '0') && (ch <= '9')) {
nrStr.append(ch);
int nr = nrStr.toUInt();
tmp.replace(signPos, 4, QChar(nr));
i=i-3; //String is shortened by 3, so rescan the last 3 signs
state = startChar;
}
break;
default:
state = startChar;
break;
}
}

yeye_olive

8th January 2016, 10:43

Your state machine code looks good. What is unusual is that it unescapes the QString in place, but that works; it won't scale well with long strings though, because each call to remove() presumably moves all the remainder of the string, giving a quadratic time complexity in the length of the string. If this becomes a performance bottleneck of your program, consider changing the interface of your decoder so that it can be fed the encoded string in chunks, and outputs chunks of the decoded string (see QTextDecoder to get an idea).

Here is an optimization for your current code: instead of accumulating digits in nrStr before converting them to a uint, you could progressively compute the number nr: replace nrStr.clear() with nr = 0, and replace nrStr.append(ch) with nr = nr * 10 + ch.toLatin1() - '0'.