separate numbers of texts
Regular expression that will isolate numbers and text.
Code:
QString text
= "5001 1001 5002 1002.5 Observation reason: 10 river and pond."
int pos = 0;
while ((pos = rx.indexIn(text,pos)) != -1){
linesInt << rx.cap(1);
pos += rx.matchedLength();
}
//linesInt = (5001, 1001, 5002, 1002, 5, 10) --- I want linesInt = (5001, 1001, 5002, 1002.5)
//or linesInt = (5001, 1001, 5002, 1002.5, Observation reason: 10 river and pond.)
The code below give me the result I expect. But I would have to test if the first 4 are numbers.
Code:
QString text
= "5001\t1001\t5002\t1002.5\tObservation reason: 10 river and pond." linesInt
= text.
split(rx,
QString::SkipEmptyParts);
//linesInt = (5001, 1001, 5002, 1002.5, Observation reason: 10 river and pond.)
Someone would indicate a QRegExp?
Re: separate numbers of texts
A simple improvement of your first regexp:
but it will catch the last "10" as well (and numbers like "1000." too).
Re: separate numbers of texts
Quote:
Originally Posted by
stampede
A simple improvement of your first regexp:
but it will catch the last "10" as well (and numbers like "1000." too).
Thanks stampede!
Really, he will catch the last "10" and numbers like "1000.". I'm trying but have not found the solution.
The problem will also be isolate the text of string.
Re: separate numbers of texts
I think catching strings like "1000." is actually a good thing. For example, in C, 100. is as good as 100.0
You can fix this by replacing the '*' with '+' if you dont like it.
Re: separate numbers of texts
Or
Code:
QRegexp rx("(\\d+(\\.\\d+)?)");
will capture the dot in a number if it is followed by other digits but not otherwise. It really depends on what you need.
You hint that input string is not indicative of your entire possible inputs (variable length of number list). You could do something like:
Code:
QRegexp rx("^((?:(?:\\d+(?:\\.\\d+)?)\\s+)*)(.*)");
and then split group 1 on white space to get only the leading numbers. Group 2 contains the trailing remainder.
Re: separate numbers of texts
Quote:
Originally Posted by
ChrisW67
Or
Code:
QRegexp rx("(\\d+(\\.\\d+)?)");
will capture the dot in a number if it is followed by other digits but not otherwise. It really depends on what you need.
You hint that input string is not indicative of your entire possible inputs (variable length of number list). You could do something like:
Code:
QRegexp rx("^((?:(?:\\d+(?:\\.\\d+)?)\\s+)*)(.*)");
and then split group 1 on white space to get only the leading numbers. Group 2 contains the trailing remainder.
Thanks ChrisW67,
I used the code below and it worked well. But the problem continues to capture "1000." not excluding the dot and not considered as a real value.
Code:
QRegExp rxIn
("^((?:(?:\\d+(?:\\.\\d+)?)\\s+)*)(.*)");
int pos = 0;
while ((pos = rxIn.indexIn(lines[i],pos)) != -1){
linesInt << rxIn.
cap(1).
split("\t",
QString::SkipEmptyParts);
linesStr = rxIn.cap(2);
pos += rxIn.matchedLength();
}
Re: separate numbers of texts
Unless there is a good reason not to, I would suggest to parse the text manually instead of using regular expressions. It seems you are focused on one class of characters (digits+dot) so using a regular expression will not be much faster (if at all) than parsing the string manually and you will save a lot of time trying to get the expression right. You can even use some kind of parser generator if you want.
Re: separate numbers of texts
Quote:
Originally Posted by
jaca
Thanks ChrisW67,
I used the code below and it worked well.
You have misused the anchored regular expression. You should match it against each line once and then split the first capture group. There is no while() loop required.
Code:
lines
<< "5001 1001 5002 1002.5 Observation reason: 10 river and pond."
<< "500. 1001 5002 1002.7 Observation reason: 20 lake or dam"
<< "5001 100. 5002 1002.7 Observation reason: 20 lake or dam"
<< "30.1 30.1 30 Observation reason: 10 river and pond.";
QRegExp rx
("^((?:(?:\\d+(?:\\.\\d+)?)\\s+)*)(.*)");
// QRegExp rx("^((?:(?:\\d+\\.?\\d*)\\s+)*)(.*)");
if (rx.indexIn(line) != -1) {
qDebug
() <<
"Numbers:" << rx.
cap(1).
split(' ',
QString::SkipEmptyParts) <<
"Remainder" << rx.
cap(2);
}
else
qDebug() << "No match:" << line;
}
The expression treats "500." as not-a-number and stops capturing at the first non-number. The commented expression includes these as numbers but you get the dot which is fine in most circumstances.
I agree, by the way, with wysota that the problem is probably easier to handle manually.