PDA

View Full Version : using Regular Expression for simple searching



sparticus_37
14th September 2011, 15:46
I was recently given an assignment for a C++ class im currently taking to search the given string

BRSLPMSLPRESLPSLPSLPABSLPENDRPCFPSLPABXDSLPRMNPLSL RSLPBESLPASLPRTWPSLPXBTSLPSLPM

using a character array approach, for occurrences of the substring SLP and output the number of times SLP occurs. Error conditions for searching the string are:
1. SLP cannot be preceded by E. ex.) ESLP
2. SLP cannot be followed by A. ex.)SLPA
3. SLP cannot be preceded by B followed by any character. ex.) BASLP , BBSLP ……., BZSLP

But using a char array approach is too simple and boring, for me the solution was


int count =0;
QString data = "BRSLPMSLPRESLPSLPSLPABSLPENDRPCFPSLPABXDSLPRMNPLSL RSLPBESLPASLPRTWPSLPXBTSLPSLPM";
for( int i =0; i < data.length(); ++i){

if(data[i] == QChar('S') && data[i+1] == QChar('L') && data[i+2] == QChar('P')){
if(data[i-2] != QChar('B') && data[i-1] != QChar('E') &&data[i+3] != QChar('A'))
++count;
}
}
// count = 8



so i decided to see if i could accomplish the same thing using regular expressions, which i just recently started learning on my own time.




#include <iostream>
#include <QRegExp>
#include <QString>
int
int main(){
QString data = "BRSLPMSLPRESLPSLPSLPABSLPENDRPCFPSLPABXDSLPRMNPLSL RSLPBESLPASLPRTWPSLPXBTSLPSLPM"
QRegExp regxTotal("SLP+"); // match string "SLP" - output is 14
QRegExp aCount("SLPA+"); // match string "SLPA" - output is 3
QRegExp eCount("ESLP+"); // match string "ESLP" output is 2
QRegExp bCount("B.SLP+"); // match string beginning with B followed by any character followed by SLP - output is 3
QRegExp woErrors("([^E]SLP[^A])"); // match SLP that do not meet error conditions
/*

Attempted variations
((?!E)SLP+|(?!B.)SLP+|SLP(?!A)+) output is 14
((?!ESLP)+|(?!B.SLP)+|(?!ASLP)+) no output, causes infinite loop
((?!ESLP)SLP+|(?!B.SLP)SLP+|(?!ASLP)SLP+) output is 14
*/


std:: cout << "" << courtOccurances(regxTotal, data) << std::endl; // output is 14
std:: cout << "" << courtOccurances(aCount, data) << std::endl; // output is 3
std:: cout << "" << courtOccurances(eCount, data) << std::endl; // output is 2
std:: cout << "" << courtOccurances(bCount, data) << std::endl; // output is 3
std:: cout << "" << courtOccurances(woErrors, data,true) << std::endl; // output should be 8


return 0;

}//end main



int countOccurances(QRegExp & regx, QString& data, bool debug = false){
int count = 0; // occurances
int pos = 0; // current pos inside of 'data'
while ((pos = regx.indexIn(data, pos)) != -1) {
++count;
pos += regx.matchedLength();

if(debug){ // print captured texts to console for debug
QStringList tmp = regx.capturedTexts();
while(! tmp.isEmpty()){
std::cout << tmp.first().toStdString() << std::endl;
tmp.removeFirst();
}// end while
}//end if
}// end while
if(debug)// print debug output seperator
std::cout << "+++++++++++++++++"<< data.toStdString() <<std::endl;


return count;
}// end fucntion





Output of captured texts


((?!E)SLP+|(?!B.)SLP+|SLP(?!A)+)
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP

((?!ESLP)+|(?!B.SLP)+|(?!SLPA)+)
expression unmatched- somehow causes infinite loop

((?!ESLP)SLP+|(?!B.SLP)SLP+|(?!ASLP)SLP+) count is 14

SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP
SLP






any tips for how I can get QRegExp to meet my error conditions? if any more info is needed plz ask. thanks for the help.

wysota
14th September 2011, 16:10
Your search approach is a bit naive. If your array has n characters, you have to do n iterations over the array. That's too much, it is possible to do the search more efficiently (without using regular expressions).

sparticus_37
14th September 2011, 17:37
it is possible to do the search more efficiently (without using regular expressions).I am aware that there are more efficient ways to approach searching a string( most of which i don't know how to implement :( ) , i just wanted to try i using regular expressions to see if i could get the same results. the program itself is for a class assignment so i am not really concerned about efficiency, just getting the correct result.