PDA

View Full Version : Regular expression



QFreeCamellia
28th December 2011, 23:46
Hi all,
What is the regular expression that represents all of strings constructed with only characters of an other string.
For example:
input string: "abc"
output strings: "bca", "abc", "bac" ...

kunashir
29th December 2011, 06:58
Hello.
I may be wrong, but it seems so:
[abc]{3,}

ChrisW67
29th December 2011, 07:09
You seem to want a good permutation algorithm. What are you trying to achieve? Regular expressions are used for matching patterns in text, and occasionally modifying matched segments, not generating whole new streams of data. Are you looking for anagrams of a given word in another list of words?

QFreeCamellia
29th December 2011, 08:28
Are you looking for anagrams of a given word in another list of words?

Yes exactly, like anagrams.

ChrisW67
29th December 2011, 20:31
Kunashir's option will match strings like "aaa", "abb" etc. which are not anagrams of the original string.

I don't know that you can do this in the general case with a single regular expression without literally listing every permutation. In cases of repeated characters you are able to collapse the list:


"abc" => abc|acb|bac|bca|cab|cba // all the permutations
"abc" => a(bc|cb)|b(ac|ca)|c(ab|ba) // a different arrangement of all permutations

"bob" => bob|bbo|obb|obb|bbo|bob // all the permutations, note the repeats
"bob" => b(ob|bo)|obb // a simpler RE

The number of permutations rises rapidly with string length also. You may quickly find the limits of the regular expression engine.

stampede
29th December 2011, 21:39
To check if string is anagram of given input you can always use brute-force algorithm that removes each character of input from string. In the end, if you are left with empty string, then it was a permutation of input characters. In pseudo-code:


bool isAnagram( baseWord, toTest ){
if( baseWord.length() != toTest.length() ) return false; // check obvious condition - cannot be a permutation if numbers of elements are not equal
foreach( char c, baseWord ){
toTest.removeExactlyOne(c);
}
return toTest.isEmpty();
}

Not very good complexity ( O(N^2), N - number of chars in string ), but its better than nothing.

kunashir
30th December 2011, 05:35
ChirsW67, let me disagree with you.
[abc] - It is any symbol from range ('a','b','c'), and repeated three times.
I check it in "RegExp Planner".

stampede
30th December 2011, 05:53
@kunashir:
I think ChrisW67 is right:


#include <QDebug>
#include <QRegExp>

int main(int argc, char **argv) {
QRegExp ex("[abc]{3,}");
qDebug() << ex.exactMatch("aaa");
return 0;
}

prints 'true'.

Another algorithm I can think of, is to use O( n log(n) ) sorting method, eg. merge sort, something like:


bool isAnagram(baseWord, toTest){
if( baseWord.length() == toTest.length() ){
return sort(baseWord) == sort(toTest);
}
return false;
}

Complexity of this algorithm is the same as sorting method (string equality test should be linear), so its better than my previous suggestion.

ChrisW67
30th December 2011, 22:34
@kunashir: Your regular expression is perfectly fine but it doesn't quite answer the question posed. An anagram uses all the letters of the original word in another order.


There might be something of use here (http://stackoverflow.com/questions/7458319/is-it-possible-to-generate-a-compact-regular-expression-for-an-anagram-of-an-a). I haven't tested that code though.