QString - Where is find_first_not_of? [Archive]

lukass

18th May 2011, 19:46

Hello,

1) Is there any QString function equivalent to

std::string::find_first_of(const std::string & str, 0);
?
I want only 'true' or 'false' return value. I don't want position.

2) Yes, I know there are QRegExp, but I want fastest solution.

3) This is my code:

bool find_first_not_of(const QString & chars, const QString & text)
{
//return true if text not contains any of char in chars
QString::const_iterator it = text.constBegin();
QString::const_iterator end = text.constEnd();
while(it != end)
{
if(chars.contains(*it) == false)
return true;
++it;
}

return false;
}
This is code with QRegExp:

bool reg_find_first_not_of(const QString & chars, const QString & text)
{
const QRegExp r("[" + QRegExp::escape(chars) + "]+");
return !r.exactMatch(text);
}
Which code is faster?

Lykurg

18th May 2011, 19:54

I am not aware of such a function in QString, but why do you don't use find_fist_of? Have you seen QString::toStdString()?

I would bet, that QRegExp is faster, but why do you not simply run a small benchmark to see what function is faster?

lukass

18th May 2011, 20:02

I am not aware of such a function in QString, but why do you don't use find_fist_of? Have you seen QString::toStdString()?
Yes, I have seen QString::toStdString(), but I want avoid conversion because:
1) speed,
2) unicode characters.

I would bet, that QRegExp is faster, but why do you not simply run a small benchmark to see what function is faster?
This is another redesign of the wheel, but OK, I do it! :)

Santosh Reddy

18th May 2011, 20:02

Try this, you can get the std::string from QString, and then operate on it

QString text;
std::string str;

text.toStdString().find_first_of(str, 0);

lukass

18th May 2011, 21:14

Try this, you can get the std::string from QString, and then operate on it

QString text;
std::string str;

text.toStdString().find_first_of(str, 0);

See my #3 post (http://www.qtcentre.org/threads/41725-QString-Where-is-find_first_not_of?p=190983#post190983).

Added after 15 minutes:

OK folks, I do little benchmark, and winner is:
-first function.

Fist function:

real 0m0.100s
user 0m0.060s
sys 0m0.013s
Second function (with QRegExp):

real 0m0.146s
user 0m0.100s
sys 0m0.020s

wysota

18th May 2011, 22:21

Hello,

1) Is there any QString function equivalent to

std::string::find_first_of(const std::string & str, 0);
?
I want only 'true' or 'false' return value. I don't want position.
So compare the result you get to -1 and you'll have your true or false.

2) Yes, I know there are QRegExp, but I want fastest solution.
What do you consider "fastest"?

3) This is my code:

bool find_first_not_of(const QString & chars, const QString & text)
{
//return true if text not contains any of char in chars
QString::const_iterator it = text.constBegin();
QString::const_iterator end = text.constEnd();
while(it != end)
{
if(chars.contains(*it) == false)
return true;
++it;
}

return false;
}
This operation is O(n*m) regardless of how you implement it.

This is code with QRegExp:

bool reg_find_first_not_of(const QString & chars, const QString & text)
{
const QRegExp r("[" + QRegExp::escape(chars) + "]+");
return !r.exactMatch(text);
}
Which code is faster?
They are both O(n*m) with the additional overhead in case of the regexp that you need to compile the expression.

If you run the same code twice or more, the regular expression will be faster (if it is compiled only once). Testing it once (as you did) is not enough. For a single call the difference is neglectible anyway.

lukass

18th May 2011, 22:43

If you run the same code twice or more, the regular expression will be faster (if it is compiled only once). Testing it once (as you did) is not enough. For a single call the difference is neglectible anyway.

I run four times both functions:
-first:

//[...]
const QString chars =
",abcdefghijklmnopqrstuwvxyzABCDEFGH IJKLMNOPRSTUWQVXY.Z";

bool b = find_first_not_of(chars, txt);
b = find_first_not_of(chars, txt);
b = find_first_not_of(chars, txt);
b = find_first_not_of(chars, txt);

-second:

//[...]
const QString chars =
",abcdefghijklmnopqrstuwvxyzABCDEFGH IJKLMNOPRSTUWQVXY.Z";

bool b = reg_find_first_not_of(chars, txt);
b = reg_find_first_not_of(chars, txt);
b = reg_find_first_not_of(chars, txt);
b = reg_find_first_not_of(chars, txt);

Time results:
-first function:

real 0m0.185s
user 0m0.147s
sys 0m0.010s
-second function:

real 0m0.376s
user 0m0.330s
sys 0m0.013s

SixDegrees

18th May 2011, 23:07

You probably shouldn't be using the 'time' function for this; it measures the execution time of the entire application, not just the function in question, and there is overhead associated with program startup, particularly loading of dynamic libraries, that will vary a good deal from one to the other but only incurs a one-time penalty. Loading the system's regex library or the appropriate C++ string handling library, for example, would be one of these. This has nothing to do with the actual function execution time.

If you want accurate timing, you'll need to either insert system calls within your program, or run it with a profiler that does such things automatically.

-------------------

Also, I'm not convinced that looking for the first occurrence of a character not in the given set is faster than looking for the first occurrence of a character in the complementary set. Note, too, that a successful search can probably be coerced to terminate faster if the character set being tested against is ordered so more likely characters occur earlier; a strict alphabetical ordering is almost certainly not optimal. There are character frequency tables scattered around the Web for English and other languages that would probably yield better results, and would be no worse than the sub-optimal ordering you're using now in the worst case.

On your current code: you're compiling the regex every time you call your function. This consumes a huge amount of time - much more than the actual comparison operation - and it is only necessary to do it once as long as the expression doesn't change.

wysota

18th May 2011, 23:34

I run four times both functions
But you are recompiling the regular expression with each run.

Oh... SixDegrees already said that :)

Anyway, have a look at QStringMatcher and QByteArrayMatcher. And if you're not interested in Unicode, then use QByteArray, it will be much faster.

ChrisW67

18th May 2011, 23:40

My first observation is that the repetition '+' is not required in your regular expression. You are looking only for the first single character:

const QRegExp r("[" + QRegExp::escape(chars) + "]");

Are you always testing for characters that are not in the range A-Z, a-z, a comma, space or period? If so, you can dramatically speed up the regular expression:

const QRegExp r("[^A-Za-z,. ]");

In a test on my machine over a million calls, timed using QTime::elapsed(), I get
4457 ms, reg exp built with chars as is
1741 ms, reg exp built with char ranges; and
338 ms, reg exp built with ranges and made static.

QString test("This is the test string and contains a 1 and 2 that are not in the chars");
QString chars = ",abcdefghijklmnopqrstuwvxyzABCDEFGH IJKLMNOPRSTUWQVXY.Z";

Performance is very strongly affected by variability of the inputs and depends on your exact requirements.

wysota

19th May 2011, 00:11

You can further increase the speed if you notice that A-Za-z forms a single range and use QByteArray so that you only work with ASCII. I'd also like to see a speed comparison with strchr() and SIMD CPU instructions.

ChrisW67

19th May 2011, 09:05

You can further increase the speed if you notice that A-Za-z forms a single range
Not in my ASCII table it doesn't.

wysota

19th May 2011, 09:35

Not in my ASCII table it doesn't.

Aah... true :)

lukass

19th May 2011, 10:47

You probably shouldn't be using the 'time' function for this; it measures the execution time of the entire application, not just the function in question, and there is overhead associated with program startup, particularly loading of dynamic libraries, that will vary a good deal from one to the other but only incurs a one-time penalty. Loading the system's regex library or the appropriate C++ string handling library, for example, would be one of these. This has nothing to do with the actual function execution time.

If you want accurate timing, you'll need to either insert system calls within your program, or run it with a profiler that does such things automatically.

-------------------

Also, I'm not convinced that looking for the first occurrence of a character not in the given set is faster than looking for the first occurrence of a character in the complementary set. Note, too, that a successful search can probably be coerced to terminate faster if the character set being tested against is ordered so more likely characters occur earlier; a strict alphabetical ordering is almost certainly not optimal. There are character frequency tables scattered around the Web for English and other languages that would probably yield better results, and would be no worse than the sub-optimal ordering you're using now in the worst case.

On your current code: you're compiling the regex every time you call your function. This consumes a huge amount of time - much more than the actual comparison operation - and it is only necessary to do it once as long as the expression doesn't change.
@SixDegrees: Thanks for the tips!

Anyway, have a look at QStringMatcher and QByteArrayMatcher. And if you're not interested in Unicode, then use QByteArray, it will be much faster.
QStringMatcher? QByteArrayMatcher? But I match for characters not for strings.

My first observation is that the repetition '+' is not required in your regular expression. You are looking only for the first single character:

const QRegExp r("[" + QRegExp::escape(chars) + "]");

Are you always testing for characters that are not in the range A-Z, a-z, a comma, space or period? If so, you can dramatically speed up the regular expression:

const QRegExp r("[^A-Za-z,. ]");

No, Your terms:

const QRegExp r("[" + QRegExp::escape(chars) + "]");
const QRegExp r("[^A-Za-z,. ]");
give me wrong results.

The correct terms are:

const QRegExp r("[" + QRegExp::escape(chars) + "]+");
const QRegExp r("[A-Za-z\\,\\.\\ ]+");

This is my new benchmark:

#include <QApplication>
#include <QRegExp>
#include <QTime>
#include <iostream>

const QString txt =
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
;

bool find_first_not_of(const QString & chars, const QString & text)
{
//return true if text not contains any of char in chars
QString::const_iterator it = text.constBegin();
QString::const_iterator end = text.constEnd();
while(it != end)
{
if(chars.contains(*it) == false)
return true;
++it;
}

return false;
}

QRegExp regexp;
bool reg_find_first_not_of(const QString & /*chars*/, const QString & text)
{
return !regexp.exactMatch(text);
}

int main(int argc, char *argv[])
{
QApplication a(argc, argv);

QTime t;

const QString chars =
",abcdefghijklmnopqrstuwvxyzABCDEFGH IJKLMNOPRSTUWQVXY.Z";

//regexp.setPattern("[" + QRegExp::escape(chars) + "]+");
regexp.setPattern("[A-Za-z\\,\\.\\ ]+");

bool f1;
bool f2;

f1 = find_first_not_of(chars, txt);
t.start();
for(int i = 0; i < 5000; ++i)
f1 = find_first_not_of(chars, txt);
const int e1 = t.elapsed();

f2 = reg_find_first_not_of(chars, txt);
t.start();
for(int i = 0; i < 5000; ++i)
f2 = reg_find_first_not_of(chars, txt);
const int e2 = t.elapsed();

std::cout << "elapsed for f1 : " << e1 << "ms" << std::endl;
std::cout << "elapsed for f2 QRegExp: " << e2 << "ms" << std::endl;
std::cout << "f1 returns: " << f1 << std::endl;
std::cout << "f2 returns: " << f2 << std::endl;

return 0;
}
And my results:

elapsed for f1 : 683ms
elapsed for f2 QRegExp: 1360ms
f1 returns: 0
f2 returns: 0
As you can see: f2 - QRegExp is slow turtle again.

wysota

19th May 2011, 12:11

QStringMatcher? QByteArrayMatcher? But I match for characters not for strings.
I said "look at" and not "use". Qt is open source, have a look at the source code and implement something that fits your needs.

No, Your terms:

const QRegExp r("[" + QRegExp::escape(chars) + "]");
const QRegExp r("[^A-Za-z,. ]");
give me wrong results.

The correct terms are:

const QRegExp r("[" + QRegExp::escape(chars) + "]+");
const QRegExp r("[A-Za-z\\,\\.\\ ]+");
These are two totally different expressions. One is an "accepting" expression the other is a "rejecting" one so the function logic has to differ, you can't just replace the expresions.

This is my new benchmark:
(...)
As you can see: f2 - QRegExp is slow turtle again.
Using QTime::elapsed() for a benchmark is a silly thing to do.

To me it seems the best solution would be some sort of radix search using MMX or similar. For multicore/manycore a solution using multiple threads would probably be fastest. OpenCL would also handle it nicely.

lukass

19th May 2011, 12:42

To me it seems the best solution would be some sort of radix search using MMX or similar. For multicore/manycore a solution using multiple threads would probably be fastest. OpenCL would also handle it nicely.
No, I don't need such low level solution, but thanks for tips.

I changed my benchmark to use QSet for characters:

#include <QApplication>
#include <QRegExp>
#include <QTime>
#include <iostream>
#include <QSet>

const QString txt =
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
;

bool find_first_not_of(const QString & chars, const QString & text)
{
//return true if text not contains any of char in chars
QString::const_iterator it = text.constBegin();
QString::const_iterator end = text.constEnd();
while(it != end)
{
if(chars.contains(*it) == false)
return true;
++it;
}

return false;
}

bool find_first_not_of(const QSet<QChar> & chars, const QString & text)
{
//return true if text not contains any of char in chars
QString::const_iterator it = text.constBegin();
QString::const_iterator end = text.constEnd();
while(it != end)
{
if(chars.contains(*it) == false)
return true;
++it;
}

return false;
}

QRegExp regexp;
bool reg_find_first_not_of(const QString & /*chars*/, const QString & text)
{
return !regexp.exactMatch(text);
}

int main(int argc, char *argv[])
{
QApplication a(argc, argv);

QTime t;

const QString chars =
",abcdefghijklmnopqrstuwvxyzABCDEFGH IJKLMNOPRSTUWQVXY.Z";

QSet<QChar> chars_set;
for(int i = 0; i < chars.size(); ++i)
chars_set.insert(chars.at(i));

//regexp.setPattern("[" + QRegExp::escape(chars) + "]+");
regexp.setPattern("[A-Za-z\\,\\.\\ ]+");

bool f1;
bool f2;

f1 = find_first_not_of(chars_set, txt);
t.start();
for(int i = 0; i < 5000; ++i)
f1 = find_first_not_of(chars_set, txt);
const int e1 = t.elapsed();

f2 = reg_find_first_not_of(chars, txt);
t.start();
for(int i = 0; i < 5000; ++i)
f2 = reg_find_first_not_of(chars, txt);
const int e2 = t.elapsed();

std::cout << "elapsed for f1 : " << e1 << "ms" << std::endl;
std::cout << "elapsed for f2 QRegExp: " << e2 << "ms" << std::endl;
std::cout << "f1 returns: " << f1 << std::endl;
std::cout << "f2 returns: " << f2 << std::endl;

return 0;
}

and results are better:

elapsed for f1 : 412ms
elapsed for f2 QRegExp: 1373ms
f1 returns: 0
f2 returns: 0

wysota

19th May 2011, 12:48

No, I don't need such low level solution, but thanks for tips.
Why do you think those solutions are low-level?

lukass

19th May 2011, 13:04

Why do you think those solutions are low-level?
Because you wrote 'MMX'. ;)

I need such function as 'find_first_not_of' for validating substrings of IRC messages in real time.

wysota

19th May 2011, 13:30

Here is a test using radix search vs naive search, I think the results are quite impressive.

#include <QtDebug>
#include <QByteArray>
#include <qtest.h>

struct TrieNode {
TrieNode *left;
TrieNode *right;
TrieNode(){ left = right = 0; }
~TrieNode() { delete left; delete right; }
};

TrieNode* buildTrie(const QByteArray& chars) {
TrieNode *root = new TrieNode;
const int s = chars.size();
for(int i=0;i<s;++i){
TrieNode *current = root;
register int level = 8;
register int ch = chars.at(i);
while(level-->0) {
int bit = ch & 0x1;
ch = ch >> 1;
if(bit==0){
if(!current->left)
current->left = new TrieNode;
current = current->left;
} else {
if(!current->right)
current->right = new TrieNode;
current = current->right;
}
}

}
return root;
}

bool inTrie(TrieNode *trie, char ch) {
int level = 8;
while(trie && level-->0) {
int bit = ch & 0x1;
ch >>=1;
trie = bit ? trie->right : trie->left;
}
if(!trie) return false;
return true;
}

int findFirstNotOf(TrieNode *trie, const QByteArray &haystack){
const int s = haystack.size();
for(int i=0;i<s;++i){
char ch = haystack.at(i);
if(!inTrie(trie, ch)) return i;
}
return -1;
}

int findFirstNotOf(const QByteArray &needle, const QByteArray &haystack) {
const int hs = haystack.size();
for(int i=0;i<hs;++i){
if(!needle.contains(haystack.at(i)))
return i;
}
return -1;
}

class Benchmark : public QObject {
Q_OBJECT
public:
Benchmark() {
ba="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwx yz,. ";
haystack = "ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam";

}
private slots:
void radixTest() {
TrieNode *trie = buildTrie(ba);
QBENCHMARK {
findFirstNotOf(trie, haystack);
}
delete trie;
}
void naiveTest() {
QBENCHMARK {
findFirstNotOf(ba, haystack);
}
}
private:
QByteArray ba, haystack;
};

QTEST_MAIN(Benchmark)

#include "main.moc"

And results:

********* Start testing of Benchmark *********
Config: Using QTest library 4.7.2, Qt 4.7.2
PASS : Benchmark::initTestCase()
RESULT : Benchmark::radixTest():
0.0000045 msecs per iteration (total: 76, iterations: 16777216)
PASS : Benchmark::radixTest()
RESULT : Benchmark::naiveTest():
0.0076 msecs per iteration (total: 63, iterations: 8192)
PASS : Benchmark::naiveTest()
PASS : Benchmark::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped
********* Finished testing of Benchmark *********

Anyone wants to beat that?

The larger the needle, the better the radix trie will perform against other algorithms. The performance would be even better with SIMD approach.

Edit: Actually it's more a binary search rather than radix search since the nodes are not compacted when having only one child.

lukass

19th May 2011, 15:37

Anyone wants to beat that?
Not sure, but I have probably faster code:
- I added unsigned to 'int' variables where are bitwise operators - right shift, because operation right shift on signed variable has portability issues,
- changed haystack to QString,
- added QSet instead of QByteArray needle to naiveTest.

Look at this:

#include <QtDebug>
#include <QByteArray>
#include <QTest>

struct TrieNode {
TrieNode *left;
TrieNode *right;
TrieNode(){ left = right = 0; }
~TrieNode() { delete left; delete right; }
};

TrieNode* buildTrie(const QByteArray& chars) {
TrieNode *root = new TrieNode;
const int s = chars.size();
for(int i=0;i<s;++i){
TrieNode *current = root;
register int level = 8;
register unsigned int ch = chars.at(i);
while(level-->0) {
int bit = ch & 0x1;
ch = ch >> 1;
if(bit==0){
if(!current->left)
current->left = new TrieNode;
current = current->left;
} else {
if(!current->right)
current->right = new TrieNode;
current = current->right;
}
}

}
return root;
}

bool inTrie(TrieNode *trie, unsigned char ch) {
int level = 8;
while(trie && level-->0) {
int bit = ch & 0x1;
ch >>=1;
trie = bit ? trie->right : trie->left;
}
if(!trie) return false;
return true;
}

int findFirstNotOf(TrieNode *trie, const QString & hs){
const QByteArray haystack = hs.toAscii();
const int s = haystack.size();
for(int i=0;i<s;++i){
char ch = haystack.at(i);
if(!inTrie(trie, ch)) return i;
}
return -1;
}

int findFirstNotOf(const QSet<char> & needle, const QString & hs) {
const QByteArray haystack = hs.toAscii();
const int s = haystack.size();
for(int i=0;i<s;++i){
if(!needle.contains(haystack.at(i)))
return i;
}
return -1;
}

class Benchmark : public QObject {
Q_OBJECT
public:
Benchmark() {
ba="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwx yz,. ";

haystack =
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
"ac, hendrerit nec justo. Nullam tristique consequat sem, id placerat eros egestas vitae. Donec imperdiet, metus id feugiat rhoncus, sem risus condimentum diam, eget volutpat diam dolor in dui. Proin mi quam,"
;
}
private slots:
void radixTest() {
TrieNode *trie = buildTrie(ba);
QBENCHMARK {
findFirstNotOf(trie, haystack);
}
delete trie;
}
void naiveTest() {
QSet<char> chars_set;
for(int i = 0; i < ba.size(); ++i)
chars_set.insert(ba.at(i));

QBENCHMARK {
findFirstNotOf(chars_set, haystack);
}
}
private:
QByteArray ba;
QString haystack;
};

QTEST_MAIN(Benchmark)

#include "main.moc"

And results:

********* Start testing of Benchmark *********
Config: Using QTest library 4.7.3, Qt 4.7.3
PASS : Benchmark::initTestCase()
RESULT : Benchmark::radixTest():
0.12 msecs per iteration (total: 63, iterations: 512)
PASS : Benchmark::radixTest()
RESULT : Benchmark::naiveTest():
0.088 msecs per iteration (total: 91, iterations: 1024)
PASS : Benchmark::naiveTest()
PASS : Benchmark::cleanupTestCase()
Totals: 4 passed, 0 failed, 0 skipped
********* Finished testing of Benchmark *********

wysota

19th May 2011, 16:17

QSet also uses a tree, the results should be (and are) comparable. The exact numbers may differ depending on the contents of needle and haystack. QSet efficiency will depend on the speed of qHash(char), my implementation depends on the distribution of lower bit values of symbols in the needle.

lukass

19th May 2011, 16:36

QSet also uses a tree, the results should be (and are) comparable. The exact numbers may differ depending on the contents of needle and haystack. QSet efficiency will depend on the speed of qHash(char), my implementation depends on the distribution of lower bit values of symbols in the needle.
Anyway my small implementation with QSet is better. ;)
Anyone wants to beat that?

wysota

19th May 2011, 21:10

It's not better, it's comparable. I can easily find a set of variables that will behave better with the other implementation.