Load in memory the content of a database [Archive]

franco.amato

29th December 2010, 22:22

Good morning,
I'm developing a biometric system based on iris recognition and the templates of every user are stored in a database. Unfortunately I have no experience with databases.
I would know if is there a way to read in memory the content of the database in 1 shot ( it's possible in VB IIRC ) intead of looping on every database entry.
For example in my case every entry has 4 fields:

user id - QString
iris code 1 - uchar array[1560]
iris code 2 - uchar array[1560]
iris code 3 - uchar array[1560]

This because if there are too much entries in the database it can take long time to loop on every one.

Best Regards,
Franco

javimoya

29th December 2010, 23:14

your request is annoying...

> I'm developing a biometric system based on iris recognition
impressive !
>I have no experience with databases.
just read 10 minutes of documentation... you dont need more !

you should use QSQLITE (sqlite)
(search in qt assistant for "SQL Database drivers")
You can have that kind of database in memory:

SQLite also supports in-memory databases, simply pass ":memory:" as the database name.

squidge

29th December 2010, 23:35

You really should read some documentation before continuing.

Using a database to store all the information and then ignoring all the benefits of that database and doing all the work yourself (such as searching for specific data) is just nuts. As is looping on every database entry. Design your database schema first before doing a single line of coding. Decide how it will all work. If you don't, I'll only come back to bite you later on.

ChrisW67

30th December 2010, 00:06

How do you need to query the database? Are you looking for an entry by userid or are you trying to find the entry that matches an iris scan (i.e. searching in the binary blobs)? The first option is trivial and any RDBMS will do this with little to no effort on your part (certainly no programatic iteration over the whole data set). The second option may be more difficult.

franco.amato

30th December 2010, 16:58

How do you need to query the database? Are you looking for an entry by userid or are you trying to find the entry that matches an iris scan (i.e. searching in the binary blobs)? The first option is trivial and any RDBMS will do this with little to no effort on your part (certainly no programatic iteration over the whole data set). The second option may be more difficult.

Hi Chris,
Is the second option. I need to scan every iris_code ( 3 per user ) and perform a matching. I thought to load all iris codes in memory and doing the process in memory but maybe I'm wrong.
I have zero experience with databases.
Best Regards,
Franco

pkohut

30th December 2010, 17:38

Good morning,
I'm developing a biometric system based on iris recognition and the templates of every user are stored in a database.

Is there an upper limit to the number of stored templates? If all the templates are in memory, how long do you guess a query would take?

franco.amato

30th December 2010, 18:33

Is there an upper limit to the number of stored templates? If all the templates are in memory, how long do you guess a query would take?

Hi no,
there is not an upper limit.

pkohut

30th December 2010, 19:24

Get a database management tool for your database system. Using it you can form queries to the database in an interactive way. Once you discover what queries need to be made to get the results you want then translate those queries to your application. If you're database is SQL Server based then MS has the free "Microsoft SQL Server Management Studio Express", get the version that matches your server (2005, 2008).

What database system do you use? Someone here can probably point you to a good management tool for your particular system.
http://en.wikipedia.org/wiki/Comparison_of_database_tools

franco.amato

30th December 2010, 19:28

Get a database management tool for your database system. Using it you can form queries to the database in an interactive way. Once you discover what queries need to be made to get the results you want then translate those queries to your application. If you're database is SQL Server based then MS has the free "Microsoft SQL Server Management Studio Express", get the version that matches your server (2005, 2008).

What database system do you use? Someone here can probably point you to a good management tool for your particular system.
http://en.wikipedia.org/wiki/Comparison_of_database_tools

Hi,
I already know which query I have to do and this was not my initial question.
Regards,
Franco

pkohut

30th December 2010, 20:25

Hi,
I already know which query I have to do and this was not my initial question.
Regards,
Franco

If there is no upper limit to the number of database entries, then "to read in memory the content of the database in 1 shot" is not a solution. It's possible that the current query can be reformed to work more efficient with the db. Playing what if queries in an interactive way can help with the discovery.

I don't know what your query is or the db structure. From the description it sounds like the query is doing a brute force search. If so, then maybe this simple, far from optimized, approach could be better than the current one. There are iris codes 1 - 3, lets call them x, y, z.
1) get collection results of everything matching x.
2) from previous result set, get collection result of everything matching y.
3) from previous result set, get collection result of everything matching z.
4) Should have 1 result in collection.

wysota

30th December 2010, 20:32

Hi Chris,
Is the second option. I need to scan every iris_code ( 3 per user ) and perform a matching. I thought to load all iris codes in memory and doing the process in memory but maybe I'm wrong.
I have zero experience with databases.

To be honest such things are not done like that. This is a special case of image classification and if you want to do it fast you should use an algorithm dedicated for it. While searching three scans for ten people is not a big effort, when your database grows, the complexity of the problem grows in a non-linear manner. So basically you should either "pre-teach" your system with a static set of images (for example using neural networks) which then reduces the classification complexity practically to O(1) or use some other kind of clustering & classification heuristics to match the scans. Especially that you will never get two identical scans of the same iris due to changing environmental conditions (such as ambient light) so you will have to work on "features" and not pixel by pixel comparison anyway. I don't see how a database is helpful in any of this unless the matching algorithm is implemented as a stored procedure for it (then it's just a matter of performing a single SELECT query to find the id of the person matching the scan).

As for what you wrote about VB - the fact that you need one line of code in your program to load all records from the database to memory doesn't mean that internally the engine is not looping over records. So if you write one line of code or four lines of code it doesn't matter as internally the same operation is performed so the one line version is not "faster" than the four line version.

ChrisW67

30th December 2010, 20:49

Here is an example that creates an Sqlite database with 100 test rows and a series of queries against it.

#include <QtCore>
#include <QtSql>
#include <QDebug>

void createTestData()
{
QSqlDatabase db = QSqlDatabase::addDatabase("QSQLITE");
db.setDatabaseName("test.db");
if (db.open()) {

QSqlQuery query;
query.exec("create table irisdata ("
"userid varchar(20), "
"iriscode1 blob, "
"iriscode2 blob, "
"iriscode3 blob, "
"primary key(userid) )");
if (query.prepare( "insert into irisdata ("
"userid, iriscode1, iriscode2, iriscode3) "
"values (?, ?, ?, ?)" )) {
for (int i=0; i<100; ++i) {
QByteArray blobData1(i+1, 'a');
QByteArray blobData2(i+1, 'b');
QByteArray blobData3(i+1, 0x83);
query.bindValue(0, QString("User %1").arg(i));
query.bindValue(1, blobData1);
query.bindValue(2, blobData2);
query.bindValue(3, blobData3);
query.exec();
}
}
}
}

void queryTestData(const QByteArray &ba)
{
QSqlDatabase db = QSqlDatabase::database();
QSqlQuery query;
if (query.prepare(
"select userid from irisdata "
"where iriscode1 = ? OR iriscode2 = ? or iriscode3 = ?") ) {
query.bindValue(0, ba);
query.bindValue(1, ba);
query.bindValue(2, ba);
if (query.exec()) {
while (query.next())
qDebug() << "Found" << query.value(0).toString();
}
}
}

int main(int argc, char *argv[])
{
QCoreApplication app(argc, argv);

createTestData();
queryTestData(QByteArray(3, 'a')); // user 2
queryTestData(QByteArray(5, 'b')); // user 4
queryTestData(QByteArray(10, 0x83)); // user 9
queryTestData(QByteArray(10, 0x00)); // does not exist
}

Should work against a Mysql or other database also.

SixDegrees

30th December 2010, 21:06

This because if there are too much entries in the database it can take long time to loop on every one.

I have no experience with databases

If you don't know anything about databases, how do you know if the one you're using is too slow in this case?

More to the point: why are you trying to code a particular solution if you know nothing about how to implement the solution you've chosen?

franco.amato

31st December 2010, 06:47

To be honest such things are not done like that. This is a special case of image classification and if you want to do it fast you should use an algorithm dedicated for it. While searching three scans for ten people is not a big effort, when your database grows, the complexity of the problem grows in a non-linear manner. So basically you should either "pre-teach" your system with a static set of images (for example using neural networks) which then reduces the classification complexity practically to O(1) or use some other kind of clustering & classification heuristics to match the scans. Especially that you will never get two identical scans of the same iris due to changing environmental conditions (such as ambient light) so you will have to work on "features" and not pixel by pixel comparison anyway. I don't see how a database is helpful in any of this unless the matching algorithm is implemented as a stored procedure for it (then it's just a matter of performing a single SELECT query to find the id of the person matching the scan).

As for what you wrote about VB - the fact that you need one line of code in your program to load all records from the database to memory doesn't mean that internally the engine is not looping over records. So if you write one line of code or four lines of code it doesn't matter as internally the same operation is performed so the one line version is not "faster" than the four line version.

Hi Wysota.
I don't know how a neural network can help me. IN the database I don't store iris images but only the extracted encoded iris code.
Regards,

If you don't know anything about databases, how do you know if the one you're using is too slow in this case?

More to the point: why are you trying to code a particular solution if you know nothing about how to implement the solution you've chosen?

SixDegrees I was part of the development of the iris recognition kernel and now I would implement a demo showing the results of the matching.
So my 'problem' is looping through the iris codes stored in the database.
Regards

Added after 10 minutes:

Here is an example that creates an Sqlite database with 100 test rows and a series of queries against it.

#include <QtCore>
#include <QtSql>
#include <QDebug>

void createTestData()
{
QSqlDatabase db = QSqlDatabase::addDatabase("QSQLITE");
db.setDatabaseName("test.db");
if (db.open()) {

QSqlQuery query;
query.exec("create table irisdata ("
"userid varchar(20), "
"iriscode1 blob, "
"iriscode2 blob, "
"iriscode3 blob, "
"primary key(userid) )");
if (query.prepare( "insert into irisdata ("
"userid, iriscode1, iriscode2, iriscode3) "
"values (?, ?, ?, ?)" )) {
for (int i=0; i<100; ++i) {
QByteArray blobData1(i+1, 'a');
QByteArray blobData2(i+1, 'b');
QByteArray blobData3(i+1, 0x83);
query.bindValue(0, QString("User %1").arg(i));
query.bindValue(1, blobData1);
query.bindValue(2, blobData2);
query.bindValue(3, blobData3);
query.exec();
}
}
}
}

void queryTestData(const QByteArray &ba)
{
QSqlDatabase db = QSqlDatabase::database();
QSqlQuery query;
if (query.prepare(
"select userid from irisdata "
"where iriscode1 = ? OR iriscode2 = ? or iriscode3 = ?") ) {
query.bindValue(0, ba);
query.bindValue(1, ba);
query.bindValue(2, ba);
if (query.exec()) {
while (query.next())
qDebug() << "Found" << query.value(0).toString();
}
}
}

int main(int argc, char *argv[])
{
QCoreApplication app(argc, argv);

createTestData();
queryTestData(QByteArray(3, 'a')); // user 2
queryTestData(QByteArray(5, 'b')); // user 4
queryTestData(QByteArray(10, 0x83)); // user 9
queryTestData(QByteArray(10, 0x00)); // does not exist
}

Should work against a Mysql or other database also.

Hi Chris,
your code is very interesting but the 'queryTestData' will not work because a iris code will never be the same. A matching is performed calculating the hamming distance over 2 iris codes. Something like this:

if ( hamming_distance( iris_code1, iris_code2) < threshold )
qDebug() << " MATCH ";
else
qDebug() << " NO MATCH ";

So I think it's mandatory to scan every code stored in the database.

Best Regards

ChrisW67

31st December 2010, 08:55

Are you calculating the hamming distance between the values in two columns of the same record, or between a candidate iris code and each of three columns in each record? If it is the former then you only ever need do this once and store the result.

Have you profiled your code to make sure that it is not the hamming distance algorithm that is chewing the time? Don't waste time optimising something that is not the biggest problem.

It is possible to provide an extension to Sqlite to give it the hamming_distance function it does not currently have. This is will be more efficient than dragging the data through Qt's Sql layers but is obviously another bunch of APIs to learn.

pkohut

31st December 2010, 09:41

Hi Wysota.
I don't know how a neural network can help me. IN the database I don't store iris images but only the extracted encoded iris code.
Regards,

SixDegrees I was part of the development of the iris recognition kernel and now I would implement a demo showing the results of the matching.
So my 'problem' is looping through the iris codes stored in the database.
Regards

Hi Chris,
your code is very interesting but the 'queryTestData' will not work because a iris code will never be the same. A matching is performed calculating the hamming distance over 2 iris codes. Something like this:

if ( hamming_distance( iris_code1, iris_code2) < threshold )
qDebug() << " MATCH ";
else
qDebug() << " NO MATCH ";

So I think it's mandatory to scan every code stored in the database.

Best Regards

Based on the information you've provided I'd look into storing a spatial index in the database. Queries could then be very fast. This link might be of use (refer to the excepted answer).
http://stackoverflow.com/questions/3505674/comparing-sift-features-stored-in-a-mysql-database

Given the luxury of having a whole spatial index in memory, look into adapting the Kd-tree algorithm to the data set. Iris codes 1 - 3 are treated as X, Y, Z coordinates, distance calculations would use the hammering algorithm. Then nearest neighbor and nearest range queries could be performed.

wysota

31st December 2010, 10:43

I don't know how a neural network can help me. IN the database I don't store iris images but only the extracted encoded iris code.
Based on the features (the "iris code" as you call it) it immediately classifies an unknown feature vector into a pre-learned set of feature sets (iris scans) without the need for looping over each iris record and performing the comparison. Then you don't even have to load the scans from the database or just load one or two to confirm the results of the algorithm.

franco.amato

31st December 2010, 18:17

Are you calculating the hamming distance between the values in two columns of the same record, or between a candidate iris code and each of three columns in each record?
It's between a candidate and each of the three columns in each record.

If it is the former then you only ever need do this once and store the result.
I didn't get you.

Have you profiled your code to make sure that it is not the hamming distance algorithm that is chewing the time? Don't waste time optimising something that is not the biggest problem.
The matching routine is really fast.
My initial doubt was if I have to copy all database entry in memory and performing the matching there instead of doing n queries against the database.

It is possible to provide an extension to Sqlite to give it the hamming_distance function it does not currently have. This is will be more efficient than dragging the data through Qt's Sql layers but is obviously another bunch of APIs to learn.
Here we're using microsoft sql express 2005.
Regards

wysota

31st December 2010, 18:35

Here we're using microsoft sql express 2005.
I'm sure it can be extended with custom functions as well.

The first three links from Google:
http://www.sqlteam.com/article/user-defined-functions
http://www.sqlteam.com/article/intro-to-user-defined-functions-updated
http://www.dreamincode.net/forums/topic/52772-user-defined-functions-in-mssql/

franco.amato

31st December 2010, 18:39

I'm sure it can be extended with custom functions as well.

The first three links from Google:
http://www.sqlteam.com/article/user-defined-functions
http://www.sqlteam.com/article/intro-to-user-defined-functions-updated
http://www.dreamincode.net/forums/topic/52772-user-defined-functions-in-mssql/

Thank you very much Wysota, my head is exploting these days trying to find a solution.
Best Regards

franco.amato

31st December 2010, 21:56

I'm sure it can be extended with custom functions as well.

The first three links from Google:
http://www.sqlteam.com/article/user-defined-functions
http://www.sqlteam.com/article/intro-to-user-defined-functions-updated
http://www.dreamincode.net/forums/topic/52772-user-defined-functions-in-mssql/

Wysota hi,
is possible to run a routine contained in a dll directly in the database? I hope you understood what I mean.
Best Regards

pkohut

31st December 2010, 22:51

Wysota hi,
is possible to run a routine contained in a dll directly in the database? I hope you understood what I mean.
Best Regards

Extended Stored Procedure

wysota

1st January 2011, 23:13

Just a side note - IF the database runs on another host than the device performing the scans there is always a subject of security as I understand this iris scanning is meant to provide security. If you do the matching on a machine different than the one that actually gives the user some "access" then you have to think whether you are not making yourself vulnerable to things such as man-in-the-middle attack, a situation when an intruder substitutes the host with the database server with his own version (e.g. by rerouting you to a different machine) or simply performs a successfull attack against the host that contains the database. Since you have no control over the other host, protection against such attacks is not a trivial task.

pheonixstorm

2nd January 2011, 20:59

Just a side note - IF the database runs on another host than the device performing the scans there is always a subject of security as I understand this iris scanning is meant to provide security. If you do the matching on a machine different than the one that actually gives the user some "access" then you have to think whether you are not making yourself vulnerable to things such as man-in-the-middle attack, a situation when an intruder substitutes the host with the database server with his own version (e.g. by rerouting you to a different machine) or simply performs a successfull attack against the host that contains the database. Since you have no control over the other host, protection against such attacks is not a trivial task.

To this I have to add my 2 pence. If you have to have a connection to an outside source then the best way to get around security issues would be to keep a local copy and only update from the master list nightly. To further muddle the attacker add an additional table to your data set to hold hashes specific to each terminal. If you pull an update and the hashes dont match, and a random sampling doesn't match then something is amiss.

Local data should be sqlite while external master list can be whatever best fits, though from a safety standpoint I would bet against MS SQL. If you want security don't run the OS and DBMS by the same company. If someone wanted to break in and they knew you used windows server the safe bet would be to assume that said server will also be running MS SQL or Oracle.

For the scan. Is this a straight scan no additional input or will there be a more traditional card swipe/id key entry along with the scan?

franco.amato

3rd January 2011, 17:21

Just a side note - IF the database runs on another host than the device performing the scans there is always a subject of security as I understand this iris scanning is meant to provide security. If you do the matching on a machine different than the one that actually gives the user some "access" then you have to think whether you are not making yourself vulnerable to things such as man-in-the-middle attack, a situation when an intruder substitutes the host with the database server with his own version (e.g. by rerouting you to a different machine) or simply performs a successfull attack against the host that contains the database. Since you have no control over the other host, protection against such attacks is not a trivial task.

Thank you very much Wysota I'll considere it.
Regarsd