My software is doing the following
1. read in several large txt data files
2. perform statistics
I do this by parsing through the large txt file and inserting into a sqlite db for each txt file. On subsequent runs, I check if the db file exists, if it does, then I read the db file instead
I have a main in-memory db.
After reading each txt file (and creating the db), I execute ATTACH and add each db into the main db. By doing all of this in a couple transactions, I've been able to get this to go fairly quickly.
I set the cache size very large on each db to try to improve performance
My statistics that I perform are mainly various SUM queries based on different selections. Then doing some averages. This goes extremely slowly. I'm trying to figure out how to speed this up.
Questions
1. Do I need to "close" each db at some point to ensure the cache is "flushed"?
2. If I just created the DB, I would think most if not all of it should be in cache. When I do an "ATTACH" and access the DB via the "main in-memory db", does it use the cache?
3. When I run my program (and the DB's are already created), the entire db needs to be read from the HD since my sums basically will touch every element at least once.
4. Is using a SQLITE database even the best approach? Should I just store everything in data structures instead?
Bookmarks