I have a small site about recreation on the Black Sea. Before knowing SQLite I afraid of MySQL (messing with a server is a serious examination to my nerves). I prefer file based databases and flat log files. I was quite happy with apache log files to find out hot pages and top ip addresses who visit my small site. So, I cron to parse my logs into SQLite database file.
$ time echo "select count(*) from crimeax;" | sql
305866 rows in the database (parsed logs into SQLite DB file). All indexes are removed.
$ time echo "select count(distinct ip) from crimeax;" | sql
$ time echo "select count(*) from (select ip from crimeax group by ip);" | sql
First, get all ips from the table, then group them and count. 3.12 times slower. Lets index on ip to be created
sqlite> create index idx_ip on crimeax(ip);
About 20 seconds passed. Please note that inserting large chunk of data into the database with indexes may slow the process. So it’s advised to pour the data and THEN apply indexing.
$ time echo “select count(*) from (select ip from crimeax group by ip);” | sql
Tenfold improve. And we can beat COUNT(DISTINCT IP) variant now by 3.4 times faster. Why not MySQL? Because it’s slower by 3X and not friendly with nested SELECTs.