arbtt: use of database like sqlite3?

Joachim Breitner mail at joachim-breitner.de
Mon Dec 15 09:35:06 CET 2014


Hi,


Am Sonntag, den 14.12.2014, 19:40 -0500 schrieb Gwern Branwen:

> That is, in more Haskelly terms, each arbtt-capture sample is a
> (timestamp, [String]); each string is assigned a unique ID and stored
> in a hashmap if not already present, and the IDs are stored with the
> timestamp. So  a few seconds of samples would look something like
> 
> (1418603655,[1,54,20,333])
> (1418603678,[1,53,333])
> (1418603693,[1,53,333])
> (1418603702,[1,53,333])
> (1418603712,[53,333,801])
> 
> avoiding the worst of between-row redundancy; and another table would
> store the definition of string #1, #53, #801, etc whenever one needed
> them. This probably would compress even better than a log format which
> looks an entry back for redundancy since it extends the lookback to
> the entire database history, and indexes presumably mean the queries
> remain as fast (since sqlite3 knows where the indices point to in the
> other table).

yes, an internalized string format would also work quite well, and if
used correctly on the Haskell side, could avoid having duplicated
strings in memory as well.

But now the insertion is even more expensive: Upon every sample, for
every open window, sqlite will have to traverse an index of over a
million¹ entries to see if this particular window title has occurred
before. That’s quite an increase both in computation time _and_ memory
consumption for the long-running process.

I think this variant is also only good if the data is first collected to
a log and then occasionally sorted into the database.

Greetings,
Joachim


¹ $ arbtt-dump |sort -u|wc -l
1116948
# and without deduplication:
$ arbtt-dump |wc -l
5767660



-- 
Joachim “nomeata” Breitner
  mail at joachim-breitner.dehttp://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  • GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141215/92e18ab3/attachment.asc>


More information about the arbtt mailing list