arbtt feedback

Joachim Breitner mail at joachim-breitner.de
Sat Sep 15 12:16:07 CEST 2012


Hi Gwern,
.
thanks for your feedback. Do you mind moving this to
arbtt at lists.nomeata.de?

Am Freitag, den 14.09.2012, 19:19 -0400 schrieb Gwern Branwen:
> So it occurred to me I should give some feedback. I currently have 2
> major issues with arbtt:
> 
> 1) the performance is terrible. Parsing my 81M of logs takes gigabytes
> of memory; it's so bad that it kills running programs like Firefox.  I
> have to run with +RTS options just to get a result.

Do you have it compiled with ghc 7.4? Performance has improved a lot
since then – I analyze my 40MB of logs in 20 seconds using 1616MB of
memory. But surely it can do better; when I wrote arbtt I did know much
less about Haskell performance than I do now. Also the internal data
structure keeps references to the future and the past (the plan was that
the config language can also query that), but that prevents the GC from
throwing out a lot of things.

> And it doesn't
> help to specify '$sampleage < 24:00', that's just as bad - even though
> one would expect it to since it should be able to only parse a few kb
> at the end of the file! What's laziness for if you're going to analyze
> the entire database...

This is a valid request. Currently, the system makes no assumption of
the ordering of entries in the file. Some code that checks for
$sampleage relations and fast-forwards the log file to the right
position might help (although it still needs to be read linearly, as
there is no seek information and the records are of varying length).

> 2) the config language is highly repetitious. When I look at my config
> (which cribs heavily from the example config in the docs), I see tons
> of verbosity. Consider these basic rules:
> 
>      current window $program == "Navigator" || current window $program
> == "chromium" || current window $program == "epiphany-browser" ||
> current window $title =~ /elinks.*/ ==> tag WWW,
>      current window $program == "evince" || current window $program ==
> "FBReader" || current window $program == "calibre" || current window
> $program == "gscan2pdf" || current window $title =~ /.*pager.*/ ||
> current window $title =~ m!/doc.*! ==> tag PDF,
>      current window $title =~ /.*mplayer.*/ || current window $program
> == "Audacity" || current window $program == "clementine" || current
> window $program == "puddletag" || current window $title =~
> /.*YouTube.*/ || current window $title =~ /.*Vocaloid.*/ || current
> window $title =~ m!.*/r/vocaloid/.*! ==> tag Music,

Ok, that should be fixable easy by allowing a list of values on the RHS
of a == or =~. Is this what you would want to use?

About the performance issues... I’d like to improve the situation, but I
don’t know when I can do that.

Ok, I am just looking for low hanging fruit and a carefully placed
deepseq gets memory consumption down to 874M :-)

Removing the future/past feature strangely does not help. I guess that
is because Data.Binary.Get is strict.


Oh, and while I am at it I implemented "... == [ "x", "y", "z"]" and
"... =~ [ m!regex1!, m!regex2!]" support. Do you want to test it from
http://darcs.nomeata.de/arbtt/ or should I just release it?

Greetings from ICFP,
Joachim


-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20120915/7b47dcc6/attachment.asc>


More information about the arbtt mailing list