arbtt feedback

Joachim Breitner mail at joachim-breitner.de
Sun Sep 16 12:20:01 CEST 2012


Hi Gwern,

Am Samstag, den 15.09.2012, 14:22 -0400 schrieb Gwern Branwen:
> On Sat, Sep 15, 2012 at 6:16 AM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
> > thanks for your feedback. Do you mind moving this to
> > arbtt at lists.nomeata.de?
> 
> I'm not subscribed, so...

that’s ok; but I’d like to have such discussion archived. Let me
rephrase the question: Do you mind if I forward (i.e. redirect) this
thread to the list as well? You do not have to subscribe.

> > Do you have it compiled with ghc 7.4? Performance has improved a lot
> > since then – I analyze my 40MB of logs in 20 seconds using 1616MB of
> > memory. But surely it can do better; when I wrote arbtt I did know much
> > less about Haskell performance than I do now. Also the internal data
> > structure keeps references to the future and the past (the plan was that
> > the config language can also query that), but that prevents the GC from
> > throwing out a lot of things.
> 
> I'm using whatever Debian testing has: 0.6.2-1 & ghc 7.0.4.

GHC in testing is at 7.4. But 0.6.2-1 should have been compiled with
that.

> > This is a valid request. Currently, the system makes no assumption of
> > the ordering of entries in the file. Some code that checks for
> > $sampleage relations and fast-forwards the log file to the right
> > position might help (although it still needs to be read linearly, as
> > there is no seek information and the records are of varying length).
> 
> I'd expect just some sort of filter would give the right behavior -
> 'filter (userInput) $ parse $ read file' if you follow me. with
> $sampleage, this ought to throw out every entry until it hits <24:00
> ago and then the filter starts returning some entries.

I looked at the code again and the problem is that it currently keeps
track of the whole list for several reasons:
 * There can be multiple report being processed.
 * Some global data (i.e. total number of records) is calculated first
and then shared between possibly multiple report passes.
 * Some reports also refer to the non-selected time (“% of total time
selected”). So making arbtt-stats O(1) is a non-trivial refactoring.

> > Ok, that should be fixable easy by allowing a list of values on the RHS
> > of a == or =~. Is this what you would want to use?
> 
> I *think* that should work.
> 
> > Oh, and while I am at it I implemented "... == [ "x", "y", "z"]" and
> > "... =~ [ m!regex1!, m!regex2!]" support. Do you want to test it from
> > http://darcs.nomeata.de/arbtt/ or should I just release it?
> 
> I can't compile it - src/Data.hs spits an error:
> 
> src/Data.hs:31:34:
>     Could not deduce (NFData UTCTime) arising from a use of `deepseq'
>     from the context (NFData a)
>       bound by the instance declaration at src/Data.hs:30:10-44
>     Possible fix:
>       add (NFData UTCTime) to the context of the instance declaration
>       or add an instance declaration for (NFData UTCTime)
>     In the first argument of `deepseq', namely `a `deepseq` b'
>     In the first argument of `deepseq', namely
>       `a `deepseq` b `deepseq` c'
>     In the expression: a `deepseq` b `deepseq` c `deepseq` ()
> cabal: Error: some packages failed to install:
> arbtt-0.6.3 failed during the building phase. The exception was:
> ExitFailure 1

You need time >= 1.4 (which should come with GHC 7.4); I fixed the cabal
file accordingly.

Greetings,
Joachim

-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20120916/16d69ced/attachment.asc>


More information about the arbtt mailing list