Listing samples which are not matched by any tags?

Gwern Branwen gwern at gwern.net
Sun Jun 29 23:35:49 CEST 2014


On Sun, Nov 3, 2013 at 6:36 PM, Gwern Branwen <gwern at gwern.net> wrote:
> On Sun, Nov 3, 2013 at 5:31 PM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
>> But now you surely want to know what these selected samples look like,
>> right? That leads us to the discussion we had on the list with Waldir
>> in June: What should the tool like like that combines the dumping of
>> arbtt-dump with the sample selection of arbtt-stats... I’m unsure about
>> the proper design here.
>
> To me, it seems pretty simple. Keeping the same interface, apply a
> categorize.cfg's set of rules to each sample and then print or not
> based on what tags matched or didn't match.

Has there been any more thought on this issue? After repairing my logs
& working out how to use the CSV, I wondered how much data I was
missing due to a lack of matching tag. This apparently is reported by
the -i flag. Even after adding some more tagging, this is what I get:

    $ arbtt-stats -i -m 0 -f '$sampleage <100:00'
    General Information
    ===================
                           FirstRecord | 2014-06-26 01:33:16.291076 UTC
                            LastRecord | 2014-06-29 21:28:30.625435 UTC
                     Number of records |                           7485
                   Total time recorded |                    3d19h31m00s
                   Total time selected |                    1d12h41m10s
       Fraction of total time recorded |                           100%
       Fraction of total time selected |                            40%
    Fraction of recorded time selected |                            40%

Given the existence of the flag '--also-inactive         include
samples with the tag "inactive"', I infer all this recorded time
reported is active time. But that means fully *60%* of my activity is
not being classified in any way! That's a heck of a lot of lost data.

And I don't know what the lost data is: I already classified
everything I could think of. What am I missing? I have no way of
knowing unless arbtt will tell me and give me samples of active time
which don't match so I can go 'aha, I need to classify $X/Y/Z as tag
A! Much better.'

-- 
gwern
http://www.gwern.net




More information about the arbtt mailing list