Listing samples which are not matched by any tags?

Sat Jul 5 13:42:34 CEST 2014

Dear Gwern,

(sorry for the late reply)

Am Sonntag, den 29.06.2014, 17:35 -0400 schrieb Gwern Branwen:
> On Sun, Nov 3, 2013 at 6:36 PM, Gwern Branwen <gwern at gwern.net> wrote:
> > On Sun, Nov 3, 2013 at 5:31 PM, Joachim Breitner
> > <mail at joachim-breitner.de> wrote:
> >> But now you surely want to know what these selected samples look like,
> >> right? That leads us to the discussion we had on the list with Waldir
> >> in June: What should the tool like like that combines the dumping of
> >> arbtt-dump with the sample selection of arbtt-stats... I’m unsure about
> >> the proper design here.
> >
> > To me, it seems pretty simple. Keeping the same interface, apply a
> > categorize.cfg's set of rules to each sample and then print or not
> > based on what tags matched or didn't match.
> 
> Has there been any more thought on this issue?

Have you had a look at the features in 0.8? I believe they (partly)
address the issue:

 * arbtt-stats can print the actual samples selected, with 
   --dump-samples.
 (http://darcs.nomeata.de/arbtt/doc/users_guide/release-notes.html)

> After repairing my logs
> & working out how to use the CSV, I wondered how much data I was
> missing due to a lack of matching tag. This apparently is reported by
> the -i flag. Even after adding some more tagging, this is what I get:
> 
>     $ arbtt-stats -i -m 0 -f '$sampleage <100:00'
>     General Information
>     ===================
>                            FirstRecord | 2014-06-26 01:33:16.291076 UTC
>                             LastRecord | 2014-06-29 21:28:30.625435 UTC
>                      Number of records |                           7485
>                    Total time recorded |                    3d19h31m00s
>                    Total time selected |                    1d12h41m10s
>        Fraction of total time recorded |                           100%
>        Fraction of total time selected |                            40%
>     Fraction of recorded time selected |                            40%
> 
> Given the existence of the flag '--also-inactive         include
> samples with the tag "inactive"', I infer all this recorded time
> reported is active time. But that means fully *60%* of my activity is
> not being classified in any way! That's a heck of a lot of lost data.

I believe you understood the flag the wrong way around: Without
--also-inactive, inactive times are _not_ counted as selected. So the
40% in your report should go up when you use "--also-inactive".

Also, your --filter in the above command will have everything that is
older than 100h (if there is any) to be considered as not selected.

> And I don't know what the lost data is: I already classified
> everything I could think of. What am I missing? I have no way of
> knowing unless arbtt will tell me and give me samples of active time
> which don't match so I can go 'aha, I need to classify $X/Y/Z as tag
> A! Much better.'

What if you have a tag "current-program" that will always be present?
With such a tag, the feature you describe is useless. I guess you mean
“show me the data from samples that are not categorized into one of
these tags:....”. But that is already possible:

$ arbtt-stats --dump-samples --filter '$sampleage < 1:00' -x Web -x Project: -x ...

Greetings,
Joachim

-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140705/24ad4c43/attachment.asc>