Listing samples which are not matched by any tags?

Sat Sep 6 04:43:07 CEST 2014

On Wed, Sep 3, 2014 at 3:32 PM, Gwern Branwen <gwern at gwern.net> wrote:
> 3. don't try to classify everything
>
>     You will never classify 100% of samples because sometimes programs
> do not include useful X properties & cannot be fixed, you have samples
> from before you fixed them, or they are too transient (like popups and
> dialogues) to be worth fixing. It is not necessary to classify 100% of
> your time, since as long as the most common programs and, say,
> [80%](https://en.wikipedia.org/wiki/Pareto_principle) of your time is
> classified, then you have most of the value. It is easy to waste more
> time tweaking arbtt than one gains from increased accuracy or more
> finely-grained tags.

A fourth guideline just occurred to me.

4. avoid large and microscopic tags

    If a tag takes up more than a third or so of your time, it is
probably too large, masks variation, and can be broken down into more
meaningful tags. Conversely, a tag too narrow to show up regularly in
reports (because it is below the default 1% filter) may not be helpful
because it is usually tiny, and can be combined with the most similar
tag to yield more compact and easily interpreted reports.

-- 
gwern
http://www.gwern.net