Domino + Fine Grained CLIP


Created: 02 Feb 2023, 05:29 PM | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge,


Try to see if CLIP is fine grained enough to determine differences in some imagenet classification problem where it only revolves around 1 class i.e. find subclasses within the 1 class

Ethical issues - https://oatml.cs.ox.ac.uk/blog/2021/06/27/web-scraped-harmful.html

  • going back to classification with imagenet, but trying for a single class / single group of classes

tried to do for group of classes that make sense with “car”, but it doesnt seem to be very descriptive

tried with single class, see if it is able to find clusters for single class, and describe it in a manner that is descriptive enough

  • e.g. a photo of a car in winter conditions?

intuition based on current testing is that it can’t describe to that level of detail

need to update the phrase templates / mask somehow, or use a version of CLIP pretrained on automotive image-text pairs

but mask is already “a photo of a [xxxx] {} [xxxx]”

results not great, does not seem to show much descriptions for the 1 class