Distinctive Image Captioning via CLIP Guided Group Optimization


Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge,

Annotations


what if use a group of images to compare against the rest of the images in the evaluation set, then get a more distinctive caption for that group of images?

how does it generate candidate phrases?

  • constructthe similar image group within the same split. The K similar images are retrievedaccording to the semantic similarity between their ground truth captions and thetarget image I0 by image-to-text retrieval. * show annotation

  • The training process uses similar image groups in both forward andbackward propagation. However, even though the evaluation of distinctiveness ofcaptions involves their similar image groups, the model is unaware of the similarimage groups and only takes target images as input in the test set at inferencetime. Therefore, we claim that similar image groups supervise the model to learndistinctive features of images in a contrastive way. * show annotation