Google Cloud console

Pic2Word Composed Image Retrieval

Pic2Word is a state of the art image retrieval model.

Overview

Pic2Word, an image-based retrieval model, was produced by a collaboration between Google Cloud AI and Boston University researchers and released Composed Image Retrieval on GitHub.

Pic2Word was first described in the paper "Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval" by Saito et al (2023). It is trained with Conceptual Caption URLs.

The paper proposes a novel method for zero-shot composed image retrieval that is trainable using only image-caption and unlabeled image datasets, rather than existing training methods that require the use of labeled triplets consisting of the query image, text specification, and the target image. Pic2Word leverages pre-trained vision-language models and transforms an input image to a language token to compose image and text queries. It uses a pretrained CLIP model to encode images as text tokens. This approach outperforms several existing supervised training methods on benchmarks.

Use cases

Multimodal image retrieval: taking as input an original image and a text prompt, the model retrieves a set of similar images from an indexed dataset. For example, if the input image is a prompt of "a painting of *" and the image is a photo of a ship, the retrieved output will be a set of paintings of boats and images similar to paintings of boats.

Documentation

Get started

This model can be used in a notebook. Click Open notebook to use the model in Colab.

Dataset and training

The model was pretrained on Conceptual Caption URLs.

Model output

Taking as input an image and a text prompt, the model produces a set of images most closely matching the combined image-text query.

Best practices and limitations

Use the "*" character in a prompt to indicate the token to be replaced with an image token. e.g., "a sketch of *"
Can input multiple images and multiple text prompts in a single query

Versions

Resource ID	Release date	Release stage	Description
google/pic2word	2024-04-01	General Availability	Serving

Attività

Recupero

Livello di abilità

Principiante

Pricing

Vertex AI custom training and prediction pricing

Overview

Use cases

Documentation

Get started

Dataset and training

Model output

Best practices and limitations

Versions

Links

ID modello

Nome versione

Tag

Attività

Livello di abilità

Pricing

Your page may be loading slowly because you're building optimized sources. If you intended on using uncompiled sources, please click this link.

Pic2Word Composed Image Retrieval anteprima

Overview

Use cases

Documentation

Get started

Dataset and training

Model output

Best practices and limitations

Versions

Links

ID modello

Nome versione

Tag

Attività

Livello di abilità

Pricing

Your page may be loading slowly because you're building optimized sources. If you intended on using uncompiled sources, please click this link.