Embeddings for Multimodal
Generates vectors based on images, which can be used for downstream tasks like image classification, image search, and so on.Embeddings for Multimodal generates 1408-dimension vectors based on the input you provide, which can include a combination of video, image and/or text. The embedding vectors can then be used for other subsequent tasks like image classification or content moderation.
The video, image and text embedding vector are in the same semantic space with the same dimensionality. Therefore, these vectors can be used interchangeably for use cases like searching video by image, image by text, or searching text by image.
predict_request_gapic.py
and requirements.txt
files from the Cloud Storage bucket (gs://vertex-ai/generative-ai/vision/multimodal embedding/) and install the dependencies in the requirements.Quota exceeded error
If this is the first time you receive this error, email cloud-ai-gen-ai-vision-feedback@google.com with the subject: "Quota requested for MultiModelEmbedding". Include your project number and project ID in the request. Otherwise wait before sending another request. If you need to increase the quota afterwards, send an email with a justification for a sustained quota request.
The follow code samples show you how to submit a text embedding request.
Submit request (Python)
Sample output
Submit request (curl)
Sample output
The embedding is a 1408 float vector. The sample response is abbreviated for space.
The follow code samples show you how to submit an image embedding request.
Submit request (Python)
Sample output
Submit request (curl)
Read the image file into a base64-encoded string:
Then enter the following:
The images can be large, so a best practice is to put the request in a file and specify this information in a curl request.
Alternative curl command
An alternative method is to replace ${BASE64_ENCODED_IMG} with the base64-encoded image and add the following to the request_with_image.json
file:
Then enter the following:
Sample output
The embedding is a 1408 float vector. The sample response is abbreviated for space.
The follow code samples show you how to submit a video embedding request.
Submit request (Python)
Sample output
Submit request (curl)
Sample output
The embedding is a 1408 float vector. The sample response is abbreviated for space.
The follow code samples show you how to submit a text and image embedding request.
Submit request (Python)
Sample output
Submit request (curl)
Replace ${BASE64_ENCODED_IMG} with the actual base64-encoded image and add the following to a request_with_image_and_text.json
file.
Then enter the following:
Sample output
The embedding is a 1408 float vector. The sample response is abbreviated for space.
Sample Generated API Clients (GAPICs) client:
This Colab demonstrates CoCa image embedding end-to-end, including getting embedding from both image and text, indexing the embedding and search using either ScaNN or Vertex Matching Engine.
Resource ID | Release date | Release stage | Description |
---|---|---|---|
multimodalembedding@001 | 2024-02-07 | General Availability | Add Video Modality |
multimodalembedding@001 | 2023-08-07- | General Availability | |
multimodalembedding@001 | 2023-07-17 | Public Preview | |
multimodalembedding@001 | 2023-06-09 | Private Preview refresh | Migrated to Vertex AI infra |
multimodalembedding@001 | 2023-05-10 | Private Preview | Initial release |
Google Cloud Console has failed to load JavaScript sources from www.gstatic.com.
Possible reasons are: