Gemini 1.0 Pro Vision
Created to be multimodal (text, images, code) and to scale across a wide range of tasksGemini 1.0 Pro Vision is a Gemini large language vision model that understands input from text and visual modalities (image and video) in addition to text to generate relevant text responses.
Gemini 1.0 Pro Vision is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.
Model name | Input data | Output data | Description |
---|---|---|---|
Gemini 1.5 Flash | Text, image, video, documents and audio | Text | A lightweight model optimized for speed and efficiency. Good for multimodal, high-volume tasks and latency-sensitive applications. |
Gemini 1.5 Pro | Text, image, video, documents and audio | Text | Created to be multimodal (text, images, audio, documents, code, videos) and to scale across a wide range of tasks with up to 1M input tokens |
Gemini 1.0 Pro | Text | Text | Designed to balance quality, performance, and cost for tasks such as content generation, editing, summarization, and classification |
Gemini 1.0 Pro Vision | Image and text | Text | Created to be multimodal (text, images, code) and to scale across a wide range of tasks |
You can use Vertex AI Studio to experiment with Gemini 1.0 Pro Vision in the Google Cloud console. You can also use the command line or integrate it in your application using Python.
Enable the Vertex AI API. For more information on getting set up on Google Cloud, see Get set up on Google Cloud.
To use Gemini 1.0 Pro Vision in Vertex AI Studio, click Open Vertex AI studio. In Vertex AI Studio, you can enter a sample prompt then click Submit to view the output generated by Gemini 1.0 Pro Vision.
To use Gemini 1.0 Pro Vision with the command line interface (CLI), do the following:
YOUR_PROJECT_ID
with the ID of your Google Cloud project.You can also replace the streamGenerateContent
method with generateContent
to receive
non-streaming responses. Streaming involves receiving responses as they are generated.
For more information, see the Gemini API reference.
Before trying this sample, follow the Python setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Python API reference documentation. To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
YOUR_PROJECT_ID
with your Google Cloud project ID.For more information, see the Gemini SDK reference.
The Gemini Chat Completions API lets you send requests to the Vertex AI Gemini API by using the OpenAI libraries for Python and REST. If you are already using the OpenAI libraries, you can use this API to switch between calling OpenAI models and Gemini models to compare output, cost, and scalability, without changing your existing code. If you are not already using the OpenAI libraries, we recommend that you call the Gemini API directly. To learn more, view the documentation.
Start by installing the OpenAI SDK:
Next, you can either modify your client setup or change your environment configuration to use Google authentication and a Vertex AI endpoint.
To programmatically get Google credentials in Python, you can use the google-auth
Python SDK:
Change the OpenAI SDK to point to the Vertex AI chat completions endpoint:
By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the openai.api_key
variable.
The OpenAI SDK can read the OPENAI_API_KEY
and OPENAI_BASE_URL
environment variables to change the authentication and endpoint in their default client. After you have installed gcloud, set the following variables, replacing YOUR_PROJECT_ID
and YOUR_LOCATION
:
Next, initialize the client:
OpenAI uses an API key to authenticate their requests. When you use the API with Google Cloud, you use an OAuth credential, such as a service account token, which is a short-lived access token. By default, access tokens last for 1 hour. You can extend the life of your access token or periodically refresh your token and update the OPENAI_API_KEY
environment variable.
The sample below is for a unary (non-streaming) request:
You should receive a response similar to the following:
Resource ID | Release date | Release stage | Description |
---|---|---|---|
gemini-1.0-pro-vision-001 | 2024-02-15 | General Availability | Adds non-streaming (unary) API and additional languages |
gemini-1.0-pro-vision | 2024-01-04 | General Availability |
Google Cloud Console has failed to load JavaScript sources from www.gstatic.com.
Possible reasons are: