Gemma
Open checkpoint variants of Google Deepmind's Gemini model suited for a variety of text generation tasksGemma is a family of lightweight, state-of-the-art open models built from research and technology used to create Google Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.
Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.
This model card includes the 2B and 7B model variants.
Open Large Language Models (LLMs) have a wide range of applications across various industries and domains. The following list of potential uses is not comprehensive. The purpose of this list is to provide contextual information about the possible use-cases that the model creators considered as part of model training and development.
You can deploy Gemma to Vertex AI or Google Kubernetes Engine (GKE).
These models were trained on a large dataset of text data that includes a wide variety of sources, totaling 8 trillion tokens. Here are the key components:
The combination of these diverse data sources is crucial for training a powerful language model that can handle a wide variety of different tasks and text formats.
Here are the key data cleaning and filtering methods applied to the training data:
Preview
This feature is a preview offering, subject to the "Pre-GA Offerings Terms" of the Service Specific Terms. Pre-GA products and features may have limited support, and changes to pre-GA products and features may not be compatible with other pre-GA versions. For more information, see the launch stage descriptions.The Fast Deployment feature prioritizes speed for model exploration, making it ideal for initial testing and experimentation. For sensitive data or production workloads, use the Standard environment for enhanced security and stability.
Gemma was trained using the latest generation of Tensor Processing Unit (TPU) hardware (TPUv5e).
Training large language models requires significant computational power. TPUs, designed specifically for matrix operations common in machine learning, offer several advantages in this domain:
These advantages are aligned with Google's commitments to operate sustainably.
Training was done using JAX and ML Pathways.
JAX allows researchers to leverage the latest generation of hardware, including TPUs, for faster and more efficient training of large models.
ML Pathways is Google's latest effort to build artificially intelligent systems capable of generalizing across multiple tasks. This is specially suitable for foundation models, including large language models like these ones.
Together, JAX and ML Pathways are used as described in the paper about the Gemini family of models; "the 'single controller' programming model of Jax and Pathways allows a single Python process to orchestrate the entire training run, dramatically simplifying the development workflow."
These models were evaluated against a large collection of different datasets and metrics to cover different aspects of text generation:
Benchmark | Metric | 2B Params | 7B Params |
---|---|---|---|
MMLU | 5-shot, top-1 | 42.3 | 64.3 |
HellaSwag | 0-shot | 71.4 | 81.2 |
PIQA | 0-shot | 77.3 | 81.2 |
SocialIQA | 0-shot | 49.7 | 51.8 |
BoolQ | 0-shot | 69.4 | 83.2 |
WinoGrande | partial score | 65.4 | 72.3 |
CommonsenseQA | 7-shot | 65.3 | 71.3 |
OpenBookQA | 47.8 | 52.8 | |
ARC-e | 73.2 | 81.5 | |
ARC-c | 42.1 | 53.2 | |
TriviaQA | 5-shot | 53.2 | 63.4 |
Natural Questions | 5-shot | 12.5 | 23.0 |
HumanEval | pass@1 | 22.0 | 32.3 |
MBPP | 3-shot | 29.2 | 44.4 |
GSM8K | maj@1 | 17.7 | 46.4 |
MATH | 4-shot | 11.8 | 24.3 |
AGIEval | 24.2 | 41.7 | |
BIG-Bench | 35.2 | 55.1 | |
Average | 44.9 | 56.4 |
Our evaluation methods include structured evaluations and internal red-teaming testing of relevant content policies. Red-teaming was conducted by a number of different teams, each with different goals and human evaluation metrics. These models were evaluated against a number of different categories relevant to ethics and safety, including:
The results of ethics and safety evaluations are within acceptable thresholds for meeting internal policies for categories such as child safety, content safety, representational harms, memorization, large-scale harms.
On top of robust internal evaluation, we report numbers on safety benchmarks like BBQ, BOLD, Winogender, Winoboas, RealToxicity, and TruthfulQA are shown here.
Benchmark | Metric | 2B Params | 7B Params |
---|---|---|---|
RealToxicity | average | 6.86 | 7.90 |
BOLD | 45.57 | 49.08 | |
CrowS-Pairs | top-1 | 45.82 | 51.33 |
BBQ Ambig | 1-shot, top-1 | 62.58 | 92.54 |
BBQ Disambig | top-1 | 54.62 | 71.99 |
Winogender | top-1 | 51.25 | 54.17 |
TruthfulQA | 44.84 | 31.81 | |
Winobias 1_2 | 56.12 | 59.09 | |
Winobias 2_2 | 91.10 | 92.23 | |
Toxigen | 29.77 | 39.59 |
Like any large language model, these models have certain limitations that users should be aware of.
The development of large language models (LLMs) raises several ethical concerns. In creating an open model, we have carefully considered the following:
Resource ID | Release date | Release stage | Description |
---|---|---|---|
gemma | 2024-02-21 | GA |
By using, reproducing, modifying, distributing, performing or displaying any portion or element of Gemma, Model Derivatives including via any Hosted Service, (each as defined below) (collectively, the "Gemma Services") or otherwise accepting the terms of this Agreement, you agree to be bound by this Agreement.
Google reserves the right to update this Gemma Prohibited Use Policy from time to time.
You may not use nor allow others to use Gemma or Model Derivatives to:
Google Cloud Console has failed to load JavaScript sources from www.gstatic.com.
Possible reasons are: