NLLB
NLLB (No Language Left Behind) is a large language model for machine translation supporting 200 languages.No Language Left Behind (NLLB) was developed and open-sourced by Meta AI, which is aimed at tackling language barriers across the globe, and is capable of translating content between 200 languages.
This model version boasts a 44% improvement over previous iterations. It also includes translations for languages such as Kamba and Lao that were previously inaccessible. NLLB-200's training involved the creation of a new dataset called Flores-200, composed of data in 200 languages. The evaluation of the model involved 40,000 distinct translation directions, and it can translate content from one language to another without relying on an intermediary language.
The following variants are available for serving on Google Cloud:
This model can be created in a notebook. Click Open notebook to use and fine-tune the model in Colab.
The training process for NLLB-200 consists of three core steps. First, automatic data construction leverages the older LASER model to generate a vast amount of training data for NLLB-200. Next, the modeling of 200 languages employs numerous expert models to cover diverse language categories and data. Regularization techniques are utilized to prevent overfitting. Finally, the performance of NLLB-200 is evaluated using the extensive FLORES dataset, composed of human-translated data. The evaluation demonstrates that NLLB-200 outperforms Meta's earlier machine translation models by 44%.
NLLB-200 is a machine translation model developed by Meta AI with a primary focus on research, particularly for low-resource languages. It offers the ability to translate single sentences across an impressive range of 200 languages. Researchers and the machine translation research community can refer to the Fairseq code repository for instructions on utilizing the model, along with valuable information on training, evaluation, and relevant data references.
In summary, NLLB-200 serves as a valuable research tool, specifically for low-resource languages, enabling single sentence translation capabilities across a vast language spectrum. Researchers should adhere to guidelines provided in the Fairseq code repository for optimal usage. It is important to acknowledge the limitations, as NLLB-200 is not suitable for production, domain-specific texts, or document translation, and longer sequences may impact translation quality. Translation outputs should not be regarded as certified translations.
This section demonstrates deploying the model and running inference for translation
task.
Example deployment (Python)
Example online inference (Python)
Resource ID | Release date | Release stage | Description |
---|---|---|---|
facebook/nllb-200-3.3B | 2023-10-08 | Public GA | NLLB model serving on Vertex AI |
facebook/nllb-200-1.3B | 2023-10-08 | Public GA | NLLB model serving on Vertex AI |
facebook/nllb-200-distilled-1.3B | 2023-10-08 | Public GA | NLLB model serving on Vertex AI |
facebook/nllb-200-distilled-600M | 2023-10-08 | Public GA | NLLB model serving on Vertex AI |
La console Google Cloud n'a pas pu charger les sources JavaScript depuis www.gstatic.com.
Voici les raisons possibles :