Gemini Embedding 2
mAP@20
Size
Size
About This Model
Overview
Gemini Embedding 2 Preview is a multimodal embedding model developed by Google and exposed through the Gemini API. The model produces dense vector representations for semantic similarity, clustering, and retrieval across multiple input modalities. It supports text, images, audio, video, and documents, mapping them into a shared embedding space for cross-modal retrieval tasks.
Architecture
Google documents the model's supported modalities and output dimensionality, but does not publish a detailed public architecture description. By default, the API returns 3,072-dimensional embeddings and also supports lower output dimensionalities such as 768 and 1,536.
Capabilities
The model supports cross-modal search, classification, and clustering across text, images, audio, video, and documents in a unified embedding space. It enables semantic similarity comparison and retrieval across more than 100 languages, making it suitable for multilingual and multimodal applications.
Performance Across Datasets
| Dataset | Category | R@1 | R@5 | mAP |
|---|---|---|---|---|
| Stanford Online Products | E-commerce | 75.63% | 86.69% | 55.13% |
| Products-10K | E-commerce | 62.02% | 80.29% | 42.24% |
| DIY v1 | Hardware/DIY | 24.50% | 45.59% | 32.66% |
| Automotive v1 | Automotive | 21.92% | 46.12% | 27.70% |
| Clips-and-Connectors v1 | Industrial | 10.29% | 23.12% | 1.85% |
| Average | 38.87% | 56.36% | 31.92% | |