Datasets Models Results
Models Generic Gemini Embedding 2
Google

Gemini Embedding 2

Google's Gemini Embedding 2 model for multimodal embeddings.

Google Proprietary Multi-Modal
Generic
Model Type
#5
Overall Rank
38.87%
Avg. R@1
31.92%
Avg.
mAP@20
3072
Embedding
Size
N/A
Input
Size
5
Datasets

About This Model

Overview

Gemini Embedding 2 Preview is a multimodal embedding model developed by Google and exposed through the Gemini API. The model produces dense vector representations for semantic similarity, clustering, and retrieval across multiple input modalities. It supports text, images, audio, video, and documents, mapping them into a shared embedding space for cross-modal retrieval tasks.

Architecture

Google documents the model's supported modalities and output dimensionality, but does not publish a detailed public architecture description. By default, the API returns 3,072-dimensional embeddings and also supports lower output dimensionalities such as 768 and 1,536.

Capabilities

The model supports cross-modal search, classification, and clustering across text, images, audio, video, and documents in a unified embedding space. It enables semantic similarity comparison and retrieval across more than 100 languages, making it suitable for multilingual and multimodal applications.

Performance Across Datasets

Dataset Category R@1 R@5 mAP
Stanford Online Products E-commerce 75.63% 86.69% 55.13%
Products-10K E-commerce 62.02% 80.29% 42.24%
DIY v1 Hardware/DIY 24.50% 45.59% 32.66%
Automotive v1 Automotive 21.92% 46.12% 27.70%
Clips-and-Connectors v1 Industrial 10.29% 23.12% 1.85%
Average 38.87% 56.36% 31.92%