Gemini Embedding 2

Overview

Gemini Embedding 2 Preview is a multimodal embedding model developed by Google and exposed through the Gemini API. The model produces dense vector representations for semantic similarity, clustering, and retrieval across multiple input modalities. It supports text, images, audio, video, and documents, mapping them into a shared embedding space for cross-modal retrieval tasks.

Architecture

Google documents the model's supported modalities and output dimensionality, but does not publish a detailed public architecture description. By default, the API returns 3,072-dimensional embeddings and also supports lower output dimensionalities such as 768 and 1,536.

Capabilities

The model supports cross-modal search, classification, and clustering across text, images, audio, video, and documents in a unified embedding space. It enables semantic similarity comparison and retrieval across more than 100 languages, making it suitable for multilingual and multimodal applications.

Dataset	Category	R@1	R@5	mAP
Stanford Online Products	E-commerce	75.63%	86.69%	55.13%
Products-10K	E-commerce	62.02%	80.29%	42.24%
DIY v1	Hardware/DIY	24.50%	45.59%	32.66%
Automotive v1	Automotive	21.92%	46.12%	27.70%
Clips-and-Connectors v1	Industrial	10.29%	23.12%	1.85%
Average		38.87%	56.36%	31.92%

About This Model

Overview

Architecture

Capabilities

Performance Across Datasets