Datasets Models Results
Benchmark Results

Model Performance

Comprehensive evaluation of 11 embedding models across 6 diverse product datasets.

11
Models Evaluated
6
Datasets
86.87% R@1
Stanford Online Products
60
Total Evaluations

Generic Models Leaderboard

About this leaderboard: Each model must retrieve the exact correct product from a catalog of thousands or millions of images using only visual embeddings. No re-ranking, no text search, no filters, no hybrid-search. This is the most difficult single-stage retrieval test. Higher is better, and small differences in this metric translate into large differences in production search quality.

Rank Model Provider Embedding
Size
Input
Size
Avg. R@1 Avg. R@5 Avg.
mAP@20
1
nyris
GEM v5.1 (ours)
nyris 768 336 56.84% 72.35% 48.88%
2
Meta
DINOv3 ViT-L/16
Meta 1024 224 40.58% 58.35% 29.73%
3
Google
SigLIP2 SO400M
Google 1152 384 40.54% 56.81% 32.98%
4
Meta
PE-Core L/14
Meta 1024 336 40.39% 57.42% 32.40%
5
Google
Gemini Embedding 2
Google 3072 N/A 38.87% 56.36% 31.92%
6
Google
Vertex AI Multi-Modal
Google 1408 N/A 38.67% 56.51% 32.14%
7
Cohere
Cohere Embed v4
Cohere 1536 N/A 33.67% 46.76% 27.07%
8
Meta
DINOv2 Large
Meta 1024 224 31.53% 45.50% 21.47%
9
Jina AI
Jina Embeddings v4
Jina AI 2048 Dynamic 27.60% 38.70% 19.86%
10
Nomic AI
Nomic Embed MM 3B
Nomic AI 2048 Dynamic 27.17% 38.48% 18.73%

Domain-Specific Models Leaderboard

Specialized models trained for specific product domains, evaluated only on their target datasets.

Model Provider Target Domain Embedding
Size
Input
Size
R@1 R@5 mAP@20
nyris
AEM v1 (ours)
nyris Specialized 768 336 32.49% 51.60% 35.68%

Retrieval Results Comparison Examples

Click a query image to see how each model ranked the top-5 results.

Query image
Correct match (same product)
Incorrect match (different product)
Full size image