Benchmark Results

Model Performance

Comprehensive evaluation of 11 embedding models across 6 diverse product datasets.

Models Evaluated

Datasets

86.87% R@1

Stanford Online Products

Total Evaluations

Generic Models Leaderboard

About this leaderboard: Each model must retrieve the exact correct product from a catalog of thousands or millions of images using only visual embeddings. No re-ranking, no text search, no filters, no hybrid-search. This is the most difficult single-stage retrieval test. Higher is better, and small differences in this metric translate into large differences in production search quality.

Rank	Model	Provider	Embedding Size	Input Size	Avg. R@1	Avg. R@5	Avg. mAP@20
1	GEM v5.1 (ours)	nyris	768	336	56.84%	72.35%	48.88%
2	DINOv3 ViT-L/16	Meta	1024	224	40.58%	58.35%	29.73%
3	SigLIP2 SO400M	Google	1152	384	40.54%	56.81%	32.98%
4	PE-Core L/14	Meta	1024	336	40.39%	57.42%	32.40%
5	Gemini Embedding 2	Google	3072	N/A	38.87%	56.36%	31.92%
6	Vertex AI Multi-Modal	Google	1408	N/A	38.67%	56.51%	32.14%
7	Cohere Embed v4	Cohere	1536	N/A	33.67%	46.76%	27.07%
8	DINOv2 Large	Meta	1024	224	31.53%	45.50%	21.47%
9	Jina Embeddings v4	Jina AI	2048	Dynamic	27.60%	38.70%	19.86%
10	Nomic Embed MM 3B	Nomic AI	2048	Dynamic	27.17%	38.48%	18.73%

Domain-Specific Models Leaderboard

Specialized models trained for specific product domains, evaluated only on their target datasets.

Model	Provider	Target Domain	Embedding Size	Input Size	R@1	R@5	mAP@20
AEM v1 (ours)	nyris	Specialized	768	336	32.49%	51.60%	35.68%

Retrieval Results Comparison Examples

Click a query image to see how each model ranked the top-5 results.

Query image

Correct match (same product)

Incorrect match (different product)

Home Explore Models