Embedding Models
Generic Models
General-purpose embedding models evaluated across all benchmark datasets.
| Rank | Model | Accessibility | Modality | Embedding Size |
Input Size |
Avg. R@1 | Avg. R@5 | Avg. mAP@20 |
|
|---|---|---|---|---|---|---|---|---|---|
| 1 |
GEM v5.1 (ours)
|
In-House | Vision | 768 | 336 | 56.84% | 72.35% | 48.88% | |
| 2 |
DINOv3 ViT-L/16
|
Open Source | Vision | 1024 | 224 | 40.58% | 58.35% | 29.73% | |
| 3 |
SigLIP2 SO400M
|
Open Source | Vision | 1152 | 384 | 40.54% | 56.81% | 32.98% | |
| 4 |
PE-Core L/14
|
Open Source | Vision | 1024 | 336 | 40.39% | 57.42% | 32.40% | |
| 5 |
Gemini Embedding 2
|
Commercial | Multi-Modal | 3072 | N/A | 38.87% | 56.36% | 31.92% | |
| 6 |
Vertex AI Multi-Modal
|
Commercial | Multi-Modal | 1408 | N/A | 38.67% | 56.51% | 32.14% | |
| 7 |
Cohere Embed v4
|
Commercial | Multi-Modal | 1536 | N/A | 33.67% | 46.76% | 27.07% | |
| 8 |
DINOv2 Large
|
Open Source | Vision | 1024 | 224 | 31.53% | 45.50% | 21.47% | |
| 9 |
Jina Embeddings v4
|
Open Source | Multi-Modal | 2048 | Dynamic | 27.60% | 38.70% | 19.86% | |
| 10 |
Nomic Embed MM 3B
|
Open Source | Multi-Modal | 2048 | Dynamic | 27.17% | 38.48% | 18.73% |
Domain-Specific Models
Specialized models trained for specific product domains, evaluated only on their target datasets.
| Model | Accessibility | Target Domain | Embedding Size |
Input Size |
R@1 | R@5 | mAP@20 | |
|---|---|---|---|---|---|---|---|---|
AEM v1 (ours)
|
In-House | Automotive | 768 | 336 | 32.49% | 51.60% | 35.68% |
Model Accessibility
Freely available model weights that can be deployed on your own infrastructure. Offers flexibility and control over the inference pipeline.
Third-party models accessed via API or commercial license (e.g. Vertex AI, Cohere). Typically hosted and maintained by the provider.
Models developed and operated in-house (e.g. GEM, AEM). Optimized for specific product search and domain workflows.
Input Resolution
Input resolution is not available from the model's official documentation or specifications.
The input size is dynamically adjusted based on the model's resizing scheme and image resolution.