Model Performance
11
Models Evaluated
6
Datasets
86.87% R@1
Stanford Online Products
60
Total Evaluations
Generic Models Leaderboard
About this leaderboard: Each model must retrieve the exact correct product from a catalog of thousands or millions of images using only visual embeddings. No re-ranking, no text search, no filters, no hybrid-search. This is the most difficult single-stage retrieval test. Higher is better, and small differences in this metric translate into large differences in production search quality.
| Rank | Model | Provider | Embedding Size |
Input Size |
Avg. R@1 | Avg. R@5 | Avg. mAP@20 |
|---|---|---|---|---|---|---|---|
| 1 |
|
nyris | 768 | 336 | 56.84% | 72.35% | 48.88% |
| 2 |
|
Meta | 1024 | 224 | 40.58% | 58.35% | 29.73% |
| 3 |
|
1152 | 384 | 40.54% | 56.81% | 32.98% | |
| 4 |
|
Meta | 1024 | 336 | 40.39% | 57.42% | 32.40% |
| 5 |
|
3072 | N/A | 38.87% | 56.36% | 31.92% | |
| 6 |
|
1408 | N/A | 38.67% | 56.51% | 32.14% | |
| 7 |
|
Cohere | 1536 | N/A | 33.67% | 46.76% | 27.07% |
| 8 |
|
Meta | 1024 | 224 | 31.53% | 45.50% | 21.47% |
| 9 |
|
Jina AI | 2048 | Dynamic | 27.60% | 38.70% | 19.86% |
| 10 |
|
Nomic AI | 2048 | Dynamic | 27.17% | 38.48% | 18.73% |
Domain-Specific Models Leaderboard
Specialized models trained for specific product domains, evaluated only on their target datasets.
| Model | Provider | Target Domain | Embedding Size |
Input Size |
R@1 | R@5 | mAP@20 |
|---|---|---|---|---|---|---|---|
|
nyris | Specialized | 768 | 336 | 32.49% | 51.60% | 35.68% |
Retrieval Results Comparison Examples
Click a query image to see how each model ranked the top-5 results.
Query image
Correct match (same product)
Incorrect match (different product)