Datasets Models Results
Datasets Stanford Online Products

Stanford Online Products

Classic academic benchmark with product images from eBay across 12 categories.

E-commerce Intra-Retrieval Closed-Set Gallery Public
60,502
Query Images
60,502
Gallery Images
11,316
Products (Query)
11,316
Products (Gallery)
Academic
Source

About This Dataset

Overview

The Stanford Online Products (SOP) dataset is a widely used benchmark for deep metric learning and instance-level image retrieval. It contains 120,053 images representing 22,634 products across 12 broad categories, collected from real e-commerce listings on platforms such as eBay. These categories span household items, apparel, electronics, and accessories, introducing substantial variation in object appearance and imaging conditions.

SOP is particularly challenging because each product instance is represented by only a small number of images, forcing models to learn fine-grained distinctions rather than relying on broad category differences.

Dataset Composition

The dataset is evenly divided into a training and test split. The first half consists of 11,318 products with 59,551 images, designated for training. The second half contains 11,316 products with 60,502 images, reserved for testing.

In this study, we focus exclusively on the test split and evaluate models using a self-retrieval protocol, where each query image must retrieve other images belonging to the same product instance.

Dataset Statistics

No. of Train Test
Images 59,551 60,502
Categories 12 12
Products 11,318 11,316

References

  1. Oh Song, Hyun, et al. "Deep metric learning via lifted structured feature embedding." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Model Performance on Stanford Online Products

Rank Model Provider Embedding
Size
Input
Size
R@1 R@5 mAP@20
1 GEM v5.1 (ours) nyris 768 336 86.87% 94.17% 72.45%
2 SigLIP2 SO400M Google 1152 384 80.28% 90.01% 60.79%
3 PE-Core L/14 Meta 1024 336 80.09% 89.83% 59.46%
4 Vertex AI Multi-Modal Google 1408 N/A 76.88% 87.73% 56.32%
5 Gemini Embedding 2 Google 3072 N/A 75.63% 86.69% 55.13%
6 Cohere Embed v4 Cohere 1536 N/A 68.00% 79.76% 45.09%
7 DINOv3 ViT-L/16 Meta 1024 224 66.61% 77.92% 42.47%
8 Jina Embeddings v4 Jina AI 2048 Dynamic 59.48% 72.24% 35.15%
9 Nomic Embed MM 3B Nomic AI 2048 Dynamic 56.92% 69.34% 32.60%
10 DINOv2 Large Meta 1024 224 56.34% 67.74% 31.91%

Sample Images

Curated query-reference pairs from this dataset. Each row shows a query image and its matching reference images.

Query Image
Reference Image
Full size image