Clips-and-Connectors v1
About This Dataset
Overview
The Fasteners Dataset is one of the most challenging industrial benchmarks in our evaluation, designed specifically to test instance-level retrieval performance under extreme intra-class similarity and significant synthetic-real domain shift. It contains 12,531 unique fasteners, which includes screws, bolts, nuts, washers, pins, and similar components, organized into broad categories based on their type, mechanical function, and material.
Because many fasteners differ only in minute geometric or dimensional variations, this dataset represents a realistic and highly demanding use case for maintenance workflows.
Dataset Composition
The dataset is composed of two complementary subsets that form the reference and query splits used in our study.
Reference Split
The reference split consists of 200,496 synthetic CAD-rendered images, generated from the 3D models of all 12,531 fasteners. For each fastener, 16 viewpoints were rendered using a controlled multi-camera setup.
These synthetic renders provide clean, canonical representations of the parts, independent of background clutter or lighting variation. The use of CAD renders allows consistent and noise-free reference imagery.
Query Split
The query split comprises 3,624 real-world images captured for a curated subset of 453 fasteners. The number of query images per fastener varies, reflecting natural availability and usage frequency.
These images were taken under uncontrolled conditions, differing in: - Lighting - Perspective - Background - Object placement
This introduces substantial domain shift relative to the synthetic reference renders. Real-world capture conditions often include reflections, shadows, partial occlusions, or industrial workbench environments, all of which increase retrieval difficulty.
Dataset Statistics
| No. of | Reference | Query |
|---|---|---|
| Images | 200,496 | 3,624 |
| Products | 12,531 | 453 |
| Views per Product | 16 | — |
Model Performance on Clips-and-Connectors v1
| Rank | Model | Provider | Embedding Size |
Input Size |
R@1 | R@5 | mAP@20 |
|---|---|---|---|---|---|---|---|
| 1 | GEM v5.1 (ours) | nyris | 768 | 336 | 63.36% | 80.16% | 38.31% |
| 2 | DINOv3 ViT-L/16 | Meta | 1024 | 224 | 26.38% | 45.03% | 5.94% |
| 3 | DINOv2 Large | Meta | 1024 | 224 | 13.66% | 26.43% | 2.71% |
| 4 | PE-Core L/14 | Meta | 1024 | 336 | 12.14% | 26.85% | 2.37% |
| 5 | SigLIP2 SO400M | 1152 | 384 | 10.57% | 23.12% | 2.16% | |
| 6 | Gemini Embedding 2 | 3072 | N/A | 10.29% | 23.12% | 1.85% | |
| 7 | Vertex AI Multi-Modal | 1408 | N/A | 8.77% | 21.80% | 1.83% | |
| 8 | Nomic Embed MM 3B | Nomic AI | 2048 | Dynamic | 2.46% | 6.18% | 0.43% |
| 9 | Cohere Embed v4 | Cohere | 1536 | N/A | 2.40% | 6.68% | 0.40% |
| 10 | Jina Embeddings v4 | Jina AI | 2048 | Dynamic | 1.71% | 4.69% | 0.30% |
Sample Images
Curated query-reference pairs from this dataset. Each row shows a query image and its matching reference images.