Clips-and-Connectors v1

Overview

The Fasteners Dataset is one of the most challenging industrial benchmarks in our evaluation, designed specifically to test instance-level retrieval performance under extreme intra-class similarity and significant synthetic-real domain shift. It contains 12,531 unique fasteners, which includes screws, bolts, nuts, washers, pins, and similar components, organized into broad categories based on their type, mechanical function, and material.

Because many fasteners differ only in minute geometric or dimensional variations, this dataset represents a realistic and highly demanding use case for maintenance workflows.

Dataset Composition

The dataset is composed of two complementary subsets that form the reference and query splits used in our study.

Reference Split

The reference split consists of 200,496 synthetic CAD-rendered images, generated from the 3D models of all 12,531 fasteners. For each fastener, 16 viewpoints were rendered using a controlled multi-camera setup.

These synthetic renders provide clean, canonical representations of the parts, independent of background clutter or lighting variation. The use of CAD renders allows consistent and noise-free reference imagery.

Query Split

The query split comprises 3,624 real-world images captured for a curated subset of 453 fasteners. The number of query images per fastener varies, reflecting natural availability and usage frequency.

These images were taken under uncontrolled conditions, differing in: - Lighting - Perspective - Background - Object placement

This introduces substantial domain shift relative to the synthetic reference renders. Real-world capture conditions often include reflections, shadows, partial occlusions, or industrial workbench environments, all of which increase retrieval difficulty.

Dataset Statistics

No. of	Reference	Query
Images	200,496	3,624
Products	12,531	453
Views per Product	16	—

Rank	Model	Provider	Embedding Size	Input Size	R@1	R@5	mAP@20
1	GEM v5.1 (ours)	nyris	768	336	63.36%	80.16%	38.31%
2	DINOv3 ViT-L/16	Meta	1024	224	26.38%	45.03%	5.94%
3	DINOv2 Large	Meta	1024	224	13.66%	26.43%	2.71%
4	PE-Core L/14	Meta	1024	336	12.14%	26.85%	2.37%
5	SigLIP2 SO400M	Google	1152	384	10.57%	23.12%	2.16%
6	Gemini Embedding 2	Google	3072	N/A	10.29%	23.12%	1.85%
7	Vertex AI Multi-Modal	Google	1408	N/A	8.77%	21.80%	1.83%
8	Nomic Embed MM 3B	Nomic AI	2048	Dynamic	2.46%	6.18%	0.43%
9	Cohere Embed v4	Cohere	1536	N/A	2.40%	6.68%	0.40%
10	Jina Embeddings v4	Jina AI	2048	Dynamic	1.71%	4.69%	0.30%

About This Dataset

Overview

Dataset Composition

Reference Split

Query Split

Dataset Statistics

Model Performance on Clips-and-Connectors v1

Sample Images