This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/Jesse_marqo on 2024-10-20 21:51:41+00:00.
We have finally released the Marqo Google Shopping 10 million dataset on Hugging Face (Marqo-GS-10M). One of the largest and richest datasets for multimodal product retrieval!
- 10M rows of query, product title, image and rank (1-100)
- ~100k unique queries
- ~5M unique products across fashion and home
- Reflects real-world data and use cases and serves as a good benchmark for method development
- Proper data splits, in-domain, novel query, novel document and novel-document and novel query.
The dataset features detailed relevance scores for each query-document pair to facilitate future research and evaluation.
!pip install datasets
from datasets import load_dataset
ds = load_dataset("Marqo/marqo-GS-10M")
We curated this large-scale dataset as part of the publication of our training framework: Generalized Contrastive Learning (GCL).
Dataset:
GCL:
Paper:
You must log in or register to comment.