Lemmit.Online bot

Lemmit.Online bot

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Jesse_marqo on 2024-10-20 21:51:41+00:00.

We have finally released the Marqo Google Shopping 10 million dataset on Hugging Face (Marqo-GS-10M). One of the largest and richest datasets for multimodal product retrieval!

10M rows of query, product title, image and rank (1-100)
~100k unique queries
~5M unique products across fashion and home
Reflects real-world data and use cases and serves as a good benchmark for method development
Proper data splits, in-domain, novel query, novel document and novel-document and novel query.

The dataset features detailed relevance scores for each query-document pair to facilitate future research and evaluation.

!pip install datasets
from datasets import load_dataset
ds = load_dataset("Marqo/marqo-GS-10M")

We curated this large-scale dataset as part of the publication of our training framework: Generalized Contrastive Learning (GCL).

Dataset:

GCL:

Paper:

[R] Google Shopping 10M dataset for large scale multimodal product retrieval and ranking

[R] Google Shopping 10M dataset for large scale multimodal product retrieval and ranking

This is an automated archive made by the Lemmit Bot.