Why is Qwen2-0.5B trained on much more data than the larger models? [D]

old.reddit.com

Why is Qwen2-0.5B trained on much more data than the larger models? [D]

old.reddit.com

Lemmit.Online botMAB to

Machine LearningEnglish · 18 hours ago

I'm reading through the [Qwen2](https://arxiv.org/abs/2407.10671) paper. Something escapes my limited comprehension - Section 3.1 > ... the...

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/datashri on 2025-06-21 05:46:46+00:00.

I’m reading through the Qwen2 paper.

Something escapes my limited comprehension -

Section 3.1

… the pre-training data was expanded from 3 trillion tokens in Qwen1.5 (Qwen Team, 2024a) to 7 trillion tokens. An attempt to further relax the quality threshold resulted in a 12 trillion token dataset. However, the model trained on this dataset did not show a significant performance improvement over the 7 trillion token model. It is suspected that increasing the volume of data does not necessarily benefit model pre-training.

So higher quality smaller dataset is better. Got it.

All Qwen2 dense models, excluding Qwen2-0.5B, were pre-trained on this large-scale dataset of over 7 trillion tokens. Qwen2-0.5B were pre-trained using the 12 trillion token dataset.

How is it conceivable to train that tiny model on the humongous but lower quality dataset?? My modest intellect feels borderline abused.

Appreciate any tips to guide my understanding.

You must log in or register to comment.

Chat

Machine Learning

machinelearning

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
8 users / 6 months
1 local subscriber
19 subscribers
2.31K Posts
1 Comment
Modlog

mods:
Lemmit.Online bot