This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/lambda-research on 2024-10-16 18:57:34+00:00.


Hey All,

We’ve been writing a technical guide on how to scale training code from single GPU all the way to multiple nodes.

It’s centered around training LLMs, and goes over things like DDP, FSDP, diagnosing errors/logging, and way more.

Tried to make the code and explanations as clear and simple as possible, let us know if you find it helpful!

Contributions welcome and feel free to open issues with requests/bugs.