This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/tom83_be on 2024-09-17 18:21:14+00:00.


Update: Now runs with about 7 GB VRAM, see bold text on updated settings below!

I posted a guide (basically working settings) for OneTrainer LoRA/DoRA training here. There was a question concerning support for 8 GB VRAM. I tried a few settings and it seems to run at just below 8 GB VRAM. Since I do not own such a card I need people with these cards to validate it (maybe there are spikes that I do not see).

Please do the folkowing:

  • Use the settings provided here:
  • EMA OFF (training tab) => maybe not needed, see update below
  • Rank = 16, Alpha = 16 (LoRA tab)
  • activating “fused back pass” in the optimizer settings (training tab) seems to yield another 100MB of VRAM saving => maybe not needed, see update below
  • “LoRA weight data type” (LoRA tab) to bfloat16 again saves some VRAM. => maybe not needed, see update below
  • Update: You can also set “gradient checkpointing” to “CPU_OFFLOADED” in the “training”-tab. After that it runs with less than 7 GB VRAM, but a bit slower for me (3,7 s/it vs. 3.4 s/it). Thanks to u/setothegreat for that idea! If you keep EMA enabled, still use float32 as the “LoRA weight data type” and also do not activate “fused back pass”, it still runs at 7,2 GB VRAM and 3,9 s/it for me. So it might be enough to

It now trains with just below 7,8 / 7,9 GB of VRAM. I would like to get feedback from 8 GB VRAM users if this works.

I can also give no guarantee on quality/success of the training! Let’s find out together!

PS: I am using my card for training/AI only; the operating system is using the internal GPU, so all of my VRAM is free. For 8 GB VRAM users this might be crucial to get it to work…