This is an automated archive made by the Lemmit Bot.
The original was posted on /r/stablediffusion by /u/Corleone11 on 2024-11-20 13:03:41+00:00.
Hi all,
Over the past year I created a lot of (character) LoRas with OneTrainer. So this guide touches on the subject of training realistic LoRas of humans - a concept already known probably all base models of SD. This is a quick tutorial how I go about it creating very good results. I don’t have a programming background and I also don’t know the ins and outs why I used a certain setting. But through a lot of testing I found out what works and what doesn’t - at least for me. :)
I also won’t go over every single UI feature of OneTrainer. It should be self-explanatory. Also check out Youtube where you can find a few videos about the base setup and layout.
1. Prepare Your Dataset (This Is Critical!)
- Curate High-Quality Images: Aim for about 50 images, ensuring a mix of close-ups, upper-body shots, and full-body photos. Only use high-quality images; discard blurry or poorly detailed ones. If an image is slightly blurry, try enhancing it with tools like SUPIR before including it in your dataset. The minimum resolution should be 1024x1024.
- Avoid images with strange poses and too much clutter. Think of it this way: it’s easier to describe an image to someone where “a man is standing and has his arm to the side”. It gets more complicated if you describe a picture of “a man, standing on one leg, knees pent, one leg sticking out behind, head turned to the right, doing to peace signs with one hand…”. I found that too many “crazy” images quickly bias the data and the decrease the flexibility of your LoRa.
- Aspect Ratio Buckets: To avoid losing data during training, edit images so they conform to just 2–3 aspect ratios (e.g., 4:3 and 16:9). Ensure the number of images in each bucket is divisible by your batch size (e.g., 2, 4, etc.). If you have an uneven number of images, either modify an image from another bucket to match the desired ratio or remove the weakest image.
2. Caption the Dataset
- Use JoyCaption for Automation: Generate natural-language captions for your images but manually edit each text file for clarity. Keep descriptions simple and factual, removing ambiguous or atmospheric details. For example, replace: “A man standing in a serene setting with a blurred background.” with: “A man standing with a blurred background.”
- Be mindful of what words you use when describing the image because they will also impact other aspects of the image when prompting. For example “hair up” can also have an effect of the persons legs because the word “up” is used in many ways to describe something.
- Unique Tokens: Avoid using real-world names that the base model might associate with existing people or concepts. Instead, use unique tokens like “Photo of a df4gf man.” This helps prevent the model from bleeding unrelated features into your LoRA. Experiment to find what works best for your use case.
3. Configure OneTrainer
Once your dataset is ready, open OneTrainer and follow these steps:
- Load the Template: Select the SDXL LoRA template from the dropdown menu.
- Choose the Checkpoint: Train using the base SDXL model for maximum flexibility when combining it with other checkpoints. This approach has worked well in my experience. Other photorealistic checkpoints can be used as well but the results vary when it comes to different checkpoints.
4. Add Your Training Concept
- Input Training Data: Add your folder containing the images and caption files as your “concept.”
- Set Repeats: Leave repeats at 1. We’ll adjust training steps later by setting epochs instead.
- Disable Augmentations: Turn off all image augmentation options in the second tab of your concept.
5. Adjust Training Parameters
- Scheduler and Optimizer: Use the “Prodigy” scheduler with the “Cosine” optimizer for automatic learning rate adjustment. Refer to the OneTrainer wiki for specific Prodigy settings.
- Epochs: Train for about 100 epochs (adjust based on the size of your dataset). I usually aim for 1500 - 2600 steps. It depends a bit on your data set.
- Batch Size: Set the batch size to 2. This trains two images per step and ensures the steps per epoch align with your bucket sizes. For example, if you have 20 images, training with a batch size of 2 results in 10 steps per epoch.
6. Set the UNet Configuration
- Train UNet Only: Disable all settings under “Text Encoder 1” and “Text Encoder 2.” Focus exclusively on the UNet.
- Learning Rate: Set the UNet training rate to 1.
- EMA: Turn off EMA (Exponential Moving Average).
7. Additional Settings
- Sampling: Generate samples every 10 epochs to monitor progress.
- Checkpoints: Save checkpoints every 10 epochs instead of relying on backups.
- LoRA Settings: Set both “Rank” and “Alpha” to 32.
- Optionally, toggle on Decompose Weights (DoRa) to enhance smaller details. This may improve results, but further testing might be necessary. So far I’ve definitely seen improved results.
- Training images: I specifically use prompts that describe details that doesn’t appear in my training data, for example different background, different clothing, etc.
8. Start Training
- Begin the training process and monitor the sample images. If they don’t start resembling your subject after about 20 epochs, revisit your dataset or settings for potential issues. If your images start out grey, weird and distorted from the beginning, something is definitely off.
Final Tips:
- Dataset Curation Matters: Invest time upfront to ensure your dataset is clean and well-prepared. This saves troubleshooting later.
- Stay Consistent: Maintain an even number of images across buckets to maximize training efficiency. If this isn’t possible, consider balancing uneven numbers by editing or discarding images strategically.
- Overfitting: I noticed that it isn’t always obvious that a LoRa got overfitted while training. The most obvious indication are distorted faces but in other cases the faces look good but the model is unable to adhere to prompts that require poses outside the information of your training pictures. Don’t hesitate to try out saves of lower Epochs to see if the flexibility is as desired.
Happy training!