This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/YentaMagenta on 2024-11-19 02:07:57+00:00.


TLDR: Reduce CFG/Guidance and/or increase sampling flux. Avoid using “woman” or “man.” Don’t use cliche incantations. Specify a nationality/ethnicity and/or major city of origin. Specify a body type. Try a variety of descriptors for face shape and expression. Add an age. All of this is non-scientific, anec-datal advice based on personal experience and limited testing. Here is a zipped set of somewhat random images that illustrate these techniques and their level of effectiveness, with embedded ComfyUI workflows.

Background (skip to My Recommendations, if you like)

Many people in this sub have concluded that same-face is uniquely bad for Flux; but it is a problem to some degree for most models. And some very popular quasi-base models, like Pony, are extremely prone to it (depending on the exact checkpoint). This is, of course, because of how they are trained, the biases that get baked in, the limits of captioning, etc.

It can take work to avoid same face in many situations because models have to train on huge amounts of images from across the internet, and those data typically reflect biases toward people who are white, “conventionally attractive,” fit, etc. There’s also the related issue that an image of a white male doctor will often just be labeled “doctor” while an image of a Black, woman doctor will often carry those additional qualifiers. Another issue is that facial averages tend to converge in a way that most people regard as attractive.

But—and here’s the part where I’m most likely to get downvoted—it is also a problem because people tend to use the same words and incantations over and over again. If you prompt “beautiful young woman,” you shouldn’t be surprised when you get something that looks like the facial average of white supermodels and movie stars, because those tend to be the dominant images tagged “beautiful,” “young,” and “woman” on the interwebs. Without being very intentional about addressing these biases, AI captioning systems and humans remain conditioned to describe people who have that supermodel look in similar fashion.

My Recommendations

Reduce Guidance Strength—this is the most important tip (which is also good for photorealism). As most of us know, higher CFG is a little bit like saying to the model “follow my words even more exactly.” But this also means that the output is going to stick even more precisely to whatever concept the model has of the terms you included. In the case of “woman,” this means even closer to the model’s internal Platonic ideal of that big-eyed, butt-chinned, high-cheeked woman. Similarly, increasing the sampling flux can give the model additional freedom to deviate from the dreaded same face—though it may also stray even further from your prompt. But seriously, lowering your CFG will pretty much automatically increase facial and other forms of diversity.

It’s actually odd to me that Flux generations/workflows often seem to default to a Flux Guidance of 3.5. Unless you’re trying to capture a lot of very specific details, this is quite excessive and often counterproductive because it will degrade image quality. And even with a lot of prompt details, lower CFGs will usually do just fine. Flux also achieves much greater photo realism at lower guidance. I find that 1.6-2.6 are the sweet spot, and sometimes even dropping to as low as 1.4 can still sometimes yield decent photographic results. For artistic styles (especially abstract) you can try even lower guidance. (And as always, try different schedulers/samplers, too.)

Avoid “man”/“woman” It appears people are correct about Flux’s limitations insofar as it seems to have especially rigid understandings of the exact terms “woman” and “man,” absent other qualifiers. This is where we find always-bearded men and butt chins for people of all genders. So, instead of woman, try female, gal, or lady. Instead of man, try male, guy, or dude. Or just skip those words entirely and let the gender be purely implied by the he/his or she/her pronouns you use in the rest of the description. Words like father, uncle, brother, aunt, nephew, niece can also help avoid same-face and jazz things up. Using a profession like librarian, plumber, consultant, realtor, etc. can also introduce variety.

Avoid incantations. This issues above are exacerbated by people adding the cliche (and largely unhelpful) “masterpiece, absurdres, high quality, best quality, professional” tags to everything they generate. A disproportionate amount of the highest quality and professional photos are going to be of conventionally attractive people modeling. So using these terms only further pushes your results to a particular kind of person/face. It also means you’re even more likely to get a lot of bokeh and traditional compositions since professional photos tend to feature those things.

Specify ethnicity/nationality or a city of origin, provided the city/country is “famous” enough to be represented in the model. Northern/western European origins are more likely to give you the “same face” because the “same face” looks more-or-less northern/western European. Specifying southern/eastern European origins or elsewhere are more likely to give you a different face. For the US, you can try different states and cities; specifying New York City, Brooklyn, and especially Chicago can introduce some diversity. One word of warning though: trying to specify traits in extremely uncommon combinations, like an East Indian woman with red hair and green eyes, will be resisted by the model.

Specify a body type. This can be a little hard to do subtly as there is a tendency in the model to make people very thin, very fit, or pretty big. Getting in-between types is a bit harder. Try a wide variety of words: dad bod, average, fat, chubby, chunky, husky, thick, thick, overweight, rangy, hefty, heavyset, medium, etc. You can also specify body parts like “a small belly,” “narrow shoulders,” or “wide hips.” These don’t always work, but experimentation with subtle changes and multiple seeds is important. Sometimes certain aspects of your prompt will work against body type specifications, so just experiment. (e.g., I’ve found that sometimes even a hairstyle specification can drag the model away from other aspects of the prompt.)

Describe the face. To a certain extent you can also specify particular aspects of the face to get different faces. Flux seems to at least somewhat understand chubby cheeks, hatchet faced, horse faced (maybe), and certain aspects of nose size/shape—though this last one is by far the most tenuous. On expressions, Flux is a bit of a mixed bag. It seems less prone to exaggerated, cartoonish expressions than SD3.5, but it is also sometimes much harder to convince it to do an expression at all, especially if the emotion is subtle. Flux can be finicky in this regard. For, example closed-mouth smile did not work for me, but “His mouth is closed with a slight smile. His lips are closed” worked better.

Include an age. Various methods of specifying age will work to a degree, especially if you follow the other advice above. Middle-aged, grandma, 30-something, 64 years old, age 27, 48 year-old, 21yo are all examples of things that have worked for me at least some of the time. If you find it is not taking age guidance, try reinforcing it with a different way of saying it. E.g., “Middle aged dad sitting on a sofa. He is 54 years old.”

So there you go. These recommendations are far from a guarantee, but they can often help you avoid same face enough to reduce the need for LoRAs and inpainting when using Flux.

In case I later delete the examples images, here is an example prompt:

Photo of a sharp faced Lebanese woman trying to figure out a recipe. She has a long pointy nose and a large pointy chin that juts out from her face. She is standing in a kitchen glaring angrily at a cookbook on a counter. She has her index finger touching a line on the page. She is covered in flour and there are bits of egg shells stuck in her black wavy hair. The kitchen is a mess with a fire on the stovetop in the background.