This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/Ok-Application-2261 on 2024-08-26 20:50:04+00:00.


Consider this basic prompt: a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall,

Now lets say i want to change specific characteristics of the woman so i change the prompt to: a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall, the woman has dark thick black hair,

Two things happen. She comes closer to the camera and the background gets the bokeh treatment. This effect becomes more pronounced the more we describe the woman.

a dystopian street in the heart of a grunge city with a woman in the far distance leaning with her back against the wall, the woman has dark thick black hair, her curly hair is tied in a ponytail,

And the concept of a dystopian street seems to melt away with the bokeh (no graffiti). Here’s another example uninterrupted by text.

Same prompt but on the last image i prompted for combat fatigues instead of curly hair.

Effect is the same on SDXL Juggernaut:

So whats the solution? Prompting individual Unet Blocks. We just don’t have it for Flux (that im aware of). But here’s a demonstration on SDXL:

And for the next one we send all the descriptors+original prompt to Attention layer “output_1” and we get this:

Problem is pretty much solved. There is SOME bleeding of red into the walls but it can be managed.

Anyways overall i feel like Matteos U-net layer prompting node for comfyUI was the single most significant advancement since controlnets were introduced and i wonder if its possible for Flux. Here’s the source of my information/workflow (Latent Vision):

This seems to work by keeping the descriptors like “Red dress” and “long black hair” away from input_8 which is a subject-related input that over-powers the output.