This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/AdQuirky7106 on 2024-09-28 02:14:31+00:00.


I already liked Steve Mould…a dude that’s appeared on Numberphile many times. But just now watching a video on a certain kind of dumb little visual illusion, he unexpectedly launched into the most thorough and understandable explanation of how CLIP-inferred diffusion models work that I’ve ever seen. Like, by far. It’s just incredible. For those that haven’t seen this, enjoy the little epiphanies from connecting diffusion-based image models, LLMs, and CLIP, and how they all work together with cross-attention!!

Starts at about 2 minutes in.