This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/AgeNo5351 on 2025-09-01 21:14:29+00:00.


FIFO Diffusion.

The official Wan 2.2 paper mentions developing a model called “Streamer” to enable infinite videos. Similarly the latest paper by HiDream describes “Ouroboros-Diffusion” for the same. Both these techniques are using the work published by Kim et al. called FIFO-Diffusion.

When you use a video model you noise all the latents simultaneously (parallel denoising). So all latents have similar noise level at every step. In FIFO technique, instead of denoising latents with similar noise you have latents with increasing noise from beginning to end. Then in every step of inference one latent gets fully denoised and removed from queue (diagonal denoising). Simultaneously , another latent with pure noise is added at the end. This keeps on going until to generate infinite video.

Diagonal diffusion of FIFO technique allows you to sequentially propagate context to later frames, which makes it better than just using last frame as first frame for next set.

There are more juicy details in the Hidream paper . They use a technique called Coherent Tail Sampling to add the new latent frame at end of queue. A vanilla technique to add the new latent to queue would be to just add random noise to prev latent and use it as new latent added to queue. Instead they apply a low-pass filter to the prev latent capturing the overall composition and then add a high freq random noise to induce dynamics. In this way the better motion is induced and overall consistency is maintained. They also use Subject-Aware Cross-Frame Attention and Self-Recurrent Guidance , to ensure consistency of main subject during the infinite generation.

The future looks exciting for video generation. Hopefully we all get to play with these models soon !