This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/OkSpot3819 on 2024-09-13 09:22:22+00:00.


  • Open-source of Qwen2-VL (VLM) coming soon (GITHUB) via NielsRogge on X
  • FineVideo: 66M words across 43K videos spanning 3.4K hours - CC-BY licensed video understanding dataset. It enables advanced video understanding, focusing on mood analysis, storytelling, and media editing in multimodal settings (HUGGING FACE)
  • Fluxgym Update: automatically generates sample images during training; use ANY resolution, not just 512 or 1024 (for example 712, etc.) via cocktailpeanut on X (creator)
  • Fish Speech 1.4: text to speech model trained on 700K hours of speech, multilingual (8 languages); voice cloning; low latency; ~1GB model weights (OPEN WEIGHTS) (HUGGING FACE SPACES)
  • Out of Focus v1.0: uses diffusion inversion for prompt-based image manipulation using Gradio UI, requires a high-end GPU for optimal performance (GITHUB)
  • Google NotebookLM launches “Audio Overview” feature: can turn any document into a podcast conversation. Once you upload the document and hit the generate button, two AI moderators will kick off a conversation-like discussion, diving deep into the main takeaways from the document (LINK)
  • Video Model is coming to Adobe Firefly via icreatelife on X
  • Midjourney is pioneering a new 3D exploration format for images, led by Alex Evans, innovator behind Dreams’ graphics via MartinNebelong on X
  • FBRC & AWS present Culver Cup GenAI film competition at LA Tech Week via me :) on X
  • Coming soon: Vchitect 2.0 - A new text-to-video and Image-to-video model.
  • UVR5 UI: Ultimate Vocal Remover with Gradio UI (GITHUB)
  • Vidu AI Update: new “Reference to Video” feature, you can now apply consistency to anything—whether real or fictional (LINK)
  • Vchitect 2.0: new image2video/text2video model soon (LINK)
  • and slightly unrelated, but special mention: 🍓!

Wednesday’s updates - link

Last week’s updates - link