This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/jmellin on 2024-09-01 12:18:26+00:00.


I love the new CogVideoX-5b model and think it’s great that we finally have a strong competitor in the open-source space, rivaling Kling, Runway, and others. However, I believe the community’s demand for an image-to-video (img2vid) feature is evident.

Fine-tuned image-to-video model of curent text-to-video model existing but not released

After doing some research on GitHub, I found that the authors have stated they have no plans to open-source their current Image-to-Video model, which I find disappointing. I hope they reconsider in the future.

I believe that the first person or team to fine-tune the current model to handle image-to-video (which I know is no small task) and open-source it will gain a lot while also becoming a community legend. Alternatively, if someone develops a software solution, similar to inpainting I guess, that allows setting the first latent image, they would also be eligible for that recognition.

Keeping my fingers crossed for any of the above.

Links:

Authors response to Image To Video request in their github

kijai mention it as a reply in his ComfyUI-wrapper node