Compile fp8 on RTX 30xx in triton-windows 3.5

old.reddit.com

Compile fp8 on RTX 30xx in triton-windows 3.5

old.reddit.com

Lemmit.Online botMAB to StableDiffusionEnglish · 2 days ago

I've merged the patch to let `torch.compile` work with fp8 on Ampere GPUs and let's see how it rolls out:...

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/stablediffusion by /u/woct0rdho on 2025-10-15 09:05:21+00:00.

I’ve merged the patch to let torch.compile work with fp8 on Ampere GPUs and let’s see how it rolls out: https://github.com/woct0rdho/triton-windows/pull/140

I hoped this could be superseded by GGUF + better torch.compile or Nunchaku, but as of PyTorch 2.9 I realized that fp8 + the block swap in ComfyUI-WanVideoWrapper (or ComfyUI-wanBlockswap for native workflows) runs faster and causes fewer recompilations than GGUF + the block swap in ComfyUI-GGUF on my machine.

This is the first feature in the ‘core’ part (rather than the Windows support code) that’s deliberately different from the official Triton. It should also work on Linux but I’m not sure what’s the best way to publish Linux wheels.

I’m not an expert on PTX. Welcome help in optimizing those PTX code.

triton-windows 3.2.0.post21 is also released, which supports fp8 on RTX 20xx.

You must log in or register to comment.

Chat

StableDiffusion

stablediffusion

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

/r/StableDiffusion is back open after the protest of Reddit killing open API access, which will bankrupt app developers, hamper moderation, and…

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

3 users / day
4 users / week
8 users / month
19 users / 6 months
1 local subscriber
101 subscribers
16K Posts
8 Comments
Modlog

mods:
Lemmit.Online bot