This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/jacobfa on 2025-02-19 14:51:24+00:00.
I show that diffusion kernels capture global dependencies and that a simple diffusion kernel with a recurrent structure outperforms transformers in fewer parameters and FLOPs.
You must log in or register to comment.