This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/papaswamp91 on 2024-04-07 18:21:14.


Do we know how Gemini 1.5 achieved its 1.5M context window? Wouldn’t compute go up quadratically as the attention window expands?