This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/papaswamp91 on 2024-04-07 18:21:14.
Do we know how Gemini 1.5 achieved its 1.5M context window? Wouldn’t compute go up quadratically as the attention window expands?
You must log in or register to comment.