This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/delorean-88 on 2024-06-12 19:55:40+00:00.


The recent Grokfast paper found a way to accelerate grokking by a factor of 50 for an algorithmic dataset. Earlier Omnigrok paper established that, for their algorithmic dataset, “constrained optimization at constant weight norm largely eliminates grokking”

Do these improvements mean that now we don’t have to worry about delayed generalization/grokking when training a model (notwithstanding obscurity of its mechanism)?