This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Primary-Wasabi292 on 2024-02-12 10:35:53.


I am wondering if it is worth to go through extensive hyperparameter tuning of model architecture. Learning rate tuning often pays off as this has a big impact on convergence and all around performance, but when tuning architecture (num_layers, num_heads, dropout etc.), I have found if you stay within a certain sweetspot range, the actual performance differences are marginal. Am I doing something wrong? What are your experiences with this?