This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/masonw32 on 2024-11-10 03:37:47+00:00.


In machine learning we work with log probabilities a lot, attempting to maximize log probability. This makes sense from a numerical perspective since adding is easier than multiplying but I am also wondering if there is a fundamental meaning behind “log probability.”

For instance, log probability is used a lot in information theory, and is the negative of ‘information’. Can we view minimizing the negative log likelihood in terms of information theory? Is it maximizing/minimizing some metric of information?