This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/f14-bertolotti on 2024-09-02 17:07:49+00:00.


Hey Redditors!

The last time I shared something here, your encouragement was incredible. Thanks to that support, I’ve finally decided to start a little blog about my notes and research. My blog is super minimalist, it is written in pure HTML with a bit of JavaScript—so no cookies or tracking involved. It’s inspired by Lilian Weng’s “lil’log,” though mine is still a bit rough around the edges.

I’ve just published my first post, which delves into Categorical Cross Entropy. Did you know that the population minimizer of conditional risk associated with Categorical Cross Entropy is actually the true probability label distribution? While this might seem intuitive, I haven’t come across a proof for it, especially in the context of neural networks. Luckily, I found this paper that provides practically the same proof for Adaboost, just with a different label encoding. I’ve adapted it to fit modern deep learning framework.

This also explains why people sometimes interpret the output of an NN (the one after the softmax layer) as actual probability of the labels.

I needed this proof for another research project, but for a long time, I either avoided searching for it or just couldn’t find anything. Eventually, I did track down what I was looking for. Unfortunately, Reddit doesn’t handle mathematical notation very well, so I couldn’t post it directly here. Instead, you can check out my blog post at:

I would love your feedback. Please, let me know if you spot any errors or if you have any additional insights!