[D] How does a MoE router learn when it has made a wrong choice?

Lemmit.Online botMAB to

Machine LearningEnglish · 1 year ago

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/RepresentativeWay0 on 2024-04-06 17:42:30.

Looking at the code for current mixture of experts models, they seem to use argmax, with k=1 (picking only the top expert) to select the router choice. Since argmax is non differentiable, the gradient cannot flow to the other experts. Thus it seems to me that only the weights of the selected expert will be updated if it performs poorly. However, it could be the case that a different expert was in fact a better choice for the given input, but the router cannot know this because the gradient does not flow to the other experts.

How can the router learn that it has made a wrong choice and use a different expert next time?

You must log in or register to comment.

Chat

Machine Learning

machinelearning

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
5 users / 6 months
1 local subscriber
20 subscribers
2.4K Posts
1 Comment
Modlog

mods:
Lemmit.Online bot