[D] What are the best open source, fine-tunable, large context, encoder-decoder models today?

old.reddit.com

[D] What are the best open source, fine-tunable, large context, encoder-decoder models today?

old.reddit.com

Lemmit.Online botMAB to

Machine LearningEnglish · 10 months ago

I'm looking for model recommendation to fine-tune for a translation task. The input sequence pairs are pretty long, up to 1MB each, although the...

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/huopak on 2024-09-03 19:05:17+00:00.

I’m looking for model recommendation to fine-tune for a translation task.

The input sequence pairs are pretty long, up to 1MB each, although the data set can be truncated to only contain ~200kB sequences. The sequences are program code (basically transpiling) but my intuition is that I would still benefit from a base model trained on natural language since it captures some basic general knowledge that improves performance.

I also would like to train the same model architecture from scratch and compare the performance with the fine-tuned version to make this point.

Criteria for the model:

open license for research (not necessarily for commercial purposes but it’s a plus)
transformer-based with encoder/decoder legs
long context length in the hundreds of thousands of tokens
ideally inference can run on a newer Mx chip MacBook (not a must-have)
ideally a newer, more state-of-the-art model (not a must-have)
ideally available in Huggingface (not a must-have)

Regrettably anything based on BERT (e.g. DistilBERT) would not have a large enough context window. I’ve been looking at XLNet and Longformer that fit this criteria. Both seem to fit the bill more or less but I’d like to explore all the options.

Thank you so much!

You must log in or register to comment.

Chat

Machine Learning

machinelearning

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Community locked: only moderators can create posts. You can still comment on posts.

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

1 user / day
1 user / week
1 user / month
7 users / 6 months
1 local subscriber
19 subscribers
2.36K Posts
1 Comment
Modlog

mods:
Lemmit.Online bot