This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/huopak on 2024-09-03 19:05:17+00:00.


I’m looking for model recommendation to fine-tune for a translation task.

The input sequence pairs are pretty long, up to 1MB each, although the data set can be truncated to only contain ~200kB sequences. The sequences are program code (basically transpiling) but my intuition is that I would still benefit from a base model trained on natural language since it captures some basic general knowledge that improves performance.

I also would like to train the same model architecture from scratch and compare the performance with the fine-tuned version to make this point.

Criteria for the model:

  • open license for research (not necessarily for commercial purposes but it’s a plus)
  • transformer-based with encoder/decoder legs
  • long context length in the hundreds of thousands of tokens
  • ideally inference can run on a newer Mx chip MacBook (not a must-have)
  • ideally a newer, more state-of-the-art model (not a must-have)
  • ideally available in Huggingface (not a must-have)

Regrettably anything based on BERT (e.g. DistilBERT) would not have a large enough context window. I’ve been looking at XLNet and Longformer that fit this criteria. Both seem to fit the bill more or less but I’d like to explore all the options.

Thank you so much!