This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Solaris1712 on 2025-02-18 01:22:30+00:00.


So additional details…

I’m using paperspace gradient instance with an A6000 48gb vram, 8vcpu, 45 gb ram.

My dataset is 9k samples of newsarticle text and labels.

The model i’m using is “answerdotai/ModernBERT-base” with a context length of 8192.

Initially, I was constantly getting OOM errors when I was trying to finetune using batchsize of 32 or 16. Then after experimenting, I saw that setting the batchsize 4 or less was the only way training started.

Even training one epoch is taking around 1h 31mins.

Is this normal?

This is my first time finetuning a model so I am a without reference or past experience. I was not expecting to see a 45mb csv file to fill up the entire vram when I set the batch size to 32 or 16.

Is it a pytorch bug or ???

edit - the dataset im using is a truncated version of “valurank/PoliticalBias_AllSides_Txt” which has about 19k data samples. I’m using a subset of that - about 9k samples.