This is an automated archive made by the Lemmit Bot.
The original was posted on /r/machinelearning by /u/Amgadoz on 2024-03-30 21:23:22.
Hey everyone!
I recently compared all the open source whisper-based packages that support long-form transcription.
Long-form transcription is basically transcribing audio files that are more than 30 seconds.
This can be useful if you want to chat with a youtube video or podcast etc.
I compared the following packages:
- OpenAI’s official whisper package
- Huggingface Transformers
- Huggingface BetterTransformer
- FasterWhisper
- WhisperX
- Whisper.cpp
I compared between them in the following areas:
- Accuracy - using word error rate (wer) and character error rate (cer)
- Efficieny - using vram usage and latency
I’ve written a detailed blog post about this. If you just want the results, here they are:
I hope you find it useful!
You must log in or register to comment.