Whisper

Description

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

Home page

https://github.com/openai/whisper

Documentation

https://github.com/openai/whisper

License

MIT license

Usage

Whisper audio transcription should be run on the GPU nodes and requires uploading the trained model file. Example scripts are available in "/tsd/shared/software/whisper". See here for a basic tutorial of its use. Please note that the scripts will require modification to fit the needs of your analysis and data (e.g. different runtime, different transcription language, different model arguments). Testing has shown that the Slurm job walltime can be set to approximately one-half to one-third of the audio file duration.

Use

module avail Whisper

to see which versions of Whisper are available. Use

module load Whisper/version

to get access to Whisper.

Accounting

The job runs on a GPU and therefore the billing units are given the GPU cost factor as explained in detail here. In general the cost of a transciption job = (number of GPUs) x (GPU cost factor) x (job run time) x (core hour price). So assuming transcription of a 1 hour audio/video file takes 30 minutes to complete on 1 GPU (default) and using UH pricing, the cost will be 1 x 24 x 0.5 x 0.15 = 1.8 NOK.

Did you find what you were looking for?

Published Jan. 30, 2023 1:51 PM - Last modified Apr. 14, 2025 8:04 AM