Description
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.
Home page
https://github.com/openai/whisper
Documentation
https://github.com/openai/whisper
License
Usage
Whisper audio transcription should be run on the GPU nodes and requires uploading the trained model file. Example scripts are available in "/tsd/shared/software/whisper". See here for a basic tutorial of its use. Please note that the scripts will require modification to fit the needs of your analysis and data (e.g. different runtime, different transcription language, different model arguments). Testing has shown that the Slurm job walltime can be set to approximately one-half to one-third of the audio file duration.
Use
module avail Whisper
to see which versions of Whisper are available. Use
module load Whisper/version
to get access to Whisper.
Accounting
The job runs on a GPU and therefore the billing units are given the GPU cost factor as explained in detail here. In general the cost of a transciption job = (number of GPUs) x (GPU cost factor) x (job run time) x (core hour price). So assuming transcription of a 1 hour audio/video file takes 30 minutes to complete on 1 GPU (default) and using UH pricing, the cost will be 1 x 24 x 0.5 x 0.15 = 1.8 NOK.