Norwegian version of this page

Whisper in TSD

Here is the recipe for how to transcribe using Whisper in TSD.

 

Prerequisites

The TSD project in question must already have access to the HPC cluster Colossus. The PI of the project can request access to Colossus by emailing tsd-drift@usit.uio.no with the relevant project number.

Installation

Because projects may have Windows- and Linux VMs, there are several methods of installing Whisper. Common to all of them is that the Whisper software must be copied from a shared volume and into the project. Below are guides for doing this using File Explorer (Windows) and using a terminal (Windows and Linux).

Beware that you will need to log in to your preferred VM prior to following the instructions below. If you are unsure how to do this, please see instructions here.

File Explorer (Windows)

  1. Open File Explorer
    file explorer markert
  2. Click on the file path area and enter the following file path: \\ess01\shared\software\whisper. Press enter to access the file path directory.
    innskrevet filsti markert
  3. Select all contents of this folder, right click one of the selected files, and choose Copy.
    alt innhold i mappen markert, h?yreklikket og valget "copy" markert
  4. Click on the file path area. and enter the following file path, where pXXXX is substituted for your project number: \\ess01\pXXXX\data\durable. Press enter to access the directory.
    filstifelt med riktig filsti markert
  5. Create a new folder inside this directory by right-clicking on a blank field inside the folder, choose New and then Folder. We recommend that you name this folder whisper, but you are free to choose any name, as long as you are able to remember the function of this folder. Finally, enter the folder.stegvis hvordan opprette ny mappe med h?yreklikk
  6. Paste the content that you previously copied, by right-clicking and choosing Paste. Whisper is now copied to our project and can be used by all project members.
    valget "paste" markert

Terminal (Linux + Windows)

  1. Open a terminal window and connect to Colossus.
    1. On Windows you must do this using the app PuTTY. Instructions for how to do this can be found here.
    2. On Linux, open "Terminal" and type "ssh pxxxx-hpc-01" before hitting Enter. Input your TSD password (NB! You will not see any indication that you're typing) and hit Enter.
  2. Maneuver to the project's durable folder by using the following command, where pXXXX is substituted for the your project number. Press Enter after each command. 
cd /tsd/pxxxx/data/durable

3. Copy the Whisper software to a folder under /durable, where everyone in the project can access it. NB! THE FOLLOWING STEP SHOULD ONLY BE DONE ONCE!
IF THE FOLDER "whisper" ALREADY EXISTS, THIS STEP IS UNNECESSARY.

cp -r /shared/software/whisper/ .

Using Whisper

Before you send trancription job to Whisper, you must connect to Colossus. Instructions for doing this can be found in the Installation section of this article, under Terminal (Windows + Linux).

  1. Make sure the audio files you want to transcribe are located in the folder at the following location: "pxxxx/data/durable/whisper/data/". Easiest way to move files here is through "File Explorer" (Windows) or "Files" (Linux).

    NB! File names may only contain legal characters and no spaces.
  2. Set current directory to the folder containing the Whisper software (NB! if you gave a custom name for the "whisper" folder, substitute this).
cd /tsd/pxxxx/data/durable/whisper
  1. Run the script starting the transcription job from inside the Whisper:
./transcribe_data

4. You should receive a confirmation that a job was started, with the relevant job ID.You will find the transcribed files in the same folder as your audio/video files (pxxxx/data/durable/whisper/data/) as soon as the transcription is finished.
Remember to empty the data folder before transcribing other files!

See instructional video

Warning: The video mentiones logging into submit host (pxxx-submit), but with changes in Colossus submit hosts are now named pxxx-hpc-nn. 

Note: The language model used by Whisper on Colossus is now a software module loaded by the scripts inside the whisper folder, and not a file in itself as in previous versions.  See also below in 'Advanced' section for how to choose your language model.

This video is subtitled with Whisper both with NOR and EN as parameters, and then the following files came out (which I have exported from TSD) (also note that the Audio in this video is in Norwegian)

 

You can change the subtitles yourself or turn it off. This film is subtitled without being edited afterwards. I used Whisper to translate by changing whisper.sm.

Advanced settings

You can make changes yourself in the file 'Whisper.sm'. This can be relevant when the recordings are longer than 20 minutes or if you want the transcription to be translated.

Here are 2 things you can change:

  • LANGUAGE=en
    - If you change this to from "no" to "en", you will automatically have the transcription translated (!).
  • #SBATCH --time=00:20:00
    - if you have large files, you must increase this from 20 min, otherwise you will get a timeout.

Remember to save the file before running the script again.

?pne gjerne med Notepad++
Feel free to open with Notepad++
  • Whisper uses the model "large-v3" by default, but all OpenAI models are included in the module and can be used using the environment variables set by the module. To see all available models in the module, run the following command after loading the module:
printenv | grep EBWHISPERMODEL


To use another model, update the "whisper.sm" with the corresponding environment variable (e.g. MODEL=$EBWHISPERMODELLARGEV2).

 

By Dagfinn Bergsager
Published Feb. 1, 2023 1:43 PM - Last modified Sep. 6, 2024 9:07 AM