Watch the video:
Whisper by OpenAI
Whisper is an incredible bit of code from OpenAI that allows you to easily convert an audio file (or audio stream, I think) to text. With tons of languages supported, and incredible power. You can run it off your computer completely offline. Your own hardware. Because of that, it’s also completely free. There are no subscription fees or tokens to spend on converting minutes or hours of audio to text. The only cost is getting a good-enough computer if you haven’t got one already and power.
This guide will show you how to install it, as well as basic command line usage on Windows. The installation steps for Linux should be almost exactly the same, as once you have Python installed and set up, the rest are just packages built on top of it. All are installed with Python’s pip.
The Whisper project can be found on OpenAI’s GitHub as Whisper.
Installing Whisper prerequisites
To install Whisper: All you need is Python installed. Preferably you should also have a CUDA-supporting Nvidia graphics card - That’s most of them nowadays.
Check if Python is installed
Open a Command Prompt window and enter
python -V to see if Python is already installed. We are looking for Python v3.9.9, as suggested on the GitHub page.
I tried setting it up with Python 3.11, and it failed later on due to incompatibilities with Torch (one of the libraries), and another one that was needed. This may be an issue with other packages on my computer, but if you have nothing start by downloading the suggested Python version.
Download the Windows Installer (64-bit if you have a 64-bit CPU) from the Python website.
If this is your first time installing Python on your computer: ABSOLUTELY MAKE SURE that “Add to PATH” is checked on the first page of the installer. This will allow you to just open a terminal and run
python. Without this option ticked, that will not work.
Multiple Python versions
If you already have a Python version installed, don’t tick that option. This is important. If you are installing this alongside an existing Python install, head to the install folder after installation (Shown on the first page of the installer), and rename
python39.exe, or something similar. When we use commands starting with
python in the terminal, use
python39, or whatever you renamed it to in its place.
Next, hit Start and search for ‘Environment’ or ‘PATH’. We will then open ‘Edit the System Environment Variables’. If “System Properties” opens up, click “Environment Variables…” in the bottom right.
Under “User Environment Variables” double-click the option starting with “Path”. Click “Edit”.
Click New. Now a new empty cell should appear at the end of the list. Enter the path where Python 3.9.9 is installed. For example,
C:\python399. Then “New” once more and enter
C:\python399\Scripts, to add the Scripts folder as well. This will allow us to run
whisper from the command line.
Whisper is simple to install, but does require a few things prepared as well. If you’ve already used something AI-based like Stable Diffusion, you’ve probably already got things like Nvidia CUDA installed. CUDA is only required if you have a compatible Nvidia GPU and want to use GPU acceleration - It makes things a LOT faster.
PyTorch supports CUDA 11.6 and 11.7 at the time of writing this.
Download CUDA 11.6 from Nvidia and install it. If you’re a gamer you will likely want to untick Graphics Driver in the customize installation screen.
I don’t think CuDNN is required, but the steps for that are a lot longer. This may be required, but I don’t think so. For installing CuDNN, see the official Nvidia guide
Head to the PyTorch Website and choose Stable, Windows, Pip, Python, Cuda 11.6 (or CPU if you don’t have CUDA, or want CUDA support).
Finally you can copy the command and paste it into a terminal:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116. If you have multiple python versions swap
python39 -m pip (python 39 being what you renamed it earlier).
FFMPEG is a useful tool for processing and interacting with audio. This isn’t something with a simple installer. You download a binary file, or build it from the source and tell your computer where it is (Using PATH).
Download FFMPEG from an official source and extract the .zip you download (if it contains
ffmpeg.exe, otherwise the contents of the
bin folder, assuming
ffmpeg.exe is located in there) to a folder like
You will need to add this to your system’s PATH using the steps mentioned above under Multiple Python versions. You’ll add
C:\PATH if you’re following along with this tutorial.
Finally, we can install whisper.
Open a command prompt or terminal and enter the following (as per GitHub):
or if you have multiple versions, it will look something like
Finally. The part you’ve been waiting for. The most rewarding, but simple part. Running commands.
Open a terminal or command prompt in a folder alongside an audio file you want to process.
whisper --model base.en "XYZ.mp3" will use the base English model to transcribe the mp3 file. It will create 3 files: an srt, vtt and txt file.
Simple as that. You can also see Python usage on the GitHub page to integrate it with your own code projects.
That’s it. Enjoy the new power you have at your fingertips!