Whisper

Whisper is an automatic speech recognition (ASR) system
Whisper

Whisper is an automatic speech recognition system (ASR) developed by OpenAI, trained on 680,000 hours of multilingual and multitask supervised data from the web.

Whisper was open-sourced in September 2022 and provides automated speech to text transcription in nearly 100 languages.

Whisper is accessible via OpenAI's API, which provides developers with two main endpoints: transcriptions and translations, based on the large-v2 Whisper model.


Accessing Whisper through the OpenAI API

OpenAI has provided the Whisper large-v2 model through their API and is priced at $0.006 per minute. The API accepts various file formats, including m4a, mp3, mp4, mpeg, mpga, wav, and webm.

Here's an example of how to use the Whisper API:

import openai

file = open("/path/to/file/openai.mp3", "rb")
transcription = openai.Audio.transcribe("whisper-1", file)

print(transcription)

Speech to Text with Whisper

The Whisper API provides two main functionalities via its transcriptions and translations endpoints:

  • Transcriptions: This feature transcribes the audio into the source language.
import openai

audio_file= open("/path/to/file/audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
  • Translations: This feature transcribes the audio into English regardless of the source language.
import openai

audio_file= open("/path/to/file/german.mp3", "rb")
transcript = openai.Audio.translate("whisper-1", audio_file)

Prompting with Whisper

Prompts can also be used to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt, making it more likely to use capitalization and punctuation if the prompt does too.

Here's an example of prompting with Whisper:

prompt = "Hello, welcome to my lecture."
transcript = openai.Audio.transcribe("whisper-1", audio_file, prompt=prompt)

In summary, Whisper is a leading open source speech to text system that can handle both audio transcriptions and translations, making it a valuable resource for developers handling tasks related to audio and language translations.


Whisper

Whisper is an automatic speech recognition (ASR) system.

Go to Whisper

Disclaimer

Please note this pages serves informational purposes only and does not constitute an endorsement of any AI tool. Some company descriptions are assisted by our GPT-4 research assistant and are provided without any expressed or implied warranties.

Spread the word

Keep reading

public