python · whisper · 30 seconds
Transcribe audio in Python.
Drop a file below, get the transcript, then copy the eight-line Python that produced it. OpenAI-compatible. $0.20/hr in production.
The same call, in your code
Same SDK you already use for OpenAI. Swap two lines, you're done.
from openai import OpenAI
# Same SDK you already use. Just swap the base_url.
client = OpenAI(
api_key="sk-se-YOUR_KEY_HERE",
base_url="https://www.tryspeakeasy.io/api/v1",
)
with open("audio.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=f,
)
print(transcript.text)
Don't want the SDK? Eight lines of requests:
import requests
with open("audio.mp3", "rb") as f:
r = requests.post(
"https://www.tryspeakeasy.io/api/v1/audio/transcriptions",
headers={"Authorization": "Bearer sk-se-YOUR_KEY_HERE"},
files={"file": f},
data={"model": "whisper-1", "response_format": "verbose_json"},
)
print(r.json()["text"])
Get an API key → First 50 hours every month are included.
// why this is here
Honestly, most Python transcription tutorials are 2,000 words of preamble before any working code. This page is the opposite: the playground above proves it works, the snippet below is what you paste in your editor, and the cost is on every transcript so you never have to guess what production looks like. Looking for a cheaper Whisper API?
FAQ
Is the API really OpenAI-compatible?+
Yes. Same request shape as OpenAI's audio.transcriptions endpoint. Point the OpenAI Python SDK's base_url at https://www.tryspeakeasy.io/api/v1 and your existing code keeps working. No SDK to install, no new client to learn.
What model runs under the hood?+
OpenAI's Whisper model family — the same checkpoint OpenAI ships through their whisper-1 endpoint. The transcript JSON you get back is shape-compatible with OpenAI's response (same text field, same segment timestamps, same language code), so swapping one library doesn't break any downstream parsing you've already written.
What audio formats work?+
Anything ffmpeg can decode — mp3, wav, flac, m4a, mp4, webm, ogg, opus, mov, mpeg, aac. Pass the file as a binary stream; no transcoding step needed. The free playground caps at 5MB. On the paid API the practical ceiling is the file size you're willing to wait for — a 1-hour audio file usually returns in 30 to 60 seconds.
Do I need to install anything new?+
No. If you already have openai installed (pip install openai), you're done — just change the base_url. If you'd rather skip the SDK entirely, the requests snippet above is 8 lines.
What does it actually cost in production?+
$0.20 per audio-hour, billed by the second. A 5-minute meeting recording costs $0.0167. The first 50 hours each month are included on the entry plan.
How is this different from running Whisper locally?+
Local Whisper means GPU rental, model loading, batching, and timeouts you have to babysit. The API gives you the same Whisper accuracy with a one-line call — and cheaper than most GPU instances if you process less than ~200 hours/month.