Question 1

Is the API really OpenAI-compatible?

Accepted Answer

Yes. Same request shape as OpenAI's audio.transcriptions endpoint. Point the OpenAI Python SDK's base_url at https://www.tryspeakeasy.io/api/v1 and your existing code keeps working. No SDK to install, no new client to learn.

Question 2

What model runs under the hood?

Accepted Answer

OpenAI's Whisper model family — the same checkpoint OpenAI ships through their whisper-1 endpoint. The transcript JSON you get back is shape-compatible with OpenAI's response (same text field, same segment timestamps, same language code), so swapping one library doesn't break any downstream parsing you've already written.

Question 3

What audio formats work?

Accepted Answer

Anything ffmpeg can decode — mp3, wav, flac, m4a, mp4, webm, ogg, opus, mov, mpeg, aac. Pass the file as a binary stream; no transcoding step needed. The free playground caps at 5MB. On the paid API the practical ceiling is the file size you're willing to wait for — a 1-hour audio file usually returns in 30 to 60 seconds.

Question 4

Do I need to install anything new?

Accepted Answer

No. If you already have openai installed (pip install openai), you're done — just change the base_url. If you'd rather skip the SDK entirely, the requests snippet above is 8 lines.

Question 5

What does it actually cost in production?

Accepted Answer

$0.20 per audio-hour, billed by the second. A 5-minute meeting recording costs $0.0167. The first 50 hours each month are included on the entry plan.

Question 6

How is this different from running Whisper locally?

Accepted Answer

Local Whisper means GPU rental, model loading, batching, and timeouts you have to babysit. The API gives you the same Whisper accuracy with a one-line call — and cheaper than most GPU instances if you process less than ~200 hours/month.

Transcribe audio in Python.

The same call, in your code

FAQ