Fast mode
Fast mode provides synchronous transcription for individual audio files. The response returns immediately after the transcription completes.
Key features
- Single file processing Processes one audio file at a time
- Immediate results Returns results after the transcription completes
- Short recordings Works best for short recordings
Fast mode can handle one (mono) or two (stereo) audio channels. The API returns either a single combined transcript or separate transcripts for each channel.
Input
We support two audio input channel formats.
- Mono channel
- Stereo channel
Output
- Single transcript
Single aggregated transcript when
channel_separation=false - Per-channel transcripts
Per-channel aggregate transcripts with channel tags when
channel_separation=true
Audio formats
WAV, MP3, M4A, and MP4
Transcription options
These options control transcription language, segmentation mode and channel separation.
| Parameter | Type | Default | Description |
|---|---|---|---|
language | string | — | Language code (e.g., en-US). See language abbreviations. |
segmentation_mode | string | "auto" | Voice Activity Detection (VAD) mode. Set "none" to disable. |
word_time_offsets | boolean | false | Include word-level timestamps. |
channel_separation | boolean | false | Transcribe stereo channels separately. |
Audio transcription
Use this endpoint to send an audio file for synchronous transcription:
POST /aiservices/scribe/transcribe
Example request (cURL)
This cURL example sends an audio file to the Scribe API and returns a transcription:
curl -X POST https://api.zoom.us/v2/aiservices/scribe/transcribe \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"file": "https://example.com/path/clip.mp3",
"config": {
"language": "en-US",
"word_time_offsets": true,
"channel_separation": false
}
}'
Response (200)
{
"request_id": "req_123",
"duration_sec": 27.4,
"result": {
"text_display": "Human-readable with punctuation.",
"text_lexical": "lowercase lexical form",
"segments": [
{
"start": 0.0,
"end": 5.2,
"text": "Welcome everyone ...",
"words": [
{ "word": "Welcome", "start": 0.0, "end": 0.4 },
{ "word": "everyone", "start": 0.41, "end": 0.9 }
]
}
]
}
}