Fast mode

Fast mode provides synchronous transcription for individual audio files. The response returns immediately after the transcription completes.

Key features

  • Single file processing Processes one audio file at a time
  • Immediate results Returns results after the transcription completes
  • Short recordings Works best for short recordings

Fast mode can handle one (mono) or two (stereo) audio channels. The API returns either a single combined transcript or separate transcripts for each channel.

Input

We support two audio input channel formats.

  • Mono channel
  • Stereo channel

Output

  • Single transcript Single aggregated transcript when channel_separation=false
  • Per-channel transcripts Per-channel aggregate transcripts with channel tags when channel_separation=true

Audio formats

WAV, MP3, M4A, and MP4

Transcription options

These options control transcription language, segmentation mode and channel separation.

ParameterTypeDefaultDescription
languagestringLanguage code (e.g., en-US). See language abbreviations.
segmentation_modestring"auto"Voice Activity Detection (VAD) mode. Set "none" to disable.
word_time_offsetsbooleanfalseInclude word-level timestamps.
channel_separationbooleanfalseTranscribe stereo channels separately.

Audio transcription

Use this endpoint to send an audio file for synchronous transcription:

POST /aiservices/scribe/transcribe

Example request (cURL)

This cURL example sends an audio file to the Scribe API and returns a transcription:

curl -X POST https://api.zoom.us/v2/aiservices/scribe/transcribe \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "file": "https://example.com/path/clip.mp3",
    "config": {
      "language": "en-US",
      "word_time_offsets": true,
      "channel_separation": false
    }
  }'

Response (200)

{
    "request_id": "req_123",
    "duration_sec": 27.4,
    "result": {
        "text_display": "Human-readable with punctuation.",
        "text_lexical": "lowercase lexical form",
        "segments": [
            {
                "start": 0.0,
                "end": 5.2,
                "text": "Welcome everyone ...",
                "words": [
                    { "word": "Welcome", "start": 0.0, "end": 0.4 },
                    { "word": "everyone", "start": 0.41, "end": 0.9 }
                ]
            }
        ]
    }
}