Scribe API

The Scribe API delivers scalable, high-performance transcription across a broad range of media formats and use cases. It enables organizations to convert audio and video into accurate text, and handle large archives in bulk or real-time interactions at scale.

Key features

Optimal high‑volume transcription directly from your existing cloud storage
Multiple input formats, including common audio and video formats
Ideal for call conversation transcription, podcasts, and long‑form media processing
Adds timestamps to show when each segment occurs
Punctuation and formatting for human-readable output
Speaker separation occurs when more than one person appears
Profanity filtering when enabled
Short command‑and‑control scenarios through fast mode

Processing modes

Fast mode

Fast mode provides synchronous, low-latency transcription for individual files.

Processes one audio file at a time
Response returns immediately after the transcription completes
Works best for short recordings

Note: Fast mode can handle one (mono) or two (stereo) audio channels. The API returns either a single combined transcript or separate transcripts for each channel.

Example workflow

To convert an audio recording into searchable text on demand:

Use fast mode for near real-time transcription.
Your app uploads the audio file.
The backend generates a JWT with Build platform credentials.
The backend sends a transcription request.
The API returns a JSON transcript with timestamps.
Your app displays the transcript to the user.

For details, see Fast mode.

Batch mode

Batch mode provides asynchronous transcription for large or complex jobs.

Processes many files in a single request
Runs in the background — submit jobs and retrieve results when processing is complete

Batch mode is best for:

Long recordings
Large collections of files
Multi-speaker audio

Each audio file generates its own transcript in the corresponding output location.

Example workflow

To transcribe stored call recordings in S3:

Submit a batch job and specify the input folder in your bucket.
The job runs asynchronously.
The service writes transcripts to the specified output location.
Use batch job status endpoints or webhooks to monitor progress.
Retrieve per-file results when processing completes.

You can use these transcripts for record retention or media archives.

For details, see Batch mode.