Processing local device media with Zoom Video SDK

When developing an audio/video application, you may want to access the local media stream—that is, the media streaming from the user’s local device, such as a camera or microphone—to process elsewhere.

Though Zoom's Video SDK does not currently expose the local media track sent to Zoom1, you can emulate this process by starting the local device with the browser mediaDevices API and Video SDK at the same time. The local media track will capture input from the device, but will not send any data to Zoom, meaning it can only be accessed, not modified.

This blog post assumes you’re capturing the video input stream locally. However, you can apply the same concepts to other devices, such as the microphone.

Getting primary video device

Before starting any local media, ensure you’re calling the browser and Video SDK methods with the same video ID. To do this, create a function that uses Video SDK’s getDevices() static method, which returns an array of MediaDeviceInfo objects.

MediaDeviceInfo includes a kind property that you can use to identify the type of device that is connected. For this post, we’ll filter the kind property to only include objects equal to videoinput.

async function getPrimaryVideoInput() {
    const localDevices = await VideoSDK.getDevices();
    const videoInputDevices = localDevices.filter(
        ({ kind }) => kind === "videoinput",
    );
    return videoInputDevices[0];
}

Starting video with local media stream

Now that you have the video input device(s) directly from the browser, create a new function that serves two purposes:

  1. Begin streaming the video to Zoom via the Video SDK startVideo() method.

  2. Create and return a local MediaStream for accessing and processing local content.

async function startVideoStream(deviceId) {
    if (await zmStream.startVideo({ deviceId })) {
        return await window.navigator.mediaDevices.getUserMedia({
            video: {
                deviceId,
            },
        });
    } else {
        console.error("Zoom failed to start video, exiting...");
    }
}

In the previous function, deviceId is the device identifier retrieved from the getPrimaryVideoInput() function, while zmStream is the Video SDK Stream namespace, retrieved from the getMediaStream() method in the SDK. For more information on creating and accessing a Zoom media stream, check out our documentation on starting and joining sessions.

Using local media stream

Now that you have access to the local MediaStream context, you can use it in your application—for example, uploading the image to an AI processing model, or, in our case, printing the image to the console as a base64-encoded data URL.

const videoDeviceId = await getPrimaryVideoInput();
const localVideoStream = await startVideoStream(videoDeviceId);
if (localVideoStream && localVideoStream.active) {
    // Grab the first MediaStreamTrack from our local MediaStream
    const firstTrack = localVideoStream.getTracks()[0];
    // Capture single frame from the MediaStreamTrack
    const imageCapture = new ImageCapture(firstTrack);
    const imageBlob = await imageCapture.takePhoto();
    // Read the image blob as a data URL, and output to console
    const fileReader = new FileReader();
    fileReader.addEventListener("load", ({ target: { result } }) =>
        console.log(result),
    );
    fileReader.readAsDataURL(imageBlob);
} else {
    console.error("localVideoStream not defined or active");
}

Note that the previous code snippet uses the experimental ImageCapture browser API. Since this API is only available in a minority of browsers, it is used for demonstration purposes only, and should not be used in a production environment.

Destroying media stream

Finally, now that your application has made full use of the MediaStream and any attached MediaStreamTrack object(s), create a function to stop the video, both with Zoom, and each local media track, ensuring that all resources are no longer capturing data from the device.

async function stopVideoStream(mediaStream) {
    await zmStream.stopVideo();
    mediaStream.getTracks().forEach((track) => track.stop());
}

Conclusion

As demonstrated in this post, if you're looking to stream video or another device to Zoom while also retaining the ability to capture input directly from the device for external processing, with just a few lines of code, you can accomplish this quickly and easily using a combination of native browser APIs and Zoom's Video SDK.

Footnotes

  1. Raw audio/video access for local and remote participants is currently on the Zoom Video SDK roadmap and slated for release by the end of Q1 2024