Measuring latency on incoming video streams

Photo of a clock.
Photo by Sonja Langford on Unsplash

How often do you measure the latency in your real time communications? And if there are more than two participants in the call, can you expect a similar latency from everyone?

We started the new year with a little experiment to determine how to precisely measure the latency of video streams in a Broadcast Bridge room. Latency measurement is a common way to quickly check that a WebRTC connection is working as expected. This is usually done by sharing a screen showing the current time and manually calculating the difference on the receiving end (or reading some WebRTC statistics, but those are dull).

Our aim was to improve on this practice by making it more objective and reliable. To do this, we decided that the measurement should be:

  • Automatic
  • Continuously updated
  • Precise
  • Simple

Who wants to read a number from an image and do the manual maths to work out the frame to frame latency? No-one.

What we needed was a way to encode, in each video frame, the precise time at which the frame was sent, to compare it to the time at which it was received on the other side. The frame timestamp property was not an option, as typical RTP timestamps are just incremental numbers and not really timestamps at all. There are various solutions to this problem, one of which is to enable the Absolute Capture Time extension on RTP packets. However, our SFU doesn't currently support negotiating it.

We decided to use QR-encoded timestamps, as they are a simple way to carry the information in the frame image itself. This has the advantage of making it easy for the sender to just show a QR code to the camera, without any specific setup. On the receiving end, the timestamp can be decoded automatically and used in the latency calculation; no human manual maths required!

Our aim was to analyse each video track continuously, frame by frame, so this was the perfect chance for us to experiment with Insertable Streams. This browser API exposes the content of the MediaStreamTrack and lets us pipe it through a transformer function defined by us. Here you can modify the frame, for example by highlighting the QR code after it's been detected. Since we were not interested in modifying the frame for this experiment, we simply used the transformer to decode the timestamp and calculate the latency for each frame. We should also mention the partial support for Insertable Streams, as the feature is not currently available in Firefox and Safari – a tale as old as time when it comes to WebRTC support among browsers.

Here's how we piped the track through the transformer:

const videoTracks = stream.getVideoTracks();
const trackProcessor = new MediaStreamTrackProcessor({
    track: videoTracks[0],
});
const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' });

trackProcessor.readable
    .pipeThrough(transformer)
    .pipeTo(trackGenerator.writable);

We used the qr-scanner package to detect and decode QR codes in the transformer function:

const transformer = new TransformStream({
    transform: async (videoFrame, controller) => {
        const arrived = Date.now();
        const bitmap = await createImageBitmap(videoFrame);

        try {
            const result = await QrScanner.scanImage(bitmap, {
                returnDetailedScanResult: true,
            });
            computeLatency(result.data, arrived);
        } catch (err) {
            console.error(err);
        } finally {
            bitmap.close();
        }
        controller.enqueue(videoFrame);
    },
    flush(controller) {
        controller.terminate();
    },
});

The computeLatency function converts the timestamp string extracted from the QR code to a date, and then subtracts it from the arrived date to calculate the latency.

Technically, Chrome has support for reading QR codes baked into the browser now, so why the qr-scanner package when we don't need to support anything other than Chrome due to the lack of insertable streams elsewhere? Unfortunately the baked in API is broken on certain platforms... and so didn't work for us.

As a result, we are now technically able to calculate the latency of each video frame for every single remote stream. This allows us to see how latency differs for each connection and how it changes over time.

Of course, since we want to be able to measure latencies in the order of 100-200 ms, we need the measurement to be precise. This means that the timestamp QR code must be generated at a high frequency, to make sure that it captures as closely as possible the moment when the frame was sent. We found that GoPro Labs' precision date and time could serve this purpose for now.

The image below shows the result. In this example there are two browsers. One of them, the 'receiver', joined the room without sharing a video. The 'sender' is the browser that shares a window open on GoPro's precision date and time page; this page displays a QR code updated about every 10 ms. The video element on the receiver's side shows the current latency, which is calculated for the latest frame, and an average latency, calculated on the last 120 frames.

Screenshot showing two browser windows: a sender and a receiver. Current and average latency of the video stream is displayed on the receiver's end.
Sender (left) and receiver (right). Current and average latency of the video stream is displayed on the receiver's end. Eagle-eyed readers will have noticed that the receiver's window is displaying an older frame than the sender.

The following image shows the latency being calculated separately for each incoming video stream. In this case, the sender also joined the room with a mobile device, using the camera to film an iPhone showing the GoPro time code. In both cases, the calculated latency is very similar.

Screenshot of a Broadcast Bridge room where we're measuring latency on each incoming video stream. Left: laptop sharing a window; right: mobile device filming a QR code on a screen.
Measuring latency on each incoming video stream. Left: laptop sharing a window; right: mobile device filming a QR code on a screen.

Job done, then? Not really. During tests, it soon became apparent that the lack of an external, synchronised time reference can introduce errors in latency calculation. In fact, the GoPro time code is generated from the device clock, which is not, in itself, a perfectly reliable source. While it's very easy to calculate latency when both browsers run on the same machine, we noticed a discrepancy when comparing different devices, especially if they were from different brands.

In the screenshot below, the QR code generated from an Android phone produced an overestimated value of latency which was not consistent with the small delay perceived visually during the test.

Screenshot of a Broadcast Bridge room showing the latency calculation issues caused by device times not being in sync. The latency calculated from the QR time code generated on an Android device was overestimated.
Latency calculation issues caused by device times not being in sync. The latency calculated from the QR time code generated on an Android device was overestimated.

The issue with device time highlights the need for synchronised timestamps, if we want to achieve a truly reliable latency measurement. This was beyond the scope of this experiment, but we're looking at exploring the use of a timing object for this purpose in a future post.

Our experiment was a successful proof of concept showing how it may be possible to easily measure the latency of any incoming video stream during a call. For example, at the beginning of an event, the organiser could run a quick check by asking all guest speakers to briefly show a QR time code to their webcam. Just like how film makers use a clapperboard at the start of a scene – aiding the synchronising of the cameras and microphones, among other things.

Clapperboard
Photo by Martin Lopez from pexels

Our tests also highlighted the limitations of this approach. The way we chose to analyse each video frame using Insertable Streams is not widely supported by browsers. Latency calculation issues showed the importance of how sender timestamps are generated, and the need for a synchronised clock to measure the real latency reliably.

There is still some work to be done before this proof of concept can be turned into a feature, but there's an exciting road ahead!

Subscribe to Broadcast Bridge Updates

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe