Monday, April 06, 2020

Livestreaming done three ways

Like almost everyone else on the planet, my church has recently and rapidly become acquainted with a host of new technologies designed to let us keep in touch despite the necessary partial lockdown the UK finds itself under. During this time, people are being asked to remain in their homes unless necessary (for definitions of "necessary" that vary in strictness); this includes church staff and production team on Sundays! In the hope that someone, somewhere might find it useful - or at the very least, mildly interesting - I'm sharing the approaches we've taken as a church to continue being the body of Christ even in our social isolation.

It's fair to say that there's been a large amount of figuring things out as we go, and our processes have been changing almost as much as the UK government's advice; by no means are the things I write about here definitive - there are many, many ways to skin this particular bantha, and I'm sure we'll continue to make changes as time goes on, until eventually we can get back to "normal" - whatever that means!

Week 1: Same same, but different

Initially, UK government advice was to avoid gatherings of more than 20 people. This meant we couldn't meet in church as normal - we get in the region of 1,000 people a Sunday, spread over three services - but we could still bring a small team to the church building and produce a live-streamed service from there (while each maintaining a suitable physical distance to the others, of course).

This was useful for a number of reasons, mostly because it meant we could make use of our existing production infrastructure: our sound desk and video mixer, and all the equipment that comes with that setup; we could have used our regular PTZ cameras, but our media and communications manager set up a couple of manually-operated cameras instead for more precise control. The increased latency of the image from those cameras would have been a problem in an IMAG setting, but there are no such concerns for the live stream; I eyeballed an approximate 100ms delay and added that to the audio capture.

The output of the video mixer was then fed into a Blackmagic Design Web Presenter, and from there into a laptop running OBS Studio. The OBS setup was pretty minimal; one scene with a "service starting soon" static image, and one with the Web Presenter's output full screen. Song lyrics were overlaid on the camera image in the video mixer, following our normal process, so we could literally stream the unaltered output to YouTube.

The Web Presenter is only capable of 720p, but on a fundamental level - who really cares it's not 1080p? Our video system (run over old cabling) is only capable of 1080p25 anyway; and the limiting factor would almost certainly become the church network which was already being taxed by the lower-resolution stream, leading to a couple of stutters during the service.

Lots of thought went into staging and framing of camera angles to avoid the "big empty building" feel; and, well, the rest of the room wasn't at its most presentable, owing to a storage room being deep-cleaned that week!

Sunday morning live stream: the "when we could still get in the building" edition.

As well as the morning live-stream, church set up a Sunday evening meeting on Zoom, intended as a less formal and more interactive "call to prayer" event, with clergy hosting from home. Zoom was also adopted to facilitate the day-to-day operations of the church with its staff spread out across the city.

I had no formal involvement with running this evening meeting, but I did run an "unofficial" lyrics stream during the sung worship: capturing the output of JustWords into OBS and rendering to a "virtual webcam", set as my camera for Zoom. (Actually, it would have been better to just use Zoom's screen-sharing features, since that would have allowed viewing the lyrics side-by-side with the worship leader.)

Week 2: When all you have is a hammer

Within a few days of that first stream, Government advice changed to more closely match the rest of the world: stay at home, unless necessary (subsequent days would see various Government ministers disagree on what "necessary" meant, but this isn't supposed to be a politics blog!). In the face of contradictory advice from Government, the Archbishop of Canterbury (in charge of the Church of England of which my church is a part) ordered church buildings to be closed entirely, which meant that our previous solution was no longer viable.

With little time to come up with a more comprehensive solution, and having some experience with it already, church decided to use Zoom for the morning service too, with our rector presenting from his study at home, and our media manager inserting pre-recorded segments via Zoom screen-share. Our worship pastor had recorded a selection of songs, and someone overlaid captions, so they were ready to be rolled out in whichever order was chosen for a service. Other contributors were also incorporated live via Zoom; and, exceptionally, the wider church were able to view proceedings either via YouTube or the Zoom call itself, much more like the Sunday evening meetings.

Zoom has the facility built-in to stream to YouTube (hint: if you want to stream to a pre-scheduled YouTube event, you need to use the "custom live streaming service" option and not the "YouTube" one!). And, you know what? It works, and I'll take a working system over a broken one any day. It did leave some room for improvement, though:
  • The massive Zoom watermark in the bottom of the screen that can't be removed, only changed (if you have special permission to do so) - as someone said, "I was surprised to see the watermark considering what we are paying/month!"
  • Screen-sharing for videos was a little ropey, with quality loss and stuttering (much like a standard Zoom meeting); and it also appeared as a split-screen view with whoever spoke last appearing in the corner of the screen. Not only was that a bit distracting, but it also meant we could see them talking or texting during the songs! (Almost certainly on the WhatsApp group set up to orchestrate these live meetings - but there's no way for the congregation to know that!)
  • There was no control over which participants appeared in the stream output - it was stuck on Zoom's slightly unpredictable "most recent speaker" view, and someone therefore had to be vigilant about muting and unmuting participants throughout. There was no option to remove the name tags from the corner of people's screens, either.
Playback of a prerecorded video clip during a Zoom meeting streamed to YouTube.
Could we improve on this setup? Well, I like to think so, and it'd be a much shorter blog post otherwise...

Week 3: The kitchen (virtual audio) sink

I was asked for suggestions on how to take the same basic service "elements" and accomplish a similar stream "in a way that makes it easier or increases quality". Whenever coming up with a technical solution, it's important to bear in mind the specific goals: here, we're not aiming to produce a radically different output, but instead make iterative process and technical improvements to achieve those two core goals, ease of operation and quality of output. This process of continuous evaluation and refinement is common in the software industry, where a "minimum viable" product with a bare-bones set of features can slowly, incrementally gain more functionality over subsequent releases.

I'm going to go into more detail here than for the previous two weeks, partly because it's necessary to encapsulate some of the complexity, but also because it's the current marker of "we are here" and might serve as a useful milestone.


At a glance

The fundamentals of the current solution are:
  • Live participants join a private Zoom meeting
  • Screen- and audio-capture from Zoom into OBS
  • Pre-recorded content streamed from OBS
  • Slides run through screen-capture into OBS
  • Audio and video foldback routed from OBS to Zoom input via virtual webcam output
I'll look at each point in turn, saving the most complex until last.

During the live stream. Left-hand screen: OBS advanced audio settings. Right-hand screen: OBS in Studio Mode. The mug of tea is an essential component. This is after I had tidied my desk.

Live participants join a private Zoom meeting

I mentioned that Week 2's stream allowed the congregation (indeed, anybody with the meeting number) to join the call alongside the service participants. Even before Zoom's now-well-documented security problems were brought into sharper focus, this had been identified as less than ideal; if nothing else, it was splitting the congregation across two different media, so those on the Zoom call couldn't see the YouTube community chat and vice versa. It's also less possible to curate the experience for someone on the Zoom call, since what they see and hear will depend on their settings, their choices within the meeting, and their platform (I'm reasonably sure that Zoom's clients behave differently on iOS and Android than desktop).

The wise decision was made to limit the Sunday morning Zoom meeting to active participants in the service only, plus one person acting as administrator, and the AV operator - that was me! 


Capture Zoom output in OBS

OBS has built-in tools to capture the output of a specific window, and to capture desktop audio, adding them into scenes as required. This was the mechanism used to send video and audio from Zoom participants to the YouTube live stream.

There are some settings in Zoom that made this easier. 
  • Unchecking "Always display participant names on the video" cleaned up that part of the output. 
  • Tucked away under "Share Screen" is a setting to "Use dual monitors" whose behaviour is counter-intuitive but ended up providing us exactly what we needed. 
With the "dual monitors" option selected, joining a meeting will open up two video windows: the first, as normal, with Zoom's meeting controls and the option to switch between gallery and speaker views; and the second whose behaviour seems to change depending on how many participants there are in the meeting - which really threw me at first! With two people, it will show you your own video - not particularly useful. But with three or more participants, it will show the current speaker's video, but stripped of all the UI elements Zoom normally renders over the top. It was this second window that was captured by OBS.

A few more things to consider. Zoom doesn't restrict the video window to the aspect ratio of the source content, so if the window is anything other than a perfect 16:9 it will either add black bars or crop the video (perhaps both). I needed a couple of rounds of tweaking the window size until it perfectly lined up with OBS's output canvas. Making the window full-screen wouldn't have helped me as my monitors are 16:10!

Unlike the "main" Zoom window, the secondary window won't display a toolbar when the mouse moves over it or someone speaks in chat. Nevertheless, since OBS can still capture the window from a different virtual desktop, I set it up on its own and left it there during the stream.

In this configuration, Zoom also gives you the ability to "pin" participants' videos to either window. We could have used this to override the "current speaker" detection, but I didn't really have the cognitive bandwidth to want to do that; plus it starts adding additional UI chrome to the otherwise clean video, so I chose to avoid it. "Current speaker" was good enough. (I could have cropped the video in OBS to exclude these elements if I'd really wanted to.)

I set up a couple of scenes in OBS with both the Zoom window video and desktop audio: one full-screen, which we used most of the time for the live content, and one with a shrunk-down Zoom window and space for slide content. Each of these scenes needed both the audio and video source in. (I could probably have made those two an OBS group and copied them over, but there wasn't really a need.)


Pre-recorded content streamed from OBS

This was quite straightforward, but did highlight a shortcoming of OBS: the lack of any sort of media controls when playing back videos.

I set up one scene per video, with the video full-screen and set to restart playback when it becomes visible. I needed to adjust the audio level for most of them (and in one case add a compressor) because the levels were a bit uneven and in many cases too close to clipping; a normalisation to -5dB or so when rendering would have been useful. One of the videos had been shot vertically, so I composited it over a generic background image.

I numbered the scenes and videos sequentially - at least, they were sequential until I got a final copy of the script with an additional video inserted halfway through! - and annotated the running order with these identifiers, along with video duration and, for the ones with spoken content, the final sentence as a cue to move to the next scene. It would have been really useful if OBS were able to display a progress bar or, ideally, a timer against a playing video.

Production notes for the livestream, with video identifiers, timecodes and in/out cues.
It would also have been useful - in rehearsal at least - to be able to seek within the video as it played. That way, we would have discovered that the video that was supposed to contain two worship songs back-to-back in fact contained three! A quick exchange of messages with the rector, and a (hopefully) well-timed crossfade, cut the interloper short without it standing out too much...

I could have run an external media player and captured that in the same way I was doing with OBS, which would have given me greater control; but I wasn't sure if that would reduce the quality of the stream output (remember our goals) and would have made a complicated audio setup even more so, since I was already capturing desktop audio from Zoom and didn't want to be broadcasting sound from the Zoom call during the pre-recorded segments.


Slides run through screen-capture into OBS

Another straightforward component once the initial content preparation work had been done: I'd been provided with some content slides as a PDF document, but out of preference I converted those into a series of images and dropped them into a presentation (LibreOffice Impress, as it happens, but other slide-based presentational media packages are available).

I set up the usual two-screen presentation options, with "presenter view" on one and the slide show on the other, then captured the slide show window in OBS. This sat on another virtual desktop which I switched to only to change slides - and there was little enough slide content that I didn't spend much time here at all during the service.


Audio and video foldback into Zoom

All of the above was relatively straightforward, but left one significant hole in the setup: there was no way for the participants of the Zoom call to be able to see and hear the prerecorded media as it played out to the stream. (Watching the stream wasn't an option, since there was about a 25-second delay on it.) It's also useful to have a way to communicate between the "control room" (err, me) and the participants, to cue in/out of video and as a reassuring familiar face running the show.

The video was the easier of the two parts. I set up a "virtual webcam" clone of OBS's output, and set that as the video source in Zoom's settings. Since I run Linux, that meant compiling the v4l2loopback kernel module (Ubuntu's package is an out-of-date version that doesn't work properly) and installing the OBS v4l2sink plugin. Windows users can make use of obs-VirtualCam by the same author. (Sorry, Mac users, you're on your own in the walled garden; there's an alternative later that might work for you.)

Audio was more complicated. The solution ended up as:
  • Set the main audio mix of the computer as OBS's monitor device
  • Set the monitor of that mix as OBS's input
  • Make sure the audio for all prerecorded content was set to "monitor and output" in OBS' advanced audio settings window
  • Ensure that the "desktop audio capture" device was NOT set to monitor, or else it would have created a feedback loop
  • Rely on the fact that Zoom never plays you your own audio, so audio sent via monitor wouldn't end up on the stream output
There are almost certainly better ways of doing this, but nobody wants to research PulseAudio's virtual sink and loopback devices on a Saturday evening, and that wouldn't help Windows users anyway.

It also doesn't help that a bug in OBS means the first point is impossible (at least on Linux) - it lists only audio sources as potential monitor devices, when you need to be able to select an audio sink! So I needed to leave it on "default" and set it using the PulseAudio volume control instead. 

Likewise, Zoom's device selection menus weren't up to the task, so I ran it wrapped by padsp (not sure that was actually needed), set its own settings to "default", then changed what the default was from PulseAudio.

What would have been really useful is a second monitor output from OBS. That would have let me set up one virtual audio sink receiving Zoom audio, which I could monitor locally as well as send to OBS' main output, plus a second one to receive OBS' other monitored audio to feed into Zoom.

Before you ask: I don't use Windows, so I have no experience of setting this sort of thing up there; Virtual Audio Cable looks like a good place to start.

One thing to make participants aware of is that the foldback video they see will have been through Zoom's compression at least once, if not twice: in their words, it will look "fuzzy" - and it will also be delayed, which might throw them if they're not used to seeing themselves on screen while presenting. Reassure them that the real live stream will look much better than that, and it's a technical limitation of the system being used to show them a copy of it.

I had an extra scene in OBS with a local webcam/mic so that I could participate in conversations during the rehearsal and setup, with the audio from the mic configured to only be sent to OBS's monitor output - even if I'd accidentally sent the scene live, nobody would have been able to hear me swear as I realised and changed back!


Alternatives?

Ostensibly, an alternative would have been to say to all participants, "watch the videos in advance so you know what you're linking to/from". It's not a good alternative, but it is the most straightforward!

A feature of OBS I've not played around with too much are "projectors". It's possible to take a copy of OBS's rendered output and display it in a window. It might be worth trying to use Zoom's screen-sharing feature with that window to provide the foldback channel, rather than a virtual camera; I suspect that might have a detrimental effect on the dual-screen Zoom setup we introduced earlier, but it's an option where a virtual webcam isn't a possibility, and it's worth investigating. (I think it might replace the video output of the second screen with the shared screen content; but I could be wrong, and it could do that in the primary window, in which case it's all good.)

If your computer has a line-in (not a microphone in) you could try setting up a second computer on the same Zoom call to act as an audio source, and see if the jitter is low enough that a constant delay one way or another is enough to match it up to the video from the first computer, but I wouldn't hold out much hope. You'd also need to feed OBS's monitor output to that computer so that the foldback audio was from the same Zoom participant as the video - or else it'd cut to the wrong user in its "current speaker" view.


Actually running the stream

All of the above discussion focuses on the system design and setup, which as always takes the lion's share of the time. With all of that sorted, actually running the stream ran fairly smoothly, mostly a case of following the script and clicking appropriate buttons at the right time. Since my audio bus was in use for exchanging audio with Zoom, I wasn't able to monitor the YouTube stream preview for audio levels (or at least, I wasn't supposed to - as I realised a second after unmuting it!) but fortunately my wife was watching the stream downstairs, so I could pop down there quickly to check that output levels were sensible. (Would be really nice for OBS to have audio meters for its main output mix.)

In rehearsal, I'd noticed that sometimes switching to Zoom would cause frame rates to tank and system load to shoot up, so I avoided switching to it during the stream, just leaving it on a virtual desktop on its own.

There were a couple of mid-service instructions from our rector, which I wasn't always able to accommodate. In particular, I'd set up the video for the offering song to include text detailing how to give; I was asked to take that text down halfway through the song, but I didn't want to transition to/from the scene since I wasn't sure if that would restart the video. Turns out, I could quite straightforwardly have done so - but wasn't brave or foolish enough to try and find out in the middle of the livestream! 

(For the record: a source set to "restart playback when it becomes visible" doesn't restart if it was already visible - even if that's on a different scene. So I could have had the same video in two scenes, one with text and one without, and transitioned between them without affecting the playback.)

Back to the goals

In system design, as in software design, it's important to keep going back to your original goals and checking how closely you're tracking to them. To save you scrolling back, here was the original goal statement, said in comparison with the "week 2" solution:
[accomplish a similar stream] in a way that makes it easier or increases quality
Let's break that into two parts.


Makes it easier

On the surface, you might consider that anything requiring as much exposition as the above can't possibly have succeeded in achieving an "easier" goal.

But much of the effort and difficulty was in coming up with a working solution in the first place. That done, it'll be straightforward enough to replicate in future weeks - I'll just load up OBS and Zoom, check the audio routing is still working, make sure I'm capturing the right window, and set up that week's videos as needed. (Again, the setup takes time, but none of the individual steps involved were particularly difficult).

Even then, I think there's still a big win in terms of in-stream ease of use. Here's the procedure our media manager sent describing what he did to show a video in week 2:

...making the transition smooth requires two screens and adhering carefully to the below simple steps (I practised this for about 15 min)

Getting ready:
  1. Open video in video software (don't play yet) move it to a screen were you don't have anything else, and go into fullscreen mode
  2. in zoom chose 'share screen' and select the video software, also tick both tick boxes (these will be ticked by default from now on)  don't click share button yet! 
When service leaders gives the cue:
  1. Play video in video software, (i've left a few seconds of quiet at the beginning of each song)
  2. click share screen  
When video ends
  1. click 'stop sharing'
  2. close video player window
  3. get next clip ready asap using above steps
Transitioning between scenes in OBS is considerably easier than that!

But the production team aren't the only people (or person) whom this process needs to serve. How easy things are for the non-production team and participant "talents" is also important to consider. From the point of view of most of the participants, very little will have changed - they still just need to sign into Zoom and work out when they need to talk (and when they need to not talk).

From the point of view of the person orchestrating the call, things are much easier as they don't need to worry about constantly muting and unmuting participants during video segments (and indeed it's advantageous not to, since then the next speaker can count themselves - and me - in when starting the event or ending a video early, since I wasn't routing their audio to the stream output at those times).

None of which is to say that it's "easy"! I most definitely needed the Saturday rehearsal to straighten out all the kinks in my own mind about the process, and even then it wasn't until partway through that I started to relax into it. New things can be stressful!

Increases quality

The facets that make a "quality" livestream can be pretty subjective. Here's one subjective opinion:

From the WhatsApp group used to orchestrate the stream: a screenshot accompanied by the text "Just so you all know, this is how everyone can see it on YouTube. It's high quality and sounds [some emoji I'm assuming is positive]"
From my perspective, the main "quality" wins were:

  • Greater control over who appeared on screen, when and where (no forced split screen)
  • Videos no longer subjected to Zoom's compression
  • Lack of Zoom UI clutter on the stream

In summary

Well done for making it this far!

Hopefully this post has given an insight, not just into the process we've developed for putting together our livestreams during this unusual time, but also into the evolution of that process and the key decision points along the way.

With all of that said, it's worth making the point that this isn't the final version of that process. I'm not even sure there is, or ever will be, a final version! As our Worship and Creative Director posted this morning:
It's important to pay attention to the process and not just the project...every song, project or completion is a step in the right direction but never the destination. None of us have arrived or made it.
Now, perhaps more so than ever, we're pretty much making it up as we go along, hoping that each tweak, each change, each new piece of technology is solving a part of the puzzle, a step in the right direction, helping us "be" church while we're apart a little more easily.

Eventually this time shall pass, and something resembling normality will return - and when it does, we should remember to ask ourselves, "what have we learned over this time that we should hold on to going forward?" Answering that might be a little easier if we've paid heed to not just the end results we can catch up on via YouTube, but the process we took to get there, too.