The art (and AI) of producing video captions

05th August 2020

Next Post

By Dom Bourne, Take 1 Founder

Video captions are everywhere these days. We’re all used to seeing captions at the bottom of our screens. On TV. On YouTube. On OTT platforms.  Turning them on & off can be as simple as clicking the little CC button on your phone or remote, but the process involved in producing captions can vary enormously depending on the content they’re created for.  In this blog, we’re going to demystify video captions and explore various approaches that go into making them.

What is closed captioning?

Closed Caption | Brands of the World™ | Download vector logos and ...

Put simply, captions are a text version of all the audio elements in a video. They’re super useful if you are watching with the sound off, or if your hearing isn’t so good.  Closed captions can be switched on and off while open captions are always in view and can’t be deselected.  Either way, captions make video more accessible to viewers and to search engines.

It’s easy to confuse captions and subtitles because they look almost identical, but they’re designed for very different purposes.  Captions describe all the audio elements of a production – including the dialogue and descriptions of sound-effects, music and other audio cues and clues.  They’re designed to provide all the audio information in a programme for viewers that (for whatever reason) can’t hear any of the original soundtrack.  Subtitles, on the other hand, only provide a written version of the video’s dialogue and are most often used for translated content where viewers can hear the other audio elements and need an alternate version of the spoken words.

How to create closed captions

OOONA on Twitter: "#ooona just released more tools, Compare ...

image courtesy of Ooonatools

The basic process to create closed captions can be summed up in three steps;

  1. First you need a transcript – a word-for-word written representation of all the dialogue as well as descriptions of any other audio elements included in the production. This can be created professionally by a transcriber or generated automatically by ASR, (or an ASR transcript can be polished by a human editor.)
  2. Next you need to make sure that the captioning content flows well and is timed correctly – so you need some captioning software, and someone experienced to operate it.
  3. Lastly the captions need to be exported in the correct format – .stl, .srt, .scc, .ttml… The list of formats is constantly growing as more standards are released to keep up with new SVOD platforms and playout systems.

Choosing a captioning service provider

The tools you use and level of human input into each of the above steps depends on whether you need high quality, broadcast grade captions or quick captions for an online video.   Each project should be driven by budget, deadline, and quality requirements. For example, when the Take 1 team create broadcast captions our focus is on quality as the captions need to meet FCC & Ofcom regulations and style-guides. Broadcast grade captions will be originated by a team of captioners, experienced in delivering against different technical requirements and will go through several rounds of QC in order to ensure that they’re completely error-free. Conversely, if the captions are for web video, then we can switch to a lower-cost workflow which makes use of ASR and fewer rounds of quality control.

If you’re looking for a broadcast captioning service provider, three critical capabilities to check for include;

  • A strong understanding of the FCC closed captioning rules and requirements or Ofcom regulations.
  • a team of trained captioners capable of handling volume, and
  • a secure platform for the uploading and delivery of media & files.

These points are especially important if you’re working on high volumes and tight turnaround times to accommodate the ever-shrinking content distribution window!

Startup Uses AI and Human Augmentation for Video/Audio ...

The future of caption creation

Innovation and R&D have a big part to play in shaping the future of caption creation. One of the areas Take 1 is exploring is using as-broadcast scripts (also known as post-production scripts) as the blueprint for captioning. When as-broadcast scripts are created as XML files (as Take 1 does) the transcripts contained in these files can be repurposed into captioning data rather than being recreated from scratch.

Secondly, of course, is Artificial Intelligence.  In the world of speech & dialogue, ASR (Automatic Speech Recognition) is the first AI that comes to mind, but this is not the only relevant technology in the creation of captions. Various emerging AI segmentation solutions now automate caption timings and MT technology can automate translation for foreign language subtitles. Lastly, there have been big strides recently in the automation of live captioning technologies. None of these processes are perfect, however, and still require eyeballs & eardrums to polish the output, but early research indicates potential operational efficiencies allowing faster captions for lower cost. All good news for customers.

For more information, or to talk to one of our captioning experts,

get in touch.