Transcripts, reality TV & A.I. – Q&A with post-production expert Mark Raudonis
Mark Raudonis is the former senior vice president of post-production at Bunim-Murray Productions and a long-time client of Take 1’s. Over a career spanning decades he’s earned two Emmy nominations for editing and has worked on some of the most iconic, genre-defining reality TV shows of the era. Hit shows like MTV’s “The Real World”, “Road Rules”, “Keeping up with the Kardashians”, and “Project Runway” all passed through post on Mark’s watch. Keeping BMP on the leading edge of technological advancements in post was a big part of his job, and that included using transcriptions to speed up the editorial process.
We recently invited Mark to participate in our virtual roundtable at the HPA Tech Retreat to share his experience, predictions and opinions around AI in media workflows with Take 1 founder, Dom Bourne. Here are some of the highlights from that session.
HOW DOES BUNIM MURRAY WORK WITH TAKE 1?
I met Dom about ten or 12 years ago and, at the time, we were using a wide variety of Mom-and-Pop transcribers. We’d send them a VHS tape of an interview and they would transcribe it and send us back a Word document. It was very slow, it was scattered and, when I met Dom, he promised that Take 1 would bring more professionalism and automation to the process.
We primarily just transcribe interviews (not live action footage) with24-48-hour turnaround, and the transcriptions are turned over to our editorial team, story team for them to peruse and select bites and use in the edit.
Take 1 provided us with a server which basically acts like a watch folder on our network. We placed the interviews that we wanted to be transcribed in the watch folder and they were magically taken out of the watch folder, sent to their headquarters and were returned as a Word file or whatever format we needed at the time. So, that simple bit of automation – having a local place for us to put all the work to be done, really changed our workflow and how we did things. It made it faster, made it more efficient, more accurate and really went a long way towards making the whole process a lot more user-friendly.
HAVE YOU TRIED USING AI TOOLS FOR TRANSCRIPTION?
Yes, we have. And you know, it works. It’s fast. But it ain’t even close to being accurate in a lot of cases. Sometimes that’s okay. However, AI transcription is even less accurate on things like speaker diarization – identifying who’s saying what, when – than the actual words. Simple things like punctuation and how it’s displayed on a page, could use some improvement. Often, it’s hard to tell whether what someone said was a question or an answer, and that’s actually very important.
In my understanding of the process, if you have a series with consistent cast, you can front load samples of their speech patterns, vocabulary, slang, or whatever else, into the AI engine, training it to optimize for that voice. That’s great if you have a long season of 23 episodes, but if you have a documentary, where you’re talking to one person and you’re never going to talk to them again, it’s not practical to go to that effort of training the AI for just that one interview. In that case, it’s more efficient to pay for the human accuracy.
As I mentioned earlier, Take 1 has their server with a live watch folder in our server room. That ease of operation is not to be underestimated – it speeds up the process so much. If you have to go to an external website and upload a file and then have it transcode in the cloud and all that – that’s okay for an individual shot, but we work on a pretty large scale, dealing with multiple shows at once – that’s hundreds of hours of interviews that have to be processed and uploaded. It suddenly becomes a really onerous task, and the whole reason for going with the computer-based thing is to do it faster. If it takes more time to get the content up there, bring it back, and you lose stuff, you’re losing the advantage of the speed and you’re left with the disadvantage of bad accuracy.
ARE ACCURATE TRANSCRIPTS REALLY THAT IMPORTANT?
It depends on who you talk to. For some people it’s critical, for others not so much. Along with accuracy comes timeliness. So, some people will say, I need it yesterday and I’m willing to accept less accuracy for that speed. Others say, I need it to be perfect and I’m willing to wait a couple of days for that. It really depends on the nature of the production, what kind of delivery deadlines they’re under, and the specific needs of a show or a story producer.
Where accuracy really comes into play is when you start doing word searches. So, if my name is Tom and I want to search for every time Tom shows up in an hour’s worth of transcripts – if Tom isn’t spelled correctly each time, you’re going to miss references. And, if you don’t have the confidence that the words are accurate, then the whole search process breaks down and you go back to manually looking through things.
Bunim Murray has a very high shoot-to-edit ratio. Occasionally we’ll hit 400-1 or higher. Some of that is multiple cameras shooting the same scene, but the end result is literally thousands of hours of acquired footage for 43 minutes of final programming. That’s a lot of material to manually search through!
HOW CLOSE DO YOU THINK WE ARE TO ACCURATE AI-GENERATED TRANSCRIPTS?
I draw the analogy to fully automated self-driving, which is perhaps a much more difficult problem to solve. Auto-pilot now works well under certain conditions. For example, on a freeway with painted lines on the road I will trust my life to autopilot. It’s the edge cases that are difficult to figure out and to solve, and that’s where the work needs to be done. In speech to text terms, that means accents, slang, and unusual vocabulary usually just comes out wrong.
I’m optimistic that AI is going to advance to the point where the errors created by unusual words, or speaker diarization are minimal and can be overlooked. I think that’s coming sooner than you know. It’s just remarkable what’s being done with AI in multiple fields. The pace and change of improvements have gone exponentially faster than probably most people can even imagine.
IN YOUR EXPERIENCE, ARE ACCURATE TRANSCRIPTS WORTH THE COST?
What we found is, when the price comes down, we found more use for transcripts. Dom mentioned that inherently having words digitized enables the search method and enables more automation of the post process. And we’re finding that to be true. So, you just thought you were getting a cheaper, machine-generated transcript but what you’re really doing is opening up the post process to a whole different way of thinking and way of doing things.
A lot of people overlook the importance of transcriptions and how it really makes the editorial process much more organised, much more efficient, and much more accurate. So, if people look at transcription as a cost, I think that’s the wrong approach, they should look at it as a helping hand to the editorial staff, from both a story perspective and from an editorial perspective. Over the last 20 some years, every show that we’ve ever done always starts with the transcript – and it brings massive value to the process.
Take 1’s video transcription service combines the best aspects of human intelligence and speech-to-text technology to provide fast, accurate and affordable transcripts.