Is deepfake audio a godsend or a curse for the broadcast industry?

23rd January 2020

Next Post

Artificial intelligence is probably the biggest buzzword in the broadcast industry at the moment.  Whether you’re commissioning productions, producing or distributing them, there’s little doubt that the tools you use to do so either already include some AI capability or will in the near future.  At the moment, AI mostly helps reduce the amount of repetitive, tedious work we need to do, but it has the potential to completely change our industry – as we’re already starting to see with the emergence of deepfake content.

Deepfakes are the result of using AI-powered deep learning to create fake video and audio to the point where it’s almost indistinguishable from the real thing.  We’ve progressed quickly in the last few years from a face-swapped video of Barak Obama saying some pretty unbelievable things to now having the capability to generate completely fake audio using the smallest samples of anyone’s voice.  But even the creators of this generative content technology describe it as an experience of “wunderschrecken” - a simultaneous feeling of wonder and dread. Like all powerful technology, the tools that are used to create deepfake content could be a godsend or a curse depending on who’s hands they’re in and how they’re used.

We thought it would be interesting to explore how the widespread use of deepfake audio might impact the broadcast industry specifically.

 

The pros of deepfake audio in the broadcast industry

Anyone who’s worked in production understands the frustration of last-minute changes. When these changes come after you’ve completed your recording, then they’re not only frustrating but can also be time-consuming and expensive.  In the future however, thanks to deepfake audio tools – or ‘Photoshop for Audio’ as some have described it – instead of having to call the artist back into the recording booth, you might be able to make revisions to your existing voice-over track simply by editing the text in their script which would make it quick and easy to make last-minute changes to voice-overs.

Male Voice Actor in audio booth
Less time in the Audio Booth could lead to multiple time and cost savings in a productions schedule.

 

But why limit the use of deepfake audio to making small revisions to existing audio tracks?  Why not use this technology to create entire narratives from scratch?  Creating original content using deepfake audio tools would mean that renowned and loved voices like Sir David Attenborough’s would be available for centuries to come and that producers would have access to an almost limitless catalogue of voices to suit any creative brief.

One thing that differentiates the software used to produce deepfake audio content from most industry technology is the price tag.  Whereas other ground-breaking industry developments (like 8K and VR, for example) are generally introduced with high price points and take a number of years to become democratised, creating deepfake audio is set to be more cost effective, from the outset, than producing original audio tracks.  Replacing expensive artist fees and recording studio time with cost-effective deepfake tools would not only reduce production costs but would also make it more affordable for producers to dub content into foreign languages and to produce audio descriptions for audiences with disabilities, making video content more accessible to all audiences.  Greta Thunberg may also approve as it would potentially help reduce the carbon footprint of television and film productions.

 

The dangers of using deepfake audio in broadcast

The term “fake news” never even existed before 2016, yet these fictitious reports shared over social media are now attributed with influencing elections, inciting violence and even threatening democracy.  And that’s just the written content.  While sceptical audiences might question the accuracy of a text-based piece, when confronted with a video or audio recording that looks and sounds like the real thing, most of us wouldn’t doubt its legitimacy.  We may laugh at the spoof videos of Boris Johnson and Jeremy Corbyn endorsing each other to be the next PM and at the fake clip of Joe Rogan talking about his chimpanzee hockey team but would we even recognise them as fake if the content wasn’t so preposterous?   Moreover- should a politician, business-leader or other high profile ‘influencer’ make a genuine but ill-judged comment – in the world of deepfakes, they can easily claim that their genuine comments are fake.  The world will soon not know what to believe.

If even those of us that work in the media sector struggle to recognise deepfakes, how can we ensure that our industry doesn’t fall victim to AI-based cybercrime and fraud?  And how effective will all our advanced content security practices be if we’re fooled into providing platform log-in credentials to pirates using deepfake audio messages to pose as part of the production team?  The hope, of course, is that if AI is used to create deepfakes, then the same technology might be used to detect deepfakes.

Rights management is another industry pain-point that will become more complicated.  It’s highly likely that, in a future industry dominated by deepfake content, voice artists would license the use of their voice at a premium to make up for the loss in performance fees.  Not only will the industry need to develop tools that identify this deepfake content, but we’ll also need to devise systems to ensure that our sources are authorised suppliers and that the appropriate usage fees make their way back to the original artist.

Finally, the creative ramifications of using deepfake audio shouldn’t be underestimated.  There’s a reason we use words like “artist” and “talent” when we talk about the people that voice our programmes, beyond speaking in tones that are easy on the ears, they are craftsmen and women – able to create different moods with the smallest change in inflection and a well-timed pause – effects which are unlikely to be matched by anything generated by a machine.  By using deepfake audio we risk reducing an art form to a functional service.

Actors playing AI and Robots is the norm, but AI and Robots being Actors is still a bit of a stretch.

Our take on deepfake audio in the broadcast industry

 Adopting new technology is always both exciting and challenging, but change is inevitable, and our industry would stagnate without new tools to drive our creativity.  The challenge we face is to capitalise on the potential of this new tech while reducing the associated risks. Luckily our industry is supported by organisations like the IABM, MESA and the DPP to help us do just that.  These organisations, and their members have already led the way in setting standards for content security and now is the time for new standards for the ‘good’ use of AI to be discussed and adopted.

 

Get in touch to find out how Take 1 uses AI for good to increase efficiency and reduce costs.