Ever heard a voiceover on TikTok or YouTube and thought “Hmmmm….there’s something not quite right with that”? Chances are the creator was using voice synthesis.
With the rise of artificial intelligence (AI), voice synthesis is one of the many ways content is now being created by computers.
But this technology has actually been around for a while.
As far back as the 18th century, inventors were trying to create machines that could mimic real voices, using pipes and bellows to work their veritable magic.
And while the dream of artificial speech evolved over the centuries, it’s only in the last few decades that voice synthesis, or text-to-speech (TTS), has truly started to produce more human-like voiceovers.
In recent years, speech-to-speech translation (SST) has also joined the stable of voice synthesis technology, using digital software to ape an actual person’s voice including their accent, vocal inflections, and speech patterns.
However, as artificial voiceovers become more common in marketing and content creation, concerns about their negative impact on voice actors’ livelihoods as well as brands and businesses, need to be addressed.
But before digging into these concerns, let’s first get the lowdown on the mechanisms behind voice synthesis.
Often referred to as voice cloning as well as text-to-speech (TTS), voice synthesis is a method that converts written language into speech using artificial intelligence (AI) and computer technology.
It falls under the category of synthetic media, which is a catch-all term for any kind of artificially generated, manipulated, or modified media, be it text, video, or voice.
Audio-based synthetic media is developed by using AI algorithms, or systems, to move the written text into human-sounding spoken audio.
The systems analyze a huge database of recorded voice samples, learning how to predict different tones, pitch, and speech patterns. From there, a vocoder transforms these features into an audio waveform, and—voilà!—speech is created.
Modern voice synthesis was initially developed to help people with speech disorders and other sensory difficulties, or those who struggle with reading.
For folk with conditions like ALS or cerebral palsy, these early TTS systems enabled them to communicate more effectively. As such, the primary goal was to improve quality of life and foster independence.
However, nowadays, you’d be just as likely to find an artificial voice on an audiobook or TikTok as you would on an assistive communication device.
Of course, using this kind of technology to create content, implement marketing strategies, and develop smart home assistants (we’re looking at you, Alexa, and Siri) is not inherently wrong.
But it can come with ethical challenges, not to mention a downslide in quality.
For instance, voice actors who license their voices for fixed fees—which is commonly how most TTS contracts are structured—may unwittingly contribute to being cut out of future projects.
Once their voice has been cloned, companies can use it repeatedly in advertisements, audiobooks, video games, or other media, elbowing out the need for any further work from the original talent.
Not only does this undermine the intellectual property rights of professional voice actors, but flooding the market with regurgitated audio-based synthetic media will undoubtedly lead to a slump in industry standards.
Even if voiceover artists withhold their voice for use, companies may still try to get around this.
Recently, Hollywood actor Scarlet Johansson threatened legal action against OpenAI for allegedly mimicking her voice on their latest AI system after she refused to license it to the company.
For many, the incident showed the critical need for regulated control over how synthesized voices are used.
In a media interview, a spokesperson for merged entertainment unions, The Screen Actors Guild and American Federation of Television and Radio Artists (SAG-AFTRA) claimed:
It’s not just voiceover artists who are bearing the brunt of the AI fallout.
The use of voice synthesis in ads, videos, and other content creation can also have a negative impact on businesses.
For starters, synthetic voices often lack the subtle emotional kick that a human voice actor can bring to a recording.
This can make content, whether commercials or video game voiceovers, sound robotic and unrelatable, which in turn will create an emotional disconnect with audiences.
If brands persist in using AI voices for their marketing content, they may find themselves struggling to build genuine trust or true connections with their target audience.
That’s not to say that TTS technology hasn’t improved a lot over the decades. It has.
Certainly, the artificial voices used by global brand giants like Google and Amazon, though somewhat generic, are a far cry from the electronic-sounding output that many of us typically associate with synthetic speech.
However, smaller brands and businesses are typically not on the same playing field as Google when choosing voice synthesis and opting for cheaper choices can find them dealing with quality issues.
In fact, common-or-garden AI voices still struggle with unnatural intonation, mispronunciations, and conveying certain forms of wit like quips and sarcasm.
This can leave content falling flat or sounding too unnatural for audiences to put in the extra effort to absorb the information being relayed.
Truth is, with so much great, authentic content up for grabs, listeners will simply disengage and click away with a “Thank you! Next!”
As modern audiences are generally clued into what’s real and what’s not, a poorly rendered synthetic voice is pretty simple to spot.
Even if you’ve put in the sweat trying to make an AI voice work in your video, the truth is it has the stigma of being cheap and “low effort.”
People want content that’s engaging and enjoyable, with a human touch.
A voice that lacks personality, emphasis, or emotion can leave listeners believing the content itself is generic and sub-par—even if it’s not.
This perception won’t just damage a business’s reputation, but will also make it harder for the brand to stand out in a crowded marketplace where authenticity is respected and originality rewarded.
However, it’s not all doom and gloom for the voice actors, brands, and businesses contending with a new era of AI-generated media.
For sure, synthetic media is here to stay and voice synthesis is part of the posse. But there are ways that it could be developed in the future so that voice artists and brands benefit too.
This, in turn, will make the industry more ethical and sustainable.
The first order of business must be for governments or industry bodies to put legal protections in place for voice artists, ensuring fair compensation and ethical use of voice cloning technologies.
These steps won’t just reduce the risk of exploitation but will also create a safer environment for artists and brands alike.
Following this, standard contracts for voice actors licensing their money-makers must be created whereby the specific terms of how and where the individual’s voice can be used are clearly defined.
Not only will this allow voice talent to retain control over their intellectual property, but it will also help brands sidestep any legal minefields associated with licensed VO misuse.
Additionally, a royalties-based system similar to how musicians collect more cash from streams or plays on platforms such as Spotify and Apple Music, could be implemented for voice talent.
In this way, voiceover artists can be sure of ongoing income each time their voices are used.
Brands, on the other hand, would only need to fork out the funds for actual use, making the whole system fair and scalable.
Finally, new approaches need to be found to combine the strengths of both TTS technology and human voiceover work.
For instance, a business could use AI to generate early drafts of content and then engage real human voice actors to apply the finishing polish. This collaboration would give the work an emotional punch and an authentic sound.
In doing this, businesses would bring together the efficiency of virtual tech with the quality of real human creativity.
Making sure that there are clear industry standards and safeguards in place regarding fair compensation, transparency, and clear consent from voice artists about how voices are used can only enhance the use and effectiveness of TTS technology.
VO talent will trust they won’t be tricked out of fair compensation, and big brands and small businesses will have equal access to high-quality voiceovers in a way that encourages confidence, creativity, and ethical practices.
AI is part of our future. That’s a given.
But unfair and improper practices when it comes to implementing voice synthesis (and other synthetic media) in business and creative endeavors don’t need to be.
* * * *
If you’re looking for high-quality, authentic human voices right now, why not explore our top range of award-winning, professional voice actors?
Contact us and we’ll help manage your project, from sourcing experienced voice talent to providing transcription, translation, subtitling, video editing, and all other post-editing services you might need.
0 Comments