fbpx

AI voice actors are more human voices than ever and ready to hire


The company’s blog post is brimming with enthusiasm for a ’90s US ad. WellSaid Labs explains what customers can expect from “eight new digital voice actors.” Tobin is “energetic and insightful”. Paige is “balanced and expressive”. Ava is “polished, confident and professional”.

Each is based on a real voice actor whose likeness (with consent) is preserved using AI. Companies can now license these voices to say whatever they need. They just feed some text into the audio engine and it will accumulate a clear audio clip with a natural performance out.

WellSaid LaboratoriesA Seattle-based startup that emerged from research from the nonprofit Allen Institute for Artificial Intelligence is the latest to offer AI voices to customers. For now, it specializes in audio for corporate e-learning videos. Other initiatives are making noise digital assistants, call center operators, and even video game characters.

Not so long ago, such deep fake sounds got a bad rap for their use. scam calls and internet trick. But their improved quality has since attracted a growing number of companies. Recent breakthroughs in deep learning have made it possible to reproduce many intricacies of human speech. These sounds pause and breathe in the right places. They can change their style or feelings. You might notice the cheating if they talk too long, but in short audio clips, some of them became indistinguishable from humans.

AI voices are also inexpensive, scalable, and easy to work with. Unlike recording by a human voice actor, synthetic voices can update their script in real time, creating new opportunities to personalize ads.

But the rise of hyper-realistic fake voices is not without consequences. Voice actors, in particular, are left to wonder what this means for their livelihoods.

how to fake a voice

Synthetic sounds have been around for a while. But they are old, including the sounds of the original Siri and Alexa, glued words and sounds together to achieve a chunky, robotic effect. Making their voices more natural was a painstaking handiwork.

Deep learning has changed that. Audio developers no longer needed to dictate the exact pacing, pronunciation or intonation of the generated speech. Instead, they can feed several hours of audio into an algorithm and have the algorithm learn these patterns on its own.

“If I’m a Pizza Hut, I definitely can’t talk like Domino’s, and I definitely can’t talk like Papa John’s.”

Rupal Patel, Founder and CEO of VocaliD

Over the years, researchers have used this basic idea to build increasingly complex sound engines. For example, a WellSaid Labs build uses two basic deep learning models. First, it predicts broad strokes from a text passage about how a speaker will sound, including accent, pitch, and timbre. Second, it fills in details, including the breaths and the way the sound resonates around it.

However, making a convincing synthetic sound takes more than just pressing a button. One of the things that makes a human voice so human is its inconsistency, expressionism, and ability to present the same lines in completely different styles depending on the context.

Capturing these nuances involves finding the right voice actors to provide appropriate training data and fine-tune deep learning models. WellSaid says the process requires at least an hour or two of sound and several weeks of labor to develop a realistic-looking synthetic copy.

AI voices have become particularly popular with brands looking to provide a consistent voice over millions of interactions with customers. With the ubiquity of smart speakers today and the rise of automated customer service representatives, as well as digital assistants embedded in cars and smart devices, brands may need to produce more than a hundred hours of audio per month. But they no longer want to use the generic sounds offered by traditional text-to-speech technology; A trend accelerating during the pandemic as more and more customers skip in-store interactions to interact with companies virtually.

“If I’m a Pizza Hut, I definitely can’t talk like Domino’s, and I definitely can’t talk like Papa John’s,” says Rupal Patel, professor at Northeastern University and founder and CEO of VocaliD, which promises to build. custom sounds that match a company’s brand identity. “These brands have thought about their color. They thought of fonts. Now they have to start thinking about how their voices sound, too.”

While companies used to have to hire different voice actors for different markets (Northeast vs Southern USA or France vs Mexico), some AI firms can change the accent or change the language of a single voice in different ways. This opens up the possibility of adapting ads on streaming platforms depending on who is listening, changing not only the characteristics of the sound, but also the spoken words. A beer ad, for example, may tell the listener to stop by a different bar, depending on whether it’s played in New York or Toronto. Resemble.ai, which designs audio for ads and smart assistants, says it’s already working with customers to launch such personalized audio ads on Spotify and Pandora.

The gaming and entertainment industries are also seeing the benefits. Sonantic, a firm specializing in emotional voices that can laugh and cry or whisper and shout, is working with video game makers and animation studios to provide voiceovers for their characters. Most of their clients use the synthesized sounds only in pre-production and pass it on to real voice actors for final production. But Sonantic says a few people have started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai and others have also worked with movies and TV shows to correct actors’ performances when words are distorted or mispronounced.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

(0)