Go to Menu
Celebrating 25 Years of Voice! 🎉

Voice Artificial Intelligence: What Is It?

Here’s an up-to-date definition of voice artificial intelligence, along with a few ways brands are using this technology today to drive growth.

May 29, 2023 by Jean-Rémi Larcelet-Prost
Voice Artificial Intelligence: What Is It?

At the end of 2019, Voicebot.ai published a list of the year’s “top 20 brand innovators in voice.” It called out companies in a surprising range of industries—health and beauty, financial services, media, entertainment, fast food, packaged food and drink, health care, education, automotive, and even consumer packaged goods—that had produced a voice app of some kind, mostly for smart speakers, although there were a few unique branded voice assistants and at least one car-based voice control system, too.

All the brands on the list are leaders in their fields. What can we take away from such a list?

Voice technology isn’t stuck in the tech sector.

Tomorrow’s leading brands—including those in your industry—are working on voice strategies today. To do the same, you need to know a little bit about the technology behind today’s digital voice systems: voice artificial intelligence.

ReadSpeaker uses voice artificial intelligence to develop lifelike custom voices for brands. Learn more here.

Voice Artificial Intelligence: What Is It?

Voice artificial intelligence is an emergent technology, and even industry insiders haven’t yet settled on a clear-cut definition of the term. Tech bloggers use it to refer to any of the intersections of artificial intelligence with automated speech recognition (how computers understand spoken language) and/or text-to-speech generation (how computers speak).

Some writers call smart speaker personas like Alexa “voice AI.” Others use the term to describe synthetic voice production that uses machine learning. This split in usage suggests two definitions. With that in mind, here’s a proposed entry for some future dictionary:

Voice artificial intelligence

Abbreviated: Voice AI

Definition of voice artificial intelligence

1. Software that’s capable of machine learning and employs a voice user interface (VUI) to accept commands and return results, as in voice assistants like Alexa, Siri, and Google Assistant

E.g., “Fixing appointments or reordering things, and more, … your Voice AI will connect all the data from your devices and do it for you … ”

2. The process and result of synthetic voice generation using deep neural networks, including AI voice cloning and deep voice software

E.g., “Voice AI technology involves understanding what comprises a human voice and then reproducing it after recording those elements.”

The first definition refers to an artificial intelligence-driven persona that interacts with users via voice. The second refers to the use of AI to generate a synthetic voice, like an AI voice clone.

Forward-thinking brands are using voice AI in both senses to drive recognition, boost customer loyalty, and differentiate themselves from competitors in an increasingly visual-free media environment.

How Companies Are Using Voice AI Assistants

Most brands will use voice artificial intelligence in the second sense of the above definition. That is, they’ll develop a literal brand voice with the help of a deep neural network from a provider like ReadSpeaker (we’ll discuss that application shortly).

But as the Voicebot list illustrates, leading brands may end up producing their own virtual assistants and/or smart products that host those personas. Here are a few examples of branded voice AI assistants:

  • In 2018, Bank of America rolled out an AI-powered virtual financial assistant called Erica. This voice-enabled persona lives on the Bank of America mobile app. Erica completed more than 35 million “client requests” from 6 million users as of March 2019—everything from reading aloud a customer’s routing number to tracking down specific transactions to warning that a recurring charge amount has changed—all through a voice user interface.
  • The previous year, Capital One released an AI assistant called Eno. Eno was one of the first branded voice bots outside of the major smart speaker personas. The virtual assistant is available through Capital One’s mobile app and on its website.
  • Drivers of new Mercedes models can wake up the native MBUX virtual assistant by saying, “Hey Mercedes.” This system uses natural language understanding, a form of artificial intelligence, to recognize diverse commands. Drivers can ask for directions, turn down the air conditioning, change the radio station, and more through this on-board voice assistant—all while speaking naturally.

Few brands have the resources to develop custom voice AI products like these. The more common way to take advantage of branding in voice-only environments is to produce a custom branded voice—a process that, at its highest level, also uses artificial intelligence.

Artificial Intelligence in Synthetic Voice Generation

Artificial intelligence allows the creation of lifelike synthetic voices, including AI voice clones that closely imitate the sound of a particular speaker. To create an AI voice clone, engineers use deep neural networks (DNN)—a complex form of computer architecture that imitates the synaptic connections within the human brain. These systems recognize patterns within data sets. That means you can train them; they “learn.” Training a model on a DNN is called deep learning.

To clone a voice, technicians feed audio recordings of the source speaker into deep voice software, a specialized type of neural network. The DNN identifies the minute patterns in that voice—tone, pronunciation, speed, stress, rhythm—and creates a model that can imitate those subtleties while performing entirely new scripts. This AI voice technology creates powerful new branding opportunities. For instance:

1. AI Voice Clones for Celebrity Spokespeople

Back in the early-to-mid 2000s, actor James Earl Jones was “the voice of Verizon.” He appeared in the company’s commercials. He did live branding events. But back then, there were relatively few voice-based touchpoints between brands and their customers—Jones’ recording schedule was manageable.

If Verizon and Jones had the same relationship today, the company would bankrupt itself paying the actor to record scripts for all the new voice-based channels: ads, smart speaker apps, interactive voice response (IVR) systems, etc. A licensed James Earl Jones voice clone would allow Verizon to maintain its branding across all voice channels without the expense or the scheduling challenges of countless recording sessions.

2. Consistent Brand Mascot Voices

Real-life celebrities aren’t the only ones whose voices can be cloned. Characters—Ronald McDonald, Mickey Mouse, Chester Cheetah—also create a consistent brand experience across audio channels. Voice cloning allows a character’s voice to remain the same across generations, without the subtle variations that come from switching voice actors.

3. All-New Custom Branded Voices

Companies don’t need to have an existing brand voice to take advantage of neural text-to-speech technology. ReadSpeaker’s proprietary voice engine uses a deep neural network to generate a unique text-to-speech voice, exclusive to your brand. We work with you to identify ideal sources—voice actors whose speech we’ll use to train our AI models. We further customize models by developing a brand lexicon, complete with individualized pronunciation for industry jargon. We can even add emotional inflection. Soon, responsive expression technology will allow voice-based systems to adjust emotional tone based on the customer’s speech patterns.

When brands develop custom voices, they can deploy this unique identifier across the growing range of voice-first devices and media: voice AI assistants, IVR systems, in-car infotainment, interactive in-store displays, e-learning materials, TV, radio, online ads, instruction videos, accessibility tools, conversational robots, and more. This creates a consistent experience that follows the customer throughout their day, improving recognition, trust, and loyalty—without the costs of ongoing talent contracts.

This is the sort of voice artificial intelligence that will drive brand innovation going forward. One thing Voicebot.ai’s 2019 list of top voice-branding efforts doesn’t include is a lot of AI-generated branded voices. That year, developing a smart speaker app was enough. That will change on future lists.

In 2020, for instance, Amazon announced that it would enable brand voices on Alexa skills. Other smart device manufacturers are sure to follow. Next year’s brand innovators in voice will be the ones that can most effectively incorporate voice artificial intelligence into their digital strategies.

Related articles
Start using Text to Speech today

Make your products more engaging with our voice solutions.

Contact us