Go to Menu
Celebrating 25 Years of Voice! 🎉

Custom TTS Voices: 5 Things Brand Managers Should Know

Confused by all the hype surrounding custom AI voices? These five facts cut through the noise to help you get the TTS voice you need.

May 1, 2024 by Gaea Vilage
Man recording custom TTS voices: 5 things brand managers should know

Your brand has a logo. It has a color palette. It even has implied human characteristics, carefully chosen to forge meaningful connections with the public.

Why not a voice, too?

An actor may provide your literal brand voice, but no single person can be everywhere at once. A voice actor couldn’t possibly record all the statements you need to run automated call centers, AI assistants, online content and newspaper readers, and more. Voice communication at scale requires digital speech.

That’s why many brand managers turn to custom text-to-speech (TTS) voices: unique brand assets that provide consistency across all your audio channels.

A decade ago, TTS technology might not have been capable of pulling this off successfully. Synthetic voices simply didn’t sound that great. Today, however, AI has unlocked new levels of lifelike voice quality. Neural networks and machine learning create much more realistic speech, which is why we also call these AI voices neural TTS.

With this technology, you can get a custom synthetic voice that expresses your brand personality as warmly and naturally as your favorite actor.

But there are limits to what an AI-powered custom voice can do. There are good and not-so-good ways to go about creating an AI voice. In short, there’s a lot of uncertainty around this new technology. Let us clear a few things up.

Here are five important facts about custom AI voices that every brand manager should know.

Looking for an AI voice that represents your brand perfectly?

Learn about custom text-to-speech voices
Woman in glasses with tablet

5 Considerations for Creating a Custom TTS Voice

5 facts about custom AI voices

There’s a lot of hype around AI. Neural TTS is no different. Lots of AI voice providers promise the moon, but if it sounds too good to be true, it probably is.

Here’s the truth about custom AI voices.

1. Computing resources determine the audio quality of an AI voice.

Neural voices sound amazingly lifelike. High definition, information-rich versions of these voices also require a lot of computing power. That means they may not work for every use case.

Producing a static speech file, like an audiobook? A top-quality AI voice is ideal. It will run perfectly and sound terrific.

Live-streaming dynamically generated speech, as in an AI voice assistant? You probably need a quicker, lighter TTS voice. The same is true for embedding TTS into a device.

At ReadSpeaker, we use multiple technologies to deliver the best TTS quality possible—not just in general, but for the specific system you’re using. That includes full, high-definition AI voices. It also includes small-footprint neural voices for on-device speech. And it includes unit selection synthesis (USS), a technique that leads to ultra-light TTS voices.

The point is, comparing small-footprint TTS with the heaviest, most resource-intensive AI voice is apples and oranges.

Our advice? Always ask your TTS provider what sort of computing resources you need to deploy a voice you like. Don’t assume that the highest-definition AI voice will work for your use case.

The technical reality is that today’s best-sounding voices won’t work in every situation. What you can do, however, is partner with ReadSpeaker to bring a custom TTS voice to any use case, with the ideal quality-to-footprint balance for every deployment.

2. Collaboration is the key to a great branded synthetic voice.

Some AI voice generators lead with the promise of simplicity: Just upload some recordings and get a TTS voice! These AI voice generator platforms generally do what they promise, but that doesn’t mean you’ll be happy with the results.

A self-service AI voice generator is a tool, just like a movie camera. You can own the best camera in the world, but if you don’t know how to make a movie, you won’t be winning any Oscars.

Creating a custom TTS voice is the same. It takes a lot of expertise in a variety of disciplines—computational linguists, AI engineers, voice coaches, recording technicians, actors, and more—to do it right.

Most importantly, you need to engage with the project yourself. You know your brand best. Is it honest and cheerful, tough and outdoorsy, gentle and reassuring? The voice has to express these traits.

In other words, AI voice technology alone isn’t enough. You also need real-world expertise. That brings us to our next point.

3. A TTS voice may not have perfect pronunciation right out of the box.

Real talk: It’s virtually impossible for a TTS voice to pronounce everything perfectly the first time.

Imagine a music-streaming service. No set of training data will include the name of every artist and song on offer. If it’s not in the training data, that AI voice will try to predict the appropriate pronunciation. The prediction algorithms and models perform well, but they’re not 100%.

Now think about the jargon in your industry. Think of the acronyms, the proper nouns, the loanwords. The only way to get your custom voice to pronounce all these edge cases correctly is to monitor and continually improve the system’s pronunciation dictionary.

At ReadSpeaker, we invest heavily in correct pronunciation. We offer pronunciation tuning as part of any support-and-maintenance deal. We’ll even check your content and test your voice for proactive corrections.

You won’t get that sort of ongoing quality assurance from a self-service AI voice generator—nor the tech giants.

4. Every TTS voice starts with a human voice actor, and those actors have rights that need protecting.

The neural networks that generate AI voices require training data, and that data can only come from recordings of human voices. We’ve said it before and we’ll say it again: Ask your AI voice provider where they get their data.

There’s a voice actor behind every custom TTS voice. Unethical AI voice providers may use recordings without permission, violating the speaker’s rights and potentially subjecting you to later legal liability.

Make sure your branded voice is safe and responsible. At ReadSpeaker, we ensure ethical AI by generating our own training data with contracts in place. That means we record voice actors ourselves; we pay contributors fairly; and all parties agree on approved uses for the resulting voice.

Voice actors trust ReadSpeaker. That gives us a lot of options for your custom AI voice, since we have access to more actors than an untrustworthy provider. Our reputation for ethical AI has also opened up exciting opportunities, like our work with Giancarlo Esposito to produce the custom AI voice for Sonos Voice Control.

5. You’ll need TTS support after deployment, too.

Producing your custom AI voices is one thing. Making sure it runs correctly across all your channels and on a variety of technological platforms is quite another.

The truth is that AI technology can produce unexpected results. You need a TTS partner who’s there to correct problems if they pop up. You need ongoing pronunciation support. And you need technical assistance as you bring your branded voice to new channels.

ReadSpeaker won’t disappear after delivering your custom voice. We’ll be there to make sure your voice performs just as you need it to—and we’ll keep your TTS up to date, regardless of how the technology develops.

You simply won’t get the same support from one of the many startup AI voice generator companies—or big tech firms that offer TTS among many other offerings. The truth is, ReadSpeaker doesn’t just build custom AI voices; we offer AI voice consultancy services.

Custom Voice Consultancy With ReadSpeaker

What does custom voice consultancy look like? It starts with choosing the ideal speaker to form the basis of your brand’s speech identity.

That choice is more complicated than you might think. We evaluate suitability for TTS by listening to each possible speaker—whether that’s one of our voice actors or your CEO—looking for factors that might disqualify a potential voice talent.

Regardless of how nice a speaker sounds in person, any of the following qualities may not translate well into an AI voice:

  • Raspiness or breathiness
  • Inconsistent pace, intonation, or other speech quality
  • Speech that’s too quick (or too slow!)
  • Nasality
  • Unclear enunciation

We evaluate dozens of prospects to find the ideal vocal qualities. Most importantly, we’ll find a speaker that expresses your brand persona—and we can make your final TTS voice present as male, female, or gender-neutral, at any age, and with any combination of vocal traits.

But we can’t reduce the process to a technical checklist. Casting for TTS is as much an art as it is a science.

Once we line up a few speakers, we work with you to choose the right one. Then we record hours of special TTS scripts, designed to produce the speaking style that works for your brand. After you approve these recordings, they become training data.

We feed the data into our proprietary deep neural networks (DNNs) to produce a draft voice. Again, you check off the progress before we proceed to fine-tuning. Finally, once we’re all happy with your AI brand voice, we help you deploy it in all your audio channels.

It’s a full-service custom-voice partnership, and it produces wonderful results.

Ready to produce your brand’s custom TTS voice?

Get started
Smiling black man using smartphone
Related articles
Start using Text to Speech today

Make your products more engaging with our voice solutions.

Contact us