Q&A with the ReadSpeaker Development Team

The ReadSpeaker team is known for being passionate about driving innovation in text to speech. And today we are pleased to share insights from our brilliant Development team. We hope you enjoy this Q&A with Marcel de Korte, Jaebok Kim, and Wonsuk Jun on Multilingual Neural Text to Speech based on cross-lingual modeling technology from ReadSpeaker.

Why is this Capability Important?

Organizations in today’s global economy are competing in an increasingly complex landscape. Of course, with international presence comes the challenge of distributing information accurately and in an engaging way to audiences who speak a variety of languages.

And thanks to high-quality speech solutions, communicating with consumers and colleagues using voice-enabled applications has become easier in recent years. However, since most synthetic (and, in fact, most human) voices are usually only fluent in one or two – and rarely three languages at the most, organizations typically find they need to use different text-to-speech voices. They’re usually unable to speak to consumers in a single voice in different languages.

How is ReadSpeaker Active in this Context?

Actually, at ReadSpeaker, we’re working on making it easy for our customers to overcome the issues we just mentioned. In fact, we are developing cross-lingual capability trained on our very own multilingual model technology. The purpose of this is to enable organizations to be represented by a single voice that sounds native –in a multitude of languages.

We’d like our customers to be able to browse our entire portfolio of existing voice personas and select the one that has the vocal characteristics that represent their company or organization best. Choosing a voice is such a subjective matter. So we want to make sure we offer the broadest choice.

How Can Brands and Organizations Make the Most of Multilingual Text to Speech?

Our cross-lingual capability enables us to customize voices so they can speak other languages offered by ReadSpeaker. The feedback we have received from the market indicates that multilingual voices are especially relevant to our customers that want to be instantly recognized in the voice interface.

Brands and organizations benefit even more from a custom voice that is exclusively tailored to their brand or corporate identity – and is also multilingual.

What Underlying Technology Does ReadSpeaker Use Here?

Well, to understand how this works, let us take a look at how neural text-to-speech is usually created. In a typical situation, we use deep neural networks (DNN) to create digital voices that are hard to distinguish from human voices. We start with a language-specific script and create a collection of text and speech pairs for a specific speaker. The text is then converted into linguistic representations of the language. Then we create a DNN-trained model to understand how the neural text-to-speech speech engine needs to translate text into a specific voice persona’s speech. 

So, in the case of multilingual modeling, we extend this DNN model to include linguistic representations for each language. Once we’ve done that, we task our model with learning how to convert these representations into speech for every language.

Moreover, as our target is to maintain the speaker’s identity in other languages, we also try to disentangle the speaker’s identity (meaning their vocal characteristics) from the linguistic aspects. Once we have separated these vocal layers, we can synthesize any speaker-language pair we have in our training data by ‘pasting’ the speaker’s voice characteristics onto the linguistic specifications of the other languages.

An Example of Multilingual Neural Text to Speech 

The following examples are a prototype version* of a multilingual voice powered by ReadSpeaker’s cross-lingual capability. And as you can hear, the voice characteristics remain similar across the different languages.

LanguageAMEBREDUTESPFREGER
Native Speaker
Alice
Ilse
Lola
Elise
Lena
Target Speaker
Sophie
Sophie
Sophie
Sophie
Sophie
Sophie
Click on the play buttons to hear female voice Sophie speak in different languages.
LanguageAMEBREDUTESPFREGER
Native Speaker
Hugh
Alex
Manuel
Benoît
Max
Target Speaker
James
James
James
James
James
James
Click on the play buttons to hear male voice James speak in different languages.

In conclusion, we hope you enjoyed this introduction to our multilingual neural text to speech voices. If you’re interested in finding out how ReadSpeaker DNN voices can boost your business, please contact us to start a conversation with ReadSpeaker today.

*Special thanks to Wonsuk Jun for developing the prototype samples.