In our previous post, Making a computer talk, we briefly explained how Text to Speech (TTS) is made. It is worth noting that it is quite fantastic that a synthetic voice manages to read most text in a legible and near natural way. However, the better the voice, the more one tends to react when a pronunciation error or glitch occurs. While most text is relatively well read, there can be some mispronunciation or misinterpretation of a text. There are three basic causes for this:

  1. If a word is not in the dictionary, the TTS voice usually has a method to guess the pronunciation, and if the guessing fails, the word will be mispronounced. It can, for instance, be that the stress is on the wrong syllable or that a sequence of letters is incorrectly guessed in the current context. “Ghoti” is a classic example of how written language and its pronunciation do not always cohere. This example is a bit exaggerated, but it does illustrate the complexity of guessing pronunciations.
  2. The recorded sound units that are used to build the utterances might be incorrectly marked up or some sound units may not match each other as well as expected. Sound units might theoretically match each other, but spoken language isn’t always that theoretical. This usually leads to a “glitch”: an unexpected sound in the word or a phoneme that doesn’t sound as expected.
  3. If the text normalization (the process where abbreviations, symbols, digits, etc. are transformed into text) fails, there might also be an unexpected reading of a phrase. A TTS voice might, for instance, only support time formats with a colon (e.g. 11:30); it will then fail to correctly read a time stamp if it is written with a dot (e.g. 11.30).

What can ReadSpeaker do if such errors occur? Most of these issues can be quickly solved by our linguists using our main and customer-specific dictionaries. Here are a few examples of mispronunciations that have been corrected using our dictionaries and reading rules. [speech-enabling_websites]