Audio in video games has come a long way from the bleeps, bloops, and four-channel themes of the Nintendo Entertainment System’s “Super Mario Bros.” Like other consoles of the 1980s, the original NES used 8-bit sound chips to generate music and in-game sound effects. That audio quality remains charming; it even spawned an electronic music genre, chiptune, that’s thriving to this day. But compared to the immersive soundscapes of film and television, early audio design for games was…a bit limited.

In the 1990s, console producers made the leap to optical media. The extra storage available on CD-ROMs left space for high-definition audio recordings, and background music became indistinguishable from symphonic film scores. Between the ‘90s and today, the development of 3D audio design led to increasingly immersive soundscapes, alerting players to off-screen events and helping to define in-game space.

As technology has unlocked more possibilities for video games, however, it’s easy to lose track of the fundamentals. What’s the purpose of audio in a traditionally visual medium? How do designers use it today, and what’s next for audio in video games?

The most exciting development has ramifications that go far beyond the player’s ears, creating deeper, more realistic, and more dynamic virtual experiences, particularly for open-world games and RPGs—and it’s all made possible through advances in artificial intelligence (AI) and text-to-speech (TTS) integrations.

Before we get into the future of audio in video games, however, let’s take a look at how it functions today. Here’s why audio is an important tool for drawing players into your game’s digital world.

Looking for dynamic, runtime TTS for your next game development project? Try our free full trial of the TTS game engine plugin today.

Audio in Video Games: What It Does and Why It Matters

A game’s soundtrack isn’t just a kind of audio set dressing. It’s essential to the player’s experience. In fact, it’s no exaggeration to say that, for video games, audio is just as important as video. More specifically, audio serves at least five powerful functions for any video game:

1. Audio contributes to a more immersive gaming experience.

We experience the world through our five senses, so It makes sense that engaging more senses will make a virtual experience feel more real. We know that multi-sensory content improves learning outcomes, probably through deeper student engagement. The same dynamic can help make the digital experiences you create feel more immersive.

By creating a lush, believable soundtrack to your game, you can deepen the sense of space; trigger emotional responses; and make your digital world feel more real, no matter how outlandish the setting (remember Oddworld: Abe’s Odyssee?).

2. Audio cues give players information about in-game events.

Most video game sound designers use one of two audio development applications: Wwise or FMOD. Both offer spatial audio programming, which allows designers to precisely locate both sound sources and in-game listeners within a virtual environment.

Spatial audio gives players a sense of where sounds are coming from, so they can use audio to locate enemies and other objects of interaction. There are even audio-only games that omit video content altogether; see Blind Drive, a nominee for the IGF 2021 Excellence in Audio award, for example.

3. Music cues enhance the player’s emotional engagement.

Music taps directly into a player’s emotions, creating a more powerful and purposeful experience. With adaptive scoring technology, designers can program rules-based triggers that play musical cues based on the player’s behavior. Anyone who’s played 2010’s Red Dead Redemption can tell you how emotionally powerful that can be.

4. Audio increases video game accessibility.

Many gamers experience games primarily or only through sound. Neglecting your soundtrack leaves these gamers behind. In a 2022 survey by the Royal National Institute of Blind People (RNIB), 70% of ex-gamers with blindness or partial sight said they quit playing video games because of poor accessibility.

Audio accessibility features include:

  • Additional sound cues
  • Support for screen readers
  • Glossaries that explain each sound cue through audible speech
  • Audio descriptions of events

Over the past few years, game studios have invested a lot more effort into accessibility. Both Naughty Dog’s The Last of Us Part II and Santa Monica Studio’s God of War: Ragnarok offered historic suites of accessibility features—including TTS audio descriptions powered by ReadSpeaker.

5. Thanks to natural language AI, audio unlocks new forms of conversational gaming.

Conversations with video game characters are usually pretty restrictive. You may be able to choose from a list of comments, but you’re typically not free to say whatever comes to mind. That’s changing, thanks to natural language understanding (NLU) technology.

Take the alternate reality game (ARG) Acolyte from game studio Superstring, for instance. Gameplay revolves around conversations with a fictional digital assistant who guides you through the mystery story. Acolyte’s AI supports natural language input, which supports natural conversation. With this technology on the game’s back end, you don’t need pre-defined dialogue choices. Just say what comes to mind and watch the story unfold, with you at the center of the action.

Acolyte’s designers originally built a text-only game but found the lack of spoken dialogue too restrictive. So designers used ReadSpeaker’s TTS game engine plugins to give the robotic main character the perfect voice. That’s just one example of how TTS is changing the landscape of audio in video games. The future of audio in gaming, it seems, will involve AI NPC’s like Acolyte’s digital assistant.

Neural TTS for AI Audio in Video Games

Developments in AI are leading to the creation of in-game non-player characters (NPCs) free from the constrictions of pre-scripted conversation trees. The audio component? Getting those characters to speak their dynamic responses, out loud and in real time. That’s where TTS comes into play—but not all TTS engines are ready for the technical requirements of video game developers.

This isn’t speculative; AI NPCs are already in development. Based on the player’s questions or statements, they use natural language generation (NLG) software to come up with a fresh, relevant response. Essentially, these characters are AI chatbots—and, unlike characters who stay on-script, only text to speech can give these bots a voice.

Voice actors remain the gold standard of video game character speech, but when characters themselves—or at least the AI models behind those characters—come up with new lines on the spot, pre-recorded speech isn’t an option. To create the next generation of immersive AI NPCs for open-world games, developers need to leverage TTS. They also need an embedded TTS game engine plugin to ensure runtime response, free of latency.

Generating Dynamic TTS Audio Without Latency

The latency issue is a major barrier to deployment of AI NPCs in today’s games. In the video below, you can see there’s a multi-second delay between the player’s question and the AI NPC’s response. That’s because both the NLG and the TTS services are cloud-based, integrated into the game engine via API. The game has to send a request out to the NLG and TTS modules and wait for the reply before it can play the audio, causing a delayed player experience.

Developers can leap the audio hurdle by using a TTS game engine plugin from ReadSpeaker. This TTS software integrates directly into game engines, generating audio tracks on the user’s device so they’re free from latency. It’s dynamic TTS at runtime: instant video game audio for dynamically generated dialog.

Of course, TTS is already familiar to game developers. Many use it to prototype game dialogue, faster and at a lower cost than re-recording voice actor dialogue again and again. As we mentioned, TTS is also a key accessibility tool. But with the emerging generation of runtime TTS game engine plugins, we’re looking at something new. Voicebot NPCs are poised to take audio in video games—and the experiences it creates—to a higher level than ever before.

Contact the experts at ReadSpeaker to learn more about TTS for AI NPCs and other game development applications.