Go to Menu
Celebrating 25 Years of Voice! 🎉

Audio in Video Games: A Guide for Developers

Audio in video games is more immersive than ever, but the latest developments will take gameplay to a whole new level. Learn more here.

December 6, 2022 by Gaea Vilage
Audio in Video Games: A Guide for Developers

Audio in video games has come a long way since the Nintendo Entertainment System’s heyday in the 1980s. The “Super Mario Bros.” soundtrack may remain iconic, but it doesn’t compare with the immersive, cinematic soundscapes of today’s video games.

As technology has unlocked more possibilities for video games, however, it’s easy to lose track of the fundamentals. What’s the purpose of audio in a traditionally visual medium? How should developers approach audio to construct a believable digital world? This guide will help you explore answers to these questions.

We’ll start with the assumption that video game audio breaks down into two general categories. The first is speech, whether that’s a narrative voice-over or in-game dialog. The second category includes everything else: music, sound effects, cues, accessibility features, and more.

We address both pillars of video game sound in this comprehensive guide to gaming audio. First, we’ll provide an overview of the topic, focusing on everything but speech. Then we’ll zoom in on spoken language, including both recorded human speech and the latest text-to-speech (TTS) integrations.

Here’s a detailed introduction to audio in the game development process.

Audio in Video Games: What It Does and Why It Matters

Audio in Video Games: What It Does and Why It Matters

A game’s soundtrack isn’t just a kind of audio set dressing. It’s essential to the player’s experience. In fact, it’s no exaggeration to say that, for video games, audio is just as important as video. More specifically, audio serves at least six purposes in any video game:

1. Audio contributes to a more immersive gaming experience.

We experience the world through our five senses, so It makes sense that engaging more senses will make a virtual experience feel more real. We know that multi-sensory content improves learning outcomes, probably through deeper student engagement. The same dynamic can help make the digital experiences you create feel more immersive.

By creating a lush, believable soundtrack to your game, you can deepen the sense of space; trigger emotional responses; and make your digital world feel more real, no matter how outlandish the setting (remember Oddworld: Abe’s Odyssee?).

2. Audio cues give players information about in-game events.

Most video game sound designers use one of two audio development applications: Wwise or FMOD. Both offer spatial audio programming, which allows designers to precisely locate both sound sources and in-game listeners within a virtual environment.

Spatial audio gives players a sense of where sounds are coming from, so they can use audio to locate enemies and other objects of interaction. There are even audio-only games that omit video content altogether; see Blind Drive, a nominee for the IGF 2021 Excellence in Audio award, for example.

3. Music cues enhance the player’s emotional engagement.

Music taps directly into a player’s emotions, creating a more powerful and purposeful experience. With adaptive scoring technology, designers can program rules-based triggers that play musical cues based on the player’s behavior. Anyone who’s played 2022’s God of War Ragnarök can tell you how emotionally powerful that can be.

4. Audio increases video game accessibility.

Many gamers experience games primarily or only through sound. Neglecting your soundtrack leaves these gamers behind. In a 2022 survey by the Royal National Institute of Blind People (RNIB), 70% of ex-gamers with blindness or partial sight said they quit playing video games because of poor accessibility.

Audio accessibility features include:

  • Additional sound cues
  • Support for screen readers
  • Glossaries that explain each sound cue through audible speech
  • Audio descriptions of events

Increasingly, game studios are investing a lot more effort into accessibility. Both Naughty Dog’s The Last of Us Part II and Santa Monica Studio’s God of War: Ragnarok offered historic suites of accessibility features—including TTS audio descriptions powered by ReadSpeaker.

5. Thanks to natural language AI, audio unlocks new forms of conversational gaming.

Conversations with video game characters are usually pretty restrictive. You may be able to choose from a list of comments, but you’re typically not free to say whatever comes to mind. That’s changing, thanks to natural language understanding (NLU) technology, which allows software to extract relevant data from normal human writing and speech.

Take the alternate reality game (ARG) Acolyte from game studio Superstring, for instance. Gameplay revolves around conversations with a fictional digital assistant who guides you through the mystery story. Acolyte’s AI supports natural language input, which supports natural conversation. With this technology on the game’s back end, you don’t need pre-defined dialogue choices. Just say what comes to mind and watch the story unfold, with you at the center of the action.

Acolyte’s designers originally built a text-only game but found the lack of spoken dialogue too restrictive. So designers used ReadSpeaker’s TTS game engine plugins to give the robotic main character the perfect voice. That’s just one example of how TTS is changing the landscape of audio in video games. (See Chapter 4 of this guide to learn more about how TTS supports video game development.)

6. In-game dialog keeps the story moving.

It’s hard to tell a story without characters—and characters express themselves most fully through speech. Dialog is an essential audio element for any narrative video game. Non-player characters (NPCs) assign quests, provide backstory, or develop your game’s theme. Player characters respond, creating a more interactive experience. All that depends on in-game dialog.

So how do you go about producing speech files for your video game characters? These days, you have two choices: You can record voice actors, or you can use neural text to speech (TTS) to generate lifelike dialog on your own. For reasons we’ll describe shortly, many developers do both.

Let’s discuss each method of producing dialog in turn, starting with video game voice actors.

Video Game Voice Actors: An Introduction

Video Game Voice Actors: An Introduction

The hardest part of relying on video game voice acting doesn’t have anything to do with microphones. In fact, most voice actors have their own home studios these days; all you have to do is send the script. Instad, the toughest part about recording human dialog is hiring voice actors in the first place.

In this chapter, we’ll cover the basics of working video game voice actors, from starting the search to budgeting your project.

Where To Find Video Game Voice Actors

Major game development studios often cast Hollywood actors for their tentpole titles. For instance, you’ll find Josh Duhamel in Call of Duty, Elijah Wood in Legend of the Spiral, and Simon Pegg in Hogwarts Legacy.

Of course, independent game studios are unlikely to hire established voice stars or the Hollywood elite right out of the gate. Instead, up-and-comers should develop their own bullpen of talent, says voice-over artist and teacher Philip Galinsky, who appeared in Grand Theft Auto Five and trains the next generation of video game voice actors through his online studio.

“Develop your own voice team that sets the bar and the tone for all your productions,” says Galinsky.

“Start with a really good small team of voice artists who are committed and understand the vision. Then branch out.”

That’s good advice for the developing game studio, but where do you find your initial core of voice talent? Good news: There’s a process.

Your best bet is to reach out to an established voice agency. Search the internet for “video game voice agency” and you’ll find dozens. Voice agents know the talent, and they have the connections to provide experienced voice actors who are ideal for all sorts of roles.

“Agents are really good sounding boards,” Galinsky says. “They know their actors. You can say, ‘Is this guy reliable?’ and the agent’s going to know more than anybody, like, ‘Yeah, I worked with him for 10 years and it’s always been good.’”

Eventually, your studio may benefit from setting up an internal voice department. First, though, it’s best to work with the voice agents who’ve spent years building a vetted network of professional video game voice actors. “You can learn about voice casting that way,” Galinsky says.

“Maybe you create a voice department, but learn from experts first.”

Of course, voice agents do charge a fee for their services. This brings us to an important question beginning developers ask: How much should you budget for video game voice acting?

How Much Do Video Game Voice Actors Charge?

Your first decision is whether to go with union actors or independent freelancers. You’ll pay more for members of SAG-AFTRA, the actor’s union that advocates for voice-over artists, but you’ll also eliminate risk; union actors are vetted and experienced in the industry.

“Ultimately, it’s up to the person who’s putting the budget together if they’re going to use union talent,” Galinsky says.

“[Developers will] pay a lot of money for union actors, considering they could find someone off the street to do it for $300. However, you don’t know the quality. That’s the risk of trying to save the money upfront.”

One industry rate card for non-union voice actors recommends a pay rate of $200 to $350 for a standard four-hour recording session (limited to two hours for vocally stressful performances). By contrast, the SAG-AFTRA pay schedule for interactive media places the rate at $929 per four-hour session through most of 2021. That wage climbs to $956.75 between November 14, 2021 and November 7, 2022, and will likely rise by 3% per year from there.

Experienced voice actors, union or independent, may charge more than these base rates. Agents may negotiate for royalties or buyouts. Fees will mount for motion-capture work, of course. In short, paying for video game voice actors is complicated, and must be approached on a case-by-case basis—but the base rates listed above should help you put together a preliminary casting budget.

The next step is to choose the right talent, of course. Next, we’ll share a few of Galinsky’s casting tips for game developers.

Tips for Casting Video Game Voice Actors

5 Tips for Casting Video Game Voice Actors

5 Tips for Casting Video Game Voice Actors

As a game developer, it’s not always easy to find experienced video game voice-over actors for hire, let alone recognize the perfect performer for your role at first glance. Here are five tips that’ll help choose an incredible cast for your next project, informed by Galinsky’s experience in the industry.

1. Look for voice actors who use a lot of movement in the recording booth.

Given the decentralized nature of today’s video game work, this may be something you have to actually ask about, since you might not get to watch your voice-over actors record—but despite its audio-only presentation, actors who use their bodies in the recording booth are more likely to create compelling characters.

“In my practice and teaching, it’s a very physical process,” Galinsky says.

“As a voice actor, you have to embody the character. You have to make analogies, like ‘Is he snakey?’ so that you have a very unique quality to that character.”

2. Choose actors with improv experience and let them experiment with the script.

Don’t feel that you have to stick to the script religiously. When you allow actors to riff a little—to vary up the timing, feel, and even dialog as they play off each other, virtually or not—you may end up with creative gold that pushes your story and the characterization along better than a script alone ever could. That’s why it helps to work with actors who have a background in improv.

“Take some improv classes,” Galinsky tells his students. “Casting directors will want you to riff off the script.”

3. Advertise recording schedules that protect your actors’ vocal limits.

When actors perform, they place some level of strain on their vocal cords. That can actually change the quality of the actor’s voice from the start of the session to the end, which isn’t good for performance consistency—and can even cause costly delays if your cast needs a few days off for vocal rest. Let your pool of potential actors know that you’ll schedule sessions to respect their limits; they’ll appreciate it, and you could attract more accomplished applicants.

During his sessions for Grand Theft Auto Five, for instance, Galinsky recorded for 45 minutes, took a half-hour break, and repeated the schedule until the script was complete. Other scripts and other actors will have different needs. It’s way more efficient to protect the actor’s voice with frequent breaks than it is to blow out a voice and have to start again from scratch a week later.

4. Make access to Source Connect (or your team’s real-time HD audio collaboration software) a prerequisite for auditioning.

Given that so much video game voice-over work is done remotely these days, it’s essential that your actors have the technology they’ll need to collaborate effectively. That starts with a state-of-the-art home studio—but just as important, they’ll need the remote recording software that allows them to perform with other actors, from a distance and in real time.

Source Connect is the industry standard for remote audio collaboration. In this age of remote work, your cast probably needs it if they’re going to appear in scenes together, Galinsky says.

“The engineer could be in Seattle. The producer could be in Chicago. I’m in New York and another actor’s in L.A. It doesn’t help if you don’t have Source Connect.”

5. Pay attention to when actors submit their audition recordings.

Game development schedules are infamous for their delays; don’t let the audio department be the culprit. A voice actor who returns audition recordings promptly will probably make, or even beat, all your deadlines. This mindset will help to keep you ahead of schedule, and promptness is good for the actors, too.

“The earlier you send a recording to the producer, the better chance you have to book the job,” Galinsky says. “Or, if you already booked the job, the earlier you send them your stuff, the sooner they can say, ‘This is perfect!’ or ‘You’ve got to redo it.’”

Following these tips will help you cast the ideal voice actors for your interactive media project. But what if you didn’t have to go to the trouble? Increasingly, video game developers are turning to AI voice technology for dialog, prototyping, and more.

Here’s what developers need to know about cutting edge TTS for video games.

Using Text to Speech in Video Games

Using Text to Speech in Video Games

Given the cost of skilled voice performances for video games, thrifty developers may be tempted to rely on text to speech (TTS) for character voices. Thanks to advances in machine learning, AI-driven TTS sounds increasingly natural—but it’s not the best choice for most final-cut performances. Human artistry remains the gold standard in narrative art, including interactive media.

That said, every game development studio can benefit from access to high-quality, runtime TTS with ReadSpeaker’s game engine plug-ins. These game engine-native TTS solutions expand the creative possibilities in many ways, including:

  • More accessible video games. ReadSpeaker began developing TTS for digital accessibility more than 20 years ago, and that mission continues into our work in the game development industry. Introduce user interface (UI) narration to remove barriers for players with vision impairments and other disabilities. Provide an audio stream of in-game chat messages for second-language learners and people with reading disorders. Game engine plug-ins allow you to integrate these crucial accessibility features into your system without cumbersome audio file management.
  • Proof of concept for prototyping and fundraising. Onboard TTS allows you to test your dialog in scenes throughout the development process. That can help raise funds by providing a compelling rough draft to investors. Even better, it allows you to tweak pacing and language as you work, saving time and money when you finally reach the recording studio.
  • Vocal performances that would strain a human actor. Sometimes you just need an evil, screaming robot. Human voices are fragile, and while you can and should limit recording session length for strenuous performances, some roles may be safer left to a top-quality TTS voice. The human voice is best at what it does, but your game may call for an inhuman voice; TTS could be the solution.
  • Dynamic, responsive non-player characters with conversational AI. Imagine a procedurally generated game world populated by NPCs who can respond meaningfully to anything the player says. This is possible today, and runtime TTS from ReadSpeaker allows those characters to speak out loud without perceptible latency.

In fact, AI NPCs will probably be a big part of gaming’s future. To conclude this introduction to audio in video games, let’s take a closer look at the possibilities AI and TTS are bringing to interactive media.

Neural TTS for AI-Generated Content in Video Games

Developments in AI are leading to the creation of in-game non-player characters (NPCs) free from the constrictions of pre-scripted conversation trees. The audio component? Getting those characters to speak their dynamic responses, out loud and in real time. That’s where TTS comes into play—but not all TTS engines are ready for the technical requirements of video game developers.

This isn’t speculative; AI NPCs are already in development. Based on the player’s questions or statements, they use large language models (LLM) like ChatGPT to come up with a fresh, relevant response. Essentially, these characters are AI chatbots—and, unlike characters who stay on-script, only text to speech can give these bots a voice.

Voice actors remain the gold standard of video game character speech, but when characters themselves—or at least the AI models behind those characters—come up with new lines on the spot, pre-recorded speech isn’t an option. To create the next generation of immersive AI NPCs for open-world games, developers need to leverage TTS. They also need an embedded TTS game engine plugin to ensure runtime response, free of latency.

Generating Dynamic TTS Audio Without Latency

The latency issue is a major barrier to deployment of AI NPCs in today’s games. In the video below, you can see there’s a multi-second delay between the player’s question and the AI NPC’s response. That’s because both the LLMs and the TTS services are cloud-based, integrated into the game engine via API. The game has to send a request out to the LLM and TTS modules and wait for the reply before it can play the audio, causing a delayed player experience.

Developers can leap the audio hurdle by using a TTS game engine plugin from ReadSpeaker. This TTS software integrates directly into game engines, generating audio tracks on the user’s device so they’re free from latency. It’s dynamic TTS at runtime: instant video game audio for dynamically generated dialog.

As we mentioned, many game developers already use TTS to prototype game dialogue, faster and at a lower cost than re-recording voice actor dialogue again and again. Lots of developers also use TTS as a key accessibility tool.

But with the emerging generation of runtime TTS game engine plugins, we’re looking at something new. Voicebot NPCs are poised to take audio in video games—and the experiences it creates—to a higher level than ever before.

Related articles
Start using Text to Speech today

Make your products more engaging with our voice solutions.

Contact us