3D Sound for a Virtual World
Image Source:
Transfuchsian/Shutterstock.com
By Jon Gabay for Mouser Electronics
Edited September 28, 2021 (Originally Published February 2, 2021)
Introduction
The world is constantly changing. This evolving state means the way we interact with the world, equipment, and
people is also changing. Other than profit motives, two forces are dictating this change. One is that technology
is pushing us forward and enticing us with must-have devices and apps. The other driving force is
pandemic-driven concern for health.
With more time on people's hands to be entertained, immersive technology has had an opportunity to make inroads.
Virtual- and augmented-reality headsets are now readily available from several manufacturers. Development tools
let hobbyists and professionals render worlds and scenescapes. While the visuals get all the press, the audio is
moving forward as well.
Of the five acknowledged senses, sound is perhaps the most appreciated. Whether from the richness of nature, the
fullness of an orchestra, or the stirring of spiritual music, sound as much as any other sense profoundly
impacts us emotionally and physiologically.
Getting Here
We exist today in large part because of our focused ability to listen. For example, our ancestors who hunted
would use sound to track and find food for a family or village. Using sound to locate potentially lethal sources
in a 3D, immersive way makes avoiding predators possible. We could sense where something was, how far away it
was, and how fast it was approaching.
Immersive audio is the most advanced audio-processing and -delivery system yet. To get it right requires audio
processing and an array of well-characterized and -calibrated woofers, speakers, and tweeters strategically
placed to provide a dynamic full-spectrum listening sensation.
Although implementations like this have been beyond the average audiophile of the past, movie theaters and
performance venues have taken advantage of this technology for years. Like all pioneered technology, it
eventually finds its way to the rest of the world.
Modern entertainment systems utilize many specialized filters and dynamic processing to create affordable
implementations that fit more budgets. More home theaters exist today than ever before, especially in a pandemic
world, and immersive audio is sure to be in game and home theater locations everywhere.
Although stereo allows a basic surround-sound capability, the most popular surround sound and 3D audio in use
today is Dolby Digital 5.1
technology. These systems are called Dolby
Digital, Dolby Pro Logic II, DTS, SDDS, and THX. They all feature a six-speaker configuration (five total
bandwidth, one subwoofer) surrounding the listener(s) (Figure 1). These surround-sound
technologies were first used in movie theaters, which helped advance these systems and make them more
cost-effective and available to the masses.
Figure 1: Surround-sound 5.1 uses six fill range speakers placed at specific
locations so that the audio process engineer can mix down audio that spatially seems to move around the
listener. Not shown here is the woofer because it can typically be placed anywhere. (Source: Zern
Liew/Shutterstock.com)
Multiple speakers are driven with unique individual audio streams so that the perceived location of virtual sound
surrounds the listener. Here, rear left and right channels are used for spatial depth. The front left, right,
and center channels are used for lateral depth, and a single subwoofer distributes the low-frequency bass for
the entire room.
Although ideal for a single listener centrally located in a listening zone (or couch), every listener will
experience slight differences everywhere else. The relatively homogeneous sound will let everyone in the
listening zone experience audio in motion. What's more, recording artists are advertising their latest CDs as
immersive by providing 5.1 surround-sound tracks.
Interestingly, the center-front channel is optimized for speech-range signals. This helps listeners discern
conversations while immersed in 3D sound. As the bandwidth and fullness of sound became popular, the ability to
distinguish speech became more challenging, so center-channel filtering and amplification can make conversations
easier to understand.
The addition of one more rear-center channel ups the specification to 6.1 surround-sound (Figure
2), and 7.1 standard systems eliminate the rear-center channel but add left and right mid-channels
(Figure 3).
Figure 2: Surround-sound 6.1 provides lateral speakers to enhance the audio
in motion as an audio object moves from front to side and back. Again, woofer placement is arbitrary. Here,
it’s not about the bass. (Source: Zern Liew/Shutterstock.com)
The 7.1 surround-sound technology adds more speakers and unique channels. The 2.5-D cube or polygon that can be
created can extend to more speakers, tweeters, and woofers at strategic locations, immersing the listener in 2D
and limited 3D audio (Figure 3). Something directly on top or bottom can be somewhat
approximated through signal processing, but it will never be perfect unless confirmed speakers are above and
below.
Figure 3: More speakers placed at lesser angles helps eliminate audio
hotspots that can occur, especially if the tracks aren’t mixed or processed correctly, or if the audio
converters don’t process the surround sound properly. (Source: Zern Liew/Shutterstock.com)
We should note that source converters from stereo capture can process the stereo audio signal to create
synthesized multi-speaker surround-sound signals. This demonstrates how digital signal processing can separate
source locations mostly from a stereo source. The best solution would be to capture sound in a 3D microphone
configuration then play it back in the same 3D speaker configuration. However, this is cumbersome and difficult,
and most will not go to these levels when signal processing makes a good approximation.
Is this always the best approach? Can signal processing fool our keenly-developed sense of hearing using fewer
speakers, or will we continue to create walls and ceilings of sound?
Figure 4: More is better. Performers are used to exorbitant numbers of
speakers and amplifiers. In large outdoor settings, it might be necessary. But do you really want walls of
sound? Or at some point, do you realize that better sound is better than louder sound? (Source:
tommistock/Shutterstock.com)
Object-Oriented Audio
The most up-to-date implementation of immersive audio comes from Dolby Atmos, and it is designed for theater
applications. So far, almost 5,000 theaters have been retrofitted to use 64 speakers to take advantage of this
latest audio listening experience. As such, it supports an extensive array of up to 128 channels and can be
fitted with full bandwidth, low-frequency woofers and subwoofers, as well as high-frequency tweeters.
Unlike regular audio, Atmos (and the competing Sony
360 standards) uses the concept of audio objects. An Audio-Visual Receiver (AVR) will automatically know
the number of speakers, their type, and their location and perform processing on each audio object's spectral
makeup, amplitude location, speed, and direction. However, it is not just audio. The objects contain metadata
that helps an Object Audio Renderer (OAR) put the object in motion. Of the 128 channels, ten are used for
ambient stems, and the other 118 are available for audio objects.
Not every channel is a speaker. Channel information corresponds to objects, and object audio can be processed and
combined with other object audio to be directed to each speaker at the appropriate level. It is up to the AVR to
process the signals in real-time using metadata to perform real-time mixing and distribution of sound.
As you can imagine, it is not like stereo, where you simply place a couple of speakers and are then ready to
listen. With Atmos and many surround-sound and 3D sound systems, speakers must be placed then calibrated to be
an accurate part of the soundscape. The average home will not use all 128 channels. The standard seems to be a
34-speaker arrangement for home theater implementations.
Atmos is not brand-new. It was first used in 2012 in a theater in Los Angeles for a Disney movie premiere.
Since then, large theaters, IMAX, planetariums, musicals, plays, and other sound applications have propelled it
into the de facto standard used to capture audio for new movies and events. Atmos also uses ceiling speakers to
create a full hemisphere of sound, making it easier to process in real-time while providing sound from above.
At one time, Atmos was much too elaborate and expensive for the average audiophile, but it is now moving into the
realm of got-to-have for enthusiasts who have the space and budget to wow their friends. It is also rather high
on the gee-whiz index.
If you have already bitten the bullet for other surround-sound technologies, you can get a Dolby Atmos converter
and still use your existing speakers and amplifiers. However, you will want more, including ceiling speakers.
Converters will take Dolby 5.1 and convert it with a 17-speaker surround-sound 7.4.1 implementation.
It is worth noting that an alternative approach to surround speakers is the soundbar. Soundbar technology in
various forms is gaining popularity. The obvious benefits of cost reduction, setup simplicity, lower power,
fewer cables, and smaller size drive this technology forward, even as we drive forward.
Phased-array vertical soundbars have demonstrated their ability to emulate a full audio spectrum with good
clarity and separation. Musicians who use them will tell you that soundbar columns with six-inch speakers
produce an 18-inch speaker's sound clarity for subwoofer applications. That should turn a few heads. As a
result, horizontal soundbars and soundbar-based hybrid systems (including remote speakers) are popular for many
home theaters and studios.
The up-and-down enhances this, and sideways-pointing speakers cause sound to reflect on wall and ceiling
surfaces, appearing to be coming from above or behind the listener. The modern-day Tesla Model 3 uses front
soundbar technology as part of its 15-speaker audio system to tout surround and immersive audio capability. Turn
off a Model 3's rear speakers and engage immersive audio mode with signal processing and reverb to show its
capabilities. Those who've tried this swear sound is coming from behind. Feedback is mixed, and many don't like
the effect. Reviewers praise and criticize the technology, and many reviewers have mentioned that different
types of music work and don't work with sound-bar-style immersive implementations. This makes sense because the
quality reproduced will depend on the recording engineers' mix-down techniques. Advances here will mean that
accurate above-and-below immersive-sound technology is almost achievable without floor and ceiling speakers.
Capturing vs. Rendering
Immersive video experiences such as gaming, for the most part, use created environments. These are 3D structures
with surface renderings and assigned physical properties. Real video swaths can be captured and digitally
stitched together to make a panoramic view that includes the above and below imagery.
An immersive experience such as a walk through a national park can integrate rich visuals, and audio can be
synthesized or created through a composite of pre-recorded clips. They can be audio-captured in a 3D sound
system and used as part of the immersive experience. Like a video controlled by head tracking, the audio must
also be controlled by head tracking. For example, facing a babbling brook will sound much different than facing
away from it, and if the sound didn’t track, the immersive experience would be lacking.
Fortunately, you don’t have to invent your own 3D audio capture for immersive purposes. Audio leaders like
Sennheiser make specialty truly omnidirectional microphones using segmented axis and digital tools to capture
the highly directional sound (Figure 5). The AMBEO VR Mic contains several susceptible wideband
microphone elements in a surround-sound configuration. The DearVR microprocessing software can render
directional audio to feed a standard surround sound configuration.
Figure 5: Immersive Audio Capture technologies like the Sennheiser AMBEO VR
Mic allow digital audio engines to render soundscapes based on polar magnitude and direction orientations.
Digital summing can create composite audio combining multiple sources of sound at different distances.
(Source: Sennheiser)
For this to work, the audio engine needs to know your head orientation and motion. With a headset, that's easy by
today's standards. Head tracking is built in for video rendering. But how do you create an immersive head
position-based audio system using headphones that are limited to two ears? Tiny speakers can be placed in the
headphones around the ears and mimic the surround-sound experience. For most applications, stereo will suffice,
but it will not be at the same level as authentic surround sound.
Non-Entertainment Applications
While most immersive audio and visual technology will be used for entertainment, there are also professional
uses. For example, product design engineering can benefit from immersive technology, both video and audio. From
a video perspective, the mechanical design of complex assemblies can be virtually constructed, rendered, and
examined. An immersively-generated assembly like a jet engine can be constructed, pushed into, and examined to
see if gears and turbines align. A repair technician on the other side of the world can be shown what to do by a
factory expert immersed in a fabricated environment.
Even immersive audio can be helpful in engineering applications. An engineering team designing a car can listen
to a rendered simulation of engine and transmission noise. Internal environmental controls like airflow,
vibrations, and oscillations can be extracted from a virtual design. Windows can be designed and tested to
eliminate the thumping oscillations that still occur on new cars when we roll the windows down in just the right
position at just the right speed.
In all cases, an immersive experience includes audio. However, not every case requires surround sound, and
simulated surround may be adequate, at least until someone solves the problem of creating true surround sound in
a binaural headset.
Author Bio
Jon Gabay is a contributing writer for
Mouser
Electronics. Jon Gabay is a mad scientist with no hostility. He doesn't want to rule or blow up the world. He
wants to make it a better place. Studying electrical engineering, he has worked with defense, commercial,
industrial, consumer, energy, and medical companies as a design engineer, firmware coder, system designer,
research scientist, and product developer. As an alternative energy researcher and inventor, he has been
involved with automation technology since he founded and ran Dedicated Devices Corp. up until 2004. Since then,
he has been doing research and development, writing articles, and developing "Gizmo Blocks" for next-generation
engineers and students.