Intel 3D Realistic Sound Experience
A technical overview of Intel's
state-of-the-art software library for incorporating
interactive, positional 3D sound into PC-based
applications
The 3D Realistic Sound
Experience
Intel's new 3D Realistic Sound Experience (3D RSX) is
a software library that enables developers to create a
PC-based sound experience so realistic you have to hear
to believe it without expensive add-in cards or custom
speakers.
What is 3D RSX and how does it work? What are the key
technologies that comprise the realistic 3D sound
experience? How can you use 3D RSX to enhance the sound
quality of everything from computer games and 3D virtual
worlds to user interfaces and chat spaces? This white
paper, which provides a technical overview of 3D RSX,
answers these questions and more.
Enriching the Experience with Realistic,
Three-Dimensional Sound
Intel's 3D Realistic Sound Experience, 3D RSX, is a
software library that delivers realistic sound for
PC-based applications. Just as the term realistic
graphics describes the graphical rendering of a scene
that's good enough to simulate visual reality, so
realistic sound is the audio rendering of a scene that
simulates auditory reality.
Four Elements of the Realistic Sound Experience
3D RSX encompasses four key elements that together
make up the 3D Realistic Sound Experience:
- True 3D sound clearly positions sounds in
three-dimensional space, and works on speakers as
well as headphones
- Reverberation simulates the acoustical properties
of enclosed areas
- Doppler effect simulates sounds in motion
relative to the listener
- Pitch-shifting allows for variation in the
frequency of sound waves and hence in the
highness or lowness of a sound
To get an idea of how realistic sound can enrich your
PC-based applications, consider a simulated trip to the
State Fair.
You sit
at your PC and take a virtual walk through the State
Fair. You hear screams from the roller coaster high
above you and the noise of the roller coaster on its
track (3D sound, Doppler effect). To your
right, calliope music emanates from the
merry-go-round (3D sound). Moments later, a
carnival barker shouts about how easy it is to win a
stuffed tiger it sounds like he's standing just
ahead of you, off to the left (3D sound). As
you walk along, an unusually high-pitched voice off
to the right makes you turn your head (3D sound,
pitch shifting). It's another barker beckoning one
and all to try their hand at her game of chance. You
decide to enter the Haunted House. Inside, you hear
people ahead of you screaming, and their voices seem
to echo off the walls (3D sound,
reverberation). Winding your way through the dark
halls, you hear the rumbling, low-pitched voice of a
monster below you (pitch shifting, 3D sound,
reverberation). Outside the Haunted House, you see
a helicopter pulling a long advertising banner it
sounds like the chopper is moving to and fro right
over your head (3D sound, Doppler). You have a
sudden urge for a corn dog and cotton candy and
decide to jump up from your computer to see what's in
your refrigerator...
With realistic 3D sound, the listener is no longer
just an observer, but a part of the overall scene.
True 3D Sound
True 3D sound provides an experience far beyond
traditional stereo technology. Stereo technology, at
best, provides a left-right panning effect. Sound may be
louder or softer in one ear, but the movement of the
sound is generally restricted to a line between the ears.
Some stereo expanders may move sounds out beyond the
physical location of the speakers, but the sound
essentially lies in a one- or two-dimensional plane.
With true 3D sound, on the other hand, sounds are
clearly positioned in 3D space. In addition to the
left-right panning of traditional stereo, true 3D sound
seems to originate in the three-dimensional space outside
a listener's head. It can seem to be above or below you,
in front of or behind you, to the right or left nearly
anywhere in the 3D space that surrounds you.
For example, say you create a piece of computer music
that features a five-piece band. With true 3D sound, you
can place your listener in the middle of the band, so one
guitar sounds as if it is behind you, the drums sound as
if they are in front of you, the piano is above you, and
so forth. This effect could be preprocessed into the
music, but for an even better experience it can be used
interactively, allowing your listeners to "move
around" and hear the music from nearly any angle,
direction or position to fly over the band, crawl
below them, sit in the front row or in the balcony.
Reverberation
Reverberation simulates the acoustical properties of
enclosed areas, from small chambers and concert halls to
wide-open canyons. In simple terms, reverberation is the
slight echo effect heard when sound is generated in an
enclosed area. For instance, if you have a conversation
with a friend in a racquetball court, the sound waves
from your friend's mouth not only travel directly to your
ears, but also bounce off of the walls and return to your
ears a second, third or fourth time. The reflected sound
waves that indirectly reach our ears produce the effect
known as reverberation.
3D RSX allows developers to modify both the
reverberation delay and intensity of an enclosed area.
The reverberation delay coefficient models the size of
the room. For example, as the volume of a particular
environment (room) increases, the perceived acoustical
delay also increases. This corresponds to an increase in
the delay coefficient value in the reverberation
algorithm. The intensity value approximates the
"reflectivity" of the surfaces in the acoustic
environment. By adjusting these coefficients, developers
can use 3D RSX to simulate a variety of acoustical
environments. In addition, a minimal level of
reverberation further improves the realism of 3D sound.
Software tools have been available for some time to
add reverberation to sound files by preprocessing the
audio files. 3D RSX can be used to preprocess files with
reverberation, but in addition, it also provides the
ability to vary reverberation in real-time, while an
application is executing.
You're
at your PC playing a game that transports you to a
magical, mythical world where knights and kings rule
the kingdom. You, the game's hero or heroine, are
walking through an open field (reverberation is
off). You spot an opening in a nearby mountain and
enter a dark cave (reverberation is turned on to
model the acoustics of the cave). You hear the
snarl of a dragon, and your spine tingles as its
deep-throated moan permeates the cave (sound waves
reflecting off of the cave walls). Scared beyond
belief, you run out of the cave to a nearby castle
(reverberation is switched off as you run through
the countryside). Just inside the castle, you hear
the voice of the castle sentry resonating
(reverberation is adjusted to approximate the
acoustics of a seventh-century castle) as he
maniacally screams that you are trespassing...
Doppler Effect
The Doppler effect simulates sounds in motion. A
well-known experiment with Doppler involves blowing a
horn on a moving train. As the train travels toward the
listener, the sound waves are compressed, effectively
increasing the horn's pitch. As the train moves away from
the listener, the sound waves are rarefied,
correspondingly decreasing the pitch. The Doppler effect
can thus be viewed as an automatic pitch shifter that
translates the relative velocity between a sound source
and the listener into a change in the sound's pitch.
Doppler works well for applications and games with moving
objects. A game with airplanes and helicopters whizzing
overhead as you scramble for cover in a battlefield, or
cars zipping by as you sit in the audience of a virtual
Grand Prix race these are effective uses of Doppler.
Pitch Shifting
Pitch shifting varies a sound's pitch its
distinctive quality, which depends primarily on the
frequency of the sound waves produced by its source. The
most common example of pitch shifting is
"chipmunk" voice effects and other voice
distortions, but pitch shifting is also useful for sound
effects like the revving of an engine. It is used
internally by 3D RSX to perform the Doppler effect.
Understanding True 3D Sound
In the past, three-dimensional sound required multiple
speakers positioned around you, plus expensive add-in
cards in the computer. 3D RSX requires just a stereo
sound card, a set of headphones or two standard PC
speakers, and the 3D RSX software library. With these
simple requirements met, any 3D RSX-enabled application
can provide true 3D sound.
How does 3D RSX manage to produce realistic,
positional sound with so little hardware?
It does so by using software, and the power of Intel's
PentiumŪ and PentiumŪ Pro processors, to simulate the
way humans hear sound in the real world. Specifically,
3D RSX uses Head-Related Transfer Functions (HRTF), an
application of fundamental research that's been occurring
at major universities for years.
Sounds in the real world are not stereo but monaural
single channel. However, because you have two ears,
sounds appear as two-channel stereo -the two channels
correspond to your two ears. As a sound reaches your ear,
your brain and ears work together to distinguish the
origin of the sound. The position and shape of your head,
ears, shoulders and torso interact with incoming sound
waves in such a way that your brain-ears combination can
pinpoint the origin of nearly any sound that reaches our
ears. So, how does this work?
Determining Left vs. Right
Consider an example. If you're standing in a long
hallway and a door slams shut on your right, the sound
reaches your right ear earlier than it reaches your left
ear. (This is referred to as Interaural Time Delay
ITD). The sound is also slightly louder in your right ear
than in your left. (This is known as the Interaural
Intensity Difference IID). Using these auditory cues,
your brain determines that the slammed door is located to
your right and not your left. In fact, your ears and
brain do this so well that you can probably determine
where the door is located even with your eyes closed.
Determining Front vs. Back
Your brain working in concert with your ears also
determines if a sound is located in front of or behind
you. The outer part of the ear (the pinna) acts as a
filter to provide the audio cues the brain needs to
determine if a sound is in front of or behind you. You
may also turn your head and/or obtain visual confirmation
that the sound source is where your ears tell you it is.
Determining Up vs. Down
The cues generated by the pinna also help establish
the elevation of a sound source. The height of a sound
source, relative to your ears, is determined from the
unique frequency patterns (notches) present in the signal
arriving at the ear drums. The reflections off the
pinna's asymmetric ridges produce an elevation-dependent
interference pattern. Sound sources from a higher
elevation generate a different interference pattern from
those originating at a low elevation. The brain
determines the height of a sound source by interpreting
these different frequency patterns.
HRTF: From the Lab to the Computer
Head-Related Transfer Functions are the key to
simulating all the information that is being passed
between our ears and our brain so that a computer can
produce the same effect. HRTFs are based on sound
measurements made on volunteers in an anechoic (no
echoes) chamber. Researchers placed clinical probe
microphones inside volunteers' ear canals and recorded
measurements at approximately 463 positions around their
heads.
Figure 1 A pair of HRTF impulse
responses and their respective spectra at 60 degrees
azimuth.
This measured set of impulse responses represents the
collection of directionally dependent characteristics
impressed upon the audio signal by the human torso,
shoulder, head, and outer ear. A pair of these impulse
responses (representing a single spatial position) and
their respective transfer functions are shown below in
Figure 1. This set of data is collectively referred to as
Head-Related Transfer Functions. HRTFs are stored as a
data table in the 3D RSX software library, which uses
them to position sounds in three-dimensional space.
True 3D Sound over Headphones and Speakers
Because HRTF-based sound is measured via microphones
placed in volunteers' ears, the resulting sound is meant
to be directly funneled into the ears via headphones. RSX
3D uses cross cancellation technology to deliver the same
high-quality 3D sound using only two speakers.
Imagine your speakers positioned with one on the right
side of your monitor and the other on the left, as shown
in Figure 2.
Figure 2 Listening to 3D sound with
speakers
To ideally simulate headphones, all the sound from the
right speaker should go only to your right ear and all
the sound from the left speaker should go only to your
left ear. In reality, cross-talk occurs; that is, some
sound from the right speaker travels to your left ear,
and vice versa. 3D RSX uses cross-talk cancellation to
cancel the unwanted cross-talk signal and provide true 3D
sound, previously available only with headphones, now on
speakers as well.
Cross-talk cancellation generates an audio signal that
is mixed with the signal from the left speaker. The new,
mixed signal cancels the cross-talk signal that travels
from the right speaker to the left ear. Similarly, a
second signal cancels out the cross-talk from the left
speaker to the right ear. By employing cross-talk
cancellation, 3D RSX delivers the appropriate signal to
each ear and delivers true 3D sound over speakers.
Figure 3 3D RSX Tray Tool icon (the
little headphones) shown next to the clock
3D RSX provides a tool to dynamically specify whether
headphones or speakers are being used. When an RSX
3D-enabled application is running, the 3D RSX Tray Tool
icon appears in the lower right hand corner of the
Windows* tray (see Figure 3). When the user clicks on the
3D RSX Tray Tool icon the 3D RSX Tray Tool opens (Figure
4) to allow for easy selection of speakers or headphones.
Using 3D RSX: Listeners, Emitters and More
3D RSX provides an intuitive interface modeled after
the interfaces found in 3D graphics libraries. The two
basic constructs in 3D RSX are a listener (the person
hearing the sound), and one or more emitters or sound
sources. Just as the position and orientation of a
graphical "camera" determines what is seen in a
3D graphics application, so the position and orientation
of the listener relative to the emitters determines what
is heard.
Figure 4 3D RSX Tray Tool for
selecting an audio peripheral
To use 3D RSX, an application creates a listener and
as many emitters as needed. Then the application simply
updates the position and orientation of all of the
objects to generate a realistic 3D sound experience.
Listeners: The 3D Sound "Camera"
3D RSX allows one listener per instance of 3D RSX, and
provides two types of listeners: direct and streaming.
The choice between direct and streaming listeners depends
on the needs of the application. Both types of listeners
allow developers to specify the listener's audio format,
position and orientation. All PCM multimedia formats are
supported: 8 kHz, 11 kHz, 22 kHz, 44 kHz, 48 kHz,
8/16-bit, monaural/stereo.
Direct Listeners
A direct listener writes 3D RSX's processed audio
output directly to the system audio device.
3D RSX works transparently on Microsoft's Windows* 95
and Windows NT* with wave* and/or DirectSound* and will
choose between the two for optimum operation depending on
the user's computer configuration. Application developers
need not concern themselves with dependencies specific to
wave or DirectSound. When using DirectSound, 3D RSX
utilizes secondary buffers to allow audio sharing between
applications and distinct instances of 3D RSX.
Streaming Listeners
A streaming listener tells 3D RSX to send its
processed output audio buffers back to the application
instead of writing to the audio device. Allowing the
application to receive these audio buffers provides
significant flexibility, including:
- The ability to modify 3D RSX output before it is
written to the audio device
- The ability to write 3D RSX processed output to
files, effectively providing a way to create RSX
3D preprocessed files
- The ability to mix 3D RSX processed output with
audio from other sources
- Control over the pacing of 3D RSX rather than
using the pacing mechanisms built into the direct
listener
Emitters: Where Sound Begins
Emitters are akin to jukeboxes that generate sound and
have position and orientation in 3D space. The emitter's
position and orientation relative to the listener
determines what is heard. The farther away the listener
is from an emitter, the softer the sound is, and if an
emitter is far enough away, its sound is inaudible.
Likewise, if several emitters are within the range of
audibility, 3D RSX mixes the sound together and plays it.
Specifying an Emitter's Sound Range
3D RSX uses the sound model adopted by the Virtual
Reality Modeling Language (VRML) 2.0 specification, a
platform independent way of representing 3D worlds on the
Internet, to specify the range of an emitter's sound. As
Figure 5 shows, this model contains a definition with two
concentric ellipsoids that together define a constant
intensity region, where the sound is at a constant volume
in both the listener's ears, and an attenuation region,
where the sound grows louder or softer as the listener
moves toward or away from the source of the sound.
The inner ellipsoid, defined by a minimum front range
and minimum back range, identifies the ambient constant
intensity region. Being in this region is roughly
analogous to being right next to (or even inside) the
sound source. The region between the edge of the inner
ellipsoid and the edge of the outer ellipsoid, defined by
a maximum front range and maximum back range, is a
diffuse position dependent region where the volume
attenuates.
Figure 5 3D RSX sound source model
Directionality the orientation of the emitter and
listener is also a factor in the attenuation region.
If the listener moves away from the sound and leaves the
source behind, the sound seems to originate behind him.
Or, if he slides away from the sound source by always
keeping it to his left, the sound seems louder in his
left ear than in his right.
Beyond the outer ellipsoid, the sound intensity is set
to zero, which results in silence.
Flexibility in Using Emitters
The number of emitters that can be created in an RSX
3D environment is limited only by the available memory
and CPU on the user's computer. 3D RSX also provides
considerable flexibility in using emitters. For example,
developers can:
- Turn off 3D sound effects for individual emitters
and use 3D RSX as a high-quality,
high-performance audio mixer. The advantage here
is that 3D RSX's file-based interface is far
simpler to use for mixing audio than the
buffer-based interfaces available elsewhere
- Simultaneously play multiple emitters of varying
audio formats
- Start, stop, pause and mute emitters individually
- Dynamically adjust the pitch of individual
emitters
- Cluster different emitters together as
"synchronization groups" so that one
operation can affect multiple emitters
3D RSX supports all PCM wave formats (8K, 11K, 22K,
44K, 48K, 8/16-bit, channels). Non-PCM wave formats are
supported through Audio Compression Manager (ACM)
filters.
Cached and Streaming Emitters
3D RSX offers two types of emitters: cached emitters
and streaming emitters.
Cached emitters provide a simple way to specify a wave
file to play. Some of the features specific to cached
emitters are:
- VCR-like controls: Play, Pause, Resume and Stop
- Support for a variety of sources:
- Local file
- Network file
- Audio file embedded in an executable as a
resource
- URL address
- Support for wave data and MIDI data
- Ability to play segments of files
- Ability to receive playback status
- Complete interoperability with streaming emitters
Streaming emitters are responsible for continuously
feeding buffers into 3D RSX. Streaming emitters are ideal
for:
- Streaming audio data from the Internet and other
networks into 3D RSX. Instead of waiting for an
entire file to download, users can start playing
audio almost instantly
- Integrating 3D RSX with multipoint conferencing
and chat applications
- Providing audio from other sources like video
files
- Dynamically creating audio data
- Modifying audio data before sending it to 3D RSX
What happens when you combine 3D RSX's realistic sound
features with cached and streaming emitters in a
connected PC application? Consider the possibilities for
an Internet-based 3D world/chat application.
You receive a DVD or CD-ROM disk in the mail that
contains hundreds of megabytes of graphics, videos
and high-quality audio files. The data contained on
the disk produces a media-rich, virtual world that
responds instantly to your actions. As you begin to
explore this world, you hear a bird above your head
(a cached emitter based on a wave file) and a
dog barking (another cached emitter) behind
you. You enter a concert hall where a beautiful
symphony is being played (cached emitter based on
a MIDI file with the appropriate reverberation
settings).
You begin to realize that you are all alone in
this virtual world, and you decide to click on the
"Connect" button. Your modem links you to a
server on the Internet, and almost immediately you
notice other people (creatures and avatars)
"walking" around.
A grandmother avatar calls to you. You hear her
sweet voice (a streaming emitter) behind you
and turn around to talk with her. With your
full-duplex audio card, you can speak into your
microphone (another streaming emitter) and
still hear her voice. As she paces back and forth,
the true 3D sound helps you follow her movement.
Suddenly you hear a deep voice behind you:
"Hand over your wallet." (another
streaming emitter) You turn to see the avatar of
your good friend, Mark. As a prank, he is trying a
"virtual mugging." You laugh while
adjusting the pitch of your voice, so Mark hears what
seems to be a laughing "chipmunk" (pitch
shifting). You and Mark then "walk" to
the virtual stadium to "watch" a football
game. You're so close to the action you can hear the
thud of leather connecting with pigskin at the
kickoff (true 3D sound, cached emitter) as well
as the roar of the crowd surrounding you (true 3D
sound, reverberation, another cached emitter)
In addition, this application could also utilize a
streaming listener to save an audio log to record on disk
any of the conversations that were held in the 3D world.
3D RSX Scalability
3D RSX is scalable. It contains several different
algorithms for reverberation, Doppler and 3D sound, and
selects the appropriate ones for the computer on which it
is executing. As a developer, this means your 3D RSX
applications can run and sound great on a wide
range of Pentium and Pentium Pro processor-based
computers. For end users, it means that as you migrate to
newer and more powerful computers, your realistic 3D
sound experience will sound even better. Because 3D RSX
is highly optimized for Pentium and Pentium Pro
processor-based PCs, with and without MMXTM
technology, it runs well on today's machines and even
better on tomorrow's machines.
3D RSX and the Internet
Any application that works on and with the Internet
can use 3D RSX and will find URL-based cached emitters
and streaming emitters to be especially useful. 3D RSX
provides a choice of ways to incorporate 3D sound into
Internet computing:
- VRML 2.0. From the beginning 3D RSX was designed
to implement all the features of the Sound Node
in the VRML 2.0 specification. In fact, 3D RSX is
the world's first VRML 2.0 audio solution
available for the PC. Developers of VRML 2.0
browsers and authoring tools can use 3D RSX to
implement the VRML 2.0 sound specification.
- ActiveX*. Intel provides an ActiveX control that
enables scripting languages such as Visual
Basic*, JavaScript* and VBScript* to access RSX
3D.
- Java*. Intel offers two solutions to enable Java
applications to use 3D RSX. One solution takes
advantage of the Microsoft mechanism for allowing
COM and Java to cooperate. This Java solution
currently works only with the Microsoft Java
Virtual Machine and supports only cached emitters
and direct listeners. A second Java
implementation is the Intel Spatial Audio for
Java package. This package supports all features
of 3D RSX and works with browsers from both
Microsoft and Netscape.
For More Information
3D RSX is a breakthrough in sound for personal
computers. It blends the four major ingredients of
realistic sound into an easy-to-use, flexible package
that can heighten the realism and impact of your
applications on Pentium or Pentium Pro processor-based
computers.
To obtain a copy of 3D RSX or to get more information,
visit the Intel 3D RSX site on the World Wide Web at:
http://developer.intel.com/ial/rsx
Features Available in 3D RSX
- True 3D Sound Dynamic, interactive, real-time
positional audio
- Open Speaker Cross-Cancellation
- Scalability
- Support for streaming sound sources and streaming
output
- Dynamic, real-time reverberation
- Dynamic, real-time Doppler effect
- Dynamic pitch adjustment
- MIDI-based sound sources
- Works with Wave and DirectSound files
- Accurate, intuitive sound source definition
compatible with VRML 2.0
- Support for both left- and right-handed 3D
coordinate systems
- Support for synchronizing sound sources
- Support for all PCM audio formats
- Sound sources support non-PCM audio formats
through ACM
- High-performance, high-quality sample rate
conversion
- Support for sharing audio device with other
applications
- Works well with the Internet
- Easy-to-use COM interfaces
- VRML 2.0 compliance
- High-performance, high-quality mixing of multiple
sample rates
- 3D RSX Tray Tool to dynamically adjust audio
peripheral
- Enhanced for Intel Pentium technology-based
processors
- Configuration/diagnostic tool for troubleshooting
and fine-tuning 3D RSX
- Backward compatibility with RSX 1.0 and RSX 2.0
3D RSX SDK
The 3D RSX Software Development Kit provides the
following features in addition to those in the runtime
version:
- Sample code (in C and C++)
- Extensive documentation in HTML format
- Sample code on integrating 3D RSX with a typical
3D graphics library
- Runtime installation for Installshield* and
non-Installshield users
- Debug 3D RSX .dlls for easier development
- Java support available through the Intel
Spatial Audio for Java package
- ActiveX support available separately
Where Can Intel's 3D Realistic Sound Experience Add
Value?
1) Games
2) VRML browsing environments
3) Chat spaces
4) 3D Virtual Worlds/Socialization applications
5) Authoring tools for VRML
6) Audio file authoring/editing tools
7) Authoring tools for 3D graphics
8) A generic filter (such as in the context of ACM or
ActiveMovie*)
9) As a high-quality, high-performance, easy-to-use audio
mixer
10) To provide audio cues in 3D user interfaces and data
visualization applications
11) And much more!
|