Intel Realistic 3D Sound Experience

	Features

	Documentation Online

	Samples/Demos

	System Requirements

	SDK Software License

	Download Area

	Additional Information

	Support Information

	News

Intel 3D Realistic Sound Experience
A technical overview of Intel's state-of-the-art software library for incorporating interactive, positional 3D sound into PC-based applications

The 3D Realistic Sound Experience

Intel's new 3D Realistic Sound Experience (3D RSX) is a software library that enables developers to create a PC-based sound experience so realistic you have to hear to believe it without expensive add-in cards or custom speakers.

What is 3D RSX and how does it work? What are the key technologies that comprise the realistic 3D sound experience? How can you use 3D RSX to enhance the sound quality of everything from computer games and 3D virtual worlds to user interfaces and chat spaces? This white paper, which provides a technical overview of 3D RSX, answers these questions and more.

Enriching the Experience with Realistic, Three-Dimensional Sound

Intel's 3D Realistic Sound Experience, 3D RSX, is a software library that delivers realistic sound for PC-based applications. Just as the term realistic graphics describes the graphical rendering of a scene that's good enough to simulate visual reality, so realistic sound is the audio rendering of a scene that simulates auditory reality.

Four Elements of the Realistic Sound Experience

3D RSX encompasses four key elements that together make up the 3D Realistic Sound Experience:

True 3D sound clearly positions sounds in three-dimensional space, and works on speakers as well as headphones

Reverberation simulates the acoustical properties of enclosed areas

Doppler effect simulates sounds in motion relative to the listener

Pitch-shifting allows for variation in the frequency of sound waves and hence in the highness or lowness of a sound

To get an idea of how realistic sound can enrich your PC-based applications, consider a simulated trip to the State Fair.

You sit at your PC and take a virtual walk through the State Fair. You hear screams from the roller coaster high above you and the noise of the roller coaster on its track (3D sound, Doppler effect). To your right, calliope music emanates from the merry-go-round (3D sound). Moments later, a carnival barker shouts about how easy it is to win a stuffed tiger it sounds like he's standing just ahead of you, off to the left (3D sound). As you walk along, an unusually high-pitched voice off to the right makes you turn your head (3D sound, pitch shifting). It's another barker beckoning one and all to try their hand at her game of chance. You decide to enter the Haunted House. Inside, you hear people ahead of you screaming, and their voices seem to echo off the walls (3D sound, reverberation). Winding your way through the dark halls, you hear the rumbling, low-pitched voice of a monster below you (pitch shifting, 3D sound, reverberation). Outside the Haunted House, you see a helicopter pulling a long advertising banner it sounds like the chopper is moving to and fro right over your head (3D sound, Doppler). You have a sudden urge for a corn dog and cotton candy and decide to jump up from your computer to see what's in your refrigerator...

With realistic 3D sound, the listener is no longer just an observer, but a part of the overall scene.

True 3D Sound

True 3D sound provides an experience far beyond traditional stereo technology. Stereo technology, at best, provides a left-right panning effect. Sound may be louder or softer in one ear, but the movement of the sound is generally restricted to a line between the ears. Some stereo expanders may move sounds out beyond the physical location of the speakers, but the sound essentially lies in a one- or two-dimensional plane.

With true 3D sound, on the other hand, sounds are clearly positioned in 3D space. In addition to the left-right panning of traditional stereo, true 3D sound seems to originate in the three-dimensional space outside a listener's head. It can seem to be above or below you, in front of or behind you, to the right or left nearly anywhere in the 3D space that surrounds you.

For example, say you create a piece of computer music that features a five-piece band. With true 3D sound, you can place your listener in the middle of the band, so one guitar sounds as if it is behind you, the drums sound as if they are in front of you, the piano is above you, and so forth. This effect could be preprocessed into the music, but for an even better experience it can be used interactively, allowing your listeners to "move around" and hear the music from nearly any angle, direction or position to fly over the band, crawl below them, sit in the front row or in the balcony.

Reverberation

Reverberation simulates the acoustical properties of enclosed areas, from small chambers and concert halls to wide-open canyons. In simple terms, reverberation is the slight echo effect heard when sound is generated in an enclosed area. For instance, if you have a conversation with a friend in a racquetball court, the sound waves from your friend's mouth not only travel directly to your ears, but also bounce off of the walls and return to your ears a second, third or fourth time. The reflected sound waves that indirectly reach our ears produce the effect known as reverberation.

3D RSX allows developers to modify both the reverberation delay and intensity of an enclosed area. The reverberation delay coefficient models the size of the room. For example, as the volume of a particular environment (room) increases, the perceived acoustical delay also increases. This corresponds to an increase in the delay coefficient value in the reverberation algorithm. The intensity value approximates the "reflectivity" of the surfaces in the acoustic environment. By adjusting these coefficients, developers can use 3D RSX to simulate a variety of acoustical environments. In addition, a minimal level of reverberation further improves the realism of 3D sound.

Software tools have been available for some time to add reverberation to sound files by preprocessing the audio files. 3D RSX can be used to preprocess files with reverberation, but in addition, it also provides the ability to vary reverberation in real-time, while an application is executing.

You're at your PC playing a game that transports you to a magical, mythical world where knights and kings rule the kingdom. You, the game's hero or heroine, are walking through an open field (reverberation is off). You spot an opening in a nearby mountain and enter a dark cave (reverberation is turned on to model the acoustics of the cave). You hear the snarl of a dragon, and your spine tingles as its deep-throated moan permeates the cave (sound waves reflecting off of the cave walls). Scared beyond belief, you run out of the cave to a nearby castle (reverberation is switched off as you run through the countryside). Just inside the castle, you hear the voice of the castle sentry resonating (reverberation is adjusted to approximate the acoustics of a seventh-century castle) as he maniacally screams that you are trespassing...

Doppler Effect

The Doppler effect simulates sounds in motion. A well-known experiment with Doppler involves blowing a horn on a moving train. As the train travels toward the listener, the sound waves are compressed, effectively increasing the horn's pitch. As the train moves away from the listener, the sound waves are rarefied, correspondingly decreasing the pitch. The Doppler effect can thus be viewed as an automatic pitch shifter that translates the relative velocity between a sound source and the listener into a change in the sound's pitch. Doppler works well for applications and games with moving objects. A game with airplanes and helicopters whizzing overhead as you scramble for cover in a battlefield, or cars zipping by as you sit in the audience of a virtual Grand Prix race these are effective uses of Doppler.

Pitch Shifting

Pitch shifting varies a sound's pitch its distinctive quality, which depends primarily on the frequency of the sound waves produced by its source. The most common example of pitch shifting is "chipmunk" voice effects and other voice distortions, but pitch shifting is also useful for sound effects like the revving of an engine. It is used internally by 3D RSX to perform the Doppler effect.

Understanding True 3D Sound

In the past, three-dimensional sound required multiple speakers positioned around you, plus expensive add-in cards in the computer. 3D RSX requires just a stereo sound card, a set of headphones or two standard PC speakers, and the 3D RSX software library. With these simple requirements met, any 3D RSX-enabled application can provide true 3D sound.

How does 3D RSX manage to produce realistic, positional sound with so little hardware?

It does so by using software, and the power of Intel's Pentium® and Pentium® Pro processors, to simulate the way humans hear sound in the real world. Specifically, 3D RSX uses Head-Related Transfer Functions (HRTF), an application of fundamental research that's been occurring at major universities for years.

Sounds in the real world are not stereo but monaural single channel. However, because you have two ears, sounds appear as two-channel stereo -the two channels correspond to your two ears. As a sound reaches your ear, your brain and ears work together to distinguish the origin of the sound. The position and shape of your head, ears, shoulders and torso interact with incoming sound waves in such a way that your brain-ears combination can pinpoint the origin of nearly any sound that reaches our ears. So, how does this work?

Determining Left vs. Right

Consider an example. If you're standing in a long hallway and a door slams shut on your right, the sound reaches your right ear earlier than it reaches your left ear. (This is referred to as Interaural Time Delay ITD). The sound is also slightly louder in your right ear than in your left. (This is known as the Interaural Intensity Difference IID). Using these auditory cues, your brain determines that the slammed door is located to your right and not your left. In fact, your ears and brain do this so well that you can probably determine where the door is located even with your eyes closed.

Determining Front vs. Back

Your brain working in concert with your ears also determines if a sound is located in front of or behind you. The outer part of the ear (the pinna) acts as a filter to provide the audio cues the brain needs to determine if a sound is in front of or behind you. You may also turn your head and/or obtain visual confirmation that the sound source is where your ears tell you it is.

Determining Up vs. Down

The cues generated by the pinna also help establish the elevation of a sound source. The height of a sound source, relative to your ears, is determined from the unique frequency patterns (notches) present in the signal arriving at the ear drums. The reflections off the pinna's asymmetric ridges produce an elevation-dependent interference pattern. Sound sources from a higher elevation generate a different interference pattern from those originating at a low elevation. The brain determines the height of a sound source by interpreting these different frequency patterns.

HRTF: From the Lab to the Computer

Head-Related Transfer Functions are the key to simulating all the information that is being passed between our ears and our brain so that a computer can produce the same effect. HRTFs are based on sound measurements made on volunteers in an anechoic (no echoes) chamber. Researchers placed clinical probe microphones inside volunteers' ear canals and recorded measurements at approximately 463 positions around their heads.

Figure 1 A pair of HRTF impulse responses and their respective spectra at 60 degrees azimuth.

This measured set of impulse responses represents the collection of directionally dependent characteristics impressed upon the audio signal by the human torso, shoulder, head, and outer ear. A pair of these impulse responses (representing a single spatial position) and their respective transfer functions are shown below in Figure 1. This set of data is collectively referred to as Head-Related Transfer Functions. HRTFs are stored as a data table in the 3D RSX software library, which uses them to position sounds in three-dimensional space.

True 3D Sound over Headphones and Speakers

Because HRTF-based sound is measured via microphones placed in volunteers' ears, the resulting sound is meant to be directly funneled into the ears via headphones. RSX 3D uses cross cancellation technology to deliver the same high-quality 3D sound using only two speakers.

Imagine your speakers positioned with one on the right side of your monitor and the other on the left, as shown in Figure 2.

Figure 2 Listening to 3D sound with speakers

To ideally simulate headphones, all the sound from the right speaker should go only to your right ear and all the sound from the left speaker should go only to your left ear. In reality, cross-talk occurs; that is, some sound from the right speaker travels to your left ear, and vice versa. 3D RSX uses cross-talk cancellation to cancel the unwanted cross-talk signal and provide true 3D sound, previously available only with headphones, now on speakers as well.

Cross-talk cancellation generates an audio signal that is mixed with the signal from the left speaker. The new, mixed signal cancels the cross-talk signal that travels from the right speaker to the left ear. Similarly, a second signal cancels out the cross-talk from the left speaker to the right ear. By employing cross-talk cancellation, 3D RSX delivers the appropriate signal to each ear and delivers true 3D sound over speakers.

Figure 3 3D RSX Tray Tool icon (the little headphones) shown next to the clock

3D RSX provides a tool to dynamically specify whether headphones or speakers are being used. When an RSX 3D-enabled application is running, the 3D RSX Tray Tool icon appears in the lower right hand corner of the Windows* tray (see Figure 3). When the user clicks on the 3D RSX Tray Tool icon the 3D RSX Tray Tool opens (Figure 4) to allow for easy selection of speakers or headphones.

Using 3D RSX: Listeners, Emitters and More

3D RSX provides an intuitive interface modeled after the interfaces found in 3D graphics libraries. The two basic constructs in 3D RSX are a listener (the person hearing the sound), and one or more emitters or sound sources. Just as the position and orientation of a graphical "camera" determines what is seen in a 3D graphics application, so the position and orientation of the listener relative to the emitters determines what is heard.

Figure 4 3D RSX Tray Tool for selecting an audio peripheral

To use 3D RSX, an application creates a listener and as many emitters as needed. Then the application simply updates the position and orientation of all of the objects to generate a realistic 3D sound experience.

Listeners: The 3D Sound "Camera"

3D RSX allows one listener per instance of 3D RSX, and provides two types of listeners: direct and streaming. The choice between direct and streaming listeners depends on the needs of the application. Both types of listeners allow developers to specify the listener's audio format, position and orientation. All PCM multimedia formats are supported: 8 kHz, 11 kHz, 22 kHz, 44 kHz, 48 kHz, 8/16-bit, monaural/stereo.

Direct Listeners

A direct listener writes 3D RSX's processed audio output directly to the system audio device.

3D RSX works transparently on Microsoft's Windows* 95 and Windows NT* with wave* and/or DirectSound* and will choose between the two for optimum operation depending on the user's computer configuration. Application developers need not concern themselves with dependencies specific to wave or DirectSound. When using DirectSound, 3D RSX utilizes secondary buffers to allow audio sharing between applications and distinct instances of 3D RSX.

Streaming Listeners

A streaming listener tells 3D RSX to send its processed output audio buffers back to the application instead of writing to the audio device. Allowing the application to receive these audio buffers provides significant flexibility, including:

The ability to modify 3D RSX output before it is written to the audio device

The ability to write 3D RSX processed output to files, effectively providing a way to create RSX 3D preprocessed files

The ability to mix 3D RSX processed output with audio from other sources

Control over the pacing of 3D RSX rather than using the pacing mechanisms built into the direct listener

Emitters: Where Sound Begins

Emitters are akin to jukeboxes that generate sound and have position and orientation in 3D space. The emitter's position and orientation relative to the listener determines what is heard. The farther away the listener is from an emitter, the softer the sound is, and if an emitter is far enough away, its sound is inaudible. Likewise, if several emitters are within the range of audibility, 3D RSX mixes the sound together and plays it.

Specifying an Emitter's Sound Range

3D RSX uses the sound model adopted by the Virtual Reality Modeling Language (VRML) 2.0 specification, a platform independent way of representing 3D worlds on the Internet, to specify the range of an emitter's sound. As Figure 5 shows, this model contains a definition with two concentric ellipsoids that together define a constant intensity region, where the sound is at a constant volume in both the listener's ears, and an attenuation region, where the sound grows louder or softer as the listener moves toward or away from the source of the sound.

The inner ellipsoid, defined by a minimum front range and minimum back range, identifies the ambient constant intensity region. Being in this region is roughly analogous to being right next to (or even inside) the sound source. The region between the edge of the inner ellipsoid and the edge of the outer ellipsoid, defined by a maximum front range and maximum back range, is a diffuse position dependent region where the volume attenuates.

Figure 5 3D RSX sound source model

Directionality the orientation of the emitter and listener is also a factor in the attenuation region. If the listener moves away from the sound and leaves the source behind, the sound seems to originate behind him. Or, if he slides away from the sound source by always keeping it to his left, the sound seems louder in his left ear than in his right.

Beyond the outer ellipsoid, the sound intensity is set to zero, which results in silence.

Flexibility in Using Emitters

The number of emitters that can be created in an RSX 3D environment is limited only by the available memory and CPU on the user's computer. 3D RSX also provides considerable flexibility in using emitters. For example, developers can:

Turn off 3D sound effects for individual emitters and use 3D RSX as a high-quality, high-performance audio mixer. The advantage here is that 3D RSX's file-based interface is far simpler to use for mixing audio than the buffer-based interfaces available elsewhere

Simultaneously play multiple emitters of varying audio formats

Start, stop, pause and mute emitters individually

Dynamically adjust the pitch of individual emitters

Cluster different emitters together as "synchronization groups" so that one operation can affect multiple emitters

3D RSX supports all PCM wave formats (8K, 11K, 22K, 44K, 48K, 8/16-bit, channels). Non-PCM wave formats are supported through Audio Compression Manager (ACM) filters.

Cached and Streaming Emitters

3D RSX offers two types of emitters: cached emitters and streaming emitters.

Cached emitters provide a simple way to specify a wave file to play. Some of the features specific to cached emitters are:

VCR-like controls: Play, Pause, Resume and Stop

Support for a variety of sources:

Local file

Network file

Audio file embedded in an executable as a resource

URL address

Support for wave data and MIDI data

Ability to play segments of files

Ability to receive playback status

Complete interoperability with streaming emitters

Streaming emitters are responsible for continuously feeding buffers into 3D RSX. Streaming emitters are ideal for:

Streaming audio data from the Internet and other networks into 3D RSX. Instead of waiting for an entire file to download, users can start playing audio almost instantly

Integrating 3D RSX with multipoint conferencing and chat applications

Providing audio from other sources like video files

Dynamically creating audio data

Modifying audio data before sending it to 3D RSX

What happens when you combine 3D RSX's realistic sound features with cached and streaming emitters in a connected PC application? Consider the possibilities for an Internet-based 3D world/chat application.

You receive a DVD or CD-ROM disk in the mail that contains hundreds of megabytes of graphics, videos and high-quality audio files. The data contained on the disk produces a media-rich, virtual world that responds instantly to your actions. As you begin to explore this world, you hear a bird above your head (a cached emitter based on a wave file) and a dog barking (another cached emitter) behind you. You enter a concert hall where a beautiful symphony is being played (cached emitter based on a MIDI file with the appropriate reverberation settings).

You begin to realize that you are all alone in this virtual world, and you decide to click on the "Connect" button. Your modem links you to a server on the Internet, and almost immediately you notice other people (creatures and avatars) "walking" around.

A grandmother avatar calls to you. You hear her sweet voice (a streaming emitter) behind you and turn around to talk with her. With your full-duplex audio card, you can speak into your microphone (another streaming emitter) and still hear her voice. As she paces back and forth, the true 3D sound helps you follow her movement.

Suddenly you hear a deep voice behind you: "Hand over your wallet." (another streaming emitter) You turn to see the avatar of your good friend, Mark. As a prank, he is trying a "virtual mugging." You laugh while adjusting the pitch of your voice, so Mark hears what seems to be a laughing "chipmunk" (pitch shifting). You and Mark then "walk" to the virtual stadium to "watch" a football game. You're so close to the action you can hear the thud of leather connecting with pigskin at the kickoff (true 3D sound, cached emitter) as well as the roar of the crowd surrounding you (true 3D sound, reverberation, another cached emitter)

In addition, this application could also utilize a streaming listener to save an audio log to record on disk any of the conversations that were held in the 3D world.

3D RSX Scalability

3D RSX is scalable. It contains several different algorithms for reverberation, Doppler and 3D sound, and selects the appropriate ones for the computer on which it is executing. As a developer, this means your 3D RSX applications can run and sound great on a wide range of Pentium and Pentium Pro processor-based computers. For end users, it means that as you migrate to newer and more powerful computers, your realistic 3D sound experience will sound even better. Because 3D RSX is highly optimized for Pentium and Pentium Pro processor-based PCs, with and without MMX^TM technology, it runs well on today's machines and even better on tomorrow's machines.

3D RSX and the Internet

Any application that works on and with the Internet can use 3D RSX and will find URL-based cached emitters and streaming emitters to be especially useful. 3D RSX provides a choice of ways to incorporate 3D sound into Internet computing:

VRML 2.0. From the beginning 3D RSX was designed to implement all the features of the Sound Node in the VRML 2.0 specification. In fact, 3D RSX is the world's first VRML 2.0 audio solution available for the PC. Developers of VRML 2.0 browsers and authoring tools can use 3D RSX to implement the VRML 2.0 sound specification.

ActiveX*. Intel provides an ActiveX control that enables scripting languages such as Visual Basic*, JavaScript* and VBScript* to access RSX 3D.

Java*. Intel offers two solutions to enable Java applications to use 3D RSX. One solution takes advantage of the Microsoft mechanism for allowing COM and Java to cooperate. This Java solution currently works only with the Microsoft Java Virtual Machine and supports only cached emitters and direct listeners. A second Java implementation is the Intel Spatial Audio for Java package. This package supports all features of 3D RSX and works with browsers from both Microsoft and Netscape.

For More Information

3D RSX is a breakthrough in sound for personal computers. It blends the four major ingredients of realistic sound into an easy-to-use, flexible package that can heighten the realism and impact of your applications on Pentium or Pentium Pro processor-based computers.

To obtain a copy of 3D RSX or to get more information, visit the Intel 3D RSX site on the World Wide Web at: http://developer.intel.com/ial/rsx

Features Available in 3D RSX

True 3D Sound Dynamic, interactive, real-time positional audio

Open Speaker Cross-Cancellation

Scalability

Support for streaming sound sources and streaming output

Dynamic, real-time reverberation

Dynamic, real-time Doppler effect

Dynamic pitch adjustment

MIDI-based sound sources

Works with Wave and DirectSound files

Accurate, intuitive sound source definition compatible with VRML 2.0

Support for both left- and right-handed 3D coordinate systems

Support for synchronizing sound sources

Support for all PCM audio formats

Sound sources support non-PCM audio formats through ACM

High-performance, high-quality sample rate conversion

Support for sharing audio device with other applications

Works well with the Internet

Easy-to-use COM interfaces

VRML 2.0 compliance

High-performance, high-quality mixing of multiple sample rates

3D RSX Tray Tool to dynamically adjust audio peripheral

Enhanced for Intel Pentium technology-based processors

Configuration/diagnostic tool for troubleshooting and fine-tuning 3D RSX

Backward compatibility with RSX 1.0 and RSX 2.0

3D RSX SDK

The 3D RSX Software Development Kit provides the following features in addition to those in the runtime version:

Sample code (in C and C++)

Extensive documentation in HTML format

Sample code on integrating 3D RSX with a typical 3D graphics library

Runtime installation for Installshield* and non-Installshield users

Debug 3D RSX .dlls for easier development

Java support available through the Intel Spatial Audio for Java package

ActiveX support available separately

Where Can Intel's 3D Realistic Sound Experience Add Value?

1) Games
2) VRML browsing environments
3) Chat spaces
4) 3D Virtual Worlds/Socialization applications
5) Authoring tools for VRML
6) Audio file authoring/editing tools
7) Authoring tools for 3D graphics
8) A generic filter (such as in the context of ACM or ActiveMovie*)
9) As a high-quality, high-performance, easy-to-use audio mixer
10) To provide audio cues in 3D user interfaces and data visualization applications
11) And much more!