<-- Back to Previous Page TOC Next Section -->


Chapter 5: The Transformation of Sound by Computer

Section 5.3: Localization/Spatialization

Applet 5.5
Filter-based localization

This applet lets you pan, fade, and simulate binaural listening (best if listened to with headphones).

Soundfile 5.8
Binaural location

Using the program SoundHack by Tom Erbe, we can place a sound anywhere in the binaural (two ears) space. SoundHack does this by convolving the sound with known filter functions that simulate the ITD (interaural time delay) between the two ears.


Close your eyes and listen to the sounds around you. How well can you tell where they’re coming from? Pretty well, hopefully! How do we do that? And how could we use a computer to simulate moving sound so that, for example, we can make a car go screaming across a movie screen or a bass player seem to walk over our heads?

Humans have a pretty complicated system for perceptually locating sounds, involving, among other factors, the relative loudness of the sound in each ear, the time difference between the sound’s arrival in each ear, and the difference in frequency content of the sound as heard by each ear. How would a "cyclaural" (the equivalent of a "cyclops") hear? Most attempts at spatializing, or localizing, recorded sounds make use of some combination of factors involving the two ears on either side of the head.

Simulating Sound Placement

Simulating a loudness difference is pretty simple—if someone standing to your right says your name, their voice is going to sound louder in your right ear than in your left. The simplest way to simulate this volume difference is to increase the volume of the signal in one channel while lowering it in the other—you’ve probably used the pan or balance knob on a car stereo or boombox, which does exactly this. Panning is a fast, cheap, and fairly effective means of localizing a signal, although it can often sound artificial.

Interaural Time Delay (ITD)

Simulating a time difference is a little trickier, but it adds a lot to the realism of the localization. Why would a sound reach your ears at different times? After all, aren’t our ears pretty close together? We’re generally not even aware that this is true: snap your finger on one side of your head, and you’ll think that you hear the sound in both ears at exactly the same time.

But you don’t. Sound moves at a specific speed, and it’s not all that fast (compared to light, anyway): about 345 meters/second. Since your fingers are closer to one ear than the other, the sound waves will arrive at your ears at different times, if only by a small fraction of a second. Since most of us have ears that are quite close together, the time difference is very slight—too small for us to consciously "perceive."

Let’s say your head is a bit wide: roughly 250 cm, or a quarter of a meter. It takes sound around 1/345 of a second to go 1 meter, which is approximately 0.003 second (3 thousandths of a second). It takes about a quarter of that time to get from one ear of your wide head to the other, which is about 0.0007 second (0.7 thousandths of a second). That’s a pretty small amount of time! Do you believe that our brains perceive that tiny interval and use the difference to help us localize the sound? We hope so, because if there’s a frisbee coming at you, it would be nice to know which direction it’s coming from! In fact, though, the delay is even smaller because your head’s smaller than 0.25 meter (we just rounded it off for simplicity). The technical name for this delay is interaural time delay (ITD).

To simulate ITD by computer, we simply need to add a delay to one channel of the sound. The longer the delay, the more the sound will seem to be panned to one side or the other (depending on which channel is delayed). The delays must be kept very short so that, as in nature, we don’t consciously perceive them as delays, just as location cues. Our brains take over and use them to calculate the position of the sound. Wow!

Modeling Our Ears and Our Heads

That the ears perceive and respond to a difference in volume and arrival time of a sound seems pretty straightforward, albeit amazing. But what’s this about a difference in the frequency content of the sound? How could the position of a bird change the spectral makeup of its song? The answer: your head!

Imagine someone speaking to you from another room. What does the voice sound like? It’s probably a bit muffled or hard to understand. That’s because the wall through which the sound is traveling—besides simply cutting down the loudness of the sound—acts like a low-pass filter. It lets the low frequencies in the voice pass through while attenuating or muffling the higher ones.

Your head does the same thing. When a sound comes from your right, it must first pass through, or go around, your head in order to reach your left ear. In the process, your head absorbs, or blocks, some of the high-frequency energy in the sound. Since the sound didn’t have to pass through your head to get to your right ear, there is a difference in the spectral makeup of the sound that each ear hears. As with ITD, this is a subtle effect, although if you’re in a quiet room and you turn your head from side to side while listening to a steady sound, you may start to perceive it.

Modeling this by computer is easy, provided you know something about how the head filters sounds (what frequencies are attenuated and by how much). If you’re interested in the frequency response of the human head, there are a number of published sources available for the data, since they are used by, among others, the government for all sorts of things (like flight simulators, for example). Researcher and author Durand Begault has been a leading pioneer in the design and implementation of what are called head transfer functions—frequency response curves for different locations of sound.

What Are Head-Related Transfer Functions (HRTFs)?

Figure 5.10  This illustration shows how the spectral contents of a sound change depending on which direction the sound is coming from. The body (head and shoulders) and the time-of-arrival difference that occurs between the left and right ears create a filtering effect.

Figure 5.11  The binaural dummy head recording system includes an acoustic baffle with the approximate size, shape, and weight of a human head. Small microphones are mounted where our ears are located.

This recording system is designed to emulate the acoustic effects of the human head (just as our ears might hear sounds) and then capture the information on recording media.

A number of recording equipment manufacturers make these "heads," and they often have funny names (Sven, etc.).

Thanks to Sonic Studios for this photo.

Not surprisingly, humans are extremely adept at locating sounds in two dimensions, or the plane. We’re great at figuring out the source direction of a sound, but not the height. When a lion is coming at us, it’s nice of evolution to have provided us with the ability to know, quickly and without much thought, which way to run. It’s perhaps more of a surprise that we’re less adept at locating sounds in the third dimension, or more accurately, in the "up/down" axis. But we don’t really need this ability. We can’t jump high enough for that perception to do us much good. Barn owls, on the other hand, have little filters on their cheeks, making them extraordinarily good at sensing their sonic altitude distances. You would be good at sensing your sonic altitude distance, too, if you had to catch and eat, from the air, rapidly running field mice. So if it’s not a frisbee heading at you more or less in the two-dimensional plane, but a softball headed straight down toward your head, we’d suggest a helmet!

<-- Back to Previous Page Next Section -->