<-- Back to Previous Page TOC Next Chapter -->

   

Chapter 2: The Digital Representation of Sound,
Part Two: Playing by the Numbers

Section 2.8: Compression

 

When we start talking about taking 44.1 kHz samples per second, each one of those samples has a 16-bit value, so we’re building up a whole heck of a lot of bits. In fact, it’s too many bits for most purposes. While it’s not too wasteful if you want an hour of high-quality sound on a CD, it’s kind of unwieldy if we need to download or send it over the Internet, or store a bunch of it on our home hard drive. Even though high-quality sound data aren’t anywhere near as large as image or video data, they’re still too big to be practical. What can we do to reduce the data explosion?

If we keep in mind that we’re representing sound as a kind of list of symbols, we just need to find ways to express the same information in a shorter string of symbols. That’s called data compression, and it’s a rapidly growing field dealing with the problems involved in moving around large quantities of bits quickly and accurately.

The goal is to store the most information in the smallest amount of space, without compromising the quality of the signal (or at least, compromising it as little as possible). Compression techniques and research are not limited to digital sound—data compression plays an essential part in the storage and transmission of all types of digital information, from word-processing documents to digital photographs to full-screen, full-motion videos. As the amount of information in a medium increases, so does the importance of data compression.

What is compression exactly, and how does it work? There is no one thing that is "data compression." Instead, there are many different approaches, each addressing a different aspect of the problem. We’ll take a look at just a couple of ways to compress digital audio information. What’s important about these different ways of compressing data is that they tend to illustrate some basic ideas in the representation of information, particularly sound, in the digital world.

Eliminating Redundancy

There are a number of classic approaches to data compression. The first, and most straightforward, is to try to figure out what’s redundant in a signal, leave it out, and put it back in when needed later. Something that is redundant could be as simple as something we already know. For example, examine the following messages:

YNK DDL WNT T TWN, RDNG N PNY

or

DNT CNT YR CHCKNS BFR THY HTCH

It’s pretty clear that leaving out the vowels makes the phrases shorter, unambiguous, and fairly easy to reconstruct. Other phrases may not be as clear and may need a vowel or two. However, clarity of the intended message occurs only because, in these particular messages, we already know what it says, and we’re simply storing something to jog our memory. That’s not too common.

Now say we need to store an arbitrary series of colors:

blue blue blue blue green green green red blue red blue yellow

This is easy to shorten to:

4 blue 3 green red blue red blue yellow

In fact, we can shorten that even more by saying:

4 blue 3 green 2 (red blue) yellow

We could shorten it even more, if we know we’re only talking about colors, by:

4b3g2(rb)y

We can reasonably guess that "y" means yellow. The "b" is more problematic, since it might mean "brown" or "black," so we might have to use more letters to resolve its ambiguity. This simple example shows that a reduced set of symbols will suffice in many cases, especially if we know roughly what the message is "supposed" to be. Many complex compression and encoding schemes work in this way.

Perceptual Encoding

A second approach to data compression is similar. It also tries to get rid of data that do not "buy us much," but this time we measure the value of a piece of data in terms of how much it contributes to our overall perception of the sound.

Here’s a visual analogy: if we want to compress a picture for people or creatures who are color-blind, then instead of having to represent all colors, we could just send black-and-white pictures, which as you can well imagine would require less information than a full-color picture. However, now we are attempting to represent data based on our perception of it. Notice here that we’re not using numbers at all: we’re simply trying to compress all the relevant data into a kind of summary of what’s most important (to the receiver). The tricky part of this is that in order to understand what’s important, we need to analyze the sound into its component features, something that we didn’t have to worry about when simply shortening lists of numbers.

temperature: 76oF

humidity: 35%

wind: north-east at 5 MPH

barometer: falling

clouds: none

 =  It’s a nice day out!

Figure 2.24  We humans use perception-based encoding all the time. If we didn't, we’d have very tedious conversations.


Xtra bit 2.6
MP3


Soundfile 2.8
128 Kbps


Soundfile 2.9
64 Kbps


Soundfile 2.10
32 Kbps

MP3 is the current standard for data compression of sound on the web. But keep in mind that these compression standards change frequently as people invent newer and better methods.

Soundfiles 2.8, 2.9, and 2.10 were all compressed into the MP3 format but at different bit rates. The lower the bit rate, the more degradation. (Kbps means kilobits per second.)

Perceptually based sound compression algorithms usually work by eliminating numerical information that is not perceptually significant and just keeping what’s important.

µ-law ("mu-law") encoding is a simple, common, and important perception-based compression technique for sound data. It’s an older technique, but it’s far easier to explain here than a more sophisticated algorithm like MP3, so we’ll go into it in a bit of detail. Understanding it is a useful step toward understanding compression in general.

µ-law is based on the principle that our ears are far more sensitive to low amplitude changes than to high ones. That is, if sounds are soft, we tend to notice the change in amplitude more easily than between very loud and other nearly equally loud sounds. µ-law compression takes advantage of this phenomenon by mapping 16-bit values onto an 8-bit µ-law table like Table 2.6.

0 8 16 24 32 40 48 56
64 72 80 88 96 104 112 120
132 148 164 180 196 212 228 244
260 276 292 308 324 340 356 372

Table 2.6  Entries from a typical µ-law table. The complete table consists of 256 entries spanning the 16-bit numerical range from –32,124 to 32,124. Half the range is positive, and half is negative. This is often the way sound values are stored.

Notice how the range of numbers is divided logarithmically rather than linearly, giving more precision at lower amplitudes. In other words, loud sounds are just loud sounds.

To encode a µ-law sample, we start with a 16-bit sample value, say 330. We then find the entry in the table that is closest to our sample value. In this case, it would be 324, which is the 28th entry (starting with entry 0), so we store 28 as our µ-law sample value. Later, when we want to decode the µ-law sample, we simply read 28 as an index into the table, and output the value stored there: 324.

You might be thinking, "Wait a minute, Our original sample value was 330, but now we have a value of 324. What good is that?" While it’s true that we lose some accuracy when we encode µ-law samples, we still get much better sound quality than if we had just used regular 8-bit samples.

Here’s why: in the low-amplitude range of the µ-law table, our encoded values are only going to be off by a small margin, since the entries are close together. For example, if our sample value is 3 and it’s mapped to 0, we’re only off by 3. But since we’re dealing with 16-bit samples, which have a total range of 65,536, being off by 3 isn’t so bad. As amplitude increases we can miss the mark by much greater amounts (since the entries get farther and farther apart), but that’s OK too—the whole point of µ-law encoding is to exploit the fact that at higher amplitudes our ears are not very sensitive to amplitude changes. Using that fact, µ-law compression offers near-16-bit sound quality in an 8-bit storage format.

Prediction Algorithms

A third type of compression technique involves attempting to predict what a signal is going to do (usually in the frequency domain, not in the time domain) and only storing the difference between the prediction and the actual value. When a prediction algorithm is well tuned for the data on which it’s used, it’s usually possible to stay pretty close to the actual values. That means that the difference between your prediction and the real value is very small and can be stored with just a few bits.

Let’s say you have a sample value range of 0 to 65,536 (a 16-bit range, in all positive integers) and you invent a magical prediction algorithm that is never more than 256 units above or below the actual value. You now only need 8 bits (with a range of 0 to 255) to store the difference between your predicted value and the actual value. You might even keep a running average of the actual differences between sample values, and use that adaptively as the range of numbers you need to represent at any given time. Pretty neat stuff! In actual practice, coming up with such a good prediction algorithm is tricky, and what we’ve presented here is an extremely simplified presentation of how prediction-based compression techniques really work.


Xtra bit 2.7
Delta modulation




The Pros and Cons of Compression Techniques

Each of the techniques we’ve talked about has advantages and disadvantages. Some are time-consuming to compute but accurate; others are simple to compute (and understand) but less powerful. Each tends to be most effective on certain kinds of data. Because of this, many of the actual compression implementations are adaptive—they employ some variable combination of all three techniques, based on qualities of the data to be encoded.

A good example of a currently widespread adaptive compression technique is the MPEG (Moving Picture Expert Group) standard now used on the Internet for the transmission of both sound and video data. MPEG (which in audio is currently referred to as MP3) is now the standard for high-quality sound on the Internet and is rapidly becoming an audio standard for general use. A description of how MPEG audio really works is well beyond the scope of this book, but it might be an interesting exercise for the reader to investigate further.

<-- Back to Previous Page Next Chapter -->