Mediascape: JPEG, Standard Compression for Still Images

Last month, Mediascape detailed the technical foundation for the entire digital world: digital sampling of analog signals for sound, still pictures and video. In this two-part article, we will examine techniques from two standards groups — the Joint Photographic Experts Group, or JPEG, for compressing still images, and the Motion Picture Experts Group for motion video. MPEG will be next month’s topic.

There is no shortage of techniques for compressing digital information. Computer bulletin boards routinely compress files before making them available to the public, using programs such as Stuffit and PKZip. Hardware products such as Disk Doubler and its many brethren get more mileage out of a computer’s hard disk by compressing information on the wing as it is being written, then decompressing it whenever it is read. As a class, all of these products can usually squeeze a file down to somewhere between 60 and 40 percent.

But those products labor under an awful constraint: they must be able to reconstruct the data perfectly. If you don’t need perfect data, you can compress the data a lot harder. Sounds and images are in this category; the difference between a perfect reproduction and a close facsimile is usually unnoticeable. This is fortunate, because these data types consume prodigious amounts of storage and need all the compression they can get.

“Tanstaafl.” Information theory reiterates what economics and thermodynamics have long known: “there ain’t no such thing as a free lunch.” If you want to have a snappy picture with good color range (especially in the highlights and shadows) and well-defined details, you’re going to pay for it. Fortunately, you get a choice of how to pay: by using lots of storage (a brute-force approach that requires no special computation), or by using lots of computing power to compress and reconstruct the data. With today’s technology, the choice is a toss-up as long as the picture stays on your computer’s hard disk; but the minute you want to transmit images to other machines, the scales tip toward compression.

Compression works because there is usually a lot of redundant information in the original data. Where there is no redundancy, nearly all compression algorithms make things worse rather than better. There are only two tricks you can use to compress still images. (For moving images, there are a few more.) All of the products in use today are variations on these themes.

Trick one: Exploit the statistics. Image data is rarely random. If you know what kind of regularities to expect, you can encode the data very efficiently. A vast number of coding schemes have been developed, each suitable for a different class of regularities. Examples that are relevant here include:

• Color-space transform. Although RGB (red-green-blue) data is what cameras generate and monitors require, it really isn’t a very compact way to describe color. The trouble is that the RGB color values bunch up at the ends of the scale and are spread thin in the middle. Other color spaces, such as YUV (see the “gory details” sidebar), make more uniform use of all the numerical values. For good-quality color, you need 24 bits of RGB per pixel. By recoding the image in YUV, you can get the same quality with 16 bits per pixel.

• Run-length coding. If your data has a pattern that repeats over and over, you can send the pattern once, along with a count of the number of repetitions.

• Huffman coding. If some patterns are quite common while other patterns are relatively rare, you can assign the shortest codes to the patterns that occur most often. This is not a new idea; International Morse Code uses it. The most common letter in Western languages is E, which is sent as a single dit; the next most common is T, which is a single dah; and so forth down to Q = dah dah dit dah. (Incidentally, there’s a nice story on Huffman coding in the September 1991 issue of Scientific American.)

All the above methods allow lossless compression: when you decompress the signal, you get the original back undamaged. (Well, more or less: color-space transforms suffer from round-off errors if you don’t maintain enough precision in your calculations.) You can get somewhere between 2:1 and 10:1 compression ratios with various combinations of these techniques.

Trick two: Suppress information. As it happens, the human eye doesn’t perceive all signals equally; it is tuned to pay more attention to some features than others. The trick, then, is to ditch the features that won’t be missed anyhow. For example, the eye is sensitive to fine-edge detail and to smooth color transitions, but not to both in the same place.

One such scheme, which was devised by the Joint Photographic Experts Group (JPEG) and subsequently adopted as an ISO standard, achieves much higher compression factors than the lossless techniques by selectively ignoring one kind of picture information or the other. The more information you are willing to sacrifice, the higher the compression factor you can get.

The redeeming grace is that all of the quality loss takes place in the first cycle of compression and decompression. Additional compression cycles will do no further damage unless you request a higher degree of compression. The only proviso to that last statement is that if you call up a compressed image and edit it (by electronic airbrushing, despeckling, sharpening and so on), you may suffer new damage to the edited portions when you recompress. This should be no surprise; such editing is actually creating new information, which now is being compressed for the first time.

How good is JPEG? Of course, if you go throwing information away, you won’t be able to reconstruct the image with perfect fidelity. How much information can you throw away before the loss in quality becomes objectionable? As it turns out, for images there’s no single answer to this question. For one thing, it depends on the application; an art museum catalog, an engine repair manual and a daily newspaper have very different requirements.

Based on the results of commercial JPEG products we’ve seen, we think you will find that, overall, they give a lot of compression for relatively little ugliness. The size of the output file depends greatly on the content of the original image — some images just don’t compress very well — and on the amount of data you threw away. Most JPEG compression products let you choose from a range of quality/file-size tradeoffs. As a rule of thumb, for good-quality printed reproduction, you want to keep the compression under 20:1, though some images are hardly disturbed by even 30:1 or more. As you approach 80:1, even the untutored viewer will be annoyed by the degradation in typical images.

Room for differences. It is important to note that the JPEG standard does not specify how much information to discard, nor how to decide where to make the cuts. That is left to each implementation. Rather, the standard specifies how to decode a compressed data stream. Thus, there is room for substantial product differentiation, both in selection of algorithms and in the selection of execution platform.

JPEG++. It so happens that JPEG compression is very bad for text within an image; the edges are often blurred or jaggy. To combat this, Storm Technology developed a proprietary extension of JPEG, which it calls JPEG++. It allows an operator to select a rectangular portion of an image in which to preserve maximum quality. Within that rectangle, Storm uses only lossless compression techniques and throws away no data at all, so the compression ratio there is fairly low. However, in the rest of the image normal JPEG compression is applied.

We see no reason why there could not be other improvements on the compression side. For example, all current products focus on fast compression as well as fast decompression. This symmetry is important in low-volume or do-it-yourself applications such as desktop publishing and office image archiving. But in systems that prepare images for mass distribution on CD-ROMs, it might pay to spend a long time on compression to fully optimize the quality/size tradeoff.

JPEG movies. We’ve used all the space in this article to talk about a technique for still images, when most of the interest in the digital world centers on compressing digital video. The method in our madness is that a moving picture is just a sequence of still images.

As we’ll discuss next month, the proposed MPEG standard uses JPEG-style compression as a major building block. However, it goes further and tries to squeeze out the frame-to-frame redundancy (i.e., parts of the image where nothing is moving) and thus gains much higher compression factors.

That’s great — unless you need to edit the individual frames of a movie, in which case it’s terrible. A producer of video sequences may therefore wish to keep every frame as a JPEG still image during the editing process. As a last step before packaging the product, you can re-encode the movie using MPEG algorithms; the conversion is tedious, but you only do it once. Tune in next month for details.

Peter Dyson

JPEG: THE GORY DETAILS

The JPEG compression process begins by converting a 24-bit RGB image (from a desktop scanner, a video frame grabber, etc.) into the YUV color space. The Y value for each pixel expresses the luminance: how the pixel would look on a black-and-white TV. It therefore carries most of the detail information. The U and V values express the hue and saturation of the color.

Tiles. The next step is to break the image into 858-pixel tiles and analyze each tile separately. In some regions of the picture, the edges and details are important, while in other regions the key to perceived quality is having smooth shading transitions. Since the eye won’t pay attention to both factors in the same place, the algorithm can partially suppress one or the other on a tile-by-tile basis.

Cosines. The mechanism for separating these factors is to apply what’s known as the Discrete Cosine Transform (DCT) to each of the Y, U and V signals. This number-crunching process turns a two-dimensional array of image samples into a collection of “spatial frequencies.” Low-spatial-frequency components reflect the average value of all pixels in the tile, while high frequencies are generated by the sudden shifts due to sharp edges. A tile with a lot of busy detail will have large numbers in the high-frequency components, while smooth surfaces will have tiny values.

When in doubt, throw it out. With this knowledge, an algorithm can decide what data to keep and what to discard by looking at the distribution of components. The way it throws data away is by using fewer bits to represent a given component. Small numbers can simply be zeroed out. That, in turn, means less accuracy when you reverse the transformation and reconstruct the image at the receiving end.

Exploit statistics. Finally, you apply run-length and Huffman coding to the data stream. Most of the reduction in file size actually happens here; one goal of the previous steps is to regularize the data so that these coding techniques will be efficient.

Unraveling. To decompress the image, just reverse the steps. Expand the Huffman and run-length codes (in that order) to build an 858 tile of spatial frequencies. Apply the Inverse Discrete Cosine Transform to get a tile of pixels in the YUV color coding. Convert the YUV values into RGB (usually by lookup table) and voila! it’s decompressed. Then the program can just copy the tile’s pixels into position on the screen.

JPEG++ decompression. In JPEG++, each tile can be compressed to a different degree. This requires more smarts in the compression program, but on the decompression side, there is no difference between JPEG and JPEG++. An image compressed with JPEG++ can be decompressed by a JPEG program.

Peter Dyson