< PREV | NEXT > | INDEX | SITEMAP | SEARCH | LINKS | UPDATES | BLOG | EMAIL | HOME

[2.0] JPEG Image Compression

v1.1.2 / chapter 2 of 3 / 01 mar 07 / greg goebel / public domain

* While compressing text files is useful, compressing still image files is almost a necessity. A typical image takes up much more storage than a typical text message, and without compression images would be extremely clumsy to store and distribute. This chapter discusses ways to compress images, focusing on the "JPEG (Joint Photographic Experts Group)" specification for image compression.


[2.1] LOSSLESS & LOSSY IMAGE COMPRESSION
[2.2] JPEG (1): LUMINANCE & CHROMINANCE IMAGE ENCODING / BLOCKS
[2.3] JPEG (1): SPECTRAL CONVERSION VIA DCT
[2.4] JPEG (3): QUANTIZATION / ZIG-ZAG SCAN / LOSSLESS OUTPUT ENCODING
[2.5] JPEG SUMMARY & STREAM FORMAT
[2.6] JPEG VARIANTS / JPEG-2000 / JBIG
[2.7] VECTOR & FRACTAL COMPRESSION

[2.1] LOSSLESS & LOSSY IMAGE COMPRESSION

* The increasing use of digitized images, particularly on the Internet, has led to the need to compress such imagery to allow economical storage and fast data transfer.

A digital image, or "bitmap", consists of a grid of dots, or "pixels", with each pixel defined by a numeric value that gives its color. Images are defined as either "realistic", meaning photographs or realistic artwork with many shades and few straight lines, or "nonrealistic", meaning graphs, block diagrams, graphics banners, and cartoons with block colors and many straight lines.

The number of different colors available in a digital image is given by the number of bits assigned to each pixel. Common "color resolutions" are 1 bit per pixel, for solid black-and-white nonrealistic images; 8 bits per pixel, for grayscale images, nonrealistic color images, and coarse realistic images; and 24 bits per pixel, for "photographic quality" realistic images. 48 bits per pixel is in increasing use for ultrahigh quality images.

An image stored in an uncompressed file format, such as the popular BMP format, can be huge. An image with a pixel resolution of 640 by 480 pixels and 24-bit color resolution will take up 640 * 480 * 24/8 = 921,600 bytes in an uncompressed format.

Lossless compression algorithms can be used to squeeze down images and then restore them again for viewing completely unchanged. The traditional lossless compressed image file scheme is Compuserve "Graphics Interchange Format (GIF)", which uses a variation of the lossless LZW compression algorithm as outlined in the previous chapter to compress images.

GIF has a few limitations. The first is that it cannot compress images with color resolutions greater than 8 bits. This is not a great problem with monochrome images or with simple nonrealistic block color images like diagrams, but it is troublesome with color photographs, which are normally 24 bits and have to be reduced in color. This is done either by selecting the closest matching colors available in 8 bits, which can give "off" results on occasion; or by "dithering", selecting a speckle pattern of two more colors that approximates an intermediate color, which tends to make the image appear grainy.

The second problem is that the LZW algorithm has strings attached in the form of copyright protection that lead to periodic fears of legal obstructions to its use in GIF; Compuserve made noises about asserting proprietary rights to the file format, which came to nothing except for alarming a large number of users. For these and other reasons, a group of enthusiasts created a lossless compressed image file scheme known as "Portable Network Graphics (PNG, pronounced 'ping')" that can compress images with 24-bit or 48-bit color resolutions, and provides many other handy features.

PNG is an inspiring labor of love by a group of benign software guerrillas. PNG is based on freeware ZIP compression scheme to ensure that there will be no legal obstacles to its use. PNG has caught on, and gradually should replace GIF. In general, PNG provides about 15% better compression ratios than GIF, since PNG performs compression along both the horizontal and vertical axes while GIF only compresses along the horizontal axis.

* While lossless compression is a necessity for text files, it is not for image files. Compression algorithms that involve a loss of image detail can provide much higher compression ratios than lossless algorithms.

In fact, in a sense, there's no strong limit on how much compression can be obtained using a lossy compression scheme. It depends on how much degradation in image quality a user can tolerate. In the absurd case, a bitmap image could be reduced to a single bit that specifies an all-black or all-white image, but nobody would really want to do this. There are practical limits to the compression ratios that can be obtained by lossy compression.

The "Joint Photographic Experts Group (JPEG)" specification has become the practical standard for storing realistic still images using lossy compression. JPEG can generally give much higher compression ratios than GIF while retaining good image quality. Useful JPEG compression ratios are typically in the range of about 10:1 to 20:1. JPEG can obtain higher compression ratios on images with high pixel resolutions, simply because the degradation isn't as noticeable on a big image.

Compression ratios can be increased if higher degradation is tolerable. Most software that can create JPEG images, such as paint programs, allows the user to set a "quality factor" to specify the percentage level of degradation. This quality factor is a more or less arbitrary scale and may not give exactly the same results in different software packages, but as a rule a 75% percent quality factor gives a perfectly acceptable image, while quality factors above 95% buy very little improvement in the image, and quality factors below 10% produce a completely unacceptable image.

As will be explained in the following section, JPEG compresses the color information, or "chrominance", in an image separately from the actual details of shapes, or "luminance". Luminance amounts to a grayscale image, while the chrominance amounts to a wash of colors painted on top of that grayscale image. The eye is much more sensitive to the details of shapes than color information, so the chrominance information can be compressed to a greater level than the luminance information. This means that JPEG will usually have higher compression ratios for color images than for grayscale images.

JPEG does not work very well for nonrealistic images, such as line drawings and graphs, where a single wrong pixel is immediately obvious. In fact, JPEG does poorly at handling images that have straight lines or abrupt edges, and does not do a good job on text embedded in images either. GIF and PNG are not only more appropriate for nonrealistic images such as line drawings, but can also achieve higher compression ratios if the images are relatively simple, with large block lettering, solid blocks of color, and so on.

JPEG is also not appropriate for storing images in a master archive as it causes an immediate degradation of the image relative to its uncompressed format. In addition, if the image is retrieved, modified, and then restored as a JPEG, the degradation is likely to get worse.

BACK_TO_TOP

[2.2] JPEG (1): LUMINANCE & CHROMINANCE IMAGE ENCODING / BLOCKS

* The JPEG committee was formed under the umbrella of the International Standards Organization (ISO) in 1986, leading to the initial release of a specification in 1991. The JPEG specification describes in general terms how to compress a digital image. It does not specify a particular implementation.

In practice, JPEG is most often used to compress 24-bit color or 8-bit grayscale images. The following discussion assumes that the JPEG coder is compressing a 24-bit color bitmap image. Comments on how grayscale or low color resolution images are handled are added at the end of the discussion.

This section and those following discuss the details of JPEG image compression. Since it is very complicated, a summary is provided following these sections to review the main concepts.

* Each numeric value that describes the color of a pixel in a 24-bit color image actually breaks down into three values that define the exact color. There are two ways to define this set of three color values. Most of the computer-literate are familiar with the "RGB" color description scheme, where each pixel value is a set of by three numbers giving the red, green, and blue color value. For example, in RGB, each 24-bit value breaks down into three 8-bit values, each giving the intensity of red, green, and blue in a scale from 0 to 255.

In the "luminance-chrominance" scheme, used in traditional US analog color TV, a pixel value is given by its grayscale brightness level, or "luminance", and by a color value, or "chrominance".

Chrominance actually amounts to two values, one that describes the "hue", or specific color within a linear range of colors, and the other that describes the "saturation", or intensity of the color.

By the way, although chrominance is specified by two values, the actual values used aren't precisely the hue and the saturation. They're components of a two-dimensional vector, where the phase or angle of the vector gives the hue, and the magnitude of the vector gives the saturation. This indirect scheme is an artifact of the way the color signal is modulated in analog color TV broadcast. If this detail seems confusing, forget it, don't worry about it. You can just take it for granted that luminance is described by one value for each pixel, and chrominance is described by a pair of values for each pixel.

Following TV usage, the luminance component is designated "Y", and the two chrominance components are designated either as "I" and "Q", or "U" and "V", or "Cr" and "Cb". For all I know these may actually represent different formulas for chrominance, but if so it doesn't matter in this discussion.

The RGB and luminance-chrominance schemes are equivalent, and given one the other can be derived by some simple calculations. In principle, JPEG compression can use either the RGB or luminance-chrominance scheme, but the luminance-chrominance scheme seems to be universal in practice, and this discussion will assume its use.

* JPEG compression begins by converting the original two-dimensional image data array into three two-dimensional arrays, consisting of one array of luminance data and two arrays of chrominance data. Each of these three arrays is then further divided up into a grid of two-dimensional arrays of 8 rows of 8 pixels, giving 64 pixel values per block. These 8x8 "blocks" are the target of JPEG encoding.

While computers display images using the RGB scheme, the pixel data is converted to the luminance-chrominance scheme because it offers greater possibilities for compression. As mentioned earlier, this is because the luminance information contains most of the detail perceived by the human eye, while the overlying chrominance color information can be fuzzy without causing any serious image degradation. The chrominance information really just defines a tint overlaid on a grayscale image.

This means that JPEG doesn't necessarily need to keep all the chrominance information, and the specification allows for a feature known as "decimation" in which part of the chrominance information is simply thrown away. For example, compression can be increased by only sampling every other horizontal pixel in a chrominance block, which cuts the number of chrominance bits in half. This is known as "horizontal decimation", using a decimation factor of 2, and results in one decimated 8x8 chrominance block for every two luminance blocks.

Even more compression can be achieved by only sampling every other horizontal and vertical pixel in a chrominance block, which cuts the number of chrominance bits to a fourth. This is known as "horizontal and vertical decimation" using a factor of 2, and results in one decimated 8x8 chrominance block for every four luminance blocks.

More sophisticated decimation algorithms, such as averaging, can be used, and it is possible to decimate by a factor of four. There is an odd nomenclature for designating decimated data. A block with no decimation is referred to as "4:4:4"; one with horizontal decimation only is referred to as "4:2:2"; and one with horizontal and vertical decimation is referred to as "4:2:0".

BACK_TO_TOP

[2.3] JPEG (1): SPECTRAL CONVERSION VIA DCT

* The first thing done to the 8x8 blocks in compression is to convert their values into a "spectrum". To understand what this means and how it is done, for the moment arbitrarily assume that the block is from the luminance component of the image. Actually, all three components are compressed in the same way, but the luminance data provides the easiest example to visualize.

As you should know by now, the luminance data corresponds to a grayscale version of the original image. In more mathematical terms, the luminance data defines a three-dimensional waveform or "contour map" of the brightness of the image in the block, with "peaks" where the image is bright and "valleys" where it is dark.

Obviously, the human eye can detect coarse details in the luminance image better than it can detect fine details. This means that if the fine details are mathematically sorted out from a luminance block and thrown away to save storage space, the image will (hopefully) still have an acceptable appearance.

This is the basic approach of the "lossy" compression algorithm used by JPEG. The trick is figuring out how to sort out the fine details so they can be discarded. This is done through "Fourier analysis". According to Fourier analysis, any waveform can be broken down into a set of pure sinusoidal waveforms of varying frequency and amplitude. This is done by a mathematical "frequency domain transform" operation, which converts the input waveform into a list of amplitude values, or "coefficients", for each pure sinusoidal value that makes up the original waveform. The list of coefficients is known as a "spectrum".

In the current example, running the 8x8 block of luminance values through a frequency domain transform results in an 8x8 block of frequency coefficients. The low frequency coefficients correspond to the coarse details of the block, while the high frequency coefficients correspond to the fine details of the block. In principle, high compression ratios can be obtained, at the expense of fine details, by throwing away high frequency coefficients, with increased compression obtained by throwing away more coefficients.

In practice, JPEG doesn't actually throw the high frequency coefficients away. As will be explained momentarily, it just coarsens them, but the basic idea is the same.

* By the way, the usual way to think of a frequency domain transform is that it converts a waveform that is a function of time into a set of amplitudes for frequency components that can, in principle, be expressed in hertz. For example, in the frequency analysis of musical sounds, a complicated musical signal is broken down into a set of coefficients describing the simple sound waves that make it up in terms of their amplitudes and frequencies.

However, in JPEG image compression the luminance values are just a set of pixels that could represent anything. This means that the sinusoidal values in the spectrum obtained from the luminance data are not expressed in hertz. They only have "wavelengths" measured in terms of number of pixels and corresponding array elements in the original luminance data, not time, and the spectrum is only defined as multiples of a starting (or "fundamental", or "base") frequency. These multiples are known as "harmonics".

Furthermore, the spectrum is simply a set of coefficient values giving the amplitudes of the harmonic components. The actual harmonics the coefficients refer to are defined by the array indexes.

If this sounds baffling, a simple example should help make it clear. Let's consider of converting a single row of a luminance block into a spectrum stored in an array with indexes from 0 to 7. The resulting data looks like this:

The element with index 0, referred to as the "zeroth harmonic", doesn't represent a sinusoidal waveform. It represents a constant value, which is a "degenerate case" of a sinusoidal waveform. The zeroth harmonic is the average value of the waveform, and so is also called the "constant component".

Of course, in the actual case of converting a luminance block to a spectrum, we actually end up with a two dimensional spectrum stored in an array with 64 elements and indexes from (0,0) to (7,7). The element with the indexes (0,0) is, once more, the zeroth harmonic or constant component, and is a constant value that gives the average brightness of the original luminance block.

All the other array elements give the coefficients of harmonics. The array element with the index (0,1) gives the coefficient of the base harmonic that only varies in the horizontal direction, the element with (1,0) gives the coefficient of the base harmonic that only varies in the vertical direction, and the element with index (1,1) gives the coefficient of the base harmonic that varies in both directions.

Similarly, other indexes give the coefficients of appropriate combinations of higher harmonics in either direction, finally ending with the array element with index (7,7). This element gives the coefficient of the seventh harmonic in both directions.

* The best known frequency domain transform is the Fourier transform and its various digital implementations, particularly the efficient "fast Fourier transform (FFT)". However, JPEG compression uses a different but similar transform known as the "discrete cosine transform (DCT)".

The FFT and the DCT give similar results, except for the way they handle endpoints. For a simplified example, consider an one-dimensional luminance waveform made up of eight points that looks like this:

   :
   :                           *
   :                        *
   :                     *
   :                  *
   :               *
   :            *
   :         *
   :      *
   :
   :..................................
Running this waveform through an FFT, eliminating some of the high frequency components, and converting it back to a waveform again with an "inverse FFT (IFFT)" gives something like:
   :
   :                            
   :                        *
   :                      *    *
   :                  *
   :              *
   :      *     *
   :         *
   :       
   :
   :..................................

In comparison, running the waveform through a DCT, trimming off some of the high frequency components, and converting it back to a waveform again with an "inverse DCT (IDCT)" gives something closer to the original straight line:
   :
   :                           *
   :                         *
   :                     *
   :                  *
   :               *
   :           *
   :         *
   :      *
   :
   :..................................

This example uses an eight-point one-dimensional waveform for simplicity, but the same principle applies to the 8x8 two-dimensional DCT used in JPEG compression.

* The precise details of the DCT are beyond the scope of this article, would make the discussion far more complicated, and are more appropriate to a document on signal processing. However, as mentioned earlier, the end result of performing a DCT on an 8x8 block of luminance values is an 8x8 block of harmonic coefficients for a set of implied two-dimensional sinusoidal waveforms.

Incidentally, if you do a "zoom" with a paint program to inspect the pixels of a JPEG image stored with a high compression ratio, you can actually see "JPEG artifacts" that reflect these harmonics, such as checkerboard or staggered-bar patterns. At high compression ratios, the high frequency harmonic information has been degraded and no longer conceals the lower harmonic patterns. What is particularly fascinating is that for normal viewing the actual image can often look just fine even with all these artifacts, though they become increasingly obvious as the quality factor is lowered and the compression ratio increased.

BACK_TO_TOP

[2.4] JPEG (3): QUANTIZATION / ZIG-ZAG SCAN / LOSSLESS OUTPUT ENCODING

* By the way, the coefficient block that results from the DCT is equivalent to the original luminance block. No information has been thrown away. If you run the coefficient block back through an IDCT, aside from normal small computer calculation roundoff errors you get the original luminance block again.

The next step in the compression process, known as "quantization", is the one that throws away information. Since the eye is less sensitive to high harmonics (fine details) in an image than low harmonics (coarse details), it is also less sensitive to alterations in the high harmonics, and the coefficients of the high harmonics can be represented with fewer bits than the lower harmonics.

For example, suppose the values in the coefficient block are all 8 bit integers, representing values from 0 to 255. For the lowest harmonics, we can keep the values at 8 bits. For the highest harmonics, we could, for example, convert the values to 4 bits, representing values from 0 to 15, saving half the storage for these coefficients. This is the "coarsening" referred to earlier.

The conversion can be done by dividing the original 8-bit value by 16 and throwing away the remainder, which a computer can do with a trivial shift operation. For example, if a high harmonic coefficient has a value of 137, dividing by 16 gives a value of 8. This will be restored to a value of 8 x 16 = 128 by the uncompression process, which is a reasonable approximation.

Given 8-bit coefficient values, the "quantization factor" can range from a value of 1, meaning no loss, to 128, which compresses the coefficient value to a single bit. The quantization factor is different for each of the 64 coefficients in the block. JPEG uses a single 64-element table of quantization factors, or "Q table", to store the values for quantization, with the same table applied to all the blocks.

The quantization factors are not fixed. Adjusting the quality value for storing a JPEG adjusts the quantization factors for the coefficient block, trading off compression ratio for image quality.

* Once a block has been converted to a spectrum and quantized, the JPEG compression algorithm then takes the result and converts it into a one dimensional linear array, or "vector", of 64 values, performing a "zig-zag" scan by selecting the elements in the numerical order indicated by the numbers in the grid below:

        0   1   2   3   4   5   6   7
       ______________________________ 

   0:   0   1   5   6  14  15  27  28
   1:   2   4   7  13  16  26  29  42
   2:   3   8  12  17  25  30  41  43
   3:   9  11  18  24  31  40  44  53
   4:  10  19  23  32  39  45  52  54
   5:  20  22  33  38  46  51  55  60
   6:  21  34  37  47  50  56  59  61
   7:  35  36  48  49  57  58  62  63
       ______________________________

This places the elements of the coefficient block in a reasonable order of increasing frequency. Since the higher frequencies are more likely to be zero after quantization, this tends to group zero values in the high end of the vector. The vector is then compressed using traditional lossless compression algorithms.

Actually, just to make things confusing again, the first element of the vector, which stores the quantized coefficient for the constant component of the luminance block, is compressed differently from the other 63 quantized coefficients, and after that they both go through a second stage of compression. JPEG seems to leave no stone unturned in its search for high compression ratios.

The idea of compressing the first element separately is particularly baffling at first sight, since there's only one value and that wouldn't seem to offer much room for compression. The trick is that the compression is performed between the DC terms of consecutive blocks, using "delta modulation".

Since the DC term doesn't change much from luminance block to neighboring luminance block, then the difference between the values of the coefficients of the DC term for the two blocks is usually small. In delta modulation, only the difference between the two coefficients is stored. Of course the DC term coefficient for the very first block is compared to a value of 0.

The other 63 values in the vector are compressed as a group, not compressed relative to vectors derived from neighboring blocks. The compression is performed with RLE, which takes advantage of the clustering of zero values by the zig-zag scan. Once the quantized coefficients have gone through their first stage of lossless encoding, they go through the second stage of compression using Huffman coding.

JPEG permits the use of a default Huffman table, or custom Huffman tables derived specifically from the data in the image being compressed. Use of a default Huffman table will very likely give poorer compression than use of custom Huffman tables, but the default tables don't need to be stored with the rest of the data, so there's a trade-off.

* Although this discussion has focused on compressing the array of luminance values, the same process is used to compress the two arrays of chrominance values as well. The only differences are that chrominance data may be decimated, as discussed earlier, and the chrominance values may have their own Q-tables.

In any case, decompressing the encoded image uses exactly the reverse steps of the process described above: Huffman decoding; RLE and delta-modulation decoding; unpacking from a vector format to a block format; dequantizing; IDCT; reassembly of blocks into a full image component; and reassembly of the image components into the image.

* Of course, compressing a grayscale image is equivalent to compressing just the luminance component. However, it appears that JPEG cannot compress a color image with 8-bit color resolution, at least not as such. As best as I can figure out, this is because JPEG can only compress a color bitmap that is a "true color" image, while a color bitmap with 8-bit color resolution is "color mapped".

In a true color image, such as a bitmap with 24-bit color resolution, the value for each pixel actually gives the pixel color as an RGB value. In a color-mapped image, the value for each pixel is an index into a table of 256 RGB values, known unsurprisingly as a "color map".

The RGB values in the color map may be 12 or more bits of color resolution. This allows an 8 bit color bitmap to have access to a wide range of colors, but the trick is that it can only use 256 different colors from that range at one time. The selection of 256 colors is known as a "palette". An image file for a color mapped image consists of a grid of indexes, one for each pixel, plus a table of color map values defining the palette.

This is why strange things happen when cutting and pasting 8-bit color images in paint program. If you have two different 8-bit color images, each will usually have a different palette. If you cut an item from one and paste it into the other, the colors in the pasted item will often look very strange, since the image into which it is pasted has a different palette and the indexes for the pasted element are now pointing to the wrong colors. Naturally, this doesn't happen with 24-bit color or other true color images, since they don't use a color map. You can cut and paste between two images and the results look fine. Since 24-bit color is effectively standard these days, few will ever see this problem, but in the old days it was a common difficulty.

Anyway, JPEG compression requires that the bitmap image provide RGB values so it can convert them into luminance and chrominance components. The pixel values for an 8-bit color bitmap aren't RGB values, they're indexes. The JPEG coder will use the indexes and the color map data provided with the image to produce a true color image, and then compress it. While reducing color image resolution from 24 bits to 8 bits will reduce file size by a factor of three for uncompressed images and will have at least that much effect on images compressed with typical lossless methods, it has no effect with JPEG since the 8 bits get converted back into 24 and all you've done is lose colors in your image.

BACK_TO_TOP

[2.5] JPEG SUMMARY & STREAM FORMAT

* JPEG compression is confusing, as even JPEG advocates will admit, so a summary may help make matters clearer. JPEG compression follows these steps:

The result of the compression is a stream of compressed data, which could be stored as a file or sent over a communications link. The structure of the JPEG stream follows directly from the compression algorithm.

The stream starts with a "start of image" flag and ends with an "end of image" flag. The contents between these flags define a "frame", which gives the entire image, which is structured as follows:

BACK_TO_TOP

[2.6] JPEG VARIANTS / JPEG-2000 / JBIG

* There are a number of variations on the basic JPEG compression scheme. The most important is "progressive JPEG", which is basically the same as standard JPEG except that it allows an image to be progressively displayed in greater detail as it is downloaded. There are a number of approaches to progressive encoding, but one of the most straightforward is to send the constant components of the blocks first, then keep sending higher harmonics, making the image increasingly clear.

There are "lossless JPEG" specifications as well, which as their name indicates, compress without throwing any detail away. Unsurprisingly, they don't use the same compression techniques as lossy JPEG, and since they are not in widespread use, if they're used at all, they will not be discussed further here.

There is also a "hierarchical" JPEG mode, which is used for encoding images for presentation on a smaller display, and "Motion JPEG", which is used for encoding video by using JPEG compression on individual frames. Neither of these appear to be in much use. Motion JPEG is clearly inferior to the popular MPEG video compression spec in terms of compression ratios. MPEG, discussed in the next chapter, has some similarities to JPEG but involves heavy use of delta modulation, generally sending changes in images from video frame to video frame instead of full frames, allowing it to obtain much greater compression.

Various different implementations of Motion JPEG appear to be in some use in video editing systems, since the delta modulation scheme used by MPEG tends to make video splicing a little more troublesome.

* JPEG compression is based on the DCT, a form of sinusoidal transform. There is another class of transforms, based on functions known as "wavelets", that instead of sinusoidal functions use sets of functions that resemble jagged pulses. They can in principle represent a time domain waveform with fewer coefficients than sinusoidal functions.

A improved version of JPEG, named "JPEG 2000", that replaces the DCT with a wavelet transform, has been introduced. The ISO JPEG 2000 committee officially released the specification in 2001 under the designation of "ISO 15444". Software supporting the new image format is now gradually becoming available, though at last notice it wasn't supported by Web browsers, which would clearly signal its coming-of-age. Apparently there are fears that some parts of JPEG 2000 may conflict with patents, and so there is some reluctance of big organizations to adopt it.

As with traditional JPEG, JPEG 2000 begins by converting an RGB image into a luminance-chrominance image, and the chrominance information can be decimated if desired. The resulting data is then split into subarrays or "tiles", which are wavelet-transformed to the desired number of wavelet terms. This data in turn is "sliced and diced" in a complicated fashion and put through three levels of arithmetic coding to produce "packets" that make up JPEG 2000 stream. The packets are organized in such a way that a substream could be selected to give a smaller "thumbnail" image, or a reduced-quality image, or a monochrome image, or whatever.

JPEG 2000 is estimated to give about 20% better compression than traditional JPEG. Comparisons of the same image stored in JPEG and JPEG 2000 format with quality settings to give the same file size show a definite improvement in image quality, with most of the notorious ugly JPEG artifacts simply disappearing in JPEG 2000.

An enhancement known as "mixed-raster content" is under consideration for JPEG-2000, in which a coder will switch compression algorithms to, for example, allow crisper compression of blocks of text, which tend to have a blurry appearance in the current JPEG implementation. Another enhancement is built-in support for Motion JPEG, allowing a JPEG image to incorporate "metadata" that describes the order and speed of frame presentation.

* JPEG is a grossly inefficient scheme for compressing black and white, or "bilevel", images, since they have to be converted to 24 bits, and so a different specification, known as "JBIG", has been devised to address that issue. It has little do to with JPEG in terms of implementation, and is really a complementary specification, not a variation on JPEG. PNG seems like a better tool for compressing such images.

BACK_TO_TOP

[2.7] VECTOR & FRACTAL COMPRESSION

* Some other compression schemes, such as vector quantization and fractal compression, have some interest value, though they are not in common use.

For a conceptual explanation of vector quantization, suppose a black and white image is broken down into 8x8 blocks as in JPEG quantization. Suppose also that an encoder, and its corresponding decoder, have a "library" of 256 or more blocks that approximate the actual pattern in any arbitrary image block. It would be possible to obtain an 8:1 compression ratio by matching each block in the image with a block in the library, and just sending a code byte designating which block in the library matches the image block. Compression could be increased by using some type of lossless compression on the coded bytes.

* Fractal compression is based on transform mathematics, but the approach is very different from that of conventional frequency analysis.

Fractals are geometric figures based on recursive expansion of a basic geometric figure. For example, the "Sierpinski triangle" is a fractal pattern that consists of a small triangle built up hierarchically as follows:


           1       2             3                   4

           *       *             *                   *
          * *     * *           * *                 * *
                 *   *         *   *               *   *
                * * * *       * * * *             * * * *
                             *       *           *       *   
                            * *     * *         * *     * *  
                           *   *   *   *       *   *   *   * 
                          * * * * * * * *     * * * * * * * *
                                             *               *
                                            * *             * *
                                           *   *           *   *
                                          * * * *         * * * *
                                         *       *       *       *   
                                        * *     * *     * *     * *  
                                       *   *   *   *   *   *   *   * 
                                      * * * * * * * * * * * * * * * *

Figure 1 on the left consists of a basic triangle. Figure 2 is a triangle built from three basic triangles. Figure 3 is built from three of the second figures, or nine basic triangles, while figure 4 is built from three of the third figure, or nine of the second figure, or 27 basic triangles.

This is a very simple fractal and it is hard to see what it has to do with image compression, but a more sophisticated fractal can be used to build a realistic image of a fern leaf, where the fronds of the leaf look like the leaf itself, and on down to the limit of graphics resolution. Fractals are said to be "self similar", meaning that if you look at a small part of the image, such as one of the fronds of the fern leaf, it looks like the entire image of the fern leaf.

Fractal patterns are well suited for representation of hierarchical objects like trees, or objects like clouds, mountains, or coastlines that reveal a lower level of "self similar" detail at closer focus. Fractals have often been used in computer graphics videos, as they tend to provide pleasing and, in some cases, highly lifelike images. The well-known "Genesis Torpedo" computer video segment from the movie STAR TREK 2: THE WRATH OF KAHN is one of the best known examples of fractal computer graphics, with a wave of fire sweeping over a barren planet that then becomes covered with life.

Given that fractal patterns can be used to generate images of objects, it's not too big of a conceptual jump to think that an arbitrary image could be broken down into a series of fractal patterns. This opens the possibility of extremely high image compression ratios. The image would be analyzed to determine the fractal patterns that make it up and transformed into relatively small and compact sets of parameters that define the proper fractal algorithms, The parameters could then be further compressed with some lossless compression algorithm, in principle resulting in a very compact graphics file.

* Fractal compression is not a new idea. It goes back to the late 1980s and an organization named Iterated Systems, which was founded by several mathematicians from the Georgia Institute of Technology. The scheme is confronted by two obstacles: first, figuring out transform operations that can convert an arbitrary image element into a fractal, and second, performing these operations in a reasonable amount of time, as they tend to be very computation-intensive.

The first problem, converting an image into a fractal algorithm, turned out to be particularly difficult, and in fact Iterated Systems was accused of fraud by some critics. Using a fractal algorithm to produce a pretty fern leaf is one thing, but taking a real-life fern leaf with its many imperfections and converting it to a fractal algorithm is another thing entirely, and although there are many objects in the real world that have fractal properties, many of them don't.

Critics referred to the original scheme for transforming an image into a fractal as the "graduate student algorithm". It involved locking a graduate student into a room with a graphics workstation until he or she had come up with the appropriate fractal algorithm to recreate the image. Automated methods have been devised, but the critics are still suspicious. It has to be noted that data compression hacks tend to be critical and quarrelsome in general and it is difficult to know how seriously to take their arguments sometimes.

Fractal compression advocates claim the technique can provide "resolution enhancement", meaning that if an image compressed with fractal encoding is enlarged, it's still detailed. This is a bit slippery because this only really applies to elements that have a fractal appearance where the concept of self-similarity applies. That is, if you have an image of a cloud, which is a fractal form, and enlarge it tremendously, it still looks like a pretty good image of a cloud. However, all that additional detail is a fabrication, created by the fractal algorithm. If there were an original image available at the same scale, the fine details would be different. The resolution enhancement breaks down seriously for elements that do not have a fractal appearance. As the saying goes, you really can't get something for nothing.

All the controversy aside, fractal compression is typically very time-consuming, though decompression is relatively straightforward since it simply requires chugging through the fractal algorithms, not determining what they are in the first place. In practice, so far fractal compression has not provided significantly higher compression ratios than, say, JPEG.

Fractal compression seems to be in limbo. Fractals are not as sexy a notion now as they were in the 1980s, when everyone overdosed on psychedelic Mandelbrot set pictures, and nobody seems to be in any hurry to jump on board fractal compression. However, it seems like an interesting enough idea to continue investigation; it might come in handy one of these years.

BACK_TO_TOP


< PREV | NEXT > | INDEX | SITEMAP | SEARCH | LINKS | UPDATES | BLOG | EMAIL | HOME