v1.1.2 / chapter 3 of 3 / 01 mar 07 / greg goebel / public domain
* The increasing use of digital video has led to the use of video data compression to allow compact storage and fast video transfer rates. A number of video compression standards have been devised, such as the "ITU (International Telecommunications Union) H.261" standard for teleconferencing, and the "MPEG-1 (Moving Picture Experts Group)" and "MPEG-2" specifications for video and audio storage and transmission. This chapter discusses H.261, MPEG-1, and MPEG-2, and provides a short survey of other audio compression technologies and MPEG specs.
* Digital video is a logical extension of digital imagery, with a sequence of images, or "video frames", displayed in a rapid sequence to give the impression of motion. Typical video frame rates from conventional analog television are 30 frames per second for the US, and 25 frames per second for Europe.
Conventional analog television also uses "interlace", which means that video frames are scanned twice, with alternating lines of the image sent on each scan, reducing flickering between frames while keeping bandwidth requirements low. Interlace gives the effect of viewing 60 or 50 frames per second with a true video frame rate of 30 or 25 frames per second. New digital TV schemes actually allow displaying 60 or 50 frames per second in full detail.
Images take up a lot of memory; video obviously implies a multiplication of the storage or bandwidth requirements by the video frame rate, and so video data compression is a necessity. Video compression can be performed by compressing each frame as though it were a stand-alone image and this is in fact what Motion JPEG does, but even JPEG by itself can't give enough compression to be really useful.
However, video does not change very much from frame to frame, and in particular background imagery changes very slowly. This means that much greater compression ratios are possible if only the differences between frames are stored, using some form of delta modulation. The H.261, MPEG-1, and MPEG-2 schemes, which have an evolutionary relationship, use JPEG-like schemes to compress individual frames, but use delta modulation between frames to achieve worthwhile compression ratios. Schemes like Motion JPEG that only compress frame-by-frame are generally used in special environments, such as video editing systems.
* While MPEG is by far the predominant form of video compression, it owes much to the earlier ITU H.261 specification, and so it is worthwhile to understand H.261.
H.261 was designed to support videotelephony and videoconferencing. It supports two video resolutions, 144 by 176 pixels, and 288 by 352 pixels. It does not support "interlaced" video. H.261 is sometimes called "p*64", since it is intended to be used to support data rates in multiples of 64 kilobits per second (KBPS) from 1 to 32, depending on the quality desired for the video transmission. H.261 borrows much from JPEG in using decimation, DCT, quantization, and Huffman coding to compress individual frames, though H.261 uses fixed fixed quantization tables.
H.261 performs horizontal and vertical (4:2:0) decimation on the chrominance data. This requires that H.261 organize video into groups of four 8x8 blocks at a time for processing, defining a 16x16 "macroblock". The luminance data for a macroblock is sent as four 8x8 luminance blocks, while the decimated chrominance data for the entire macroblock is sent as one 8x8 chrominance block.
H.261 uses a form of delta modulation between frames known as "predictive coding". H.261 uses two types of image frames:
The I frames are inserted in a stream of P frames to provide references and an access point into the sequence of images. The H.261 decoder stores the last decoded frame in a buffer and uses it as a reference to reconstruct, or "predict", a new frame.
Along with delta modulation, H.261 incorporates a "motion compensation" algorithm. Without motion compensation, if an entire scene shifts slightly, for example if the camera pans from one side to another, all the pixels in the image more or less change, even though most of the information in the image remains the same. Motion compensation determines that the image has shifted and simply sends along X and Y shift factors, or "motion vectors", with a macroblock, reducing the amount of information that has to be sent to reconstruct the delta-modulated image.
A P frame will not always be completely made up of macroblocks that contain DCT delta information and motion vectors. If a macroblock of a video frame encoded in a P frame is too different from that in the previous video frame to be described by deltas or motion vectors, the macroblock will not use them, and in fact will be the same as it would be in an I frame.
H.261 also defines an error correction scheme for transmission of the encoded video information, to ensure more reliable delivery. In addition, there is a follow-on H.263 specification that extends H.261 to low-bandwidth systems and adds some other bells and whistles.
* In the early 1990s, industry interest in compressing video and audio for multimedia led to the formation of a standards committee under the umbrella of the ISO. The committee took the informal name of the "Motion Picture Experts Group (MPEG)", as mentioned earlier.
The original objective of the MPEG committee was to specify a scheme for delivery of audio and video at a data rate of 1.86 megabits per second (MBPS). This resulted in specification now called "MPEG-1", which was followed by an enhanced specification, "MPEG-2". The main focus of these two specifications is video compression, but audio compression is included as well.
As with JPEG, the MPEG specifications do not describe specific implementations. They describe the format of the compressed data, or "bitstream", and how it is to be decompressed. How it is compressed in the first place is not specifically addressed. This section discusses MPEG-1 video compression, the following sections discuss MPEG-1 audio compression, and then the MPEG-2 extensions are described.
* MPEG-1 builds on H.261 in something like the way that H.261 builds on JPEG. For US video applications, the MPEG-1 video stream generally involves video imagery with a resolution of 352 x 240 pixels, with 24-bit color and a rate of 30 video frames per second. Higher resolutions, up to 4,096 by 4,096 pixels, can be accommodated as long as the total bandwidth remains under 1.86 MBPS.
For European video applications, the usual resolution is 352 by 288 pixels, with a rate of 25 frames per second. MPEG-1 uses horizontal and vertical (4:2:0) decimation for chrominance information.
One significant difference between H.261 and MPEG-1 is that while MPEG-1 also support the I and P frames, it also supports a "bidirectional prediction (B)" frame that was derived from the previous and following frame. The B frame was added to support "rewind" functionality. Interestingly, the frames are not necessarily sent in consecutive order; they can be collected by the decoder and then displayed in the proper sequence.
* As mentioned above, MPEG-1 was intended for relatively low-bandwidth video data streams, such as would be obtained from a multimedia CD-ROM, and so its features for audio compression proved well suited for use on the Internet. The MPEG-1 specification described three levels of audio compression, designated "Layer 1", "Layer 2", and "Layer 3", the last of which went into widespread use as "MP3".
* The general specifications for MPEG-1 audio compression state that MPEG-1 supports sample rates of 48 kHz, 44.1 kHz, and 32 kHz, and can provide audio at a rate of up to 384 KBPS. Audio data compression ratios range from 2.7:1 to 24:1. MPEG-1 advocates claim that expert listeners cannot tell the difference between uncompressed music and music that has been compressed under MPEG to a compression ratio of 6:1 or less, though some audiophiles strongly disagree.
Two audio channels are supported in one of four modes:
Compression techniques for each of the three MPEG-1 audio layers are similar, but coder complexity and typical compression ratio increases with higher layers. Layer 3, for example, gives higher compression than Layer 2, which in turn gives higher compression than Level 1. Tests show that Layer 1 gives a typical 1:4 compression ratio for an audio signal with a sampling frequency of 48 kHz, while Layer 2 gives typical compressions of 6:1, while Layer 3 gives compressions of 12:1.
There is a degree of compatibility in the layers, in that a Layer 3 decoder can decompress audio compressed by Layer 1 or Layer 2 coders, and a Layer 2 decoder can decompress audio compressed by a Layer 1 coder. However, it doesn't work the other way: a Layer 1 decoder can't handle audio compressed by Layer 2 or Layer 3 coders, and a Layer 2 decoder can't handle audio compressed by a Layer 3 coder.
MPEG-1 audio compression leverages off two earlier audio-compression specifications named "ASPEC" and "MUSICAM". In fact, MPEG Level 2 is essentially identical to MUSICAM.
* MPEG-1 compression is based on a concept called "perceptual" coding, in contrast to "waveform" coding. Waveform coding is the brute-force approach to coding audio, involving taking a series of digital samples at 8-bit, 16-bit, 32-bit, or whatever resolution, with the samples taken at a rate at least twice that of the highest frequency that is to be retained.
By the way, the rule that the samples must be taken at twice the audio rate is known as the "Nyquist-Shannon criterion", though it's usually called the "Nyquist criterion". For example, the audio data on a standard music CD is stored as 16-bit samples taken as a rate of 44.100 kilohertz (kHz), and so in principle the highest frequency in the stored audio is 22.05 kHz.
Obviously, music stored using waveform coding takes up a lot of space, particularly for stereo, since two audio streams must be recorded, doubling the storage and bandwidth requirements. It is possible to achieve some compression using delta modulation.
Delta modulation can be performed by giving the full deltas between samples, using a lossless encoding scheme like Huffman coding to improve compression, or by sending one bit at a time. With the one bit approach, the 1 or 0 indicates if signal level has increased by one increment or decreased by one increment. This implies that the sampling rate has to be well above the maximum rate of change of the signal, since if the signal changes abruptly, its coded digital equivalent can only change at one step per sample clock until it catches up with analog reality.
Compression in delta modulation can also be improved by a process known as "companding", or "compressing-expanding". The human ear doesn't pick up changes in loud signals with the same resolution that it does for changes in weak signals. This means that the signal can be distorted to record loud signals at coarse resolution and weak signals at fine resolution. The signals can be restored to their actual range on output.
Companding can be performed with analog electronic circuits outside the
digital subsystem, or by digital computations within it. Companding is often
used in telephone systems to make best use of available telephone line
bandwidth. In the US and Japan, the "mu-law" companding scheme is used for
telephone communications, and is defined by the formula:
Vout = Vmax * LN( 1 + mu * Vin / Vmax ) / LN( 1 + mu )
Current digital telephone systems use a value for mu of 255. Europe uses a
similar "A-law" companding scheme:
IF (( Vin / Vmax ) <= ( 1 / A )) THEN
Vout = Vmax * ( A * Vin / Vmax ) / ( 1 + LN(A) )
ELSE
Vout = Vmax * ( 1 + LN( A * Vin / Vmax )) / ( 1 + LN(A) )
END IF
As mentioned previously, if an analog signal that is being delta-modulated
changes too abruptly, the delta modulator will have to count up or down one
analog increment at a time until it catches up with reality, resulting in a
string of bits like "11111..." or "00000...". The game of catch-up can be
simplified if the analog-to-digital conversion system can recognize a large
string of 1s or 0s, and then change the analog increment size accordingly,
This approach is known as "adaptive pulse-code modulation (ADPCM)". A typical scheme is to change the increment size to 1.5 times normal if three 1s or 0s happen in a row, and then change it back to normal when the bit value changes. Companding and ADPCM schemes are said to eliminate "redundant" data.
* Perceptual coding, in contrast to waveform coding, seeks to compress audio data by understanding the way people hear music, and only storing the parts of it that people really hear. This is what companding does in a simple sort of way, but there are more sophisticated approaches available, and these approaches are used in perceptual coding.
Waveform coding is simple in terms of hardware and software requirements. Perceptual coding requires much more "smarts" and is more computationally intensive. Perceptual coding deals with a sound's "psychoacoustic" properties, which define how a person hears the sound. The most important psychoacoustic effect is known as "auditory masking", in which faint elements of a sound are simply not heard if similar but louder elements are present in the sound.
The unheard elements are "irrelevant". If the irrelevant elements can be identified when the sound is recorded, they can be thrown away and do not need to be stored. In MPEG-1 audio compression, this trick is performed by a "perceptual sub-band coder".
* To begin MPEG-1 audio compression, the analog audio signal is first digitized at a typical sampling rate, such as 32 kHz, 44.1 kHz, or 48 kHz. The sampled audio data is then divided into audio "frames" of a fixed number of samples. The MPEG-1 audio frame size is 384 samples, which for 44.1 kHz sampling gives an interval of 8.71 milliseconds. However, while Layer 1 deals with one audio frame at a time, Layer 2 and 3 work with three frames at a time, including a current, previous, and next frame, for a total of 1,152 samples.
In MPEG-1 audio compression, each individual frame is converted from a time-domain signal to a frequency-domain spectrum by an FFT, or more often a DCT; the spectrum is modified according to rules of perceptual coding; and then the spectrum is converted back into a time-domain signal by an inverse FFT or DCT.
The modification of spectral signals can be performed simply by multiplying, or "masking", the harmonic values to selectively eliminate, reduce, or enhance them. For example, if we wanted to eliminate one spectral term, we would simply multiply it by zero and "mask it out". In general, the range of masking values define a spectrum of their own with the same number of terms as the data spectrum under consideration.
This masking or "filter" spectrum is multiplied against the data spectrum on
an individual element by element basis, which is a simple operation. For
example, given the data spectrum:
:
: :
: : : : : :
: : : : : :
: : : : : : : :
: : : : : : : :
.:..:..:..:..:..:..:..:.. > f
-- and a filter spectrum defining a low-pass filter:
: : : :
: : : :
: : : :
: : : : :
: : : : : :
: : : : : : :
.:..:..:..:..:..:..:..:.. > f
-- then multiplying the two together gives the filtered spectrum:
:
: :
: : :
: : : :
: : : : : :
: : : : : : :
.:..:..:..:..:..:..:..:.. > f
When the filtered spectrum is converted back into its time domain equivalent
by the reverse transform operation, the high frequencies will be gone.
This illustration assumes that the filter spectrum is a fixed pattern, but as will be explained, in MPEG-1 audio compression the filter spectrum is determined by an algorithm, where a set of harmonics in the frequency-domain data spectrum is kept or masked out, based on a consideration of neighboring frequencies in the data spectrum.
* Now we can discuss the operation of the MPEG perceptual subband coder. After converting the audio frame to a spectrum, the coder then divides the spectrum into 32 parts, or "subbands", each with 12 samples. Since the top harmonic is half the frequency of the sample rate, then for 44.1 kHz sampling, each subband has a bandwidth of 689 Hz. This bandwidth is an approximation of what is known as a "critical bandwidth", which represents a frequency range that people tend to hear as a single frequency, and so the subbands are referred to as "critical bands".
There are two primary forms of auditory perceptual masking, known as "frequency masking" and "temporal masking". In frequency masking, a faint sound cannot be heard if it exists at the same time as a loud sound at a nearby frequency, much in the same way that a whistle can't be heard over the sound of a siren at the same frequency. In temporal masking, a faint sound cannot be heard if it occurs after a louder signal, much in the same way that a faint beep won't be heard if it occurs after the blast of a foghorn.
In MPEG Layer 1 audio coding, the psychoacoustic model only deals with frequency masking. This is because it only operates on a single 384 sample audio frame, and so has no "temporal" knowledge of what happened before or after the time interval of that sample frame (8.71 milliseconds for 44.1 kHz sampling).
The model determines the sound energy in each of the 32 frequency subbands. If one subband is below a certain level, or "masking threshold", relative to its neighbor, then there is no need to store it, as frequency masking in principle ensures that it won't be heard. That subband is marked as not present. A "frame packer" then takes the data and produces it in a tidy output bitstream.
MPEG Layer 2 audio compression provides a better psychoacoustic model that features both frequency and some temporal masking. Layer 2 can perform temporal masking because it deals with a current, previous, and next audio frame, allowing it to perform temporal comparisons.
MPEG Layer 3 uses three audio frames as well, and provides better temporal masking. It also includes more efficient critical subband filtering that uses variable numbers of samples for different subbands. This feature is based on the fact that the number of samples required in subbands tends to change with frequency, and so allocating an equal number of samples for each subband is inefficient. Layer 3 can also identify redundancies between stereo sound channels, and performs Huffman coding on the output to enhance compression.
Each higher layer, as pointed out earlier, provides higher compression ratios. However, they also involve higher "codec (coder-decoder)" delays, or in other words a higher delay between sound coding and decoding. This may be hundreds of milliseconds for Layer 3 compression, above and beyond any delays in data storage and retrieval or data transmission. For recorded audio, this is of no real importance, but it is a concern for realtime audio transmission.
* After introduction of the MPEG-1 specification, the MPEG committee was confronted with new requirements, and began work on follow-on specifications designated MPEG-2 and MPEG-3. These specifications focused on the coding, storage, and transmission requirements for digital television (DTV), with data streams operating at up to 20 MBPS.
MPEG-3 was originally intended to focus on high-definition digital TV (HDTV) requirements, but MPEG-2 specs turned out to logically encompass MPEG-3 as well, and so MPEG-3 was abandoned as a distinct effort. MPEG-2 has gone on to become the video standard for US DTV and DVD (Digital Video/Versatile Disk) technology. Incidentally, neither US DTV nor DVD uses an MPEG audio scheme, both preferring the "Dolby Digital" audio standard, discussed in the next chapter.
The 20 MBPS data rate that underlies the MPEG-2 specification is far faster than most users will have available on the Internet for years, or even decades, and so MPEG-2 did not replace MPEG-1. MPEG-1's 1.86 MBPS data rate is still much higher than most Web surfers have access to, but the lower data rate implies more compact files that are easier to download than those that would result from MPEG-2 compression.
MPEG-2 is conceptually identical to MPEG-1 but offers additional features. It adds support for video resolutions of up 16,383 by 16,383 pixels, for 4:2:2 horizontal-only decimation for chroma information, has specific support for 24 frame per second movie video, better handling of interlaced video, and an enhanced audio specification.
MPEG, like JPEG, allows a tradeoff of compression ratio for image quality, but MPEG enthusiasts state that the optimal compression ratio ranges from 8:1 to 30:1. US broadcast high definition TV (HDTV) requires a compression ratio of 51:1 to fit through a 6 MHz TV broadcast channel, but it appears that compression ratios tend to improve for higher-resolution imagery.
* MPEG-2 supports two audio compression schemes, one which is "backward compatible" and so known as "MPEG-2 BC", and another which is "not backward compatible" and so was originally referred to as "MPEG-2 NBC". However, the NBC format is now generally referred to as "Advanced Audio Compression (AAC)".
MPEG-2 BC is essentially a superset of MP3 with a number of extensions, particularly for multichannel audio, and is also known as "MPEG-2 Multichannel" audio. The extensions include:
MP3 audio can be decoded by an MPEG-2 BC audio decoder, and if the extensions aren't used, MPEG-2 BC audio can be decoded by an MP3 decoder.
* The MPEG-2 AAC scheme is derived from the MP3 scheme, but features improvements in implementation to provide a better tradeoff between compression and audio quality. These improvements completely break compatibility with MP3.
AAC has a revised filter bank scheme, based on a "modified" DCT, that works on 2,048 rather than 1,152 samples at a time, and also make use of "predictive" algorithms to improve audio compression. A predictive algorithm is essentially an extension of delta modulation. Delta modulation compresses data by only encoding the changes or "deltas" between the current value and the next value. Predictive algorithms actually try to predict the next value from earlier values, compressing data by encoding it as "error deltas" between the predicted next value and the actual next value. Unsurprisingly, predictive algorithms tend to provide greater compression than delta modulation if the data being compressed tends to be predictable, and this condition is true for audio waveforms.
AAC features a number of extensions in capability:
AAC provides much superior compression to MP3 or MPEG-2 BC. It has been slow to catch on, however, and some have dismissed it as a dead issue, but the new MPEG-4 scheme does incorporate AAC, as described below.
* One of the other important audio compression schemes is "Dolby Digital", created by Dolby Labs and used as the audio format for movie theaters, as well as US DTV and DVDs. When used with film projectors, the digital data for the AC-3 stream is "printed" directly onto the film strip, between the sprocket holes, and is read off by a CCD sensor. Dolby Digital was introduced in 1992, with the theater release of BATMAN RETURNS.
Dolby Digital is based on "AC-3", a perceptual coding scheme, and can support a wide range of audio stream from a single mono channel to "5.1" surround sound. 5.1 surround sound includes left, center, and right front channels; left and right rear surround channels; and a bass boost channel. Dolby Digital tends to be a specification for high-end applications, while MP3 is better suited to low-end applications.
* Microsoft has been pushing their own audio compression format, designated "Windows Media Audio (WMA)", as an alternative to MP3. It is supported by default on Microsoft Windows, though other users have to obtain a license. Microsoft claims that WMA provides higher compression ratios than MP3 with comparable quality, but to no surprise this claim is hotly criticised by the company's critics, who saw its virtues as inflated and its rationale mostly being to push a Microsoft proprietary scheme onto the public. The fact that WMA provides digital-rights protection also does not endear the scheme to users, though Microsoft promotes that feature as a virtue to music providers. Whether anyone likes it or not, the predominance of Microsoft has ensured that WMA is a popular format, supported by most or all digital music players.
* Dolby Digital has become the standard for theater-class audio, while handheld digital music players and digital jukeboxes seem to be based mostly on MP3 and WMA for the moment. However, there are a bewildering range of other audio-compression standards in use or development, and it is hard to know how seriously to take them all.
Some have established themselves in niches. For example, RealNetworks has a scheme called "RealAudio" that is often used in Internet online streaming music stations. Implementation details are not clear, but RealAudio is said to be optimized for low bit rates, as might be found on Internet. It is also said to be somewhat computation-intensive, and does not seem likely to catch on outside of its niche. Sony has a scheme named "Adaptive Transform Audio Coding (ATRAC)", currently up to the ATRAC3 spec, but it is almost strictly proprietary, essentially only supported on Sony gear.
There are a number of audio-compression schemes that are competing to succeed MP3 and compete with WMA. One of the most interesting is being implemented by the "Xiph.org Foundation", a group with an effort named "Ogg" to create public-domain, non-proprietary, free and open compression and multimedia specifications. They have created an audio compression scheme known as "Vorbis", after a character in a novel by the fantasy humorist Terry Pratchett, as part of the Ogg effort. Incidentally, "Ogg" is not an acronym, it's an obscure slang word taken from a network computer game. It's also the file format for Vorbis, and Vorbis files end with ".ogg".
The actual impetus for creating Vorbis was the looming threat of patent actions from various organizations with their fingers in MP3, and so a group of developers decided to come up with a solution that would not be trapped on such a leash. The first formal Vorbis specification was released in mid-2002. It is promoted as superior in fidelity and compression ratios to MP3 and the fact that it is free gives it a good selling point. Many new video game titles have already adopted Vorbis. Vorbis can compress a standard 44.1 kHz CD-sound stream into output ranging from 64 to 400 KBPS, depending on the quality setting.
The group is also working on a speech compression scheme named "Ogg Speex"; a lossless audio compression scheme under the designation of "Free Lossless Audio Codec (FLAC)"; and a video compression scheme named "Ogg Theora". None are available at this time. Some compression enthusiasts speak highly of the Vorbis group's effort, while others have accused them of over-promotion and under-delivery. It seems likely that Ogg's efforts and other audio compression standards will shake themselves out over the next few years.
* There are also compression schemes that are strictly defined for voice communications. One such scheme, often used in cellphones, is known as "Vector Sum Excited Linear Prediction (VSELP)".
In VSELP, speech is modeled by a mathematical "engine" that simulates the operation of the human vocal apparatus, and is controlled by various parameter values. A codec chip in the cellphone uses VSELP to convert a voice signal into these parameters, which are then transmitted to a receiver codec that feeds them to the VSELP engine to reconstruct the speech.
One of the features of VSELP that helps achieve high compression ratios is that the voice engine's behavior can be predicted. Its present behavior can be estimated from its past behavior, and so only corrections to its predicted behavior need to be sent.
* The MPEG committee didn't stop its work at MPEG-2. The group released another specification, MPEG-4, in early 1999. MPEG-4 is not a superset of or replacement for MPEG-2. In fact, MPEG-4 isn't really even a data compression specification as such, it's essentially an interactive multimedia specification, though it does have compression schemes as elements. MPEG-4 is "object oriented", and its objects not only include objects generated by computer graphics, but web pages, video imagery, and soundtracks, that can be placed together in a three-dimensional space.
The details of MPEG-4's schemes for constructing a virtual environment are outside of the domain of a document on data compression, but the spec does include among its many elements AAC audio compression and a set of improved video compression schemes. The MPEG-4 video compression scheme uses perceptual coding to provide substantially improved compression ratios by degrading or discarding scene elements not strongly noticed by the viewer.
In 1999, Jerome Rota, a French video hacker, built a codec that incorporated an improved video compression scheme outlined in part 2 of the MPEG-4 spec, along with MP3 audio. He named the codec "DivX;-", with the ";-" meaning "winking with tongue and cheek". The name was a jibe on a video scheme of the name "DivX" that had been promoted by the Circuit City company. The idea behind the Circuit City DivX scheme was that a consumer would buy a video disk at a minimum cost, and then watch it on a "pay per view" basis using online authorization. The scheme was widely panned, critics saying that it looked like it had been invented by lawyers, and in any case it sank like a brick and disappeared. Since the Circuit City DivX scheme is dead, buried, and discredited beyond usefulness, Rota's "DivX;-" has become just "DivX". Circuit City doesn't seem inclined to object, possibly on the basis that they don't want to remind anyone of their association with the concept. Apparently Mr. Rota has a very oblique sense of humor.
In any case, DivX can take a DVD, compressed in MPEG-2 and Dolby Digital, and cram it down by another factor of ten. This not only allows downloading of movies over the Internet, at least for users with a high-speed connection, but allows complete movies to be stored on a single CD-ROM, if with some degradation in quality. Since DivX is basically a tool for pirating movies, Hollywood has taken alarm, if not effective action just yet, while tens of millions of Internet users have downloaded the DivX codec for their own use.
Part 10 of the MPEG-4 spec defines another video codec, referred to as "Advanced Video Coding (AVC)" or, in an ITU context, "H.264". AVC provides better data compression than DivX, and effectively doubles the compression ratio of MPEG-2. It is being planned for use in new "high density" DVD formats and high definition TV broadcasting.
* For reasons so obscure that explanations of them don't make any sense, there never was or will be an MPEG-5 or MPEG-6 specification, but there is an MPEG-7 specification. MPEG-7 takes yet another tangent on digital data, providing a comprehensive scheme for labeling and describing multimedia, such as still images, audio, or video. Such descriptive information is known as "metadata", and MPEG-7 will simplify searches for materials on the Internet and in databases. MPEG-7 is complementary to MPEG-4, not an enhancement of it. The MPEG group is now defining yet another standard, designated "MPEG-21", which is described as a "multimedia framework" that will apparently take a more comprehensive approach to multimedia specification.
* I have no professional interest in data compression. I ran into the concepts while investigating digital TV and Internet audio schemes, and decided to write up a set of notes on the subject. It turned out to be frustrating job, since finding materials turned out to be a major Internet scavenger hunt.
I do believe I have a reasonable grasp of the basic concepts of data compression now, but this document still leaves much to be desired, and I am certain it has blatant errors in it. I hope to refine it as energy, time, and access to information allows, but for now I've had enough of it. Any reasonable suggestions for improvements are welcome, but please realize that this site is not written by a data compression professional, nor is it intended to be read by data compression professionals.
* Most of this document was written from sketchy sources found on the Web. I literally went through hundreds of web pages and pulled bits and pieces out from various places. Documenting it all would be a nightmare, all the more so because the links break on a random basis. I also found a few sources in print, including:
* Revision history:
v1.0 / 01 may 00 / gvg
v1.1.0 / 01 may 03 / gvg / Minor cosmetic update.
v1.1.1 / 01 apr 05 / gvg / Minor cleanup, tossed out MPEG-4 details
v1.1.2 / 01 mar 07 / gvg / General cleanup.
BACK_TO_TOP