Compression Techniques
for Great-Looking Indeo® Video
Introduction
Intel's Indeo® video is software that allows you to capture, compress, decompress and play digital video files on a desktop PC. It is available free of charge to PC users, video producers, and multimedia software developers, who can distribute it royalty-free with their applications. This document describes how to most effectively compress video using Indeo video interactive or Indeo video Release 3.2. Read this document if you plan to use Indeo video to produce multimedia applications on a PC. This paper assumes that you have installed on your PC the latest Indeo video drivers and software for video capture, editing, and playback. This document is divided into the following sections:
The Application Compression Dialog
The Application Compression Dialog
Setting the Target Data Rate If you enter a nonzero value into the Data Rate box, then the Quality slider is ignored (whether it is actually grayed out or not). Setting the Data Rate instead of the Quality control results in files of the highest possible visual quality, at the lowest possible data rates, and with the most precise target data rate control. If the Data Rate box is deselected, or if you enter a value of zero, then the Quality slider is enabled. The higher you set it, the better the visual quality of the resulting file. However, the data rate is also higher, because the Quality slider does not use all the codec's compression capabilities. Rather, it uses a subset of the possible compression techniques. The benefit of this approach is speed: using the Quality slider allows the Indeo codec to run faster. The tradeoff is that the resulting files will have neither the high visual quality nor the low data rate of files created using the Data Rate control. We therefore encourage you to use the Data Rate control instead of the Quality slider. Select the box and enter a target data rate, expressed in KB per second. The codec attempts to output compressed data at an average data rate as close to the target value as possible. It usually won't be exact, because compression is very content-dependent. For example, computer-generated animation sequences consisting of simple backgrounds, low detail, and little motion, typically compress extremely well; to compress such sequences, the codec might be able to achieve a data rate of 50 to 100 KB per second. However, live video sequences of sporting events for example, featuring complex backgrounds and lots of motion, require a higher data rate. Also, audio data requires its share of the data rate as well. Most of the time, the codec can achieve an average data rate within ten percent of the desired target. Selecting an appropriate data rate is an art that depends on a combination of factors relating to the system performance of the target playback environment. The most important of these is CD-ROM performance. Single-spin (1X) drives are theoretically capable of sustaining a data transfer rate of 150 KB per second; for double-spin (2X) drives, this rate is 300 KB per second. 3X and 4X drives are, of course, even faster. But theoretical data rates are not ordinarily achieved in real users' systems, because the CPU must not only transfer every byte of data from the CD-ROM drive over the system bus and into system memory, it must also decode the compressed video after it gets there. Slower CPUs might use more than 50% of their available cycles just decoding and displaying the video; they'll have fewer cycles left over to transfer the data from the CD-ROM and are therefore limited to lower average data rates. Faster CPUs might use only 20-30% of their available cycles for decode and display; they'll have more cycles left over with which to transfer the data and can therefore support higher sustained data rates. Target data rates must therefore be determined through testing and experience. Some older applications use video compressed at only 90 KB per second. Video at this data rate can play back on virtually any low-end system with even the oldest 1X CD-ROM drive. In order to get acceptable video quality at this fairly low data rate, however, the video files must be limited to a resolution of only 240 by 180 pixels, at a mere 10 frames per second (fps). For 2X playback, many developers seem to have settled on 200 KB per second. This rate is high enough for even 320 by 240 resolution video at 15 fps to be compressed with high visual quality. It's also low enough that even low-end CPUs can play the files from a 2X CD-ROM drive without dropping frames. Applications targeted for Pentium® processor-based systems and 4X CDROM drives can probably use data rates as high as 400-450 KB per second. To determine the average data rate of an existing file in Windows* 3.x:
A dialog appears, displaying the average data rate in KB per second. To determine the average data rate of an existing file in Windows 95*:
Selecting the Compression Method
Interleaving Audio and Video Most video capture applications do not interleave audio and video during capture, and so the frames are interleaved at an interval higher than 1:1. To determine whether or not an existing file was interleaved at 1:1 in Windows 3.x:
A dialog box appears, displaying the File Type. If the file type is AVI Interleaved, the file is interleaved at 1:1. If the file type is AVI Default File Handler, then the file is probably not interleaved at 1:1, and therefore cannot play from a CD-ROM drive. NOTE: This functionality has been removed from Media Player in Windows 95. Use your video editing application instead. You can reinterleave any existing file:
Setting the Key Frame Interval To maximize image quality while still maintaining low data rates, Indeo video uses a combination of intraframe and interframe encoding. Key frames are encoded only with respect to themselves; they are essentially still images, containing all of the visual information needed to display them. This is intraframe encoding. Delta frames do not contain all of the visual information necessary to display them, but only information representing differences from other frames. This is interframe encoding. Both types of frames are useful, and both have limitations:
The first frame of every video file must be a key frame. After the first frame, the codec intersperses key frames periodically between strings of delta frames to refresh image quality. The key frame interval controls the frequency with which key frames occur. (Indeo video interactive also allows developers to place key frames explicitly wherever they wish, at scene changes, for example, or where random user access is required.) For example, a file using the Indeo video interactive default key frame interval of 15 appears as shown in Figure 1. (Key frames are labeled with a K; delta frames with a D.) Figure 1. Key Frames and Delta Frames in the Video File
In addition to affecting video quality, data rate control, and user access, the key frame interval also affects playback performance. When a video is being played on a system with a slow CPU or CD-ROM drive, video decode and display can lag behind audio playback. When this occurs, Video for Windows tries to speed up the video by either decoding some video frames but not displaying them, or by not decoding them at all. If a frame is neither decoded nor displayed, it is dropped. The effect of dropping a frame depends on:
If a key frame is dropped, then every subsequent delta frame will be dropped until the next key frame. Files with higher key frame intervals drop more frames. If a delta frame is dropped, the effect varies. Some delta frames can be dropped without impacting the decode of any subsequent delta frames. However, some delta frames contain information about other delta frames, and if one such delta frame is dropped then the other delta frames will be dropped as well. The default key frame interval usually provides a good balance between video quality, data rate control, and CD-ROM playback performance. Because different types of video material present special challenges and work well at different key frame intervals, experiment with other settings to maximize video quality and playback performance of your video clip. To determine the key frame interval of an existing file in Windows 3.x:
A dialog appears, diplaying the key frame interval. NOTE: This functionality has been removed from Media Player in Windows 95. Use your video editing application instead. You can also use the application Video Compression Sampler* to learn this and other information about a video file. Unlike interleaving audio and video, you cannot change the key frame interval of a compressed file without recompressing some or all of the file. You must set the key frame interval when a file is originally compressed. If you must change the key frame interval of a compressed file, it is best to recompress the file from the original source file.
Selecting CD-ROM Padding CD-ROM padding is null data added to the end of a compressed frame of video in order to adjust its frame size. After padding, the size of each frame in the file is an exact multiple of two KB. In other words, padding rounds up the size of each frame to the next highest multiple of two KB. (It doesn't make every frame the same size; after padding, frames are still of various sizes, but all are an exact multiple of two KB.) Even though padding makes the file slightly larger, it allows the video to play more efficiently from CD-ROM, because the data on a CD-ROM is laid out in 2 KB data sectors. When an application requests a video frame from the disc, the CD-ROM driver software commands the CD-ROM drive hardware to seek the requested data. Seek commands search for data by first locating the start of the 2 KB sector in which the data resides. If the data begins precisely at the start of the 2 KB sector, search time is minimized. Therefore, a video file is read from a CD-ROM drive with optimum efficiency if all of its frames begin exactly at the start of 2 KB sectors, which is accomplished by CD-ROM padding. The faster seek time increases the data rate you can achieve and decreases the probability of dropped frames. Indeo video interactive, however, packs the video data in other ways and uses its own internal techniques to achieve efficient playback. NOTE: If you are using Indeo video interactive, do not enable CD-ROM padding. CD-ROM padding adversely affects data rate with Indeo video interactive and increases the likelihood of dropped frames.
The Indeo® Video Interactive Compression Dialog
Figure 2. The Indeo Video Interactive Compression Dialog
This dialog offers you a variety of special Indeo video interactive features:
These are discussed in detail below.
The Quick Compressor If you choose Quick Compress, then several options are automatically disabled as shown in Figure 3: bidirectional prediction, transparency, quality, and local decode. The only available Quality setting is Good. In the interest of speed, the quick compressor uses a subset of possible compression techniques; you trade off speed of encoding against both image quality (which will probably be somewhat less) and the possibility of fluctuations in the data rate. This makes the Quick compressor ideal for previewing your video. Your final application, however, will almost certainly require the offline encoder for both best image quality as well as maintaining a consistent data rate. Figure 3. The Indeo Video Interactive Dialog With the Quick Compressor Chosen
Use the quick compressor to test, prototype, or preview the appearance of your video after encoding with Indeo video interactive. It is also suitable for hard disk applications in which data rate control is less critical. However, after you've previewed the video and made any required adjustments, we recommend using the offline encoder for your final product. For the best quality at lower data rates, or to achieve a steady data rate without spikes, do not use the quick compressor. The Indeo video interactive codec can play any video clip compressed with the quick compressor. To enable quick compression, check the Quick Compress box in the Encoder Controls area.
Scalability To enable scalability, check the Scalability box in the Encoder Controls area.
Bidirectional Prediction Indeo video interactive uses such interframe encoding techniques, but it can also use a more sophisticated interframe encoding technique called bidirectional prediction: the contents of some frames are predicted based on both previous and future frames. Bidirectional prediction helps avoid large spikes in data rate caused by scene changes or fast movement, significantly improving image quality, particularly in video sequences involving high motion. Bidirectional prediction adds a certain amount of playback overhead: when the current frame has been encoded based on both past and future frames, it is necessary to first decode a future frame in order to decode and display the current frame. The Indeo video interactive codec therefore decodes frames in a different order from that in which they are actually displayed. Such out-of-order decode imposes a certain amount of overhead; for a clip without much movement, such as a talking head, bidirectional prediction may not improve image quality enough to make it worth the overhead. To enable bidirectional prediction, check the Bidirectional Prediction box in the Encoder Controls area. This option is disabled when using Quick compression.
Transparency The encoder box offers three choices for handling transparency: first-frame analysis, alpha channel, or none.
First-Frame Analysis
Alpha Channel
None If either method of transparency is enabled, Indeo video interactive analyzes each frame, separates the background pixels from the foreground pixels, and encodes only the foreground objects. This option is disabled when using Quick compression.
Quality
Select the appropriate choice from the pull-down menu. The only quality setting available with the quick compressor is Good.
Access Key To use this feature:
This value is the access key that an application must supply in order to play the video clip.
Local Decode If you plan to make use of this feature, you must specify the minimum possible size of the local decode viewport during compression. Then the display size of the viewport, and its location, can be enlarged dynamically during playback. To enable the local decode feature during video playback, specify the minimum viewport size by entering the width and height in pixels in the Width and Height fields in the Minimum Viewport Size area. To disable local decode for a given video clip, enter 0 in both of these fields. This option is disabled when using Quick compression.
Defaults
Editing Compressed Video
For example, suppose you wish to extract a sequence in which the first frame is a delta frame. The application must decompress the first frame and recompress it as a key frame, using the previous key frame from the source file as a reference. All subsequent delta frames until the next key frame must also be decompressed and recompressed as new delta frames, using the newly created key frame as a reference. But as soon as the next key frame is encountered, no further frames need to be recompressed, and applications that support limited recompression avoid doing so. NOTE: Certain editing applications, such as Premiere and Digital Video Producer, do not show an explicit No Recompression option in the Compression dialog box. To avoid recompression of (for example) Indeo video source files, select the same Indeo video codec when saving those files back to disk. The application compares the format of the compressed input files to the selected output format and automatically refrains from recompressing the data, or (as described above) recompresses only when necessary. Finally, it's a good idea to avoid editing files that were compressed in different formats into one clip. For example, if you join a file compressed using Indeo video with a file compressed using Microsoft's RLE compressor, and select Indeo video as the save format, the RLE data must be decompressed and recompressed as Indeo video. Avoiding recompression is impossible under these circumstances.
Using Transparency or Bidirectional Prediction without Application Support
As of this writing, Adobe Premiere 4.2 supports all the features of Indeo video interactive, including transparency, bidirectional prediction, and aperiodic key frames. Ulead Media Studio Pro* 2.5 supports bidirectional prediction. You may wish to consider using one of these products. Later versions of other video editors will doubtless be enhanced to support the special features of Indeo video interactive.
Transparency without Application
Support Assume an example video file containing one hundred frames of video data, as shown in Figure 3:
Figure 3. Example Video File Without Transparency Frame
You must now create a frame that contains only the background color or range of colors which are to be rendered transparent, and manually paste it onto the beginning of the file. You can create the transparency frame in various ways: you might, for example, place the video camera on a tripod, light your background as carefully as possible to produce the smallest possible range of colors, and shoot one frame of video. After you paste in the transparency frame, your file now appears as shown in Figure 4:
Figure 4. Example Video File With Transparency Frame
Ideally, the Indeo video interactive codec would simply not return the transparency frame to the editing application. However, Video for Windows codecs are required to return a compressed frame immediately upon receiving a source frame. Therefore, the Indeo video interactive codec must return the transparency frame, even though you don't want the file to play showing this as the first frame. Editing software enhanced to support the Indeo video interactive codec ignores the transparency frame; however, an existing editor not enhanced to support the Indeo video interactive codec simply pairs the transparency frame with the first frame of the audio, and writes the audio and video data as an .AVI frame. After this initial mispairing, the first frame of video is paired with the second frame of audio, the second frame of video with the third frame of audio, and so on. The resulting file has 101 frames of video but only one hundred frames of audio, and the audio and video are out of sync by one frame. To correct this:
Bidirectional Prediction without Application
Support Assuming a 100-frame example file, the result is as shown in Figure 5:
Figure 5. Compression With Bidirectional Prediction
An editing application that doesn't support bidirectional prediction is unable to compensate for this three-frame latency when pairing video frames with audio data. Instead, it simply pairs each video frame returned by the codec with the next available audio chunk, and writes the audio and video data as an .AVI frame. Because the first frame was returned three extra times, however, the audio and video in the resulting .AVI file are out of sync by three frames. The resulting compressed file displays three unusual characteristics:
In order to avoid this situation and ensure that the editor receives the last three compressed frames, you must prepare a source file that appends three dummy frames. When the last three dummy frames are sent to the codec, the codec returns the last three frames of compressed video from the source file, and the three dummy frames are the ones that the editor inadvertently omits. In this case, the files appear as shown in Figure 6, with the dummy frames marked with a D: Figure 6. Example File With Dummy Frames After Bidirectional Compression
The resulting compressed file contains all of its audio and video data. However, extra video data remains in the beginning of the file. Correct this by the same method described in the "Transparency" section:
Combining Transparency and Bidirectional Prediction As before, prepare the source file with dummy frames and then edit the compressed file, deleting the first four frames of video only and saving without recompression.
Conclusion
For help and advice on capturing and compressing Indeo video, and to obtain the latest software updates and technical information, contact the Indeo Video Developer Support group.
|