Data Compression for Remote Imaging Systems

Timothy S. Wilkinson and Hsieh S. Hou

Remote imaging platforms can generate a huge amount of data. Research at Aerospace has yielded fast and efficient techniques for reducing image sizes for more efficient processing and transmission.

Digital cameras for the consumer market can easily generate several megabytes of data per photo. Even with a fast computer, files of this size can be difficult to work with. Various compression techniques (such as the familiar JPEG standard) have been developed to reduce these file sizes for easier manipulation, transmission, and storage. Still, requirements for image compression in the home pale in comparison to those in remote sensing, where single images can be hundreds of times larger.

The IKONOS commercial imaging satellite, for example, can collect data simultaneously in red, green, blue, and near-infrared wavelengths. With a swath width of 7000 pixels and a bit depth of 12 bits per pixel, a relatively modest 7000-line scan generates 294 megabytes of data. Moreover, the data can be collected in less than a minute, so many such images can be generated fairly quickly. Even with powerful computers on the ground, data sets of this size would be of limited use without effective compression techniques.


variable-length code

A variable-length code to reduce the number of bits required to represent a set of symbols. In this case, the symbols to be coded are the letters in the words, "digital imaging." In the first case, because there are eight different letters, a fixed-length code requiring three bits per letter is used. With fourteen total letters, the code requires 42 bits for the representation. When it is recognized that some of the symbols occur more frequently than others ("i" occurs four times and "g" occurs three times, for example), the code length can be varied to shorten the overall number of bits in the representation. In this case, a Huffman code was used to shorten that length from 42 to 39 bits.

Image compression techniques fall into two broad classes: lossless and lossy. Lossless algorithms reduce file size but maintain absolute data integrity; when reconstructed, a losslessly compressed image is identical, bit for bit, to the original. Lossy techniques, on the other hand, allow some distortion into the image data; in exchange, they typically achieve much greater compression than a lossless approach.

As part of its research into advanced remote imaging systems, Aerospace has helped develop more powerful methods for compressing and manipulating vast amounts of digital data (see sidebar, Aerospace Contributions to Image Data Compression Techniques). While different compression strategies exist for different sources of imagery, most can be analyzed in terms of just a few building blocks. The first step is typically to transform the image to eliminate the redundancy that is inherent in any digital image. The transformed representation can then be quantized to better organize and prioritize the data. The quantized data can then be coded to reduce the overall length of the representation. These steps are perhaps easiest to explain in reverse, beginning with the coding process.

Coding

Coding is the process of converting one set of symbols into another. Morse code, for example, is a lossless mapping of the English alphabet into dashes and dots. A well designed code will usually consider the probabilities of occurrence of the source symbols before assigning the output symbols. The most probable symbol is ascribed the shortest code, while less probable source symbols are assigned a longer code. So for example, in converting a photo of Mars into binary code, a red pixel might be rendered as 01 and a green pixel as 100101; a photo of a golf course might use just the opposite conversion. The Huffman code is the most common example of this type of lossless scheme.

In the late '80s, a technique known as arithmetic coding developed as an alternative to the Huffman code. The basic principle is to represent a series of source symbols as a single symbol of much shorter length. The process is more complicated because it requires computation by both the encoder and the decoder (as opposed to simple substitutions, as in Huffman coding); however, arithmetic coders typically achieve better results.

Both Huffman and arithmetic codes generally handle only one source symbol at a time. Better results can be obtained by considering repetition of symbols as well. To characterize sequential occurrences of a single symbol, a technique known as run-length encoding is especially useful. Run-length encoding replaces a series of consecutive identical symbols by a code for that symbol along with a code representing the number of occurrences (e.g., something like "b20" for a series of 20 blue pixels). To characterize repeated occurrences of multisymbol patterns, substitution and dictionary codes are helpful. Such codes begin by compiling a table of repeated symbol sequences and assigning one or more codewords to represent each one.

Lossless codes can be applied directly to an input image and will achieve some (modest) amount of compression. Most often, though, their role is to shorten the amount of information that must be carried to represent the series of symbols produced through quantization, the previous step in the image-compression chain.

A quantizer that implements the floor function

A quantizer that implements the "floor" function with clipping. For inputs from zero to four, the output of the quantizer is the largest integer not greater than the input. Inputs less than zero are assigned a value of zero, while inputs greater than or equal to four are assigned a value of three. Thus, a possibly infinite range of input values is restricted to one of just four possibilities.

Quantization

Quantization is the process of limiting the number of possible values that a variable may assume. In its most general form, a quantizer can be thought of as a curve that specifies for every possible input value one of a smaller set of output values. For example, ten similar shades of green might be restricted to just one.

Quantization can fulfill several useful roles in an image-compression algorithm. Many coders require a finite range of possible source-symbol values for efficient operation. This is especially important where values passed to the quantizer are more or less continuous, as they might be when floating-point arithmetic is used in the transform. In lossless data compression, input to the coder must be in the form of integers, because there is no way to assign a finite-length binary code to a real or floating-point number. Quantization also allows some measure of importance (or weight) to be assigned to different types of symbols. For example, suppose that the data to be sent to the coder consists not of a raw image but its Fourier transform. The human eye is relatively insensitive to changes in content at high spatial frequencies (so a grassy lawn appears relatively uniform to the casual eye). An efficient quantizer might therefore try to represent the low-frequency coefficients of the transform (e.g., the gradual color variations in the lawn) with greater fidelity than the high-frequency coefficients, possibly with different quantizers. Thus, the less important data at high spatial frequencies might be permitted a smaller set of possible output values than the more important data at low spatial frequencies.

Quantization involves some information loss (except in the case of a "null" quantizer, which maps every input value to that same output value); therefore, a quantizer applied directly to an image can achieve some compression. For example, suppose that an image with a 12-bit range of possible values, 0–4095, is divided by 16—that is, the quantizer takes the integer portion of the input when divided by 16. The effect is to reduce the range of possible output values to 0–255, which can be represented with just 8 bits. Compression has been achieved, but at the expense of some distortion. Most commonly, however, quantizers are not used directly for image compression. Instead, they are used to restrict or weight the range of values produced by a transform.

Different quantizers

Different quantizers can be applied selectively to emphasize or ignore certain types of data. In this case, the quantizer at the left has relatively small input increments and a relatively large number of possible output values. Such a quantizer might be appropriate for important data, such as low-spatial-frequency information derived from an image transform. The quantizer at right has fewer possible output values corresponding to larger input ranges. Relatively unimportant image data can be retained at low fidelity with such a quantizer.

Transformation

Transformation is the process of decomposing an image to identify redundant and irrelevant information (e.g., pattern repetition, contextual information). The primary source of redundant information is the local correlation or similarity of neighboring pixel values. Several phenomena give rise to such correlation. First, the world around us is correlated. Over short distances, colors tend to be uniform and textures appear regular. Abrupt changes in intensity and pattern are typically perceived as sharp edges. Second, the process of collecting an image produces correlation. Even diffraction-limited optics impose a blur on the scene being imaged; such a blur blends neighboring scene values and increases pixel-to-pixel similarity.

Several different transforms take advantage of spatial correlation for image compression. Some of the more common include differential pulse code modulation, discrete cosine transformation, and wavelet-based transformation.

Differential Pulse Code Modulation

The spatial correlation in digital images implies a high degree of pixel-to-pixel predictability. A tool known as differential pulse code modulation or predictive coding takes advantage of this phenomenon to compress image data.

In its simplest form, the predictive coding transform processes pixels sequentially and uses the value of one pixel to predict the value of the next. The difference between the predicted and actual value—the residual—is then quantized and coded. The advantage of operating on the residuals, as opposed to the original image samples, is that small residuals occur more often than large ones. Thus, a quantizer/coder combination can take advantage of this distribution of information to achieve significant rate reduction.

differential pulse code modulation compression

A simple differential pulse code modulation compression scheme. The current input image sample and the previous sample are compared to produce a residual error. The residual error is then passed on to the quantization and coding functions to achieve image compression.

In more complicated systems, the predictor involves several surrounding pixels instead of a single pixel. Even greater efficiency can be obtained by allowing the predictor to vary as a function of local content. In this way, the prediction residual can be consistently minimized but with some expense incurred to communicate the state of the local predictor.

The prediction residual can be coded losslessly or quantized prior to coding, in which case some loss is possible. The Rice algorithm is probably the best known of the lossless predictive coding schemes. In its basic form, it uses a single pixel difference predictor along with a null quantizer and a set of adaptive Huffman codes to achieve compression.

Discrete Cosine Transform

An important characteristic of an imaging system is its frequency response, essentially the input-to-output ratio for signals across a frequency range. The frequency response often functions as a blurring mechanism or low-pass filter. When this response is superimposed on an image whose power spectral density falls off exponentially, the resulting digital image content is dominated by low and middle spatial frequencies. Compression based on the discrete cosine transform, as exemplified by the JPEG algorithm, takes advantage of this phenomenon by decomposing the image into a series of basis patterns of increasing spatial frequency.

For example, a 4 X 4 pixel region of an image can be represented by 16 basis patterns, ranging from a completely uniform block (the lowest spatial frequency) to a checkerboard pattern (the highest spatial frequency). An appropriately weighted sum of these patterns can be made to represent any selected region. Moreover, the weights—that is, the transform coefficients—can be efficiently computed. The coefficients provide a measure of the energy present at different spatial frequencies in the region. As the coefficient indices increase, so do the spatial frequencies represented by the associated basis patterns.

basis patterns for discrete cosine transform

The basis patterns for a 4 X 4 pixel discrete cosine transform can be weighted to represent any possible input block of 4 X 4 pixels. The transform of an image or image block can be computed efficiently. The resulting coefficients are loosely ordered by spatial frequency, with low-spatial-frequency patterns at the upper left and high-spatial-frequency patterns at the lower right.

When the input region contains image data, the low-frequency coefficients will generally be of higher amplitude than the high-frequency coefficients. In discrete cosine transform compression, the image is first divided into regular blocks—8 X 8 pixels is a typical choice. Within each block, the transform coefficients are typically arranged according to frequency. A quantizer is then applied to the transform coefficients to limit their range of possible value prior to lossless coding. For low spatial frequencies, which represent most of the energy in an image block, the quantization is rather fine so that minimal distortions are introduced. At higher spatial frequencies, where there is less image energy and less visual sensitivity to distortions, quantization may be rather coarse.

Wavelets

One of the drawbacks of discrete cosine transformation stems from the use of transform blocks in which to compute the frequency content of the image. When large blocks are used, the transform coefficients fail to adapt to local variations in the image. When small blocks are used (as in JPEG), the image can exhibit blocking artifacts, which impart a mosaic appearance. Compression researchers and mathematicians therefore sought a transform that would provide both reasonable spatial frequency isolation and reasonable spatial localization. The result was the so-called "wavelet transform."

single stage of a wavelet transform

The basic flow of the single stage of a wavelet transform requires a low-pass and high-pass filter pair. Operations are performed separably in both horizontal and vertical directions. All possible combinations of filter and orientation are represented, with a subsampling by a factor of 2 in each direction. The result is four sub-images, each of which is ¼ the size of the input image. The sub-images passing through one or more high-pass filters retain edge content at the particular spatial frequency represented at this stage of the transform. The sub-image resulting from exclusive application of low-pass filters is a low-resolution version of the starting image and retains much of the pixel-to-pixel spatial correlation that was originally present.

Wavelets are functions that obey certain orthogonality, smoothness, and self-similarity criteria that mathematically are rather esoteric. Those properties are significant for image compression, however, because when wavelets are used as the basis of an image transform, the resulting function is highly localized both in spatial and frequency content.

The wavelet transform can be thought of as a filter bank through which the original image is passed. The filters in the bank, which are computed from the selected wavelet function, are either high-pass or low-pass filters. The image is first filtered in the horizontal direction by both the low-pass and high-pass filters. Each of the resulting images is then downsampled by a factor of two—that is, alternate samples are deleted—in the horizontal direction. These images are then filtered vertically by both the low-pass and high-pass filters. The resulting four images are then downsampled in the vertical direction by a factor of two. Thus, the transform produces four images: one that has been high-pass filtered in both directions, one that has been high-pass filtered horizontally and low-pass filtered vertically, one that has been low-pass filtered horizontally and high-pass filtered vertically, and one that has been low-pass filtered in both directions.

The high-pass filtered images appear much like edge maps. They typically have strong responses only where significant image variation exists in the direction of the high-pass filter. The image that has been low-pass filtered in both directions appears much like the original—in fact, it's simply a reduced-resolution version. As such, it shares many of the essential statistical properties of the original. In a typical wavelet transform, the double-lowpass image at each stage is decomposed one more time, giving rise to a multiresolution pyramid representation of the original image. At each level of the pyramid, a different spatial frequency is represented.

The transform result can then be quantized and coded to reduce the overall amount of information. Low-spatial-frequency data are retained with greatest fidelity, and higher spatial frequencies can be selectively deemphasized or discarded to minimize the overall visual distortion introduced in the compression process.

Aerospace Research

Aerospace researchers were among the first to examine wavelet image compression. One early goal was a tool for progressive image transmission. Developers envisioned a server with a compressed image in its database. A client could gain access to a thumbnail or low-resolution version of the image by requesting an appropriate level of the wavelet pyramid from the server. The client could use the spatial localization properties of the transform to isolate a particular region of interest and incrementally improve its quality. The tool never advanced beyond the prototype stage, but did provide a vision for the level of image interactivity that wavelets might provide.

In the mid-1990s, the popularity of wavelets expanded into government circles. Encouraged by the increasing availability of commercial JPEG products and excited by the prospect of increased compression efficiency, the government sponsored various efforts geared toward establishing a standard based on wavelets. Aerospace began participating in the U.S. committee that worked under the aegis of ISO (the International Organization for Standardization) to develop a standard that would improve JPEG, primarily by offering superior image quality at low bit rates.

JPEG2000

In 1997, ISO issued a call for proposals for the standard that would become known as JPEG2000. The algorithm that was ultimately selected involves applying a spatial wavelet transform to an image, quantizing the resulting transform coefficients, and grouping them by resolution level and spatial location for arithmetic coding. During JPEG2000 development, Aerospace made several contributions with an eye toward maximizing the standard's utility for its government customer. For example, to enable future expansion of the standard while maintaining backward compatibility, Aerospace helped develop an extensible code stream syntax. This syntax also allowed the standard to be split into two parts. Part I contains the basic JPEG2000 algorithm, of interest to the vast majority of users. Part II contains extensions to Part I that in most cases are significantly more complex and may be tailored to specific groups of users. Of particular interest to the remote-sensing community is the ability to handle multispectral and hyperspectral images that may contain tens or hundreds of correlated bands. In fact, Aerospace led development of a Part II extension that adds the ability to implement a spectral transform in addition to the spatial wavelet transform to provide increased flexibility and compression efficiency for these potentially huge images (see comparison of JPEG and JPEG2000).

The major advantage of JPEG2000 over other image coding systems is its flexibility in handling compressed data. When JPEG2000 is used to organize and code a wavelet-transformed image, the resolution levels within the wavelet pyramid, the spatial frequency directions within a level, and the spatial regions themselves are split into code blocks that are all coded independently. Within any code block, data are encoded from most significant bit to least significant bit. A construct known as layering allows a certain number of significant bits from all of the code blocks to be indexed together. These groupings provide great flexibility in accessing and ordering data for specific purposes.

For example, one common implementation, known as "progressive by SNR," orders the coded data within a file so that the signal-to-noise ratio of the resulting image is improved as rapidly as possible. High-amplitude features, such as sharp edges and high-contrast regions, stand out earliest—that is, at the lowest bit rates. Such a representation might be useful to someone who wants to identify major features in an image, regardless of their size or extent, as quickly as possible. Another configuration, known as "progressive by resolution," organizes coded data by resolution level, with lowest resolution first and highest resolution last. In this case, the ordering is most useful for someone who needs "the big picture" first. It can be thought of as zooming in on the image instead of building it up from its prominent components. Either of these configurations can be produced without expanding and recompressing the data; once the data are coded, a parsing application can reorder the data by moving coded pieces and altering a small number of header values appearing in the code stream. Final quality can be controlled by a simple truncation of the resulting code stream (see compression scheme comparison).

Such flexibility brings home the promise of JPEG2000's future—compress only once, but support a wide range of users. Granted, JPEG2000 compression is more complicated than other methods—it requires 3–10 times more operations than JPEG, for example. But many useful data orderings become possible using relatively simple parsing applications. As a result, highly interactive client/server sessions involving imagery are enabled. In fact, an additional part of the JPEG2000 standard, known as the JPEG2000 Interactive Protocol (JPIP), will provide a general syntax for client/server interactions involving JPEG2000 images.

The Fast Lifting Scheme

The process of computing a wavelet transform by passing the input pixels through filters followed by subsampling is generally inefficient for large data sets. Aerospace has been working with a process known as "lifting," which can reduce the computation to an algorithm involving simple multiplication and addition.

The conventional lifting scheme involves factorizing the wavelet transform matrix into several elementary matrices. Each elementary matrix results in a lifting step. The process involves several iterations of two basic operations. The first is called the prediction step, which considers a predicted pixel in relation to the weighted average of its neighboring pixels and calculates the prediction residual. The second is called the update step, which uses the prediction residual to update the current pixel so that the prediction residual becomes smaller in the next iteration. This lifting factorization reduces the computational complexity of the wavelet transform almost by half. It also allows in-place calculation of the wavelet transform, which means that no auxiliary memory is needed for computations. On the other hand, the number of lifting steps can affect performance. The fidelity of integer-to-integer transforms used in lossless data compression depends entirely on how well they approximate their original wavelet transforms. A wavelet transform with a large number of lifting steps would have a greater approximation error, mostly from rounding off the intermediate real result to an integer at each lifting step.

Using a different factorization of the wavelet transform matrix, Aerospace developed a new lifting method that substantially reduces the number of lifting steps in lossless data compression. Consequently, it significantly improves the overall rounding errors incurred in converting from real numbers to integers at each lifting step. In addition, the new lifting method can be made to adapt to local characteristics of the image with less memory usage and signal delay. For lossless data compression, it's almost twice as fast as the conventional wavelet transform method and is quite suitable for fast processing of multidimensional hyperspectral data.

The Modulated Lapped Transform

Although wavelet transforms do not exhibit the same blocking artifacts as discrete cosine transforms, they can blur image edges in lossy compression. A newer approach, the lapped transform, combines the efficiency of the discrete cosine transform with the overlapping properties of wavelets. In recent studies, they have been shown to outperform both techniques in preserving image quality. One particular type of lapped transform, the modulated lapped transform, is being investigated at Aerospace.

image compressed using JPEG2000

Comparison of image quality derived from modulated lapped transform (MLT) and JPEG using discrete cosine transform (DCT). With the modulated lapped transform, more high-frequency terms are saved from quantization. Consequently, the quality of the reconstructed image is superior for the same compression ratio, as is evident when the images are enlarged.

The modulated lapped transform employs both a discrete cosine transform and a bell-shaped weighting (window) function, which resembles the low-pass filter function in a wavelet transform. This window function and the discrete cosine transform operate on two adjacent blocks of pixels successively. The window function modulates the input to the discrete cosine transform, allowing the blocks to overlap prior to actual transformation. Thus, the modulated lapped transform achieves the high speed of the discrete cosine transform without the blocking artifacts. The weighting of the input data by the bell-shaped window function causes the high-frequency terms to roll off much faster than they would using a discrete cosine transform alone. Thus, more of them are saved from quantization (a source of loss). Consequently, the quality of the reconstructed image is better than that achieved through a discrete cosine transform for the same compression ratio (see terrain categorization comparison).

Another important point is that the modulated lapped transform has more than two channels at the onset of transformation, whereas the wavelet transform has only two and needs to progressively split the low-pass channel as it goes to lower resolution levels. Thus, the modulated lapped transform operates faster than a wavelet transform.

Most applications of modulated lapped transform, including the Aerospace-patented version, are for lossy image compression; however, using a different formulation and the fast lifting method, Aerospace researchers have derived lossless and near-lossless modulated lapped transform algorithms. In addition, the lossless compressed data have excellent resistance to error propagation because of the block structure. Given its error resistance and fast processing properties, the lossless and near-lossless modulated lapped transform would be suitable for compressing multidimensional hyperspectral data in remote-sensing applications.

Conclusion

As both the spatial and spectral resolutions of remote image sensors increase, the amount of data they collect continues to grow. On the other hand, the available communication channels for faithfully transmitting the data to ground are becoming scarce. There is therefore a real need to develop new data-compression techniques that can support storage and transmission of images of varying resolution and quality. In addition, the compression operations must be fast and require little power for instantaneous processing.

The trend is toward more interactive manipulation of imagery. The wavelet transform that underlies the JPEG2000 standard for still images and the transform that underlies the MPEG4 standard for video compression share many common properties. With their successful integration in the future, new data-compression techniques should be able to process huge datasets with high fidelity and speed.

JPIP will be suitable for use on the Internet, and interactive and customizable Web-based applications could begin appearing soon. Such a development will hold special interest for the remote-sensing community as a tool to minimize dissemination delays.

An important military standard, the National Imagery Transmission Format (NITF), provides a file structure to support not only image data but also associated support data and other graphical information. The integration of JPEG into NITF provided government users with increased image-compression capabilities and made it possible to use commercially available products. JPEG2000 is being incorporated into NITF version 2.1 and will significantly enhance these capabilities.

Ultimately, each imagery source and provider presents a unique set of requirements and challenges. Through continued involvement with the JPEG committee, Aerospace will provide valuable insight into the effective integration of JPEG2000 into government and military systems. Similarly, further investigation into the fast lifting scheme and the modulated lapped transform will ensure the usability of more comprehensive remote-imaging systems.


To Summer 2004 Table of Contents




Home   Contact Us   FAQ  |   (options)
Copyright and Terms of Use, © 1995-2010 The Aerospace Corporation. All rights reserved. Send any questions or comments regarding this service to .

This page was last modified on 05/10/07