QccWAVspihtEncode.3

NAME

QccWAVspihtEncode, QccWAVspihtDecode - encode/decode an image using the SPIHT algorithm

int QccWAVspihtEncode(const QccIMGImageComponent *image, const QccIMGImageComponent *mask, QccBitBuffer *buffer, int num_levels, const QccWAVWavelet *wavelet, const QccWAVPerceptualWeights *perceptual_weights, int target_bit_cnt, int arithmetic_coded, FILE *rate_distortion_file);

int QccWAVspihtEncode2(QccWAVSubbandPyramid *image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, double image_mean, QccBitBuffer *buffer, int target_bit_cnt, int arithmetic_coded, FILE *rate_distortion_file);

int QccWAVspihtDecodeHeader(QccBitBuffer *buffer, int *num_levels, int *num_rows, int *num_cols, double *image_mean, int *max_coefficient_bits, int *arithmetic_coded);

int QccWAVspihtDecode(QccBitBuffer *buffer, QccIMGImageComponent *image, const QccIMGImageComponent *mask, int num_levels, const QccWAVWavelet *wavelet, const QccWAVPerceptualWeights *perceptual_weights, double image_mean, int max_coefficient_bits, int target_bit_cnt, int arithmetic_coded);

int QccWAVspihtDecode2(QccBitBuffer *buffer, QccWAVSubbandPyramid *image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, int max_coefficient_bits, int target_bit_cnt, int arithmetic_coded);

DESCRIPTION

Encoding

QccWAVspihtEncode() encodes an image component, image, using the Set Partitioning In Hierarchical Trees (SPIHT) algorithm by Said and Pearlman. The SPIHT algorithm involves a 2D DWT followed by a progressive "bitplane" coding of the wavelet coefficients using a zerotree-like quantization structure.

image is the image component to be coded and buffer is the output bitstream. buffer must be of QCCBITBUFFER_OUTPUT type and opened via a prior call to QccBitBufferStart(3) .

num_levels gives the number of levels of dyadic wavelet decomposition to perform, and wavelet is the wavelet to use for decomposition. perceptual_weights is the optional perceptual weights to employ (see QccWAVPerceptualWeights(3) ) or use NULL for no perceptual weighting. Note: the SPIHT algorithm as described originally by Said and Pearlman in their ITCSVT paper does not employ perceptual weighting.

The bitstream output from the SPIHT encoder is embedded, meaning that any prefix of the bitstream can be decoded to give a valid representation of the image. The SPIHT encoder essentially produces output bits until the number of bits output reaches target_bit_cnt, the desired (target) total length of the output bitstream in bits, and then it stops. Note that this is the bitstream length in bits, not the rate of the bitstream (which would be expressed in bits per pixel).

As originally described by Said and Pearlman, the SPIHT algorithm uses arithmetic coding of symbols as a final output step to improve coding efficiency. Alternatively, arithmetic coding can be suppressed, producing what Said and Pearlman call "binary-uncoded" output. The QccPack SPIHT implementation supports both arithmetic-coded and binary-uncoded output modes. arithmetic_coded is a flag passed to QccWAVspihtEncode() that indicates whether arithmetic coding should be performed (1 = arithmetic coding, 0 = binary-uncoded).

QccWAVspihtEncode() optionally supports the use of a shape-adaptive DWT (SA-DWT) rather than the usual DWT. That is, QccWAVspihtEncode() can call QccWAVSubbandPyramidShapeAdaptiveDWT(3) as the wavelet transform rather than the usual QccWAVSubbandPyramidDWT(3) . The use of a SA-DWT is indicated by a non-NULL mask; if mask is NULL, then the usual DWT is used. In the case of a SA-DWT, mask gives the transparency mask which indicates which pixels of the image are non-transparent and thus have data that is to be transformed. Refer to QccWAVSubbandPyramidShapeAdaptiveDWT(3) for more details on the calculation of this SA-DWT. See "Shape-Adaptive Coding" below for details on how the SPIHT algorithm handles shape-adaptive coding.

rate_distortion_file is a pointer to an output file. If this pointer is not NULL, then a rate-distortion profile is written to the file while the image is being compressed. Essentially, this rate-distortion profile gives numerous points on the operational rate-distortion curve of the compressed bitstream produced by QccWAVspihtEncode(); see "RATE-DISTORTION PROFILE" below for more details. The file pointed to by rate_distortion_file must be opened for writing prior to calling QccWAVspihtEncode(), and must be closed after QccWAVspihtEncode() returns. If rate_distortion_file is NULL, no rate-distortion profile is written.

The routine QccWAVspihtEncode2() provides an alternative interface to SPIHT encoding. Specifically, QccWAVspihtEncode2() functions indentically to QccWAVspihtEncode() described above, except that both the image and optional mask are assumed to have had a DWT applied to them prior to calling QccWAVspihtEncode2(). As a consequence, the image and mask are passed to QccWAVspihtEncode2() in the wavelet domain as image_subband_pyramid and mask_subband_pyramid. We note that most applications should opt for QccWAVspihtEncode() rather than QccWAVspihtEncode2(); however, QccWAVspihtEncode() is implemented essentially as a call to an appropriate DWT followed by a call to QccWAVspihtEncode2(). If QccWAVspihtEncode2() is used, it is the responsibility of the calling routine to perform the appropriate DWT prior to calling QccWAVspihtEncode2().

Decoding

QccWAVspihtDecodeHeader() decodes the header information in a bitstream previously produced by QccWAVspihtEncode(). The input bitstream is buffer which must be of QCCBITBUFFER_INPUT type and opened via a prior call to QccBitBufferStart(3) .

The header information is returned in num_levels (number of levels of wavelet decomposition), num_rows (vertical size of image), num_cols (horizontal size of image), image_mean (the mean value of the original image), max_coefficient_bits (indicates the precision, in number of bits, of the wavelet coefficient with the largest magnitude), and arithmetic_coded (indicates whether the to data stream to follow is arithmetic-coded or not).

QccWAVspihtDecode() decodes the bitstream buffer, producing the reconstructed image component, image. The bitstream must already have had its header read by a prior call to QccWAVspihtDecodeHeader() (i.e., you call QccWAVspihtDecodeHeader() first and then QccWAVspihtDecode()). If target_bit_cnt is QCCENT_ANYNUMBITS, then decoding stops when the end of the input bitstream is reached; otherwise, decoding stops when target_num_bits from the input bitstream have been decoded.

If a SA-DWT was used in SPIHT encoding, then the original transparency mask should be passed to QccWAVspihtDecode() as mask. That is, mask should be the same transparency mask (in the spatial domain) that was passed to QccWAVspihtEncode(). Note that QccWAVspihtDecode() will transform this mask via a Lazy wavelet transform, and then pass the transformed mask to QccWAVSubbandPyramidInverseShapeAdaptiveDWT(3) . If the usual DWT was used in encoding, then mask should be a NULL pointer.

QccWAVspihtDecode2() provides the appropriate alternative interface to SPIHT decoding required if encoding was done via QccWAVspihtEncode2(). Essentially, QccWAVspihtDecode() is implemented by a call to QccWAVspihtDecode2() followed by an appropriate inverse DWT. If QccWAVspihtDecode2() is used, it is the responsibility of the calling to routine to perform the appropriate inverse DWT subsequent to the call to QccWAVspihtDecode2(). As noted above, most applications should use QccWAVspihtDecode() rather than QccWAVspihtDecode2().

PERFORMANCE

The QccPack implementation of the SPIHT algorithm was coded "from scratch," with no consultation of the original SPIHT source code produced at RPI by Said and Pearlman. The QccPack implementation follows as description of the SPIHT algorithm as described by Said and Pearlman in their original ITCSVT paper as closely as can be extrapolated from that description. (Note that the description of the algorithm given in their patent documents differs slightly from that of the ITCSVT description). However, since no algorithm of complexity can ever be described completely in a journal article, it is inevitable that certain subtle details were implemented in the QccPack SPIHT coder differently from in the coder used by Said and Pearlman for their results. As a consequence, the performance of the QccPack implementation differs slightly from that reported by Said and Pearlman in their ITCSVT paper and from that obtained by the binary executable SPIHT programs available from RPI (see http://www.cipr.rpi.edu/research/SPIHT -- with arithmetic_coded = 1, the QccPack code is similar to RPI's codetree/decdtree programs, and with arithmetic_coded = 0, the QccPack code is similar to RPI's fastcode/fastdecd programs). However, the difference has been observed to be rather small, with the QccPack implementation achieving PSNR typically 0.15 dB below that achieved with the RPI binary executables (although look at the Barbara arithmetic-coded results below for an unusually strong QccPack performance!)

A quantitative comparison follows using the three "standard" images (Lenna, Barbara, and Goldhill) provided with QccPack. In all of the results, the wavelet transform employed is a 5-level dyadic decomposition using the 9/7 Cohen-Daubechies-Feauveau biorthogonal wavelet with symmetric extension at the image boundaries (this is the same transform used in Said and Pearlman's paper and in their executable code). The RPI results refer to the results obtained using the SPIHT executable code available from RPI. The "average difference" is the average of the RPI PSNR minus the QccPack PSNR over the range 0.05 dB to 1 dB calculated at every 0.025 dB.

Lenna - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 33.15 | 32.94 |
| 0.5 | 37.21 | 37.11 |
| 1.0 | 40.41 | 40.29 |
------------------------------------
Average difference = 0.16 dB

Lenna - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 32.73 | 32.60 |
| 0.5 | 36.84 | 36.72 |
| 1.0 | 39.98 | 39.89 |
------------------------------------
Average difference = 0.13 dB

Barbara - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 26.66 | 26.47 |
| 0.5 | 31.40 | 31.33 |
| 1.0 | 36.41 | 36.43 |
------------------------------------
Average difference = 0.05 dB

Barbara - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 26.29 | 26.11 |
| 0.5 | 30.94 | 30.80 |
| 1.0 | 35.94 | 35.79 |
------------------------------------
Average difference = 0.16 dB

Goldhill - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 29.85 | 29.68 |
| 0.5 | 33.13 | 32.96 |
| 1.0 | 36.55 | 36.37 |
------------------------------------
Average difference = 0.15 dB

Goldhill - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 29.53 | 29.33 |
| 0.5 | 32.71 | 32.54 |
| 1.0 | 36.00 | 35.84 |
------------------------------------
Average difference = 0.14 dB

SHAPE-ADAPTIVE CODING

The usual way to handle arbitrarily shaped objects within SPIHT is to permanently set transparent regions in the image to "insignificant" during the SA-DWT so that the SPIHT algorithm processes these transparent regions in a manner identical to that of other insignificant coefficients. This approach has been taken for a number of zerotree-based coding algorithms; see Li and Li for an example of such. Minami et al. go one step further on this basic approach by discarding all sets of coefficients that lie entirely within a transparent region from the lists maintained by SPIHT. Although this refinement typically offers a small gain in performance, the size of the gain is dependent on how much of the overall image is transparent. The QccPack implementation of SPIHT follows the approach by Minami et al. for shape-adaptive coding.

Finally, note that the concept of shape-adaptive coding arose in the work surrounding the MPEG-4 standard and was not considered in the original work by Said and Pearlman.

NUMBER OF DECOMPOSITION LEVELS AND IMAGES OF ARBITRARY SIZE

The QccPack implementation of SPIHT has the constraint that the dimensions of the image must be an integer multiple of two raised to the power (num_levels + 1). The reason for this limitation is that the zerotree structure in SPIHT, as originally described by Said and Pearlman, is built upon blocks of coefficients of size 2x2, and these blocks are coded together. Consequently, each subband at each decomposition scale is required to divide into an integer number of these 2x2 blocks, a requirement that translates into the power-of-two constraint on the size of the image.

There are a number of ways to circumvent this limitation of SPIHT. Although it would possible to make slight modifications to the zerotree structure to accommodate blocks of sizes different from 2x2 near the borders of subbands as needed to accommodate arbitrary sized images, the easiest way around the limitation is to maintain the original 2x2 block size and let these blocks overlap the subband boundary as needed. One would then treat all portions of the block beyond the subband boundary as permanently insignificant and proceed with the original SPIHT algorithm. This solution is actually a special case of shape-adaptive coding as discussed above. That is, one could easily "pad" an M1xM2 image to size N1xN2, where N1 and N2 are the closest integer multiples of 2^(num_levels + 1) greater than or equal to M1 and M2, respectively. The padded pixels are then considered to be "transparent," and the SPIHT coder is passed a transparency mask of size N1xN2 consisting of simply a single rectangular opaque region of size M1xM2. It would be possible to have QccWAVspihtEncode() automatically pad the image in this manner when needed, but this has not been implemented (yet).

RATE-DISTORTION PROFILE

For an embedded coding, such as that produced by the SPIHT algorithm, any prefix of the final compressed bitstream may be decoded to produce a reconstruction of the original image. This prefix will have an associated distortion which is necessarily greater than or equal to the distortion of the full-length bitstream, and a rate that is necessarily less than or equal to the rate of the full-length bitstream. These rate and distortion values give a single point on the so-called operational rate-distortion curve of the embedded coding; evaluating rate and distortion for a variety of prefixes allows one to trace out an approximation to the entire operational rate-distortion curve.

It is computationally expensive to perform multiple truncatations and reconstructions of a bitstream to generate this opertional rate-distortion curve as the full decoding algorithm (including inverse transform) must be run numerous times. Instead, it is fairly straightforward to calculate rate and distortion values while encoding is taking place. To do so, the encoder keeps a running estimate of the current mean squared error (MSE) as calculated in the wavelet domain. Each time a wavelet coefficient is modified while encoding, the encoder adjusts the running MSE estimate using the value of the wavelet coefficient as it would be reconstructed in the decoder.

In theory, the MSE as calculated by this procedure would be exactly the same as the MSE calculated in the original spatial domain, providing that an orthonormal wavelet transform is used (a direct consequence of Parseval's theorem). However, in practice, the MSE as calculated in the wavelet domain using the approach outlined above will differ somewhat from the spatial-domain MSE, which is really what is of interest. The first discrepancy is due to the fact that, in practice, usually biorthogonal, rather than orthonormal, transforms are used for image compression. Using "near orthogonal" transforms, such as the popular Cohen-Daubechies-Feauveau biorthogonal family, help mitigate this effect, however. Additionally, the wavelet-domain MSE calculation ignores the fact that spatial-domain pixels are "rounded" to integer values after the inverse wavelet transform in the decoder. This, too, should produced only a small deviation, so that the wavelet-domain MSE should usually be quite close to the desired spatial-domain MSE. However, by calculating MSE in the wavelet-domain, one avoids numerous and costly inverse-transform calculations.

In QccWAVspihtEncode(), each time a coefficient is modified (in the refinement pass, or when the coefficient initially becomes significant in the significance pass), the current wavelet-domain MSE estimate, along with the current rate as calculated from the number of bits written to the output bitstream thus far, is written to the output rate-distortion file, in ASCII. This is a fairly fine sampling of the operational rate-distortion curve, typically producing several thousands, or tens of thousands, of points on the curve, depending on the final rate of the compressed bitstream.

AUTHOR

Table of Contents

NAME
SYNOPSIS
DESCRIPTION

Encoding
Decoding

PERFORMANCE
SHAPE-ADAPTIVE CODING
NUMBER OF DECOMPOSITION LEVELS AND IMAGES OF ARBITRARY SIZE
RATE-DISTORTION PROFILE
SEE ALSO
AUTHOR