int QccWAVspihtEncode(const QccIMGImageComponent *image, const QccIMGImageComponent *mask, QccBitBuffer *buffer, int num_levels, const QccWAVWavelet *wavelet, const QccWAVPerceptualWeights *perceptual_weights, int target_bit_cnt, int arithmetic_coded, FILE *rate_distortion_file);
int QccWAVspihtEncode2(QccWAVSubbandPyramid *image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, double image_mean, QccBitBuffer *buffer, int target_bit_cnt, int arithmetic_coded, FILE *rate_distortion_file);
int QccWAVspihtDecodeHeader(QccBitBuffer *buffer, int *num_levels, int *num_rows, int *num_cols, double *image_mean, int *max_coefficient_bits, int *arithmetic_coded);
int QccWAVspihtDecode(QccBitBuffer *buffer, QccIMGImageComponent *image, const QccIMGImageComponent *mask, int num_levels, const QccWAVWavelet *wavelet, const QccWAVPerceptualWeights *perceptual_weights, double image_mean, int max_coefficient_bits, int target_bit_cnt, int arithmetic_coded);
int QccWAVspihtDecode2(QccBitBuffer *buffer, QccWAVSubbandPyramid *image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, int max_coefficient_bits, int target_bit_cnt, int arithmetic_coded);
QccWAVspihtEncode() encodes an image component, image, using the Set Partitioning In Hierarchical Trees (SPIHT) algorithm by Said and Pearlman. The SPIHT algorithm involves a 2D DWT followed by a progressive "bitplane" coding of the wavelet coefficients using a zerotree-like quantization structure.
image is the image component to be coded and buffer is the output bitstream. buffer must be of QCCBITBUFFER_OUTPUT type and opened via a prior call to QccBitBufferStart(3) .
num_levels gives the number of levels of dyadic wavelet decomposition to perform, and wavelet is the wavelet to use for decomposition. perceptual_weights is the optional perceptual weights to employ (see QccWAVPerceptualWeights(3) ) or use NULL for no perceptual weighting. Note: the SPIHT algorithm as described originally by Said and Pearlman in their ITCSVT paper does not employ perceptual weighting.
The bitstream output from the SPIHT encoder is embedded, meaning that any prefix of the bitstream can be decoded to give a valid representation of the image. The SPIHT encoder essentially produces output bits until the number of bits output reaches target_bit_cnt, the desired (target) total length of the output bitstream in bits, and then it stops. Note that this is the bitstream length in bits, not the rate of the bitstream (which would be expressed in bits per pixel).
As originally described by Said and Pearlman, the SPIHT algorithm uses arithmetic coding of symbols as a final output step to improve coding efficiency. Alternatively, arithmetic coding can be suppressed, producing what Said and Pearlman call "binary-uncoded" output. The QccPack SPIHT implementation supports both arithmetic-coded and binary-uncoded output modes. arithmetic_coded is a flag passed to QccWAVspihtEncode() that indicates whether arithmetic coding should be performed (1 = arithmetic coding, 0 = binary-uncoded).
QccWAVspihtEncode() optionally supports the use of a shape-adaptive DWT (SA-DWT) rather than the usual DWT. That is, QccWAVspihtEncode() can call QccWAVSubbandPyramidShapeAdaptiveDWT(3) as the wavelet transform rather than the usual QccWAVSubbandPyramidDWT(3) . The use of a SA-DWT is indicated by a non-NULL mask; if mask is NULL, then the usual DWT is used. In the case of a SA-DWT, mask gives the transparency mask which indicates which pixels of the image are non-transparent and thus have data that is to be transformed. Refer to QccWAVSubbandPyramidShapeAdaptiveDWT(3) for more details on the calculation of this SA-DWT. See "Shape-Adaptive Coding" below for details on how the SPIHT algorithm handles shape-adaptive coding.
rate_distortion_file is a pointer to an output file. If this pointer is not NULL, then a rate-distortion profile is written to the file while the image is being compressed. Essentially, this rate-distortion profile gives numerous points on the operational rate-distortion curve of the compressed bitstream produced by QccWAVspihtEncode(); see "RATE-DISTORTION PROFILE" below for more details. The file pointed to by rate_distortion_file must be opened for writing prior to calling QccWAVspihtEncode(), and must be closed after QccWAVspihtEncode() returns. If rate_distortion_file is NULL, no rate-distortion profile is written.
The routine QccWAVspihtEncode2() provides an alternative interface to SPIHT encoding. Specifically, QccWAVspihtEncode2() functions indentically to QccWAVspihtEncode() described above, except that both the image and optional mask are assumed to have had a DWT applied to them prior to calling QccWAVspihtEncode2(). As a consequence, the image and mask are passed to QccWAVspihtEncode2() in the wavelet domain as image_subband_pyramid and mask_subband_pyramid. We note that most applications should opt for QccWAVspihtEncode() rather than QccWAVspihtEncode2(); however, QccWAVspihtEncode() is implemented essentially as a call to an appropriate DWT followed by a call to QccWAVspihtEncode2(). If QccWAVspihtEncode2() is used, it is the responsibility of the calling routine to perform the appropriate DWT prior to calling QccWAVspihtEncode2().
QccWAVspihtDecodeHeader() decodes the header information in a bitstream previously produced by QccWAVspihtEncode(). The input bitstream is buffer which must be of QCCBITBUFFER_INPUT type and opened via a prior call to QccBitBufferStart(3) .
The header information is returned in num_levels (number of levels of wavelet decomposition), num_rows (vertical size of image), num_cols (horizontal size of image), image_mean (the mean value of the original image), max_coefficient_bits (indicates the precision, in number of bits, of the wavelet coefficient with the largest magnitude), and arithmetic_coded (indicates whether the to data stream to follow is arithmetic-coded or not).
QccWAVspihtDecode() decodes the bitstream buffer, producing the reconstructed image component, image. The bitstream must already have had its header read by a prior call to QccWAVspihtDecodeHeader() (i.e., you call QccWAVspihtDecodeHeader() first and then QccWAVspihtDecode()). If target_bit_cnt is QCCENT_ANYNUMBITS, then decoding stops when the end of the input bitstream is reached; otherwise, decoding stops when target_num_bits from the input bitstream have been decoded.
If a SA-DWT was used in SPIHT encoding, then the original transparency mask should be passed to QccWAVspihtDecode() as mask. That is, mask should be the same transparency mask (in the spatial domain) that was passed to QccWAVspihtEncode(). Note that QccWAVspihtDecode() will transform this mask via a Lazy wavelet transform, and then pass the transformed mask to QccWAVSubbandPyramidInverseShapeAdaptiveDWT(3) . If the usual DWT was used in encoding, then mask should be a NULL pointer.
QccWAVspihtDecode2() provides the appropriate alternative interface to SPIHT decoding required if encoding was done via QccWAVspihtEncode2(). Essentially, QccWAVspihtDecode() is implemented by a call to QccWAVspihtDecode2() followed by an appropriate inverse DWT. If QccWAVspihtDecode2() is used, it is the responsibility of the calling to routine to perform the appropriate inverse DWT subsequent to the call to QccWAVspihtDecode2(). As noted above, most applications should use QccWAVspihtDecode() rather than QccWAVspihtDecode2().
A quantitative comparison follows using the three "standard" images (Lenna, Barbara, and Goldhill) provided with QccPack. In all of the results, the wavelet transform employed is a 5-level dyadic decomposition using the 9/7 Cohen-Daubechies-Feauveau biorthogonal wavelet with symmetric extension at the image boundaries (this is the same transform used in Said and Pearlman's paper and in their executable code). The RPI results refer to the results obtained using the SPIHT executable code available from RPI. The "average difference" is the average of the RPI PSNR minus the QccPack PSNR over the range 0.05 dB to 1 dB calculated at every 0.025 dB.
Lenna - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 33.15 | 32.94 |
| 0.5 | 37.21 | 37.11 |
| 1.0 | 40.41 | 40.29 |
------------------------------------
Average difference = 0.16 dB
Lenna - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 32.73 | 32.60 |
| 0.5 | 36.84 | 36.72 |
| 1.0 | 39.98 | 39.89 |
------------------------------------
Average difference = 0.13 dB
Barbara - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 26.66 | 26.47 |
| 0.5 | 31.40 | 31.33 |
| 1.0 | 36.41 | 36.43 |
------------------------------------
Average difference = 0.05 dB
Barbara - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 26.29 | 26.11 |
| 0.5 | 30.94 | 30.80 |
| 1.0 | 35.94 | 35.79 |
------------------------------------
Average difference = 0.16 dB
Goldhill - Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 29.85 | 29.68 |
| 0.5 | 33.13 | 32.96 |
| 1.0 | 36.55 | 36.37 |
------------------------------------
Average difference = 0.15 dB
Goldhill - No Arithmetic Coding
------------------------------------
| Rate | PSNR (db) |
| (bpp) | RPI QccPack |
------------------------------------
| 0.2 | 29.53 | 29.33 |
| 0.5 | 32.71 | 32.54 |
| 1.0 | 36.00 | 35.84 |
------------------------------------
Average difference = 0.14 dB
Finally, note that the concept of shape-adaptive coding arose in the work surrounding the MPEG-4 standard and was not considered in the original work by Said and Pearlman.
There are a number of ways to circumvent this limitation of SPIHT. Although it would possible to make slight modifications to the zerotree structure to accommodate blocks of sizes different from 2x2 near the borders of subbands as needed to accommodate arbitrary sized images, the easiest way around the limitation is to maintain the original 2x2 block size and let these blocks overlap the subband boundary as needed. One would then treat all portions of the block beyond the subband boundary as permanently insignificant and proceed with the original SPIHT algorithm. This solution is actually a special case of shape-adaptive coding as discussed above. That is, one could easily "pad" an M1xM2 image to size N1xN2, where N1 and N2 are the closest integer multiples of 2^(num_levels + 1) greater than or equal to M1 and M2, respectively. The padded pixels are then considered to be "transparent," and the SPIHT coder is passed a transparency mask of size N1xN2 consisting of simply a single rectangular opaque region of size M1xM2. It would be possible to have QccWAVspihtEncode() automatically pad the image in this manner when needed, but this has not been implemented (yet).
It is computationally expensive to perform multiple truncatations and reconstructions of a bitstream to generate this opertional rate-distortion curve as the full decoding algorithm (including inverse transform) must be run numerous times. Instead, it is fairly straightforward to calculate rate and distortion values while encoding is taking place. To do so, the encoder keeps a running estimate of the current mean squared error (MSE) as calculated in the wavelet domain. Each time a wavelet coefficient is modified while encoding, the encoder adjusts the running MSE estimate using the value of the wavelet coefficient as it would be reconstructed in the decoder.
In theory, the MSE as calculated by this procedure would be exactly the same as the MSE calculated in the original spatial domain, providing that an orthonormal wavelet transform is used (a direct consequence of Parseval's theorem). However, in practice, the MSE as calculated in the wavelet domain using the approach outlined above will differ somewhat from the spatial-domain MSE, which is really what is of interest. The first discrepancy is due to the fact that, in practice, usually biorthogonal, rather than orthonormal, transforms are used for image compression. Using "near orthogonal" transforms, such as the popular Cohen-Daubechies-Feauveau biorthogonal family, help mitigate this effect, however. Additionally, the wavelet-domain MSE calculation ignores the fact that spatial-domain pixels are "rounded" to integer values after the inverse wavelet transform in the decoder. This, too, should produced only a small deviation, so that the wavelet-domain MSE should usually be quite close to the desired spatial-domain MSE. However, by calculating MSE in the wavelet-domain, one avoids numerous and costly inverse-transform calculations.
In QccWAVspihtEncode(), each time a coefficient is modified (in the refinement pass, or when the coefficient initially becomes significant in the significance pass), the current wavelet-domain MSE estimate, along with the current rate as calculated from the number of bits written to the output bitstream thus far, is written to the output rate-distortion file, in ASCII. This is a fairly fine sampling of the operational rate-distortion curve, typically producing several thousands, or tens of thousands, of points on the curve, depending on the final rate of the compressed bitstream.
A. Said and W. A. Pearlman, "A New, Fast, and Efficient Image Codec Based on Set Partitioning in Hierarchical Trees," IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, June 1996.
S. Li and W. Li, "Shape-Adaptive Discrete Wavelet Transforms for Arbitrarily Shaped Visual Object Coding," IEEE Transactions on Circuits and Systems for Video Coding, vol. 10, pp. 725-743, August 2000.
G. Minami, Z. Xiong, A. Wang, and S. Mehrota, "3-D Wavelet Coding of Video With Arbitrary Regions of Support," IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 9, pp. 1063-1068, September 2001.