int QccWAVbiskEncode(const
QccIMGImageComponent *image, const QccIMGImageComponent *mask, int num_levels,
int target_bit_cnt, const QccWAVWavelet *wavelet, QccBitBuffer *output_buffer,
FILE *rate_distortion_file);
int QccWAVbiskEncode2(QccWAVSubbandPyramid
*image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, double
image_mean, int target_bit_cnt, QccBitBuffer *output_buffer, FILE *rate_distortion_file);
int QccWAVbiskDecodeHeader(QccBitBuffer *input_buffer, int *num_levels,
int *num_rows, int *num_cols, double *image_mean, int *max_coefficient_bits);
int QccWAVbiskDecode(QccBitBuffer *input_buffer, QccIMGImageComponent
*image, const QccIMGImageComponent *mask, int num_levels, const QccWAVWavelet
*wavelet, double image_mean, int max_coefficient_bits, int target_bit_cnt);
int QccWAVbiskDecode2(QccBitBuffer *input_buffer, QccWAVSubbandPyramid *image_subband_pyramid, QccWAVSubbandPyramid *mask_subband_pyramid, int max_coefficient_bits, int target_bit_cnt);
QccWAVbiskEncode() encodes an image component, image, using the BISK algorithm by Fowler. An embedded wavelet-based image coder based on the popular bitplane-coding paradigm, BISK is designed specifically for the coding of image objects with arbitrary shape. While other similar algorithms (e.g., SPECK) employ quadtree-based set partitioning to code significance-map information, BISK uses a simpler, and more flexible, binary decomposition via k-d trees. Additionally, aggressive discarding of transparent regions is implemented by shrinking sets to the bounding box of their constituent opaque coefficients before further partitioning. See "ALGORITHM" below for more detail.
image is the image component to be coded and buffer is the output bitstream. buffer must be of QCCBITBUFFER_OUTPUT type and opened via a prior call to QccBitBufferStart(3) .
num_levels gives the number of levels of dyadic wavelet decomposition to perform, and wavelet is the wavelet to use for decomposition.
The bitstream output from the BISK encoder is embedded, meaning that any prefix of the bitstream can be decoded to give a valid representation of the image. The BISK encoder essentially produces output bits until the number of bits output reaches target_bit_cnt, the desired (target) total length of the output bitstream in bits, and then it stops. Note that this is the bitstream length in bits, not the rate of the bitstream (which would be expressed in bits per pixel).
QccbiskEncode() optionally supports the use of a shape-adaptive DWT (SA-DWT) rather than the usual DWT. That is, QccbiskEncode() can call QccWAVSubbandPyramidShapeAdaptiveDWT(3) as the wavelet transform rather than the usual QccWAVSubbandPyramidDWT(3) . The use of a SA-DWT is indicated by a non-NULL mask; if mask is NULL, then the usual DWT is used. In the case of a SA-DWT, mask gives the transparency mask which indicates which pixels of the image are non-transparent and thus have data that is to be transformed. Refer to QccWAVSubbandPyramidShapeAdaptiveDWT(3) for more details on the calculation of this SA-DWT, and see "ALGORITHM" below for how the shape-adaptive mask is employed in the BISK algorithm.
rate_distortion_file is a pointer to an output file. If this pointer is not NULL, then a rate-distortion profile is written to the file while the image is being compressed. Essentially, this rate-distortion profile gives numerous points on the operational rate-distortion curve of the compressed bitstream produced by QccWAVbiskEncode(); see "RATE-DISTORTION PROFILE" below for more details. The file pointed to by rate_distortion_file must be opened for writing prior to calling QccWAVbiskEncode(), and must be closed after QccWAVbiskEncode() returns. If rate_distortion_file is NULL, no rate-distortion profile is written.
The routine QccWAVbiskEncode2() provides an alternative interface to BISK encoding. Specifically, QccWAVbiskEncode2() functions indentically to QccWAVbiskEncode() described above, except that both the image and optional mask are assumed to have had a DWT applied to them prior to calling QccWAVbiskEncode2(). As a consequence, the image and mask are passed to QccWAVbiskEncode2() in the wavelet domain as image_subband_pyramid and mask_subband_pyramid. We note that most applications should opt for QccWAVbiskEncode() rather than QccWAVbiskEncode2(); however, QccWAVbiskEncode() is implemented essentially as a call to an appropriate DWT followed by a call to QccWAVbiskEncode2(). If QccWAVbiskEncode2() is used, it is the responsibility of the calling routine to perform the appropriate DWT prior to calling QccWAVbiskEncode2().
QccWAVbiskDecodeHeader() decodes the header information in the bitstream in the input buffer (which must be of QCCBITBUFFER_INPUT type and opened via a prior call to QccBitBufferStart(3) ). The header information is returned in num_levels (number of levels of wavelet decomposition), num_rows (vertical size of image), num_cols (horizontal size of image), image_mean (the mean value of the original image), and max_coefficient_bits (the number of bits used to represent the magnitude of the coefficient with largest absolute value).
QccWAVbiskDecode() decodes the bitstream buffer, producing the reconstructed image component, image. The bitstream must already have had its header read by a prior call to QccWAVbiskDecodeHeader() (i.e., you call QccWAVbiskDecodeHeader() first and then QccWAVbiskDecode()). If target_bit_cnt is QCCENT_ANYNUMBITS, then decoding stops when the end of the input bitstream is reached; otherwise, decoding stops when target_num_bits from the input bitstream have been decoded.
If a SA-DWT was used in BISK encoding, then the original transparency mask should be passed to QccWAVbiskDecode() as mask. That is, mask should be the same transparency mask (in the spatial domain) that was passed to QccWAVbiskEncode(). Note that QccWAVbiskDecode() will transform this mask via a Lazy wavelet transform, and then pass the transformed mask to QccWAVSubbandPyramidInverseShapeAdaptiveDWT(3) . If the usual DWT was used in encoding, then mask should be a NULL pointer.
QccWAVbiskDecode2() provides the appropriate alternative interface to BISK decoding required if encoding was done via QccWAVbiskEncode2(). Essentially, QccWAVbiskDecode() is implemented by a call to QccWAVbiskDecode2() followed by an appropriate inverse DWT. If QccWAVbiskDecode2() is used, it is the responsibility of the calling to routine to perform the appropriate inverse DWT subsequent to the call to QccWAVbiskDecode2(). As noted above, most applications should use QccWAVbiskDecode() rather than QccWAVbiskDecode2().
The unique aspect of the BISK algorithm is in the significance pass. In many ways, the BISK algorithm can be considered to be a variant of SPECK (Pearlman et al. 2003) designed specifically for improved shape-adaptive coding. The BISK coder employs two tactics to this end---aggressive discarding of transparent regions from sets after partitioning, and a spatial-partitioning structure more flexible than the quadtrees used by SPECK. In quadtree partitioning, the significance state of an entire block of coefficients is tested and coded, the block is subdivided into four subblocks of approximately equal size, and the significance-coding process is repeated recursively on each of the subblocks. On the other hand, when processing a given set, the BISK coder employs a different partitioning scheme, k-d trees (Bentley 1975) rather than quadtrees, to recursively code the significance map. Like quadtrees, k-d trees are a recursive spatial partitioning data structure. Unlike quadtrees, which subdivide a block into four equally sized subblocks, k-d trees effectuate a binary partitioning; that is, blocks are divided in two, rather than quarters. Although there are several approaches for selecting the location and orientation of the split, we use a simple approach in which blocks are divided into approximately equally sized halves, and we alternate between splitting horizontally and vertically.
Additionally, to improve shape-adaptive coding, the BISK coder "shrinks" each set to the bounding box surrounding the opaque coefficients before subdividing the set into smaller blocks.
The BISK coder uses only one type of set, rather than having S and I sets as in SPECK. Consequently, each subband in the DWT decomposition is added to an LIS at the start of the BISK algorithm, whereas the SPECK algorithm initializes with only the baseband subband in an LIS.
It is computationally expensive to perform multiple truncatations and reconstructions of a bitstream to generate this opertional rate-distortion curve as the full decoding algorithm (including inverse transform) must be run numerous times. Instead, it is fairly straightforward to calculate rate and distortion values while encoding is taking place. To do so, the encoder keeps a running estimate of the current mean squared error (MSE) as calculated in the wavelet domain. Each time a wavelet coefficient is modified while encoding, the encoder adjusts the running MSE estimate using the value of the wavelet coefficient as it would be reconstructed in the decoder.
In theory, the MSE as calculated by this procedure would be exactly the same as the MSE calculated in the original spatial domain, providing that an orthonormal wavelet transform is used (a direct consequence of Parseval's theorem). However, in practice, the MSE as calculated in the wavelet domain using the approach outlined above will differ somewhat from the spatial-domain MSE, which is really what is of interest. The first discrepancy is due to the fact that, in practice, usually biorthogonal, rather than orthonormal, transforms are used for image compression. Using "near orthogonal" transforms, such as the popular Cohen-Daubechies-Feauveau biorthogonal family, help mitigate this effect, however. Additionally, the wavelet-domain MSE calculation ignores the fact that spatial-domain pixels are "rounded" to integer values after the inverse wavelet transform in the decoder. This, too, should produced only a small deviation, so that the wavelet-domain MSE should usually be quite close to the desired spatial-domain MSE. However, by calculating MSE in the wavelet-domain, one avoids numerous and costly inverse-transform calculations.
In QccWAVbiskEncode(), each time a coefficient is modified (in the refinement pass, or when the coefficient initially becomes significant in the significance pass), the current wavelet-domain MSE estimate, along with the current rate as calculated from the number of bits written to the output bitstream thus far, is written to the output rate-distortion file, in ASCII. This is a fairly fine sampling of the operational rate-distortion curve, typically producing several thousands, or tens of thousands, of points on the curve, depending on the final rate of the compressed bitstream.
J. E. Fowler, "Shape-Adaptive Coding Using Binary Set Splitting with k-d Trees," in Proceedings of the International Conference on Image Processing, Singapore, October 2004, to appear.
W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said, "Efficient, Low-Complexity Image Coding With a Set-Partitioning Embedded Block Coder," IEEE Transactions on Circuits and Systems for Video Technology, to appear, 2003.
J. L. Bentley, "Multidimensional Binary Search Trees Used for Associative Searching," Communications of the ACM, vol. 18, no. 9, pp. 509-517, September 1975.