QccWAVspiht3DDecodeHeader.3

NAME

QccWAVspiht3DEncode, QccWAVspiht3DDecode - encode/decode an image cube using the 3D-SPIHT algorithm

int QccWAVspiht3DEncode(const QccIMGImageCube *image, const QccIMGImageCube *mask, QccBitBuffer *buffer, int transform_type, int zerotree_type, int temporal_num_levels, int spatial_num_levels, const QccWAVWavelet *wavelet, int target_bit_cnt, int arithmetic_coded);

int QccWAVspiht3DEncode2(QccWAVSubbandPyramid3D *image_subband_pyramid, QccWAVSubbandPyramid3D *mask_subband_pyramid, int transform_type, int zerotree_type, double image_mean, QccBitBuffer *buffer, int target_bit_cnt, int arithmetic_coded);

int QccWAVspiht3DDecodeHeader(QccBitBuffer *buffer, int *transform_type, int *zerotree_type, int *temporal_num_levels, int *spatial_num_levels, int *num_frames, int *num_rows, int *num_cols, double *image_mean, int *max_coefficient_bits, int *arithmetic_coded);

int QccWAVspiht3DDecode(QccBitBuffer *buffer, QccIMGImageCube *image, const QccIMGImageCube *mask, int transform_type, int zerotree_type, int temporal_num_levels, int spatial_num_levels, const QccWAVWavelet *wavelet, double image_mean, int max_coefficient_bits, int target_bit_bit, int arithmetic_coded);

int QccWAVspiht3DDecode2(QccBitBuffer *buffer, QccWAVSubbandPyramid3D *image_subband_pyramid, QccWAVSubbandPyramid3D *mask_subband_pyramid, int transform_type, int zerotree_type, int temporal_num_levels, int spatial_num_levels, int max_coefficient_bits, int target_bit_cnt, int arithmetic_coded);

DESCRIPTION

Encoding

QccWAVspiht3DEncode() encodes an image cube, image, using a 3D generalization of the Set Partitioning In Hierarchical Trees (SPIHT) algorithm. The original SPIHT algorithm was developed for 2D images by Said and Pearlman; it was latter extended to 3D by Kim, Pearlman, and Xiong. In essence, the 3D-SPIHT algorithm involves a 3D DWT followed by a progressive "bitplane" coding of the wavelet coefficients using a zerotree-like quantization structure.

image is the image cube to be coded and buffer is the output bitstream. buffer must be of QCCBITBUFFER_OUTPUT type and opened via a prior call to QccBitBufferStart(3) .

QccWAVspiht3DEncode() supports the use of both wavelet-packet and dyadic wavelet-transform decompositions. Furthermore a variety of zerotree structures are available. If transform_type is QCCWAVSUBBANDPYRAMID3D_DYADIC, a dyadic DWT is used; if transform_type is QCCWAVSUBBANDPYRAMID3D_PACKET, a wavelet-packet DWT is used. If zerotree_type is QCCSPIHT3D_ZEROTREE_DYADIC, a dyadic zerotree structure is used; if zerotree_type is QCCSPIHT3D_ZEROTREE_PACKET, a symmetric wavelet-packet zerotree structure is used; while if zerotree_type is QCCSPIHT3D_ZEROTREE_ASPACKET, an asymmetric wavelet-packet zerotree structure is used. See "DYADIC VS. WAVELET-PACKET TRANSFORMS AND ZEROTREES" below for more detail. temporal_num_levels and spatial_num_levels give the number of levels of wavelet decomposition to perform for both transform types; for a dyadic transform, temporal_num_levels must equal spatial_num_levels. Furthermore, if zerotree_type is QCCSPIHT3D_ZEROTREE_DYADIC or QCCSPIHT3D_ZEROTREE_PACKET, temporal_num_levels must equal spatial_num_levels. wavelet is the wavelet to use for decomposition.

The bitstream output from the 3D-SPIHT encoder is embedded, meaning that any prefix of the bitstream can be decoded to give a valid representation of the image. The 3D-SPIHT encoder essentially produces output bits until the number of bits output reaches target_bit_cnt, the desired (target) total length of the output bitstream in bits, and then it stops. Note that this is the bitstream length in bits, not the rate of the bitstream (which would be expressed in bits per voxel).

As originally described by Said and Pearlman, the 2D-SPIHT algorithm uses arithmetic coding of symbols as a final output step to improve coding efficiency. Alternatively, arithmetic coding can be suppressed, producing what Said and Pearlman call "binary-uncoded" output. The QccPack 3D-SPIHT implementation supports both arithmetic-coded and binary-uncoded output modes. arithmetic_coded is a flag passed to QccWAVspiht3DEncode() that indicates whether arithmetic coding should be performed (1 = arithmetic coding, 0 = binary-uncoded).

QccWAVspiht3DEncode() optionally supports the use of a shape-adaptive DWT (SA-DWT) rather than the usual DWT. That is, QccWAVspiht3DEncode() can call QccWAVSubbandPyramid3DShapeAdaptiveDWT(3) as the wavelet transform rather than the usual QccWAVSubbandPyramid3DDWT(3) . The use of a SA-DWT is indicated by a non-NULL mask; if mask is NULL, then the usual DWT is used. In the case of a SA-DWT, mask gives the transparency mask which indicates which voxels of the image are non-transparent and thus have data that is to be transformed. Refer to QccWAVSubbandPyramid3DShapeAdaptiveDWT(3) for more details on the calculation of this SA-DWT. See "SHAPE-ADAPTIVE CODING" below for details on how the 3D-SPIHT algorithm handles shape-adaptive coding.

The routine QccWAVspiht3DEncode2() provides an alternative interface to 3D-SPIHT encoding. Specifically, QccWAVspiht3DEncode2() functions indentically to QccWAVspiht3DEncode() described above, except that both the image cube and optional mask are assumed to have had a 3D DWT applied to them prior to calling QccWAVspiht3DEncode2(). As a consequence, the image cube and mask are passed to QccWAVspiht3DEncode2() in the wavelet domain as image_subband_pyramid and mask_subband_pyramid. We note that most applications should opt for QccWAVspiht3DEncode() rather than QccWAVspiht3DEncode2(); however, QccWAVspiht3DEncode() is implemented essentially as a call to an appropriate 3D DWT followed by a call to QccWAVspiht3DEncode2(). If QccWAVspiht3DEncode2() is used, it is the responsibility of the calling routine to perform the appropriate 3D DWT prior to calling QccWAVspiht3DEncode2().

Decoding

QccWAVspiht3DDecodeHeader() decodes the header information in a bitstream previously produced by QccWAVspiht3DEncode(). The input bitstream is buffer which must be of QCCBITBUFFER_INPUT type and opened via a prior call to QccBitBufferStart(3) .

The header information is returned in transform_type (either QCCWAVSUBBANDPYRAMID3D_DYADIC or QCCWAVSUBBANDPYRAMID3D_PACKET to indicate a dyadic or wavelet-packet transform decomposition, respectively), zerotree_type (QCCSPIHT3D_ZEROTREE_DYADIC, QCCSPIHT3D_ZEROTREE_PACKET, or QCCSPIHT3D_ZEROTREE_ASPACKET, to indicate a dyadic, symmetric-packet, or asymmetric-packet zerotree structure, respectively), temporal_num_levels (number of levels of wavelet decomposition in the temporal direction), spatial_num_levels (number of levels of wavelet decomposition in the spatial directions), num_frames (size of the image cube in the temporal direction), num_rows (vertical size of image cube), num_cols (horizontal size of image cube), image_mean (the mean value of the original image cube), max_coefficient_bits (indicates the precision, in number of bits, of the wavelet coefficient with the largest magnitude), and arithmetic_coded (indicates whether the to data stream to follow is arithmetic-coded or not).

QccWAVspiht3DDecode() decodes the bitstream buffer, producing the reconstructed image cube, image. The bitstream must already have had its header read by a prior call to QccWAVspiht3DDecodeHeader() (i.e., you call QccWAVspiht3DDecodeHeader() first and then QccWAVspiht3DDecode()). If target_bit_cnt is QCCENT_ANYNUMBITS, then decoding stops when the end of the input bitstream is reached; otherwise, decoding stops when target_num_bits from the input bitstream have been decoded.

If a SA-DWT was used in 3D-SPIHT encoding, then the original transparency mask should be passed to QccWAVspiht3DDecode() as mask. That is, mask should be the same transparency mask (untransformed) that was passed to QccWAVspiht3DEncode(). Note that QccWAVspiht3DDecode() will transform this mask via a Lazy wavelet transform, and then pass the transformed mask to QccWAVSubbandPyramid3DInverseShapeAdaptiveDWT(3) . If the usual, full-volume DWT was used in encoding, then mask should be a NULL pointer.

QccWAVspiht3DDecode2() provides the appropriate alternative interface to 3D-SPIHT decoding required if encoding was done via QccWAVspiht3DEncode2(). Essentially, QccWAVspiht3DDecode() is implemented by a call to QccWAVspiht3DDecode2() followed by an appropriate inverse 3D DWT. If QccWAVspiht3DDecode2() is used, it is the responsibility of the calling to routine to perform the appropriate inverse 3D DWT subsequent to the call to QccWAVspiht3DDecode2(). As noted above, most applications should use QccWAVspiht3DDecode() rather than QccWAVspiht3DDecode2().

DYADIC VS. WAVELET-PACKET TRANSFORMS AND ZEROTREES

As the first step in 3D-SPIHT coding, a wavelet transform is deployed. As is usual for 2D images, the 3D DWT is implemented in a separable fashion, employing 1D transforms separately in the spatial-row, spatial-column, and temporal-frame directions. In a 3D wavelet transform, different decomposition orders can yield different results. For instance, we can perform one scale of decomposition along each direction, then do further decomposition in the lowpass subband, leading to the dyadic decomposition (transform_type equal QCCWAVSUBBANDPYRAMID3D_DYADIC). This dyadic decomposition structure is the 3D generalization of the 2D dyadic decomposition used ubiquitously in 2D image coding. However, in 3D, we can alternatively use a so-called "wavelet-packet" transform in which we first apply a 1D decomposition in the temporal direction and then follow by decomposing each temporal frame using a separable 2D transform. With this approach, we employ a 1D decomposition of temporal_num_levels scales temporally followed by a 2D decomposition of spatial_num_levels scales spatially, where it is possible for spatial_num_levels and temporal_num_levels to be unequal. In comparing the two decomposition structures, the wavelet-packet transform is generally considered to be more flexible because the temporal decomposition can be better tailored to the data at hand than in the dyadic transform.

After the wavelet transform, the 3D-SPIHT algorithm employs a zerotree-based bitplane-coding algorithm. In the case of a dyadic transform, the zerotree used is a straightforward extension to 3D of the parent-child relationship of 2D zerotrees; that is, one coefficient is the parent to a 2x2x2 cube of eight offspring coefficients in the next scale.

In the case of a wavelet-packet transform, there are several approaches to fitting a zerotree structure to the wavelet coefficients. The first, proposed by Kim et al., recognizes that wavelet-packet subbands appear as "split" versions of their dyadic counterparts, consequently, one should "split" the 2x2x2 offspring nodes of the dyadic zerotree structure appropriately. Alternatively, one could employ a dyadic zerotree structure directly on the wavelet-packet subband decomposition oblivious to the differing structure. While the former "splitting" approach appears to be a more appropriate structure, the latter dyadic approach, which results in the intermingling of coefficients from different spatial localities within a single 2x2x2 offspring collection, is easier to implement and often results in slightly better rate-distortion performance. However, both of these zerotree structures apparently require that the wavelet-packet transform have the same number of temporal and spatial decomposition levels.

An alternative zerotree structure for packet transforms was proposed originally by He et al., and was subsequently used by Cho and Pearlman. In essence, this zerotree structure consists of 2D zerotrees within each "slice" of the subband-pyramid volume, with parent-child relationships setup between the tree-root coefficients of the 2D trees. Cho and Pearlman called this alternative structure an "asymmetric" packet zerotree, with the original splitting-based packet structure of Kim et al. then being a "symmetric" packet zerotree. The asymmetric structure usually offers slightly better rate-distortion performance than either the symmetric packet or dyadic zerotree structures; additionally, the wavelet-packet transform can have the number of temporal decomposition levels different from the number of spatial decomposition levels when the asymmetric tree is used.

The QccPack implementation of 3D-SPIHT supports both dyadic and packet transforms (as specified by transform_type). In the case that a wavelet-packet transform is used (transform_type equal to QCCWAVSUBBANDPYRAMID3D_PACKET), one can independently control whether a dyadic, symmetric-packet, or asymmetric-packet zerotree structure is used (as specified by zerotree_type). The asymmetric-packet zerotree structure (zerotree_type equal to QCCSPIHT3D_ZEROTREE_ASPACKET) is implemented by adopting the 2D zerotree offspring relationships, suitably altered at the tree roots; for an asymmetric-packet zerotree, spatial_num_levels and temporal_num_levels can differ. For the symmetric-packet zerotree structure (zerotree_type equal to QCCSPIHT3D_ZEROTREE_PACKET), the splitting approach of Kim et al. is implemented by reorganizing the wavelet coefficients with a call to QccWAVSubbandPyramid3DPacketToDyadic(3) and then following through coding as though a dyadic transform was used. For the dyadic zerotree structure (zerotree_type equal to QCCSPIHT3D_ZEROTREE_DYADIC), a zerotree of spatial_num_levels is employed directly on the wavelet-packet coefficients. In the case of either a symmetric-packet or dyadic zerotree, both spatial_num_levels and temporal_num_levels must be the same. In the case that a dyadic DWT is used (transform_type equal to QCCWABSUBBANDPYRAMID3D_DYADIC), only the dyadic zerotree structure is supported; i.e., zerotree_type must be QCCSPIHT3D_ZEROTREE_DYADIC, and both spatial_num_levels and temporal_num_levels must be the same.

SHAPE-ADAPTIVE CODING

The usual way to handle arbitrarily shaped objects within 3D-SPIHT is to follow the approach typically used for 2D zerotree-based coders; that is, permanently set transparent regions in the image to "insignificant" during the SA-DWT so that the 3D-SPIHT algorithm processes these transparent regions in a manner identical to that of other insignificant coefficients. This approach has been taken for a number of 2D zerotree-based coding algorithms; see Li and Li for an example of such. Minami et al. go one step further on this basic approach by discarding all sets of coefficients that lie entirely within a transparent region from the lists maintained by 3D-SPIHT. Although this refinement typically offers a small gain in performance, the size of the gain is dependent on how much of the overall image is transparent. The QccPack implementation of 3D-SPIHT follows the approach by Minami et al. for shape-adaptive coding.

Finally, note that the concept of shape-adaptive coding arose in the work surrounding the MPEG-4 standard and was not considered in the original 2D-SPIHT work by Said and Pearlman, nor the original extension to 3D by Kim, Pearlman, and Xiong.

AUTHOR

Table of Contents