QccVIDrwmhDecodeHeader.3

NAME

QccVIDrwmhEncode, QccVIDrwmhDecode - encode/decode an image sequence using the RWMH algorithm

int QccVIDrwmhEncode(QccIMGImageSequence *image_sequence, const QccFilter *filter1, const QccFilter *filter2, const QccFilter *filter3, int subpixel_accuracy, int blocksize, QccBitBuffer *output_buffer, int num_levels, int target_bit_cnt, const QccWAVWavelet *wavelet const QccString mv_filename, int read_motion_vectors, int quiet);

int QccVIDrwmhDecodeHeader(QccBitBuffer *input_buffer, int *num_rows, int *num_cols, int *start_frame_num, int *end_frame_num, int *num_levels, int *blocksize, int *target_bit_cnt);

int QccVIDrwmhDecode(QccIMGImageSequence *image_sequence, const QccFilter *filter1, const QccFilter *filter2, const QccFilter *filter3, int subpixel_accuracy, int blocksize, QccBitBuffer *input_buffer, int target_bit_cnt, int num_levels, const QccWAVWavelet *wavelet, const QccString mv_filename, int quiet);

DESCRIPTION

Encoding

QccVIDrwmhEncode() encodes an image_sequence using the redundant-wavelet-multihypothesis (RWMH) video-coding algorithm by Cui et al. Essentially, the RWMH algorithm involves traditional block-based motion estimation and motion compensation wherein the redundant phases of the RDWT of the reference frame are used to provide a multihypothesis estimate of motion based on the diversity of the transform phases; see "ALGORITHM" below for greater detail.

image_sequence is the image sequence to be coded and should indicate a collection of grayscale images of the same size stored as separate, numbered files; the filename indicated by image_sequence must contain one printf(3) -style numerical descriptor which will then be filled in the current frame number (e.g., football.%03d.pgm will become football.000.pgm, football.001.pgm, etc.; see QccPackIMG(3) ). Each frame of image_sequence must have a size which is an integer multiple of blocksize both horizontally and vertically. Both the start_frame_num and end_frame_num fields of image_sequence should indicate the desired starting and stopping frames, respectively, for the encoding; these should either be set manually or via a call to QccIMGImageSequenceFindFrameNums(3) prior to calling QccVIDrwmhEncode().

filter1, filter2, and filter3 are interpolation filters for supporting the subpixel accuracy specified by subpixel_accuracy which can be one of QCCVID_ME_FULLPIXEL, QCCVID_ME_HALFPIXEL, QCCVID_ME_QUARTERPIXEL, or QCCVID_ME_EIGHTHPIXEL, indicating full-, half-, quarter-, or eighth-pixel accuracy, respectively. See "SUBPIXEL ACCURACY" below.

block_size is the size of square blocks to use in the RWMH algorithm.

output_buffer is the output bitstream which must be of QCCBITBUFFER_OUTPUT type and opened via a prior call to QccBitBufferStart(3) .

num_levels gives the number of levels of dyadic wavelet decomposition to perform, and wavelet is the wavelet to use for decomposition, in all the RDWTs in the RWMH algorithm.

The RWMH encoder uses a cross-scale distortion measure to determine a motion vector for a "set" of spatially co-located blocks in the RDWT of the current frame. The number of transform scales used in this cross-scale is num_levels. The result of this motion-estimation search is an "all-phase" motion vector for the set of blocks.

When encoding the current frame, QccVIDrwmhEncode() will first output to output_buffer all the motion vectors for the frame, and then an embedded intraframe encoding of the motion-compensated residual. target_bit_cnt is the desired number of bits to output for each frame, including motion vectors and motion-compensated residual. After target_num_bits have been produced for the current frame, QccVIDrwmhEncode() will call QccBitBufferFlush(3) to flush the bit-buffer contents to the output bitstream. Encoding of the next frame starts on the next byte boundary of the output bitstream.

mv_filename gives the filename for files of motion vectors. If read_motion_vectors is FALSE, then the motion-vectors are written to mv_filename via QccVIDMotionVectorsWriteFile(3) . mv_filename should have a printf(3) -style numerical descriptor which will then be filled in with the current frame number before writing, so that motion vectors are separated into multiple files, one file per frame. On the other hand, if read_motion_vectors is TRUE, then motion vectors are read from mv_filename via QccVIDMotionVectorsReadFile(3) , in which case QccVIDrwmhEncode() performs no motion estimation itself, using simply the motion vectors read from the files for coding.

If quiet = 0, QccVIDrwmhEncode() will print to stdout a number of statistics concerning each frame as it is encoding. If quiet = 1, this verbose output is suppressed.

Decoding

QccVIDrwmhDecodeHeader() decodes the header information in a bitstream produced by QccVIDrwmhEncode(). The input bitstream is input_buffer which must be of QCCBITBUFFER_INPUT type and open via a prior call to QccBitBufferStart(3) . The header information is returned in num_rows (vertical size of image-sequence frames), num_cols (horizontal size of image-sequence frames), start_frame_num (number of the first frame of the sequence), end_frame_num (number of the last frame of the sequence), num_levels (number of wavelet-transform levels), and target_bit_cnt (number of bits encoded for each frame).

QccVIDrwmhDecode() decodes the bitstream input_buffer, reconstructing each image of the output image sequence and writing it to a separate, numbered grayscale-image file. The filename denoted by image_sequence must contain one printf(3) -style numerical descriptor which is filled in with the number of the current frame being decoded. The bitstream must already have had its header read by a prior call to QccVIDrwmhDecodeHeader() (i.e., you call QccVIDrwmhDecodeHeader() first and then QccVIDrwmhDecode()). If quiet = 0, then QccVIDrwmhDecode() prints a brief message to stdout after decoding each frame; if quiet = 1, then this message is suppressed.

mv_filename gives the name of files of motion vectors. If mv_filename is NULL, then QccVIDrwmhDecode() simply decodes the motion vectors to use in decoding from the bitstream (the usual state of affairs). On the other hand, if mv_filename is not NULL, then the motion vectors stored in the bitstream are ignored and, rather, the motion vectors are read from mv_filename instead. mv_filename should have a printf(3) -style numerical descriptor which will then be filled in with the current frame number before reading via QccVIDMotionVectorsReadFile(3) .

ALGORITHM

Multihypothesis motion compensation (MHMC) forms a prediction in the current frame as a combination of multiple predictions in an effort to combat the uncertainty inherent in the motion-estimation (ME) process. A number of multihypothesis techniques for motion compensation (MC) have been proposed. One approach to MHMC is to implement multihypothesis prediction in the spatial dimensions; i.e., the predictions are culled from spatially distinct locations in the reference frame. Included in this class of MHMC would be subpixel-accurate MC and overlapped block motion compensation (OBMC). Another approach is to deploy MHMC in the temporal dimension by choosing predictions from multiple reference frames. Examples of this class of MHMC are bidirectional prediction (B-frames) and long-term-memory motion compensation (LTMMC). Cui et al. introduced a new class of MHMC by extending the multihypothesis-prediction concept into the transform domain. Specifically, Cui et al. performed ME/MC in the domain of a redundant, or overcomplete, wavelet transform, and used multiple predictions that were diverse in transform phase. The term redundant-wavelet multihypothesis (RWMH) was coined to describe this approach to phase-diversity multihypothesis.

In the RWMH approach, both the current and reference frames are transformed into RDWT coefficients, and both ME and MC take place in this RDWT domain. However, before calculating the residual frame, the motion-compensated frame is mapped from the RDWT domain back to the spatial domain via a multiple-phase inverse RDWT, and the residual is calculated and coded in the original spatial domain.

Intuitively, we observe that each of the critically sampled DWTs within an RDWT will "view" motion from a different perspective. Consequently, if motion is predicted in the RDWT domain, the multiple-phase inverse RDWT converts these multiple predictions into a single multihypothesis prediction in the spatial domain. Cui et al. present an analytic derivation that substantiates this intuition by quantifying the performance gain of RWMH over single-phase prediction. Key to this analysis is that noise in the RDWT domain undergoes a substantial reduction in variance when the multiple-phase inverse RDWT is applied due to the well-known fact that this pseudo-inverse contains a projection onto the range space of the forward transform. Consequently, noise not captured by the motion model is greatly reduced in an RWMH system, leading to substantial reduction in the prediction-residual variance and higher coding efficiency.

We note that Cui et al. have improved the performance of RWMH by combining it with other, more traditional forms of multihypothesis, e.g., OBMC. In the present implementation of RWMH, these enhancements have not (yet) been implemented.

SUBPIXEL ACCURACY

Due to the linearity of the RDWT, it is possible to implement subpixel interpolation in the RDWT domain in a manner similar to as done in the spatial domain to support traditional subpixel accuracy. Specifically, one simply interpolates each subband of the RDWT to subpixel accuracy independently. QccVIDrwmhEncode() and QccVIDrwmhDecode() both call QccVIDMotionEstimationCreateReferenceFrame(3) for each subband of the RDWT of the reference frame to interpolate the subband to the accuracy specified by subpixel_accuracy. The filters filter1, filter2, and filter3 are passed to QccVIDMotionEstimationCreateReferenceFrame(3) to control whether filtered interpolation or bilinear interpolation is performed at each step of the subpixel interpolation. See QccVIDMotionEstimationCreateReferenceFrame(3) for more detail.

AUTHOR

Written by Joe Boettcher <jbb15@msstate.edu> based on the originally developed algorithm and code by Suxia Cui.

Table of Contents

NAME
SYNOPSIS
DESCRIPTION

Encoding
Decoding

ALGORITHM
SUBPIXEL ACCURACY
SEE ALSO
AUTHOR