Mpeg-4 Part10/H.264 - MULTIMEDIA

The Joint Video Team (JVT) of ISO / TEC MPEG and ITU - T VCEG (Video Coding Experts Group) developed the H.264 video compression standard, which was scheduled to be completed by March 2003. It was formerly known by its working title "H.26L." Preliminary studies using software based on this new standard suggests that H.264 offers up to 50% better compression than MPEG - 2 and up to 30% better than H.263+ and MPEG - 4 advanced simple profile.

The outcome of this work is actually two identical standards: ISO MPEG - 4 Part 10 and ITU - T H.264. With its superior compression performance over MPEG - 2, H.264 is currently one of the leading candidates to carry HDTV video content on many potential applications.

Core Features

Similar to the previous ITU - T H.263+, H.264 specifies a block - based, motion - compensated transform hybrid decoder with five major blocks:

  • Entropy decoding
  • Motion compensation or intra - prediction
  • Inverse scan, quantization, and transform of residual pixels
  • Reconstruction
  • In - loop deblocking filter on reconstructed pixels

Each picture can again be separated into macroblocks (16 x 16 blocks), and arbitrary - sized slices can group multiple macroblocks into self - contained units.

VLC - Based Entropy Decoding.Two entropy methods are used in the variable - length entropy decoder: Unified - VLC (UVLC) and Context Adaptive VLC (CAVLC). UVLC uses simple exponential Golomb codes to decode header data, motion vectors, and other nonresidual data, while the more complex CAVLC decodes residual coefficients.

In CAVLC, multiple VLC tables are predefined for each data type (runs, levels, etc.), and predefined rules predict the optimal VLC table based on the context (previously decoded symbols). CAVLC allows multiple statistical models to be used for each data type and improves entropy coding efficiently over existing fixed VLC, such as in H.263+.

Motion Compensation (P - Prediction).Inter - frame motion compensation in H.264 is similar to H.263+ but more sophisticated. Instead of limiting motion - compensation block size to either 16 x 16 or 8 x 8, as in H.263+, H.264 uses a tree - structured motion segmentation down to 4 x 4 block size (16 x 16, 16 x 8, 8 x 16, 8 x 8, 8 x 4,4 x 8, 4 x 4). This allows much more accurate motion compensation of moving objects.

Furthermore, motion vectors in H.264 can be up to sample accuracy. A six - tap sink filter is used for half - pixel interpolation, to preserve high frequency. Simple averaging is used for quarter - pixel interpolation, which provides not only more accurate motion but also a lower - pass filter than the half - pixel. Multiple reference frames are also a standard feature in H.264, so that the ability to choose a different reference frame for each macroblock is available in all profiles.

Intra - Prediction (I - Prediction).H.264 exploits much more spatial prediction than in previous video standards such as H.263+. Intra - coded macroblocks are all predicted using neighboring reconstructed pixels (using both intra - and inter - coded reconstructed pixels). Similar to motion compensation, different block sizes can be chosen for each intra - coded macroblock (16 x 16 or 4 x 4). There are nine prediction modes for 4 x 4 blocks (where each 4 x 4 block in a macroblock can have a different prediction mode) and four prediction modes for 16 x 16 blocks. This sophisticated intra - prediction is powerful as it drastically reduces the amount of data to be transmitted when temporal prediction fails.

Transform, Scan, Quantization.Given the powerful and accurate P~ and I - prediction schemes in H.264, it is recognized that the spatial correlation in residual pixels is typically very low. Hence, a simple integer - precision 4 x 4 DCT is sufficient to compact the energy. The integer arithmetic allows exact inverse transform on all processors and eliminates encoder / decoder mismatch problems in previous transform - based codecs. H.264 also provides a quantization scheme with nonlinear step - sizes to obtain accurate rate control at both the high and low ends of the quantization scale.

In - Loop Deblocking Filters.H.264 specifies a sophisticated signal - adaptive deblocking filter in which a set of filters is applied on 4 x 4 block edges. Filter length, strength, and type (deblocking / smoothing) vary, depending on macroblock coding parameters (intra - or inter - coded, motion - vector differences, reference - frame differences, coefficients coded) and spatial activity (edge detection), so that blocking artifacts are eliminated without distorting visual features. The H.264 deblocking filter is important in increasing the subjective quality of the standard.

Baseline Profile Features

The Baseline profile of H.264 is intended for real - time conversational applications, such as videoconferencing. It contains all the core coding tools of H.264 discussed above and the following additional error - resilience tools, to allow for error - prone carriers such as IP and wireless networks:

Arbitrary slice order (ASO). The decoding order of slices within a picture may not follow monotonic increasing order. This allows decoding of out - of - order packets in a packet - switched network thus reducing latency.

Flexible macroblock order (FMO). Macroblocks can be decoded in any order, such as checkerboard patterns, not just raster scan order. This is useful on error - prone networks, so that loss of a slice results in loss of macroblocks scattered in the picture, which can easily be masked from human eyes. This feature can also help reduce jitter and latency, as the decoder may decide not to wait for late slices and still be able to produce acceptable pictures.

Redundant slices.Redundant copies of the slices can be decoded, to further improve error resilience.

Main Profile Features

The Main profile defined by H.264 represents non - low - delay applications such as broad ­ casting and stored - medium. The Main profile contains all Baseline profile features (except ASO, FMO, and redundant slices) plus the following non - low - delay and higher complexity features, for maximum compression efficiency:

Bslices.The bi - prediction mode in H.264 has been made more flexible than in existing standards. Bi - predicted pictures can also be used as reference frames. Two reference frames for each macroblock can be in any temporal direction, as long as they are available in the reference frame buffer. Hence, in addition to the normal forward + backward bi - prediction, it is legal to have backward + backward or forward + forward prediction as well.

Context Adaptive Binary Arithmetic Coding (CAB AC). This coding mode replaces VLC - based entropy coding with binary arithmetic coding that uses a different adaptive statistics model for different data types and contexts.

  • Weighted Prediction.Global weights (multiplier and an offset) for modifying the motion - compensated prediction samples can be specified for each slice, to predict lighting changes and other global effects, such as fading.

Extended Profile Features

The extended profile (or profile X) is designed for the new video streaming applications. This profile allows non - low - delay features, bitstream switching features, and also more error - resilience tools. It includes all Baseline profile features plus the following:

  • B slices
  • Weighted prediction
  • Slice data partitioning.These partitions slice data with different importance into separate sequences (header information, residual information) so that more important data can be transmitted on more reliable channels.
  • SP andSI slice types. These are slices that contain special temporal prediction modes, to allow bitstream switching, fast forward / backward, and random access.

The vastly improved H.264 core features, together with new coding tools offer significant improvement in compression ratio, error resiliency, and subjective quality over existing ITU - T and MPEG standards.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status