# H.261 - MULTIMEDIA

H.261 is an earlier digital video compression standard. Because its principle of motion - compensation - based compression is very much retained in all later video compression standards, we will start with a detailed discussion of H.261.

The International Telegraph and Telephone Consultative Committee (CCITT) initiated development of H.261 in 1988. The final recommendation was adopted by the International Telecommunication Union - Telecommunication standardization sector (ITU - T), formerly CCITT, in 1990.

The standard was designed for videophone, video conferencing, and other, audio visual services over ISDN telephone lines. Initially, it was intended to support multiples (from 1 to 5) of 384 kbps channels. In the end, however, the video codec supports bitrates of p x 64 kbps, where p ranges from 1 to 30. Hence the standard was once known as p * 64, pronounced "p star 64". The standard requires the video encoders delay to be less than 150 msec, so that the video can be used for real - time, bidirectional video conferencing.

H.261 belongs to the following set of ITU recommendations for visual telephony systems:

H.221. Frame structure for an audiovisual channel supporting 64 to 1,920 kbps

H.230. Frame control signals for audiovisual systems

Table Video formats supported by H.261

1. H.242. Audiovisual communication protocols
2. H.261.Video encoder / decoder for audiovisual services at p x 64 kbps
3. H.320. Narrowband audiovisual terminal equipment for p x 64 kbps transmission

The above table lists the video formats supported by H.261. Chroma subsampling in H.261 is 4:2:0. Considering the relatively low bitrate in network communications at the time, support for CCIR 601 QCIF is specified as required, whereas support for CIF is optional.

The following figure illustrates a typical H.261 frame sequence. Two types of image frames are defined: ultra - frames (I - frames) and interframes (P - frames).

I - frames are treated as independent images. Basically, a transform coding method similar to JPEG is applied within each I - frame, hence the name "intra".

P - frames are not independent. They are coded by a forward predictive coding method in which current macroblocks are predicted from similar macroblocks in the preceding I: or P - frame, and differences between the macroblocks are coded. Temporal redundancy removal is hence included in P - frame coding, whereas I - frame coding performs only spatial redundancy removal. It is important to remember that prediction from a previous P - frame is allowed (not just from a previous I - frame).

The interval between pairs of I - frames is a variable and is determined by the encoder. Usually, an ordinary digital video has a couple of I - frames per second. Motion vectors in H.261 are always measured in units of full pixels and have a limited range of ±15 pixels that is, p = 15.

H.261 Frame sequence

I - frame coding

Intra - Frame(l - Frame) Coding

Macroblocks are of size 16 x 16 pixels for the Y frame of the orignal image. For Cb and Cr frames, they correspond to areas of 8 x 8, since 4:2:0 chroma subsampling is employed. Hence, a macroblock consists of four Y blocks, one Cb, and one Cr, 8 x 8 blocks.

For each 8 x 8 block, a DCT transform is applied. As in JPEG, the DCT coefficients go through a quantization stage. Afterwards, they are zigzag - scanned and eventually entropy - coded.

Inter - Frame (P - Frame) Predictive Coding

The following figure shows the H.261 P - frame coding scheme based on motion compensation. For each macroblock in the Target frame, a motion vector is allocated by one of the search methods discussed earlier. After the prediction, a difference macroblock is derived to measure the prediction error. It is also carried in the form of four Y blocks, one Cb, and one Cr block. Each of these 8 x 8 blocks goes through DCT, quantization, zigzag scan, and entropy coding. The motion vector is also coded.

Sometimes, a good match cannot be found — the prediction error exceeds a certain acceptable level."The macroblock itself is then encoded (treated as an intra macroblock) and in this case is termed a non - motion - compensated macroblock.

P - frame coding encodes the difference macroblock (not the Target macroblock itself). Since the difference macroblock usually has a much smaller entropy than the Target macroblock a a large compression ratio is attainable.

In fact, even the motion vector is not directly coded. Instead, the difference, MVD, between the motion vectors of the preceding macroblock and current macroblock is sent for entropy coding:

Quantization in H.261

The quantization in H.261 does not use 8 x 8 quantization matrices, as in JPEG and MPEG. Instead, it uses a constant, called stepsize, for all DCT coefficients within a macroblock.

H.261 P - frame coding based on motion compensation

According to the need (e.g., bitrate control of the video) stepsize can take on any one of the 31 even values from 2 to 62. One exception, however, is made for the DC coefficient in intra mode, where a step size of 8 is always used. If we use DCT and QDCT to denote the DCT coefficients before and after quantization, then for DC coefficients in intra mode,

where scale is an integer in the range of [1, 31]

H.261 Encoder and decoder

The following figure shows a relatively complete picture of how the H.261 encoder and decoder work. Here, Q and Q - 1 stand for quantization and its inverse, respectively. Switching of the intra - and inter - frame modes can be readily implemented by a multiplexer. To avoid propagation of coding errors,

H.261: (a) encoder; (b) decoder

1. An I - frame is usually sent a couple of times in each second of the video.
2. As discussed earlier, decoded frames (not the original frames) are used as reference frames in motion estimation.

Table Data flow at the observation points in H.261 encoder

Table Data flow at the observation points in H.261 decoder

To illustrate the operational detail of the encoder and decoder, let's use a scenario where frames I, P1, and P2 are encoded and then decoded. The data that goes through the observation points, indicated by the circled numbers in the above figure is summarized in the above tables. We will use I, P1, P2 for the original data,for the decoded data (usually a lossy version of the original), and P' 1, P' 2for the predictions in the Inter - frame mode.

For the encoder, when the Current Frame is an Intra - frame, Point number 1 receives macroblocks from the I - frame. DCT, Quantization, and Entropy Coding steps, and the result is sent to the Output Buffer, ready to be transmitted.

Meanwhile, the quantized DCT coefficients for I are also sent to Q - 1 and IDCT and hence appear at Point as I. Combined with a zero input from Point, the data at Point remains as I and this is stored in Frame Memory, waiting to be used for Motion Estimation and Motion - Compensation - based Prediction for the subsequent frame P1.

Quantization Control serves as feedback — that is, when the Output Buffer is too full, the quantization step size is increased, so as to reduce the size of the coded data. This is known as an encoding rate control process.

When the subsequent Current Frame P1 arrives at Point 1, the Motion Estimation process is invoked to find the motion vector for the best matching macroblock in frame I for each of the macroblocks in P1. The estimated motion vector is sent to both Motion - Compensation - based Prediction and Variable - Length Encoding (VLE). The MC - based Prediction yields the best matching macroblock in P1. This is denoted as P1 appearing at Point 2.

At Point, the "prediction error" is obtained, which is D1 = P1 - P1. Now D1 undergoes DCT, Quantization, and Entropy Coding, and the result is sent to the Output Buffer. As before, the DCT coefficients for D1 are also sent to Q - l and IDCT and appear at Point 4 as D1.

Added to P’1 at Point, we have P' 1 = P' 1 + D' 1at Point6. This is stored in Frame Memory, waiting to be used for Motion Estimation and Motion - Compensation - based Prediction for the subsequent frame P2. The steps for encoding P2 are similar to those for P1, except that P2will be the Current Frame and P1 becomes the Reference Frame.

For the decoder, the input code for frames will be decoded first by Entropy Decoding, Q - 1, and IDCT. For Intra - frame mode, the first decoded frame appears at Point 1 and then Point 4 as I. It is sent as the first output and at the same time stored in the Frame Memory.

Subsequently, the input code for Inter - frame Pi is decoded, and prediction error D1 is received at Point. Since the motion vector for the current macroblock is also entropy - decoded and sent to Motion - Compensation - based Prediction, the corresponding predicted macroblock P’1 can be located in frame I and will appear at Points.

Combined with D' 1, we have P'1 = P' 1 + D' 1 at point, and it is sent out as the decoded frame and also stored in the Frame Memory, Again, the steps for decoding P2 are similar to those for P1

A Glance at the H.261 Video Bitstream Syntax

Let's take a brief look at the H.261 video bitstream syntax. This consists of a hierarchy of four layers: Picture, Group of Blocks (GOB), Macroblock, and Block.

1. Picture layer.Picture Start Code (PSC) delineates boundaries between pictures. Temporal Reference (TR) provides a timestamp for the picture. Since temporal subsampling can sometimes be invoked such that some pictures will not be transmitted, it is important to have TR, to maintain synchronization with audio. Picture Type (PType) specifies, for example, whether it is a OF or QCIF picture.
2. GOB layer. H.261 pictures are divided into regions of 11 x 3 macroblocks (i.e., regions of 176 x 48 pixels in luminance images), each of which is called a Group of Blocks {GOB). For instance, the OF image has 2 x 6 GOBs, corresponding to its image resolution of 352 x 288 pixels.

Each GOB has its Start Code (GBSC) and. Group number (GN). The GBSC is unique and can be identified, without decoding the entire variable - length code in the bitstream. In case a network error causes a bit error or the loss of some bits, H.261 video can be recoyered and resynchronized at the next identifiable GOB, preventing the possible propagation of errors.

Syntax of H.261 video bitstream

GQuant indicates the quantizer to be used in the GOB, unless it is overridden by any subsequent Macroblock Quantizer (MQuant). GQuant and MQuant are referred to as scale. Each macroblock (MB) has its own Address, indicating its position within the GOB, quantizer (MQuant), and six 8 x 8 image blocks (4 Y, 1 Cb, 1 Cr). Type denotes whether it is an Intra- or Inter, motion - compensated or non - motion - compensated macroblock. Motion Vector Data (MVD) is obtained by taking the

Arrangement of GOBs in H.261 luminance images

difference between the motion vectors of the preceding and current macroblocks. Moreover, since some blocks in the macroblocks match well and some match poorly in Motion Estimation, a bitmask Coded Block Pattern (CBP) is used to indicate this information. Only well - matched blocks will have their coefficients transmitted. Block layer. For each 8^ x. g block, the bitstream starts with DC value, followed by pairs of length of zero - run (Rim) and the subsequent nonzero value (Level) for ACs, and finally the End of Block (EOB) code. The range of "Run" is [0,63]. "Level" reflects quantized values its range is [ - 127,127], and Level ≠ 0.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd

MULTIMEDIA Topics