Video Compression Based On Motion Compensation - MULTIMEDIA

The image compression techniques discussed in the previous chapters (e.g., JPEG and JPEG2000) exploit spatial redundancy, the phenomenon that picture contents often change relatively slowly across images, making a large suppression of higher spatial frequency components viable.

Macroblocks and motion vector in video compression: (a) reference frame; (b) target frame

Macroblocks and motion vector in video compression

A video can be viewed as a sequence of images stacked in the temporal dimension. Since the frame rate of the video is often relatively high (e.g., > 15 frames per second) and the camera parameter (focal length, position, viewing angle, etc.) usually do not change rapidly between frames, the contents of consecutive frames are usually similar, unless certain objects in the scene move extremely fast. In other words, the video has temporal redundancy.

Temporal redundancy is often significant and it is exploited, so that not every frame of the video needs to be coded independently as a new image. Instead, the difference between the current frame and other frame(s) in the sequence is coded. If redundancy between them is great enough, the difference images could consist mainly of small values and low entropy, which is good for compression.

As we mentioned, although a simplistic way of deriving the difference image is to subtract one image from the other (pixel by pixel), such an approach is ineffective in yielding a high compression ratio. Since the main cause of the difference between frames is camera and / or object motion, these motion generators can be "compensated" by detecting the displacement of corresponding pixels or regions in these frames and measuring their differences. Video compression algorithms that adopt this approach are said to be based on motion compensation (MC). The three main steps of these algorithms are:

  1. Motion estimation (motion vector search)
  2. Motion compensation - based prediction
  3. Derivation of the prediction error - the difference

For efficiency, each image is divjded into macroblocks of size N x N. By default, N = 16 for luminance images. For chrominance images, N = 8 if 4:2:0 chroma subsampling is adopted. Motion compensation is not performed at the pixel level, nor at the level of video object, as in later video standards (such - as MPEG - 4). Instead, it is at the macroblock level.

The current image frame is referred to as the Target frame. A match is sought between the macroblock under consideration in the Target frame and the most similar macroblock in

previous and / or future frame(s) referred to as Reference frame(s). in that sense, the Target macroblock is predicted from the Reference macroblock(s).

The displacement of the reference macroblock to the target macroblock is called a motion vector MV. The above figure shows the case of forward prediction, in which the Reference frame is taken to be a previous frame. If the Reference frame is a future frame, it is referred to as backward prediction. The, difference of the two corresponding macroblocks is the prediction error.

For video compression based on motion compensation, after the first frame, only the motion vectors and difference macroblocks need be coded, since they are sufficient for the decoder to regenerate all macroblocks in subsequent frames.

We will return to the discussion of some common video compression standards after the following section, in which we discuss search algorithms for motion vectors.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MULTIMEDIA Topics