Synthetic Object Coding In Mpeg-4 - MULTIMEDIA

The number of objects in videos that are created by computer graphics and animation software is increasing. These are denoted synthetic objects and can often be presented together with natural objects and scenes in games, TV ads and programs, and animation or feature films.

Synthetic objects form a subset of the larger class of computer graphics. MPEG - 4 supports the following visual synthetic objects:

  • Parametric descriptions of
  • a synthetic the face and body (body animation in Version 2)
  • Static and Dynamic Mesh Coding with texture mapping
  • Texture Coding for View Dependent applications

2D Mesh Object Coding

A 2D mesh is a tessellation (or partition) of a 2D planar region using polygonal patches. The vertices of the polygons are referred to as nodes of the mesh. The most popular meshes are triangular meshes, where all polygons are triangles. The MPEG - 4 standard makes use of two types of 2D mesh: uniform mesh and Delaunay mesh. Both are triangular meshes that can be used to model natural video objects as well as synthetic animated objects.

Since the triangulation structure (the edges between nodes) is known and can be readily regenerated by the decoder, it is not coded explicitly in the bitstream. Hence, 2D mesh object coding is compact. All coordinate values of the mesh are coded in half - pixel precision.

Each 2D mesh is treated as a mesh object plane (MOP). The above figure illustrates the encoding process for 2D MOPs. Coding can be divided into geometry coding and motion coding. As shown, the input data is the .y and y coordinates of all the nodes and the triangles (tm) in the mesh. The output data is the displacements (dxn, dyn) and the prediction errors of the motion (exn, eyn), both of which are explained below.

2D Mesh Geometry Coding MPEG - 4 allows four types of uniform meshes with different triangulation structures. The following figure shows such meshes with 4 x 5 mesh nodes. Each uniform mesh can be specified by five parameters: the first two specify the number of nodes in each row and column respectively; the next two specify the horizontal and vertical size of each rectangle (containing two triangles) respectively; and the last specifies the type of the uniform mesh.

Four types of uniform meshes: (a) type 0; (b) type 1; (c) type 2; (d) type3

Four types of uniform meshes: (a) type 0; (b) type 1; (c) type 2; (d) type3

Uniform meshes are simple and are especially good for representing 2D rectangular objects (e.g., the entire video frame). When used for objects of arbitrary shape, they are applied to (overlaid on) the bounding boxes of the VOPs, which incurs some inefficiency.

A Delaunay mesh is a better object - based mesh representation for arbitrary - shaped 2D objects.

Definition 1: If D is a Delaunay triangulation, then any of its triangles tn - (Pi, Pj, Pk) Є D satisfies the property that the circumcircle of tn does not contain in its interior any other node point Pt.

A Delaunay mesh for a video object can be obtained in the following steps:

  • Select boundary nodes of the mesh. A polygon is used to approximate the boundary of the object. The polygon vertices are the boundary nodes of the Delaunay mesh. A possible heuristic is to select boundary points with high curvatures as boundary nodes.

  • Choose interior nodes. Feature points within the object's boundary such as edge points or corners, can be chosen as interior nodes for the mesh.

  • Perform Delaunay triangulation. A constrained Delaunay triangulation is performed on the boundary and interior nodes, with the polygonal boundary used as a constraint. The triangulation will use line segments connecting consecutive boundary nodes as edges and form triangles only within the boundary.

  • Constrained Delaunay Triangulation. Interior edges are first added to form new triangles. The algorithm will examine each interior edge to make sure it is locally Delaunay. Given two triangles (Pi, Pj, Pk) and (Pj, Pk, Pt) sharing an edge, if (Pi, Pj, Pk) contains Pt or (Pj, Pk, Pt) contains Pt in the interior of its circumcircle, then JKis not locally Delaunay and will be replaced by a new edge IL

If Pi falls exactly on the circumcircle of (Pi, Pj, Pk) (and accordingly, Pi also falls exactly on the circumcircle of (Pj, Pk, Pt)), then JKwill be viewed as locally Delaunay only if Pi or Pt has the largest x coordinate among the four nodes.

The following figure and show the set of Delaunay mesh nodes and the result of the constrained Delaunay triangulation. If the total number of nodes is N, and N = Nb + Ni where Nb, and Ni denote the number of boundary nodes and interior nodes respectively, then the total number of triangles in the Delaunay mesh is Nb, + 2Ni - 2. In the above figure, this sum is 8 + 2 x 6 - 2 = 18.

Unlike a uniform mesh, the node locations in a Delaunay mesh are irregular; hence, they must be coded. By convention of MPEG - 4, the location (xo, yo) of the top left boundary node is coded first, followed by the other boundary points counter clockwise or clockwise. Afterward, the locations of the interior nodes are coded in any order.

Delaunay mesh: (a) boundary nodes {P0 to P7) and interior nodes (P8 to P13); (b) triangular mesh obtained by constrained Delaunay triangulation

Delaunay mesh: (a) boundary nodes {P0 to P7) and interior nodes (P8 to P13)

Except for the first location (xo, yo), all subsequent coordinates are coded differentially — that is, for n ≥ 1,

dx„ = xn - xn - 1, dy„ - y„ - y„-1,

and afterward, dxn, dyn are variable - length coded.

A breadth - first order of MOP triangles for 2D mesh motion coding

A breadth - first order of MOP triangles for 2D mesh motion coding

2D Mesh Motion Coding. The motion of each MOP triangle in either a uniform or Delaunay mesh is described by the motion vectors of its three vertex nodes. A new mesh structure can be created only in the intra - frame, and its triangular topology will not alter in the subsequent inter - frames. This enforces one - to - one mapping in 2D mesh motion estimation,

For any MOP triangle (Pi, Pj, Pk), if the motion vectors for Pi and Pj are known to be MVi and MVj, then a prediction Predk will be made for the motion vector of Pk, rounded to a half - pixel precision:

Predk = 0.5 . (MVi + MVj).

The prediction error ek is coded as

ek - MVk - Predk.

Once the three motion vectors of the first MOP triangle to are coded, at least one neighboring MOP triangle will share an edge with to, and the motion vector for its third vertex node can be coded, and so on.

The estimation of motion vectors will start at the initial triangle t0, which is the triangle that contains the top left boundary node and the boundary node next to it, clockwise. Motion vectors for all other nodes in the MOP are coded differentially. A breadth - first order is established for traversing the MOP triangles in the 2D mesh motion coding process. The above figure shows how a spanning tree can be generated to obtain the breadth - first order of the triangles. As shown, the initial triangle to has two neighboring triangles t1 and t2 which are not visited yet.

They become child nodes of to in the spanning tree. Triangles t1 and t2, in turn, have their unvisited neighboring triangles (and hence child nodes) t3, t4 and t5, t6 respectively. The traverse order so far is t0, t1, t3, t4, t5 in a breadth - first fashion. One level down the spanning tree, t-^ has only one child node t7, since the other neighbor t1 is already visited; t4, has only one child node t8; and so on.

2D Object Animation The above mesh motion coding established a one - to - one map­ping between the mesh triangles in the reference MOP and the target MOP. It generated

motion vectors for all node points in the 2D mesh. Mesh - based texture mapping is now used to generate the texture for the new animated surface by warping the texture of each triangle in the reference MOP onto the corresponding triangle in the target MOP. This facilitates the animation of 2D synthetic video objects.

For triangular meshes, a common mapping function for the warping is the affine trans­form, since it maps a line to a line and can guarantee that a triangle is mapped to a triangle. It will be shown below that given the six vertices of the two matching triangles, the parameters for the affine transform can be obtained, so that the transform can be applied to all points within the target triangle for texture mapping.

Given a point P = (x, y) on a 2D plane, a linear transform can be specified, such that

linear transform

A transform T is linear if T (αX + βY) = αT(X) + β (Y), where αand βare scalars. The above linear transform is suitable for geometric operations such as rotation and scaling but not to translation, since addition of a constant vector is not possible.

Definiton 2: A transform A is an affine transform if and only if there exists a vector C and a linear transform T such that A(X) — T(X) + C.

If the point (x, y) is represented as [x, y, 1] in the homogeneous coordinate system com­monly used in graphics, then an affine transform that transforms [x, y, I] to [x', y', 1] is defined as:

affine transform

If realizies the following mapping:

affine transform

The following 3 x 3 matrices are the affine transforms for translating by(Tx, Ty),rotating counter clockwise byθ,and scaling by factorsSxandSy:

3 x 3 matrices are the affine transforms

The following are the affine transforms for sheering along the x - axis and y - axis, respectively:

affine transforms for sheering along the x - axis and y - axis

where Hx and Hy are constants determining the degree of sheering.

The above simple affine transforms can be combined (by matrix multiplications) to yield composite affine transforms — for example, for a translation followed by a rotation, or a sheering followed by other transforms.

It can be proven that any composite transform thus generated will have exactly the same matrix form and will have at most 6 degrees of freedom, specified by a11, a21, a31, a12, a22 and a32 If the triangle in the target MOP is

(Po, P1, P2) - ((x0, y0), (x1, y1), (x2, y2)) and the matching triangle in the reference MOP is

(P’o, P’1, P’2) - ((x’0, y’0), (x’1, y’1), (x’2, y’2))

then the mapping between the two triangles can he uniquely defined by the following:

mapping between the two triangles

The above equation contains six linear equations (three for x's and three for y's) required to resolve the six unknown coefficients a11, a21, a31, a12, a22 and a32. Let the equation be stated as X' = XA. Then it is known that A = X - 1X', with inverse matrix given by X - 1 = adj (X)/det (X), where adj (X) is the adjoint of X and det (X) is the determinant. Therefore,

determinant

where det(X) = xo{y1 – y2) - yo(x1 - x2) + (x1y2 - x2y1).

Since the three vertices of the mesh triangle are never colinear points, it is ensured that X is not singular — that is, det (X) 0.

The above affine transform is piecewise that is, each triangle can have its own affine transform. It works well only when the object is mildly deformed during the animation sequence.

3D Model - based Coding

Because of the frequent appearances of human faces and bodies in videos, MPEG - 4 has defined special 3D models for face objects and body objects. Some of the potential appli­cations for these new video objects include teleconferencing, human - computer interfaces, games, and e - commerce. In the past, 3D wireframe models and their animations have been studied for 3D object animation. MPEG - 4 goes beyond wireframes, so that the surfaces of the face or body object can be shaded or texture - mapped.

Mesh - based texture mapping for 2D object animation

Mesh - based texture mapping for 2D object animation

Face Object Coding and Animation. Face models for individual faces could either be created manually or generated automatically through computer vision and pattern recognition techniques. However, the former is cumbersome and nevertheless inadequate, and the latter has yet to be achieved reliably.

MPEG - 4 has adopted a generic default face model, developed by the Virtual Reality Modeling Language (VRML) Consortium].

Face Animation Parameters (FAPs) can be specified to achieve desirable animations — deviations from the original "neutral'' face. In addition, Face Definition Parameters (FDPs) can be specified to better describe individual laces. The following figure shows the feature points for FDPs. Feature points that can be affected by animation (FAPs) are shown as solid circles; and those that are not affected are shown as empty circles.

Feature points for face definition parameters (FDPs). (Feature points for teeth and tongue are not shown)

Feature points for face definition parameters (FDPs)

Visemes code highly realistic lip motions by modeling the speaker's current mouth position. All other FAPs are for possible movements of head, jaw, lip, eyelid, eyeball, eyebrow, pupil, chin, cheek, tongue, nose, ear, and so on.

For example, expressions include neutral, joy, sadness, anger, fear, disgust and surprise, Each is expressed by a set of features — sadness for example, by slightly closed eyes, relaxed mouth, and upward - bent inner eyebrows. FAPs for movement include head - pitch, head - yaw, head - roll, ope - jaw, thrust - jaw, shift - jaw, push - bottom - lip, push - top - lip, and so on. For compression, the FAPs are coded using predictive coding.

Predictions for FAPs in the target frame are made based on FAPs in the previous frame, and prediction errors are then coded using arithmetic coding. DCT can also be employed to improve the compression ratio, although it is considered more computationally expensive. FAPs are also quantized, with different quantization step sizes employed to explore the fact that certain FAPs (e.g., openjaw) need less precision than others (e.g., push - top - lip).

Body Object Coding and Animation. MPEG - 4 Version 2 introduced body objects, which are a natural extension to face objects.

Working with the Humanoid Animation (H - Anim) Group in the VRML Consortium, MPEG adopted a generic virtual human body with default posture. The default is standing, with feet pointing to the front, arms at the sides, with palms facing inward. There are 296 Body Animation Parameters (BAPs). When applied to any MPEG - 4 - compliant generic body, they will produce the same animation.

A large number of BAPs describe joint angles connecting different body parts, including the spine, shoulder, clavicle, elbow, wrist, finger, hip, knee, ankle, and toe. This yields 186 degrees of freedom to the body, 25 to each hand alone. Furthermore, some body movements can be specified in multiple levels of detail. For example, five different levels, supporting 9, 24, 42, 60, and 72 degrees of freedom can be used for the spine, depending on the complexity of the animation.

For specific bodies, Body Definition Parameters (BDPs) can be specified for body dimen­sions, body surface geometry, and, optionally, texture.

Body surface geometry uses a 3D polygon mesh representation, consisting of a set of polygonal planar surfaces in 3D space. The 3D mesh representation is popular in computer graphics for surface modeling. Coupled with texture mapping, it can deliver good (photorealistic) renderings. The coding of BAPs is similar to that of FAPs: quantization and predictive coding are used, and the prediction errors are further compressed by arithmetic coding.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

MULTIMEDIA Topics