# Transform Coding - MULTIMEDIA

From basic principles of information theory, we know that coding vectors is more efficient than coding scalars. To carry out such an intention, we need to group blocks of consecutive samples from the source input into vectors.

Let X = {X1, X2… Xk)T be a vector of samples. Whether our input data is an image, a piece of music, an audio or video clip, or even a piece of text, there is a good chance that a substantial amount of correlation is inherent among neighboring samples Xi. The rationale behind transform coding is that if Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X.

For example, if most information in an RGB image is contained in a main axis, rotating so V that this direction is the first component means that luminance can be compressed differently from color information. This will approximate the luminance channel in the eye.

In higher dimensions than three, if most information is accurately described by the first few components of a transformed vector, the remaining components can be coarsely quantized, or even set to zero, with little signal distortion. The more decorrelated — that - is, the less effect one dimension has on another (the more orthogonal the axes), the more chance we have of dealing differently with the axes that store relatively minor amounts of information without affecting reasonably accurate reconstruction of the signal from its quantized or truncated transform coefficients.

Generally, the transform T itself does not compress any data. The compression comes from the processing and quantization of the components of Y. In this section, we will study the Discrete Cosine Transform (DCT) as a tool to decorrelate the input signal, We will also examine the Karhunen - Loeve Transform (KLT), which optimally decorrelates the components of the input X.

Discrete Cosine Transform (DCT)

The Discrete Cosine Transform (DCT), a widely used transform coding technique, is able to perform decorrelation of the input signal in a data - independent manner. Because of this, it has gained tremendous popularity. We will examine the definition of the DCT and discuss some of its properties, in particular the relationship between it and the more familiar Discrete Fourier Transform (DFT).

Definition of DCT. Let's start with the two - dimensional DCT. Given a function f {i, j) over two integer variablesi and j (a piece of an image), the 2D DCT transforms it into a new function F(u, v), with integer u and v running over the same range as i and j. The general definition of the transform is where i, u 0, 1,... , M — 1, j, v — 0,1, ... , AT — 1, and the constants C(u) and C(v) are determined by In the JPEG image compression standard, an image block is defined to have dimension M ~ N = 8. Therefore, the definitions for the 2D DCT and its inverse (IDCT) in this case are as follows: where; I, j, u, v = 0, 1,.,. , 7, and the constants C(u) and C(v) are determined by the equation.

2D Inverse Discrete Cosine Transform (2D IDCT). The inverse function is almost the same, with the roles of f (i, j) and f(u, v) reversed, except that now C(u)C(v) must stand inside the sums: where i, j, u, v 0, 1,... ,7.

The 2D transforms are applicable to 2D signals, such as digital images. As shown below, the ID version of the DCT and IDCT is similar to the 2D version.

ID Discrete Cosine Transform (ID DCT). where i = 0,1,... , 7, u = 0,1,... , 7.

ID Inverse Discrete Cosine Transform (1D - IDCT). where i = 0, 1,... , 7, u = 0,1,... ,7.

One - Dimensional DCT. Let's examine the DCT for a one - dimensional signal; almost all concepts are readily extensible to the 2D DCT.

An electrical signal with constant magnitude is known as a DC (direct current) signal. A common example is a battery that carries 1.5 or 9 volts DC. An electrical signal that changes its magnitude periodically at a certain frequency is known as an AC (alternating current) signal. A good example is the household electric power circuit, which carries electricity with sinusoidal waveform at 110 volts AC, 60 Hz (or 220 volts, 50 Hz in many other countries).

Most real signals are more complex. Speech signals or a row of gray - level intensities in a digital image are examples of such ID signals. However, any signal can be expressed as a sum of multiple signals that are sine or cosine waveforms at various amplitudes and frequencies. This is known as Fourier analysis. The terms DC and AC, originating in electrical engineering, are carried over to describe these components of a signal (usually) composed of one DC and several AC components.

If a cosine function is used, the process of determining the amplitudes of the AC and DC components of the signal is called a Cosine Transform, and the integer indices make it a Discrete Cosine Transform. When u = 0, the equation yields the DC coefficient; when u = 1, or 2,..., up to 7, it yields the first or second, etc., up to the seventh AC coefficient.

Inverse Discrete Cosine Transform uses a sum of the products of the DC or AC coefficients and the cosine functions to reconstruct (recompose) the function f(i). Since computing the DCT and IDCT involves some loss, f(i) is now denoted by f`(i).

In short, the role of the DCT is to decompose The original signal into its DC and AC components; the role of the IDCT is to reconstruct (recompose) the signal. The DCT and IDCT use the same set of cosine functions; they are known as basis functions. The following figure shows the family of eight ID DCT basis functions: u= 0..7.

The DCT enables a new means of signal processing and analysis in the frequency domain. We mean to analyze blocks of eight pixels in an image, but we can begin by considering time - dependent signals, rather than space - dependent ones (since time - signal analysis is where the method originates).

Suppose f(i) represents a signal that changes with time i (we will not be bothered here by the convention that time is usually denoted as t). The ID DCT transforms f (i), which is in the time domain, to F(u), which is in the frequency domain. The coefficients F(u) are known as the frequency responses and form the frequency spectrum of f(i).

Let's use some examples to illustrate frequency responses.

The ID DCT basis functions  Examples of ID Discrete Cosine Transform: (a) a DC signal f (i); (b) an AC signal fiiO; (c) /3(0 = f (i) + /2(/); and (d) an arbitrary signal /(/)  The characteristics of the DCT can be summarized as follows:

The DCT produces the frequency spectrum F(µ) corresponding to the spatial signal f(i). In particular, the 0th DCT coefficient F(0) is the DC component of the signal f(i). The other seven DCT coefficients reflect the various changing (i.e., AC) components of the signal f(i) at different frequencies. If we denote F(1) as AC1, F(2) as AC2, ..., F{1) as AC7, then AC1 is the first AC component, which completes half a cycle as a cosine function over [0, 71; AC2 completes a full cycle; AC3 completes one and one - half cycles; ..., and AC7, three and a half cycles. All these are, of course, due to the cosine basis functions, which are arranged in exactly this manner.

In other words, the second basis function corresponds to AC1, the third corresponds to AC2, and so on. Since the signal f% (i) and the third basis function have exactly the same cosine waveform, with identical frequency and phase, they will reach the maximum (positive) and minimum (negative) values synchronously. As a result, their products are always positive, and the sum of their products (F2 (2) or AC2) is large. It turns out that all other AC coefficients are zero, since f2(i) and all the other basis functions happen to be orthogonal. (We will discuss orthogonality later in this chapter.)

It should be pointed out that the DCT coefficients can easily take on negative values. For DC, this occurs when the average of f(i) is less than zero. (For an image, this never happens so the DC is nonnegative.) For AC, a special case occurs when f{i) and some basis function have the same frequency but one of them happens to be half a cycle behind — this yields a negative coefficient, possibly with a large magnitude.

In general, signals will look more like the one in the above figure (d). Then f(i) will produce many nonzero AC components, with the ones toward AC7 indicating higher frequency content. A signal will have large (positive or negative) response in its high - frequency components only when it alternates rapidly within the small range.

As an example, if AC7 is a large positive number, this indicates that the signal f(i) has a component that alternates synchronously with the eighth basis function — three and half cycles. According to the Nyqist theorem, this is the highest frequency in the signal that can be sampled with eight discrete values without significant loss and aliasing.

The DCT is a linear transform.

In general, a transform T (or function) is linear, iff where α and β are constants, and p and q are any functions, variables or constants. From the definition, this property can readily be proven for the DCT, because it uses only simple arithmetic operations.

One - Dimensional Inverse DCT. Let's finish the example in the above figure (d) by showing its inverse DCT (IDCT). Recall that F(u) contains the following:

F{u) (µ= 0 ..7): 69 -49 74 11 16 117 44 -5

The IDIDCT, as indicated in Eq. can readily be implemented as a loop with eight iterations. An example of ID IDCT   As it happens, even though we went from integer to integer via intermediate floats, we recovered the signal exactly. This is not always true, but the answer is always close.

The Cosine Basis FunctionsFor a better decomposition, the basis functions should be orthogonal, so as to have the least redundancy amongst them. Functions Bp(i) and Bq(i) are orthogonal if The orthonormal property is desirable. With this property, the signal is not amplified during the transform. When the same basis function is used in both the transformation and its inverse (sometimes called forward transform and backward transform), we will get (approximately) the same signal back. It can be shown that The cosine basis functions in the DCT are indeed orthogonal. With the help of constants C(p) and C(q) they are also orthonormal. (Now we understand why constants C(u) and C(v) in the definitions of DCT and IDCT seemed to have taken some arbitrary values.)

Recall that because of the orthogonality, lor f2(i) in the above figure, only F2(2) (for µ = 2) has a nonzero output whereas all other DCT coefficients are zero. This is desirable for some signal processing and analysis in the frequency domain, since we are now able to precisely identify the frequency components in the original signal. The cosine basis functions are analogous to the basis vectors x, y, z for the 3D Cartesian space, or the so - called 3D vector space. The vectors are orthonormal, because If we view the sum - of - products operation in Eq. as the dot product of one of the discrete cosine basis functions (for a specified u) and the signal f(i), then the analogy between the DCT and the Cartesian projection is remarkable. Namely, to get the x - coordinate of point P, we simply project P onto the x axis. Mathematically, this is equivalent to a dot productObviously, the same goes for obtaining yp and zp. Graphical illustration of 8 x 8 2D DCT basis 2D Basis Functions. For two - dimensional DCT functions, we use the basis shown as 8 x 8 images. These are depicted in the above figure where white indicates positive values and black indicates negative. To obtain DCT coefficients, we essentially just form the inner product of each of these 64 basis images with an 8 x 8 block from an original image. Notice that now we are talking about an original signal indexed by space, not time. We do this for each 8 x 8 image block. The 64 products we calculate make up an 8 x 8 spatial frequency image F(u, v).

2D Separable Basis. Of course, for speed, most software implementations use fixed point arithmetic to calculate the DCT transform. Just as there is a mathematically derived Fast Fourier Transform, there is also a Fast DCT. Some fast implementations approximate coefficients so that all multiplies are shifts and adds. Moreover, a much simpler mechanism is used to produce 2D DCT coefficients — factorization into two ID DCT transforms.

When the block size is 8, the 2D DCT can be separated into a sequence of two ID DCT steps. First, we calculate an intermediate function G(i, u) by performing a ID DCT on each column — in this way, we have gone over to frequency space for the columns, but not for the rows: Then, we calculate another ID DCT, this time replacing the row dimension by its frequency counterpart: This is possible because the 2D DCT basis functions are separable (multiply separate functions of i and j). It is straightforward to see that this simple change saves many arithmetic steps. The number of iterations required is reduced from 8 x 8 to 8 + 8.

Comparison of DCT and DFT. The discrete cosine transform is a close counterpart to the Discrete Fourier Transform (DFT), and in the world of signal processing, the latter is likely the more common. We have started off with the DCT instead because it is simpler and is also much used in multimedia. Nevertheless, we should not entirely ignore the DFT.

For a continuous signal, we define the continuous Fourier transform T as follows: Thus, the continuous Fourier transform is composed of an infinite sum of sine and cosine terms. Because digital computers require us to discretize the input signal, we define a DFT that operates on eight samples of the input signal {f0, f1,... , f7) as Even without giving an explicit definition of the DCT, we can guess that the DCT is likely a transform that involves only the real part of the DFT. The intuition behind the formulation of the DCT that allows it to use only the cosine basis functions of the DFT is that we can cancel out the imaginary part of the DFT by making a symmetric copy of the original input signal.

Symmetric extension of the ramp function This works because sine is an odd function; thus, the contributions from the sine terms cancel each other out when the signal is symmetrically extended. Therefore, the DCT of eight input samples corresponds to the DFT of 16 samples made up of the original eight input samples and a symmetric copy of these, as in the above figure.

With the symmetric extension, the DCT is now working on a triangular wave, whereas the DFT tries to code the repeated ramp. Because the DFT is trying to model the artificial discontinuity created between each copy of the samples of the ramp function, a lot of high - frequency components are needed.

The following table shows the calculated DCT and DFT coefficients. We can see that more energy is concentrated in the first few coefficients in the DCT than in the DFT. If we try to approximate

Table DCT and DFT coefficients of the ramp function Approximation of the ramp function: (a) three - term DCT approximation; (b) three - term DFT approximation The original ramp function using only three terms of both the DCT and DFT, we notice that the DCT approximation is much closer.

Karhunen - Loeve Transform*

The Karhunen - Loeve Transform (KLT) is a reversible linear transform that exploits the statistical properties of the vector representation. Its primary property is that it optimally decorrelates the input. To do so, it fits an n - dimensional ellipsoid around the (mean - subtracted) data. The main ellipsoid axis is the major direction of change in the data.

Think of a cigar that has unfortunately been stepped on. Cigar data consists of a cloud of points in 3 - space giving the coordinates of positions of measured points in the cigar. The long axis of the cigar will be identified by a statistical program as the first KLT axis. The second most important axis is the horizontal axis across the squashed cigar, perpendicular to the first axis. The third axis is orthogonal to both and is in the vertical, thin direction. A KLT component program carries out just this analysis.

To understand the optimality of the KLT, consider the autocorrelation matrix Rx of the input vector X, defined as where Rx(t, s) — E[XtXS] is the autocorrelation function. Our goal is to find a transform T such that the components of the output Y are uncorrelated — that is, E[YtYs] = 0, if t ≠ s. Thus, the autocorrelation matrix of Y takes on the form of a positive diagonal matrix.Since any autocorrelation matrix is symmetric and nonnegative definite, there are k orthogonal eigenvectors u1, u2,. .. , uk and k corresponding real and nonnegative eigenvalues λ1≥ λ2…. ≥λk≥0 We define the Karhunen - Loeve transform as Clearly, we have the required autocorrelation matrix for Y. Therefore, the KIT is optimal, in the sense that it completely decorrelates the input. In addition, since the KLT depends on the computation of the autocorrelation matrix of the input vector, it is data dependent: it has to be computed for every dataset.

MULTIMEDIA Topics