The Jpeg Standard - MULTIMEDIA

JPEG is an image compression standard developed by the Joint Photographic Experts Group. It was formally accepted as an international standard in 1992.

JPEG consists of a number of steps, each of which contributes to compression. We'll look at the motivation behind these steps, then take apart the algorithm piece by piece.

Main Steps in JPEG Image Compression

As we know, unlike one - dimensional audio signals, a digital image f (i, j) is not defined over the time domain. Instead, it is defined over a spatial domain — that is, an image is a function of the two dimensions i and j (or, conventionally, x and y). The 2D DCT is used as one step in JPEG, to yield a frequency response that is a function F (u, v) in the spatial frequency domain, indexed by two integers u and v.

JPEG is a lossy image compression method. The effectiveness of the DCT transform coding method in JPEG relies on three major observations:

  • Observation 1.Useful image contents change relatively slowly across the image — that is, it is unusual for intensity values to vary widely several times in a small area — for example, in an 8 x 8 image block. Spatial frequency indicates how many times pixel values change across an image block. The DCT formalizes this notion with a measure of how much the image contents change in relation to the number of cycles of a cosine wave per block.

  • Observation 2.Psychophysical experiments suggest that humans are much less likely to notice the loss of very high - spatial - frequency components than lower - frequency components.

Block diagram for JPECJ encoder

Block diagram for JPECJ encoder

JPEG's approach to the use of DCT is basically to reduce high - frequency - contents and then efficiently code the result into a bitstring. The term spatial redundancy indicates that much of me information in an image is repeated: if a pixel is red, then its neighbor is likely red also. Because of Observation 2 above, the DCT coefficients for the lowest frequencies are most important. Therefore, as frequency gets higher, it becomes less important to represent the DCT coefficient accurately. It may even be safely set to zero without losing much perceivable image information.

Clearly, a string of zeros can be represented efficiently as the length of such a run of zeros, and compression of bits required is possible, Since we end up using fewer numbers to represent the pixels in blocks, by removing some location - dependent information, we have effectively removed spatial redundancy.

JPEG works for both color and grayscale images. In the case of color images, such as YIQ or YUV, the encoder works on each component separately, using the same routines. If the source image is in a different color format, the encoder performs a color - space conversion to YIQ or YUV. As discussed in Chapter, the chrominance images (/, Q or U, V) are subsampled: JPEG uses 4:2:0 scheme, making use of another observation about vision:

  • Observation 3. Visual acuity (accuracy in distinguishing closely spaced lines) is much greater for gray ("black and white") than for color. We simply cannot see much change in color if it occurs in close proximity — think of the blobby ink used in comic books. This works simply because our eye sees the black lines best, and our brain just pushes the color into place. In fact, ordinary broadcast TV makes use of this phenomenon to transmit much less color information than gray information.

When the JPEG image is needed for viewing, the three compressed component images can be decoded independently and eventually combined. For the color channels, each pixel must be first enlarged to cover a 2 x 2 block. Without loss of generality, we will simply use one of them — for example, the Y image, in the description of the compression algorithm below.

The above figure shows a block diagram for a JPEG encoder. If we reverse the arrows in the figure, we basically obtain a JPEG decoder. The JPEG encoder consists of the following main steps:

  • Transform RGB to YIQ or YUV and subsample color
  • Perform DCT on image blocks
  • Apply Quantization
  • Perform Zigzag ordering and run - length encoding
  • Perform Entropy coding

DCT onImage Blocks. Each image is divided into 8x8 blocks. The 2D DCT is applied to each - block image f(i, j), with output _ being the DCT coefficients F(u,v) for each block. The choice of a small block size in JPEG is a compromise reached by the committee: a number larger than 8 would have made accuracy at low frequencies better, but using 8 makes the DCT (and IDCT) computation very fast.

Using blocks at all, however, has the effect of isolating each block from its neighboring context. This is why JPEG images look choppy ("blocky") when the user specifies a high compression ratio — we can see these blocks. (And in fact removing such "blocking artifacts" is an important concern of researchers.)

Quantization.The quantization step in JPEG is aimed at reducing the total number of bits needed for a compressed image. It consists of simply dividing each entry in the frequency space block by an integer, then rounding:


Here, F(u,v) represents a DCT coefficient, Q(u,v) is a quantization matrix entry, and F(u,v) represents the quantized DCT coefficients JPEG will use in the succeeding entropy coding.

The default values in the 8 x 8 quantization matrix Q(u, v) are listed in the following tables for luminance and chrominance images, respectively. These numbers resulted from psychophysical studies, with the goal of maximizing the compression ratio while minimizing perceptual losses in JPEG images. The following should be apparent:

The luminance quantization table

The luminance quantization table

The chrominance quantization table

The chrominance quantization table

  • Since the numbers in Q(u,v) are relatively large, the magnitude and variance of F(u, v) are significantly smaller than those of F(u, v). We'll see later that F(u, v) can "be coded with many fewer bits. The quantization step is the main source / or loss in JPEG compression.

  • The entries of Q (u, v) tend to have larger values toward the lower right corner. This aims to introduce more loss at the higher spatial frequencies — a practice supported by Observations 1 and 2.

We can handily, change the compression ratio simply by multiplicatively scaling the numbers in the Q (u, v) matrix. In fact, the quality factor, a user choice offered in every JPEG implementation, is essentially linearly tied to the scaling factor. JPEG also allows custom quantization tables to be specified and put in the header; it is interesting to use low - constant or high - constant values such as Q = 2 or Q = 100 to observe the basic effects of Q on visual artifacts.

The following figures show some results of JPEG image coding and decoding on the test image Lena.Only the luminance image (Y) is shown. Also, the lossless coding steps after quantization are not shown, since they do not affect the quality / loss of the JPEG images. These results show the effect of compression and decompression applied to a relatively smooth block in the image arid a more textured (higher - frequency - content) block, respectively.

Suppose f(i, j) represents one of the 8 x 8 blocks extracted from the image, F{u, v) the DCT coefficients, and F(u, v) the quantized DCT coefficients. Let F(u, v) denote the dequantized DCT coefficients, determined by simply multiplying by Q(u, v), and let /(/, j) be the reconstructed image block.

To illustrate the quality of the JPEG compression, especially the loss, the error e(/, j) — f(i, j) — f(i, j) is shown in the last row in the following figures.

In the following figure, an image block (indicated by a black box in the image) is chosen at the area where the luminance values change smoothly. Actually, the left side of the block is brighter, and the right side is slightly darker. As expected, except for the DC and the first few AC components, representing low spatial frequencies, most of the DCT coefficients F(u, v) have small magnitudes. This is because the pixel values in this block contain few high - spatial - frequency changes.

JPEG compression for a smooth image block

JPEG compression for a smooth image block

JPEG compression for a smooth image block

JPEG compression for a textured image block

JPEG compression for a textured image block

JPEG compression for a textured image block

An explanation of a small implementation detail is in order. The range of 8 - bit luminance values f(i, j) is [0,255]. In the JPEG implementation, each Y value is first reduced by 128 by simply subtracting.

The idea here is to turn the Y component into a zero - mean image, the same as the chrominance images. As a result, we do not waste any bits coding the mean value. (Think of an 8 x 8 block with intensity values ranging from 120 to 135.) Using f(i, j) 128 in place of f(i, j) will not affect the output of the AC coefficients — it alters only the DC coefficient.

In the above figure the image block chosen has rapidly changing luminance. Hence, many more AC components have large magnitudes (including those toward the lower right corner, where it and v are large).

Preparation for Entropy Coding.We have so far seen two of the main steps in JPEG compression: DCT and quantization. The remaining small steps shown in the block diagram all lead up to entropy coding of the quantized DCT coefficients. These additional data compression steps are lossless. Interestingly, the DC and AC coefficients are treated quite differently before entropy coding: run - length encoding on ACs versus DPCM on DCs.

Run - Length Coding (RLC) on AC Coefficients.The many zeros in F(u, v) after quantization is applied. Run - length Coding {RLC) {or Run - length Encoding, RLE) is therefore useful in turning the F(u, v) values into sets {# - zeros - to - skip, next non­zero value}. RLC is even more effective when we use an addressing scheme, making it most likely to hit a long run of zeros: a zigzag scan turns the 8 x 8 matrix F(u, v) into a 64 - vector, as the following figure illustrates. After all, most image blocks tend to have small high - spatial - frequency components, which are zeroed out by quantization. Hence the zigzag scan order has a good chance of concatenating long runs of zeros. For example, F (u , v) will be turned into

(32,6,-1,-1,0,-1,0,0,0,-1,0,0, 1,0,0,... ,0)

with three runs of zeros in the middle and a run of 51 zeros at the end.

Zigzag scan in JPEG

Zigzag scan in JPEG

The RLC step replaces values by a pair (RUNLENGTH, VALUE) for each run of zeros in the AC coefficients of F, where RUNLENGTH is the number of zeros in the run and VALUE is the next nonzero coefficient. To further save bits, a special pair (0,0) indicates the end - of - block after the last nonzero AC coefficient is reached. In the above example, not considering the first (DC) component, we will thus have

(0, 6)(0, -1)(0, -1)(1, -1)<3, "1)(2, 1X0, 0)

Differential Pulse Code Modulation (DPCM) on DC Coefficients.The DC coefficients are coded separately from the AC ones. Each 8 x 8 image block has only one DC coefficient. The values of the DC coefficients for various blocks could be large and different, because the DC value reflects the average intensity of each block, but consistent with Observation 1 above, the DC coefficient is unlikely to change drastically within a short distance. This makes DPCM an ideal scheme for coding the DC coefficients.

If the DC coefficients for the first five image blocks are 150,155,149,152, 144, DPCM would produce 150, 5, —6, 3, —8, assuming the predictor for the ith block is simply di = DCi + 1 DCi, and do =DC0. We expect DPCM codes to generally have smaller magnitude and variance, which is beneficial for the next entropy coding step.

It is worth noting that unlike the run - length coding of the AC coefficients, which is performed on each individual block, DPCM for the DC coefficients in JPEG is carried out on the entire image at once.

Baseline entropy coding details — size category

Baseline entropy coding details — size category

Entropy CodingThe DC and AC coefficients finally undergo an entropy coding step. Below, we will discuss only the basic entropy coding method, which uses Huffman coding and supports only 8 - bit pixels in the original images (or color image components).

Let's examine the two entropy coding schemes, using a variant of Huffman coding for DCs and a slightly different scheme for ACs.

Huffman Coding of DC CoefficientsEach DPCM - coded DC coefficient is represented by a pair of symbols (SIZE, AMPLITUDE), where SIZE indicates how many bits are needed for representing the coefficient and AMPLITUDE contains the actual bits.

Table illustrates the size category for the different possible amplitudes. Notice that - DPCM values could require more than 8 bits and could be negative values. The one's - complement scheme is used for negative numbers — that is, binary code 10 for 2, 01 for — 2; 11 for 3, 00 for — 3; and so on. In the example we are using, codes 150, 5, —6, 3, —8 will be turned into

(8, 10010110), (3, 101), (3, 001), (2, 11), (4,0111)

In the JPEG implementation, SIZE is Huffman coded and is hence a variable - length code. In other words, SIZE 2 might be represented as a single bit (0 or 1) if it appeared most frequently. In general, smaller SIZEs occur much more often —- the entropy of SIZE is low. Hence, deployment of Huffman coding brings additional compression. After encoding, a custom Huffman table can be stored in the JPEG image header; otherwise, a default Huffman table is used.

On the other hand, AMPLITUDE is not Huffman coded. Since its value can change widely, Huffman coding has no appreciable benefit.

Huffman Coding of AC Coefficients. Recall we said that the AC coefficients are run - length coded and are represented by pairs of numbers (RUNLENGTH, VALUE). However, in an actual JPEG implementation, VALUE is further represented by SIZE and AMPLITUDE, as for the DCs. To save bits, RUNLENGTH and SIZE are allocated only 4 bits each and squeezed into a single byte — let's call this Symbol 1. Symbol 2 is the AMPLITUDE value; its number of bits is indicated by SIZE:


Symbol 2: (AMPLITUDE)

The 4 - bit RUNLENGTH can represent only zero - runs of length 0 to 15. Occasionally, the zero - run length exceeds 15; then a special extension code, (15, 0), is used for Symbol 1. In the worst case, three consecutive (15, 0) extensions are needed before a normal terminating Symbol 1, whose RUNLENGTH will then complete the actual run length. As in DC, Symbol 1 is Huffman coded, whereas Symbol 2 is not.

JPEG Modes

The JPEG standard supports numerous modes (variations). Some of the commonly used ones are:

  • Sequential Mode
  • Progressive Mode
  • Hierarchical Mode
  • Lossless Mode

Sequential Mode. This is the default JPEG mode. Each gray - level image or color image component is encoded in a single left - to - right, top - to - bottom scan. We implicitly assumed this mode in the discussions so far. The "Motion JPEG" video codec uses Baseline Sequential JPEG, applied to each image frame in the video.

Progressive Mode. Progressive JPEG delivers low - quality versions of the image quickly, followed by higher - quality passes, and has become widely supported in web browsers. Such multiple scans of images are of course most useful when the speed of the communication line is low. In Progressive Mode, the first few scans carry only a few bits and deliver a rough picture of what is to follow. After each additional scan, more data is received, and image quality is gradually enhanced. The advantage is that the user - end has a choice whether to continue receiving image data after the first scan(s).

Progressive JPEG can be realized in one of the following two ways. The main steps (DCT, quantization, etc.) are identical to those in Sequential Mode.

Spectral selection. This scheme takes advantage of the spectral (spatial frequency spec­trum) characteristics of the DCT coefficients: the higher AC components provide only detail information.

Scan 1: Encode DC and first few AC components, e.g., AC1, AC2.

Scan 2: Encode a few more AC components, e.g., AC3, AC4, AC5.

Scan k: Encode the last few ACs, e.g., AC61, AC62, AC63.

Successive approximation.Instead of gradually encoding spectral bands, all DCT coeffi­cients are encoded simultaneously, but with their most significant bits (MSBs) first.

Scan 1: Encode the first few MSBs, e.g., Bits 7, 6, 5, and 4.

Scan 2: Encode a few more less - significant bits, e.g., Bit 3.

Scan m: Encode the least significant bit (LSB), Bit 0.

Hierarchical Mode. As its name suggests, Hierarchical JPEG encodes the image in a hierarchy of several different resolutions. The encoded image at the lowest resolution is basically a compressed low - pass - filtered image, whereas the images at successively higher resolutions provide additional details (differences from the lower - resolution images). Sim­ilar to Progressive JPEG, Hierarchical JPEG images can be transmitted in multiple passes with progressively improving quality.

The following figure illustrates a three - level hierarchical JPEG encoder and decoder (separated by the dashed line in the figure).

Block diagram for Hierarchical JPEG

Block diagram for Hierarchical JPEG


  • Reduction of image resolution. Reduce resolution of the input image f (e.g., 512 x 512) by a factor of 2 in each dimension to obtain f2 (e.g., 256 x 256). Repeat this to obtain f4 (e.g., 128 x 128).
  • Compress low - resolution image f4. Encode f4 using any other JPRG method (e.g., Sequential, Progressive) to obtain F4.
  • Compress difference image d2



Lossless ModeLossless JPEG is a very special case of JPEG which indeed has no loss in its image quality. It employs only a simple differential coding method, involving no transform coding. It is rarely used, since its compression ratio is very low compared to other, lossy modes. On the other hand, it meets a special need, and the newly developed JPEG - LS standard is specifically aimed at lossless image compression.

JPEG bitstream

JPEG bitstream

A Glance at the JPEG Bitstream

The above figure provides a hierarchical view of the organization of the bitstream for JPEG images. Here, a frame is a picture, a scan is a pass through the pixels (e.g., the red component), a segment is a group of blocks, and a block consists of S x 8 pixels. Examples of some header information are:

Frame header

  • Bits per pixel
  • (Width, height) of image
  • Number of components
  • Unique ID (for each component)
  • Horizontal/vertical sampling factors (for each component)
  • Quantization table to use (for each component)

Scan header

  • Number of components in scan
  • Component ID (for each component)
  • Huffman/Arithmetic coding table (for each component)

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status