Media-On-Demand (Mod) - MULTIMEDIA

Media - on - Demand involves many fundamental multimedia network communication issues. In this section, we will briefly introduce Interactive TV, broadcast schemes for video - on - demand, and issues of buffer management.

Interactive TV (ITV) and Set - Top Box (STB)

Interactive TV (ITV) is a multimedia system based on the television sets in homes. It can support a growing number of activities, such as

  1. TV (basic, subscription, pay - per - view)
  2. Video - on - Demand (VOD)
  3. Information services (news, weather, magazines, sports events, etc.)
  4. Interactive entertainment (Internet games, etc.)
  5. E - commerce (online shopping, stock trading)
  6. Access to digital libraries and educational materials

A new development in Digital Video Broadcasting (DVB) is Multimedia Home Platform (DVB - MHP) which supports all the activities above as well as electronic program guide (EPG) for television.

The fundamental differences between ITV and conventional cable TV are first, that ITV invites user interactions; hence the need for two - way traffic - downstream (content provider to user) and upstream (user to content provider). Second, ITV is rich in information and multimedia content.

To perform the above functions, a Set - top Box (STB) is required, which generally has the following components:

Network interface and communication unit,including tuner and demodulator (to extract the digital stream from analog channel), security devices, and a communication channel for basic navigation of WWW and digital libraries as well as services and maintenance

  1. Processing unit, including CPU, memory, and a special - purpose operating system for the STB
  2. Audio / video unit, including audio and video (MPEG - 2 and 4) decoders, Digital Signal Processor (DSP), buffers, and D / A converters
  3. Graphics unit,supporting real - time 3D graphics for animation and games
  4. Peripheral control unit, controllers for disks, audio and video I / O devices (e.g., digital video cameras), CD / DVD reader and writer, and so on

General architecture of Set - top Box

General architecture of Set - top Box

Broadcast Schemes for Video - on - Demand

Among all possible Media - on - Demand services, the most popular is likely to be subscription to movies: over high - speed networks, customers can specify the movies they want and the time they want to view them. The statistics of such services suggest that most of the demand is usually concentrated on a few (10 to 20) popular movies (e.g., new releases and top - ten movies of the season). This makes it possible to multicast or broadcast these movies, since a number of clients can be put into the next group following their request.

An important quality measure of such MOD service is the waiting time (latency). We will define access time as the upper bound between the time of requesting the movie and the time of consuming the movie.

Given the potentially extremely high bandwidth of fiber - optic networks, it is conceivable that the entire movie can be fed to the client in a relatively short time if it has access to some high - speed network. The problem with this approach is the need for an unnecessarily large storage space at the client side.

Staggered Broadcasting. For simplicity, we will assume all movies are encoded using constant - bitrate (CBR) encoding, are of the same length L (measured in time units), and will be played sequentially from beginning to end without interruption. The available high bandwidth W is divided by the playback rate b to yield the bandwidth ratio B. The bandwidth

Staggered broadcasting with M - 8 movies and it - 6 channels

Staggered broadcasting with M - 8 movies and it - 6 channels

of the server is usually divided up into K logical channels (K ≥ 1).

Assuming the server broadcasts up to M movies (M ≥ 1), all can be periodically broad ­ cast on all these channels with the start - time of each movie staggered. This is therefore referred to as Staggered broadcasting. The above figure shows an example of Staggered broad ­ casting in which M = 8 and K = 6.

If the division of the bandwidth is equal among all K logical channels, then access time for any movie is δ = M.L / β. (Note: the access time is actually independent of the value of K.) In other words, access time will be reduced linearly with the increased network bandwidth.

Pyramid Broadcasting. Viswanathan and Imielinski proposed Pyramid broad ­ casting, in which movies are divided up into segments of increasing sizes. That is, L i + 1 = α.Li, where Li is the size (length) of Segment Si and α > 1. Segment Si will be periodically broadcast on Channel i. In other words, instead of staggering the movies on K channels, the segments are now staggered. Each channel is given the same bandwidth, and the larger segments are broadcast less frequently.

Since the available bandwidth is assumed to be significantly larger than the movie play­back rate h (i.e., B >> 1), it is argued that the client can be playing a smaller Segment 5; and simultaneously be receiving a larger Segment Si + 1.

To guarantee continuous (noninterrupted) playback, the necessary condition is

necessary condition

The size of S1 determines access time for Pyramid broadcasting. By default, we set α = B / M.K to yield the shortest access time. Access time drops exponentially with the increase in total bandwidth B, because a can be increased linearly.

A main drawback of the above scheme is the need for a large storage space on the client side, because the last two segments are typically 75 - 80% of the movie size. Instead of using a geometric series, Skyscraper broadcasting uses {1, 2, 2, 5, 5, 12, 12, 25, 25, 52, 52, ... } as the series of segment sizes, to alleviate the demand on a large buffer.

The following figure shows an example of Skyscraper broadcasting with seven segments. As shown, two clients who made a request at time intervals (1, 2) and (16, 17), respectively, have their respective transmission schedules. At any given moment, no more than two segments need to be received.

The segment sizes and their corresponding channel bandwidths are analyzed, with the objective of minimizing the total server bandwidth required to broadcast a specific video. Different from the above pyramid - based broadcasting schemes, GEBB operates in a "greedy" fashion. The client receives as much data as possible from all the channels immediately after "tuning in" to a video broadcast. The client ceases receiving a segment immediately before playing back the corresponding segment.

Skyscraper broadcasting with seven segments

Skyscraper broadcasting with seven segments

The server bandwidth optimization problem can be formally stated as:

server bandwidth optimization problem

The following figure illustrates GEBB. In this figure, all the bandwidths are equal.

Illustration of GEBB. The shaded area represents data received and played back by the client

The shaded area represents data received and played back by the client

where w is the wait time and the Bi is the bandwidth of Channel i. The condition represented by the above equation ensures that Segment Si is completely received at the exact time when the playback of Segment Si - 1 terminates. Thus, the segments are available exactly on time for their playback.

The above nonlinear optimization problem is solved using the Lagrange multiplier method. The result is that the required bandwidth is minimized when the channel bandwidths are equal. The broadcasting bandwidth of each channel is

broadcasting bandwidth of each channel

The segment progression follows a geometrical sequence. (S / w + 1)1 / k is called the golden factor of video segmentation.

Harmonic Broadcasting. Harmonic broadcasting adopts a different strategy. The size of all segments remains constant, whereas the bandwidth of channel i is Bi = b / i, where b is the movie's playback rate. In other words, the channel bandwidths follow the decreasing pattern b, b / 2, b / 3,... b / K, The total bandwidth allocated for delivering the movie is thus

Harmonic Broadcasting

where K is the total number of segments, and

total number of segments

is the Harmonic number of K.

The following figure shows an example of Harmonic broadcasting. After requesting the movie, the client is allowed to download and play the first occurrence of segment S1 from channel 1. Meanwhile, the client will download all other segments from their respective channels.

Harmonic broadcasting

Harmonic broadcasting

Take S2 as an example: it consists of two halves, S21 and S22 - Since bandwidth B2 is only b / 2, during the playback time of S1, one - half of S2 (say S21 will be downloaded (prefetched). It takes the entire playback time of 52 to download the other half (say S22) just as S2 is finishing playback. Similarly, by this time, two - thirds of S3 is already prefetched, so the remaining third of S3 can be downloaded just in time for playback from channel 3, which has a bandwidth of only b / 3, and so on.

The advantage of Harmonic broadcasting is that the Harmonic number grows slowly with K. For example, when K = 30, Hk ≈ 4. If the movie is 120 minutes long, this yields small segments - only 4 minutes (120 / 30) each. Hence, the access time for Harmonic broadcasting is generally shorter than for Pyramid broadcasting, and the demand on total bandwidth (in this case 4b) is modest. Juhn and Steng show that the upper bound for the buffer size at the client side is 37% of the entire movie, which also compares favorably with the original pyramid broadcasting scheme.

However, the above Harmonic broadcasting scheme does not always work. For example, if the client starts to download at the second instance of S1, then by the time it finishes 5i, only the second half of S2 — that is, S22 — is prefetched. The client will not be able to simultaneously download and play S21 from channel 2, since the available bandwidth is only half the playback rate.

An obvious fix to the above problem is to ask the client to delay the playback of Si by one slot. The drawback of this delayed Harmonic broadcasting scheme is that it doubles the access time.

Since 1997, several variants of the original method, such as cautious harmonic broadcasting, quasi - harmonic broadcasting, and polyharmonic broadcasting, have been proposed, all of which address the problem with added complexity. Pagoda Broadcasting. Harmonic broadcasting schemes broadcast videos using a large number of low - bandwidth streams, while Pyramid broadcasting schemes broadcast videos using a small number of high - bandwidth streams. The total required bandwidths of Pyramid broadcasting schemes are generally higher than those of Harmonic - based schemes. But managing a large number of independent datastreams for Harmonic broadcasting is likely to be daunting.

Paris, Carter, and Long proposed Pagoda broadcasting and its variant. They present a frequency broadcasting scheme that tries to combine the advantages of Harmonic and Pyramid schemes.

The following figure illustrates Pagoda broadcasting. It partitions each video into n fixed - size segments of duration T = L/n, where T is defined as a time slot. Then it broadcasts these segments at the consumption bandwidth b but with different periods. So the problem is to select the proper segment - to - channel mapping and the proper broadcasting period for each segment.

First three channel - segment maps of Pagoda broadcasting

First three channel - segment maps of Pagoda broadcasting

Compared to Pyramid and Harmonic broadcasting, Pagoda broadcasting is not band ­ width efficient for any given waiting time. It requires fewer segments compared to Harmonic broadcasting to achieve comparable waiting time requirements while requiring more segments than Pyramid broadcasting.

All the above protocols are based on the assumption that the videos are encoded using Constant Bit Rate (CBR). Some protocols were proposed to deal with VBR - encoded videos.

Stream Merging. The above broadcast schemes are most effective when limited user interactions are expected — that is, once requested, clients will stay with the sequential access schedule arid watch the movie in its entirety.

Stream merging is more adaptive to dynamic user interactions, which is achieved by dynamically combining multicast sessions. It still makes the assumption that the client's receiving bandwidth is higher than the video playback rate. In fact, it is common to assume that the receiving bandwidth is at least twice the playback rate, so that the client can receive two streams at the same time.

The server will deliver a video stream as soon as it receives the request from a client. Meanwhile, the client is also given access to a second stream of the same video, which was initiated earlier by another client. At a certain point, the first stream becomes unnecessary, because all its contents have been prefetched from the second stream. At this time, the first stream will merge with (or "join") the second.

As the following figure shows, the "first stream" B starts at time t = 2. The solid line indicates the playback rate, and the dashed line indicates the receiving bandwidth, which is twice the playback rate. The client is allowed to prefetch from an earlier ("second") stream A, which was launched at t = 0. At t = 4, stream B joins A.

Stream merging

Stream merging

The technique of stream merging can be applied hierarchically — Hierarchical multicast stream merging (HMSM). As the above figure shows, stream C, which started at t = 4, would join B at t = 6, which in turn joined A. The original stream B would have been obsolete after t = 4, since it joined A. In this case, it will have to be retained until t = 6, when C joins A.

A variation of Stream merging is piggybacking, in which the playback rate of the streams is slightly and dynamically adjusted, to enable merging (piggybacking) of the streams.

The data that a client can store in the buffer assists the smooth playback of the media when the media rate exceeds the available network bandwidth

The data that a client can store in the buffer assists the smooth playback of the media when the media rate exceeds the available network bandwidth

Buffer Management

Continuous media usually have expected playback rates, such as 30 fps for NTSC video and 25 fps for PAL. If the video is delivered through the network, then without work - ahead smoothing at playback time, the required network throughput must be higher than the video's peak bitrate for uninterrupted video playback.

As discussed earlier, most compressed media are VBR - coded. Usually, the more activities (motions in the video, changes in the speech or music), the higher the required bitrate. The mean bitrate for MPEG - 1 is 1.5 Mbps and for MPEG - 2 is ≥ 4 Mbps. Media that have VBR characteristics can have a low bitrate at one point and a much higher bitrate at another point. The peak bitrate can be much larger than the mean bitrate for the media and may not be supported by the network bandwidth available.

Although not popular nowadays, CBR coding is also an option — variable distortions can be introduced to maintain a constant bitrate. CBR coding is less efficient than VBR: to obtain comparable quality of coded media, the CBR bitrate is typically 15 - 30% higher than the mean VBR video bitrate (the average bitrate of the video).

To cope with the variable bitrate and network load fluctuation, buffers are usually employed at both sender and receiver ends. A prefetch buffer is introduced at the client side (e.g., in the client's Set - top Box) to smooth the transmission rate (reducing the peak rate). If the size of frame t is d(t), the buffer size is B, and the number of data bytes received so far (at play time for frame t) is A(t), then for all t ε1,2, ... , N, it is required that

Buffer Management

When Buffer ManagementWe have inadequate network throughput and hence buffer underflow (or starvation), whereas whenbuffer underflow (or starvation)we have excessive network throughput and buffer overflow. Both are harmful to smooth, continuous playback. In buffer underflow, no data is available to play, and in buffer overflow, media packets must be dropped.

The above figure illustrates the limits imposed by the media playback (consumption) data rate and the buffered data rate. (The transmission rates are the slopes of the curves.) At any time, data must be in the buffer for smooth playback, and the data transmitted must be more than the data consumed. If the network bandwidth available is as in Line II in the figure, at some point during playback, the data to be consumed will be greater than can be sent. The buffer will underflow, and playback will be interrupted. Also, at any point, the total amount of data transmitted must not exceed the total consumed plus the size of the buffer.

If the network available bandwidth is as in Line I and the media was sent as fast as possible without buffer considerations (as in normal file downloads), then toward the end of the video, the data received will be greater than the buffer can store at the time. The buffer will overflow and drop the extra packets. Then the server will have to retransmit the packets dropped, or these packets will be missing. Although, during overflow, some time is allowed for retransmission, this increases bandwidth requirements (and hence may cause underflow in the future). In many cases, such as broadcast, no back channel is available.

Techniques to maintain data in the prefetch buffer without overflowing or underflowing it are known as transmission rate control schemes. Two simple approaches are to prefetch video data to fill the buffer and try to transmit at the mean video bitrate, or to keep the buffer full without exceeding the available bandwidth. For video sections that require higher bandwidth than available, the transmission rate control schemes hope that the data already in the buffer and the available network bandwidth will enable smooth playback without buffer underflow.

An Optimal Plan for Transmission Rates. Given knowledge about the data rate characteristics of the media stored on the server, it is possible to use the prefetch buffer more efficiently for the network. The media server can plan ahead for a transmission rate such that the media can be viewed without interruption and die reserved bandwidth minimized. Many transmission plans may minimize peak rate, but there is a unique plan that also minimizes rate variability — the variance of the transmission rate. Such a rate transmission plan is referred to as the optimal work - ahead smoothing plan.

Minimizing rate variability is important, since it implies the optimal rate plan is a set of piecewise, constant - rate - transmission segments. Basing the bandwidth reservation strategy on the current transmission rate rather than the peak rate allows some processing and network resources to be minimized and changes in bandwidth reservation to be less frequent. For discussion purposes, the following will refer to video media, although the technique could be extended for general media.

The video data rate can be analyzed frame by frame, although that might not be the best strategy, since inter - frames cannot be decoded by themselves and introduce a decoding delay anyhow. Additionally, the computational cost is high. Indeed, it is more practical to approximate the video data rate by considering the total data consumed by the time each I - frame should be displayed. The approximation could be made coarser by considering only the total data consumed at the first frame after a scene transition, assuming the movie datarate is constant in the same scene.

The optimal smoothing plan for a specific video and buffer size. In this case, it is not feasible to transmit at the constant (average) data rate

The optimal smoothing plan for a specific video and buffer size. In this case, it is not feasible to transmit at the constant (average) data rate

As before, define d(t) to be the size of frame t, where t 1,2, ..., N, and N is the total number of frames in the video. Similarly, define α (t) to be the amount of data transmitted by the video server during the playback time for frame t (for short, call it at time t). Let D(t) be the total data consumed and A(t) be the total data sent at time t. Formally:

total data sent at time t

Let the buffer size be B. Then at any time t, the maximum total amount of data that can be received without overflowing the buffer during the time 1..t is W(t) = D(t — 1) + B. Now it is easy to state the conditions for a server transmission rate that avoids buffer overflow or underflow:

overflow or underflow

To avoid buffer overflow or underflow throughout the video's duration, the above equation has to hold for all t Є 1, 2,... , N. Define S to be the server transmission schedule (or plan), i.e., S = a(1) ,a(2), ... , a(N). S is called & feasible transmission schedule if for all r, S obeys the above equation. The above figure illustrates the bounding curves D(t) and W(t) and shows that a constant (average) - bitrate transmission plan is not feasible for this video, because simply adopting the average bitrate would cause underflow.

When frame sizes d(t) for all t are known ahead of transmission time, the server can plan ahead to generate an optimal transmission schedule that is feasible and minimize the peak transmission rate. Additionally, the plan minimizes schedule variance, optimally trying to smooth the transmission as much as possible.

We can think of this technique as stretching a rubber band from D(1) to D(N) bounded by the curves defined by D(t) and W(t). The slope of the total - data - transmitted curve is the transmission data rate. Intuitively, we can minimize the slope (or the peak rate) if, whenever the transmission data rate has to change, it does so as early as possible in the transmission plan.

As an illustration, consider the above figure. The server starts transmitting data when the prefetch buffer is at state (a). It determines that to avoid buffer underflow at point (c), the transmission rate has to be high enough to have enough data at point (c). However, at that rate, the buffer will overflow at point (b). Hence it is necessary to reduce the transmission rate somewhere between points (c) and (b).

The earliest such point (that minimizes transmission rate variability) is point (c). The rate is reduced to a lower constant bitrate until point (d), where the buffer is empty. After that, the rate must be further reduced (to lower than the average bitrate!) to avoid overflow until point (e), when the rate must finally be increased.

Consider any interval [p, q] and let B (t) represent the amount of data in the buffer at time t. Then the maximum constant data rate that can be used without overflowing the buffer is given by Rmax:

overflowing the buffer is given by Rmax

The minimum data rate that must be used over the same interval to avoid underflow is given by Rmin:

same interval to avoid underflow is given by Rmin

Naturally it is required that Rmax ≥ Rmin, otherwise no constant bitrate transmission is feasible over interval [p, q]. The algorithm to construct the optimal transmission plan starts with interval [p, q = p +1] and keeps incrementing q, each time recalculating Rmax and Rmin. If Rmax is to be increased, a rate segment is created with rate Rmax over interval [p, qmax] where qmax is the latest point at which the buffer is full (the latest point in interval [p, q] where Rmax is achieved).

Equivalently, if Rmin is to be decreased, a rate segment is created with rate Rmin over interval [p, qmin], where qmjn is the latest point at which the buffer is empty.

Planning transmission rates can readily consider maximum allowed network jitter. Suppose there is no delay in the receiving rate. Then at time t, A(t) bytes of data were received, which must not exceed W(t). Now suppose the network delay is at its worst — 8 sec maximum delay. Video decoding will be delayed by S seconds, so the prefetch buffer will not be freed. Hence the D(t) curve needs to be modified to a D(t S) curve. This situation provides protection against overflow or underflow in the plan for a given maximum delay jitter.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status