Multimedia Over Ip - MULTIMEDIA

Due to the great popularity and availability of the Internet, various efforts have been made to make Multimedia over IP a reality, although it was known to be a challenge. This section will study some of the key issues, technologies, and protocols.

IP - Multicast

In network terminology, a broadcast message is sent to all nodes in the domain, a unicast message is sent to only one node, and & multicast message is sent to a set of specified nodes. IP - multicast enables multicast on the Internet. It is vital for applications such as mailing lists, bulletin boards, group file transfer, audio / video - on - demand, audio / videoconferencing, and so on.

Tunnels for IP Multicast in MBone

Tunnels for IP Multicast in MBone

One of the first trials of IP - multicast was in March 1992, when the Internet Engineering Task Force (IETF) meeting in San Diego was broadcast (audio only) on the Internet.

MBone. The Internet Multicast Backbone (MBone) is based on IP - multicast technology. Starting in the early 1990s, it has been used, for example, for audio and video conferencing on the Internet. Earlier applications include vat for audio conferencing, vic and nv for video conferencing. Other application tools include wb for whiteboards in shared workspace and sdr for maintaining session directories on MBone.

Since many' routers do not support multicast, MBone uses a subnetwork of routers {mrouters) that support multicast to forward multicast packets. As the above figure shows, the mrouters (or so - called islands) are connected with tunnels. Multicast packets are encapsulated inside regular IP packets for "tunneling", so that they can be sent to the destination through the islands.

Recall that under IPv4, IP addresses are 32 bits. If the first 4 bits are 1110, the message is an IP - multicast message. It covers IP addresses ranging from to

IP - multicast has anonymous membership. The source host multicasts to one of the above IP - multicast addresses - it doesn't know who will receive. The host software maps IP - group addresses into a list of recipients. Then it either multicasts when there is hardware support (e.g., Ethernet and FDDI have hardware multicast) or sends multiple unicasts through the next node in the spanning tree.

One potential problem of multicasting is that too many packets will be traveling and alive in the network. Fortunately, IP packets have a time - to - live (TTL) field that limits the packet's lifetime. Each router decrements the TTL of the pass - by packet by at least one. The packet is discarded when its TTL is zero.

The IP - multicast method described above is based on UDP (not TCP), so as to avoid excessive acknowledgments from multiple receivers for every message. As a result, packets are delivered by "best effort", so reliability is limited.

Internet Group Management Protocol (IGMP). Internet Group Management Pro­tocol (IGMP) was designed to help the maintenance of multicast groups. Two special types of IGMP messages are used: Query and Report. Query messages are multicast by routers to all local hosts, to inquire about group membership. Report is used to respond to a query and to join groups.

On receiving a query, members wait for a random time before responding. If a member hears another response, it will not respond. Routers periodically query group membership, and declare themselves group members if they get a response to at least one query. If no responses occur after a while, they declare themselves nonmembers. IGMP version 2 enforces a lower latency, so the membership is pruned more promptly after all members in the group leave.

Reliable Multicast Transport. IETF RFC 2357 was an attempt to define criteria for evaluating reliable IP - multicast protocols.

MB one maintains a flat virtual topology and does not provide good route aggregation (at the peak time, MBone had approximately 10,000 routes). Hence, it is not scalable. Moreover, the original design is highly distributed (and simplistic). It assumes no central management, which results in ineffective tunnel management, that is, tunnels connecting islands are not optimally allocated. Sometimes multiple tunnels are created over a single physical link, causing congestion.

RTP (Real - time Transport Protocol)

The original Internet design provided "best - effort" service and was adequate for applications such as e - mail and FTP. However, it is not suitable for real - time multimedia applications. RTP is designed for the transport of real - time data, such as audio and video streams, often for audio - or videoconferencing. It is intended primarily for multicast, although it can also be applied to unicast. It was used, for example, in nv for MB one, Netscape LiveMedia, Microsoft Netmeeting, and Intel Videophone.

RTP usually runs on top of UDP, which provides efficient (but less reliable) connectionless datagram service. There are two main reasons for using UDP instead of TCP. First, TCP is a connection - oriented transport protocol; hence, it is more difficult to scale up in a multicast environment. Second, TCP achieves its reliability by retransmitting missing packets. As mentioned earlier, in multimedia data transmissions, the reliability issue is less important. Moreover, the late arrival of retransmitted data may not be usable in real - time applications anyway.

Since "UDP will not guarantee that the data packets arrive in the original order (not to mention synchronization of multiple sources), RTP must create its own timestamping and sequencing mechanisms to ensure the ordering. RTP introduces the following additional parameters in the header of each packet:

  • Payload type indicates the media data type as well as its encoding scheme (e.g., PCM, H.261 / H.263, MPEG 1, 2, and 4 audio / video, etc.) so the receiver knows how to decode it.

  • Timestamp is the most important mechanism of RTP. The timestamp records the instant when the first octet of the packet is sampled; it is set by the sender. With the timestamps, the receiver can play the audio / video in proper timing order and synchronize multiple streams (e.g., audio and video) when necessary.

  • Sequence number is to complement the function of timestamping. It is incremented by one for each RTP data packet sent, to ensure that the packets can be reconstructed in order by the receiver. This becomes necessary, for example, when all packets of a video frame sometimes receive the same timestamp, and timestamping alone becomes insufficient.

  • Synchronization source (SSRC) ID identifies sources of multimedia data (e.g., audio, video). If the data come from the same source (translator, mixer), they will be given the same SSRC ID, so as to be synchronized.

  • Contributing Source (CSRC) ID identifies the source of contributors, such as all speakers in an audio conference.

The following figure shows the RTP header format. The first 12 octets are of fixed format, followed by optional (0 or more) 32 - bit Contributing Source (CSRC) IDs.

Bits 0 and 1 are for the version of RTP, bit 2 (P) for signaling a padded payload, bit 3 (X) for signaling an extension to the header, and bits 4 through 7 for a 4 - bit CSRC count that indicates the number of CSRC IDs following the fixed part of the header.

Bit 8 (M) signals the first packet in an audio frame or last packet in a video frame, since an audio frame can "be played out as soon as the first packet is received, whereas a video frame can be rendered only after the last packet is received. Bits 9 through 15 describe the payload type, Bits 16 through 31 are for sequence number, followed by a 32 - bit timestamp and a 32 - bit Synchronization Source (SSRC) ID.

RTP packet header

RTP packet header

Real Time Control Protocol (RTCP)

The RTP Control Protocol (RTCP) is a sister protocol of the Real - time Transport Protocol (RTP). Its basic functionality and packet structure is defined in the RTP specification RFC 3550, superseding its original standardization in 1996 (RFC 1889).

RTCP provides out - of - band statistics and control information for an RTP flow. It partners RTP in the delivery and packaging of multimedia data, but does not transport any media streams itself. Typically RTP will be sent on an even - numbered UDP port, with RTCP messages being sent over the next higher odd - numbered port. The primary function of RTCP is to provide feedback on the quality of service (QoS) in media distribution by periodically sending statistics information to participants in a streaming multimedia session.

RTCP gathers statistics for a media connection and information such as transmitted octet and packet counts, lost packet counts, jitter, and round - trip delay time. An application may use this information to control quality of service parameters, perhaps by limiting flow, or using a different codec.

RTCP itself does not provide any flow encryption or authentication methods. Such mechanisms may be implemented, for example, with the Secure Real - time Transport Protocol (SRTP) defined in RFC 3711.

RTCP is a companion protocol of RTP. It monitors QoS in providing feedback to the server (sender) on quality of data transmission and conveys information about the participants of a multiparty conference. RTCP also provides the necessary information for audio and video synchronization, even if they are sent through different packet streams.

The five types of RTCP packets are as below.

  • Receiver report (RR) provides quality feedback (number of last packet received, number of lost packets, jitter, timestamps for calculating round - trip delays).

  • Sender report (SR) provides information about the reception of RR, number of packets / bytes sent, and so on.

  • Source description (SDES) provides information about the source (e - mail address, phone number, full name of the participant).Bye indicates the end of participation.

  • Application specific functions (APP) provides for future extension of new features. RTP and RTCP packets are sent to the same IP address (multicast or unicast) but on different ports.

Resource Reservation Protocol (RSVP)

The Resource Reservation Protocol (RSVP) is a Transport Layer protocol designed to reserve resources across a network for an integrated services Internet. RSVP operates over an IPv4 or IPv6 Internet Layer and provides receiver - initiated setup of resource reservations for multicast or unicast data flows with scaling and robustness. It does not transport application data but is similar to a control protocol, like ICMP or IGMP. RSVP is described in RFC 2205.

RSVP can be used by either hosts or routers to request or deliver specific levels of quality of service (QoS) for application data streams or flows. RSVP defines how applications place reservations and how they can relinquish the reserved resources once the need for them has ended. RSVP operation will generally result in resources being reserved in each node along a path.

RSVP is not a routing protocol and was designed to interoperate with current and future routing protocols. RSVP by itself is rarely deployed in elecommunications networks today but the traffic engineering extension of RSVP, or RSVPTE, is becoming more widely accepted nowadays in many QoS - oriented networks. Next Steps in Signaling (NSIS) is a replacement for RSVP.

A scenario of network resource reservation with RS VP: (a) senders S1 and S2 send out their PATH messages to receivers Rl, R2, and R3; (b) receiver Rl sends out RESV message to SI; (c) receiver R2 sends out RESV message to S2; (d) receivers R2 and R3 send out their RESV messages to SI

A scenario of network resource reservation with RS VP

A scenario of network resource reservation with RS VP

The main challenges of RSVP are that many senders and receivers may compete for the limited network bandwidth, the receivers can be heterogeneous in demanding different contents with different QoS, and they can be dynamic by joining or quitting multicast groups at any time.

The most important messages of RSVP are Path and Resv. A Path message is initiated by the sender and travels towards the multicast (or unicast) destination addresses. It contains information about the sender and the path (e.g., the previous RSVP hop), so the receiver can find the reverse path to the sender for resource reservation. A Resv message is sent by a receiver that wishes to make a reservation.

RSVP is receiver - initiated. A receiver (at a leaf of the multicast spanning tree) initi­ates the reservation request Resv, and the request travels back toward the sender but not necessarily all the way. A reservation will be merged with an existing reservation made by other receiver(s) for the same session as soon as they meet at a router. The merged reservation will accommodate the highest bandwidth requirement among all merged requests. The user - initiated scheme is highly scalable, and it meets users' heterogeneous needs.

RSVP creates only soft state. The receiver host must maintain the soft state by periodically sending the same Resv message; otherwise, the state will time out. There is no distinction between the initial message and any subsequent refresh message. If there is any change in reservation, the state will automatically be updated according to the new reservation parameters in the refreshing message. Hence, the RSVP scheme is highly dynamic.

The above figure depicts a simple network with two senders (SI, S2), three receivers (R1, R2, and R3), and four routers (A, B, C, D). the above figure (a) shows that SI and S2 send Path messages along their paths to R1, R2, and R3. In (b) and (c), R1 and R2 send out Resv messages to SI and S2, respectively, to make reservations for SI and S2 resources. From C to A, two separate channels must be reserved since R1 and R2 requested different datastreams. In (d), R2 and R3 send out their Resv messages to SI, to make additional requests. R3's request was merged with R1's previous request at A, and R2's was merged with R1's at C.

Any possible variation of QoS that demands higher bandwidth can be dealt with by modifying the reservation state parameters.

Real - Time Streaming Protocol (RTSP)

TheReal Time Streaming Protocol (RTSP) is a network control protocol designed for use in entertainment and communications systems to control streaming media servers. The protocol is used for establishing and controlling media sessions between end points. Clients of media servers issue VCR - like commands, such as play and pause, to facilitate real - time control of playback of media files from the server.

The transmission of streaming data itself is not a task of the RTSP protocol. Most RTSP servers use the Real - time Transport Protocol (RTP) in conjunction with Real - time Control Protocol (RTCP) for media stream delivery, however some vendors implement proprietary transport protocols. The RTSP server from RealNetworks, for example, also features RealNetworks' proprietary Real Data Transport (RDT).

RTSP was developed by the Multiparty Multimedia Session Control Working Group (MMUSIC WG) of the Internet Engineering Task Force (IETF) and published as RFC 2326 in 1998. RTSP using RTP and RTCP allows for the implementation of rate adaption.

Streaming Audio and Video. In the early days, multimedia data was transmitted over the network (often with slow links) as a whole large file, which would be saved to a disk, then played back. Nowadays, more and more audio and video data is transmitted from a stored media server to the client in a datastream that is almost instantly decoded - streaming audio and streaming video.

Usually, the receiver will set aside buffer space to prefetch the incoming stream. As soon as the buffer is filled to a certain extent, the (usually) compressed data will be uncompressed and played back. Apparently, the buffer space needs to be sufficiently large to deal with the possible jitter and to produce continuous, smooth playback.On the other hand, too large a buffer will introduce unnecessary initial delay, which is especially undesirable for interactive applications such as audio - or videoconferencing.

A possible scenario of RTSP operations

A possible scenario of RTSP operations

The RTSP Protocol. RTSP is for communication between a client and a stored media server. The above figure illustrates a possible scenario of four RTSP operations:

  1. Requesting presentation description. The client issues a DESCRIBE request to the Stored Media Server to obtain the presentation description, such as, media types (audio, video, graphics, etc.), frame rate, resolution, codec, and so on, from the server.

  2. Session setup. The client issues a SETUP to inform the server of the destination IP address, port number, protocols, and TTL (for multicast). The session is set up when the server returns a session ID.

  3. Requesting and receiving media. After receiving a PLAY, the server starts to transmit streaming audio/video data, using RTP. It is followed by a RECORD or PAUSE. Other VCR commands, such as FAST - FORWARD and REWIND are also supported. During the session, the client periodically sends an RTCP packet to the server, to provide feedback information about the QoS received.

  4. Session closure. TEARDOWN closes the session.

Internet Telephony

The Public Switched Telephone Network (PSTN) relies on copper wires carrying analog voice signals. It provides reliable and low - cost voice and facsimile services. In the eighties and nineties, modems were a popular means of "data over voice networks". In fact, they were predominant before the introduction of ADSL and cable modems.

As PCs and the Internet became readily available and more and more voice and data communications became digital (e.g., in ISDN), "voice over data networks," especially Voice over IP (VoIP) started to attract a great deal of interest in research and user communities. With ever - increasing network bandwidth and the ever - improving quality of multimedia data compression, Internet telephony has become a reality. Increasingly, it is not restricted to voice (VoIP) — it is about integrated voice, video, and data services.

The main advantages of Internet telephony over POTS are the following:

  • It provides great flexibility and extensibility in accommodating integrated services such as voicemail, audio - and videoconferences, mobile phone, and so on.

  • It uses packet switching, not circuit switching; hence, network usage is much more efficient (voice communication is bursty and VBR - encoded).

  • With the technologies of multicast or multipoint communication, multiparty calls are not much more difficult than two - party calls.

  • With advanced multimedia data - compression techniques, various degrees of QoS can be supported and dynamically adjusted according to the network traffic, an improvement over the "all or none" service in POTS.

  • Good graphics user interfaces can be developed to show available features and services, monitor call status and progress, and so on.

As the following figure shows, the transport of real - time audio (and video) in Internet telephony is supported by RTP (whose control protocol is RTCP). Streaming media is handled by RTSP and Internet resource reservation is taken care of by RSVP.

Internet telephony is not simply a streaming media service over the Internet, because it requires a sophisticated signaling protocol. A streaming media server can be readily identified by a URI (Universal Resource Identifier), whereas acceptance of a call via Internet telephony depends on the callee's current location, capability, availability, and desire to communicate. The following are brief descriptions of the H.323 standard and one of the most commonly used signaling protocols, Session Initiation Protocol (SIP).

Network protocol structure for internet telephony

Network protocol structure for internet telephony

H.323. H.323 is a standard for packet - based multimedia communication services over networks (LAN, Internet, wireless network, etc.) that do not provide a guaranteed QoS. It specifies signaling protocols and describes terminals, multipoint control units (for conferencing), and gateways for integrating Internet telephony with General Switched Telephone Network (GSTN) data terminals.

The H.323 signaling process consists of two phases:

  • Call setup. The caller sends the gatekeeper (GK) a Registration, Admission and Status (RAS) Admission Request (ARQ) message, which contains the name and phone number of the callee. The GK may either grant permission or reject the request, with reasons such as "security violation" and "insufficient bandwidth".

  • Capability exchange. An H.245 control channel will be established, for which the first step is to exchange capabilities of both the caller and callee, such as whether it is audio, video, or data; compression and encryption, and so on.

H.323 provides mandatory support for audio and optional support for data and video. It is associated with a family of related software standards that deal with call control and data compression for Internet telephony. Following are some of the related standards:

Signaling and Control

  • H.225. Call control protocol, including signaling, registration, admissions, packetization and synchronization of media streams

  • H.24S. Control protocol for multimedia communications — forexample, opening and closing channels for media streams, obtaining gateway between GSTN and Internet telephony

  • H.235. Security and encryption for H.323 and other H.245 - based multimedia terminals

Audio Codecs

  • G.711. Codec for 3.1 kHz audio over 48, 56, or 64 kbps channels. G.711 describes Pulse Code Modulation for normal telephony
  • G.722. Codec for 7 kHz audio over 48, 56, or 64 kbps channels
  • G.723.1. Codec for 3.1 kHz audio over 5.3 or 6.3 kbps channels. (The VoIP Forum adopted G.723.1 as the codec for VoIP.)
  • G.728. Codec for 3.1 kHz audio over 16 kbps channels
  • G.729, G.729 a. Codec for 3.1 kHz audio over 8 kbps channels. (The Frame Relay Forum adopted G.729 by as the codec for voice over frame relay.)

Video Codecs

  • H.261. Codec for video at p x 64 kbps (p > 1)

  • H.263. Codec for low - bitrate video (< 64 kbps) over the GSTN

Related Standards

  • H.320. The original standard for videoconferencing over ISDN networks

  • H.324. An extension of H.320 for video conferencing over the GSTN

  • T.120. Real-time data and conferencing control

SessionInitiation Protocol (SIP) — A Signaling ProtocolSIP an application - layer control protocol in charge of establishing and terminating sessions in Internet telephony. These sessions are not limited to VoIP communications — they also include multimedia conferences and multimedia distribution.

Similar to HTTP, SIP is a text - based protocol that is different from H.323. It is also a client - server protocol. A caller (the client) initiates a request, which a server processes and responds to. There are three types of servers. A proxy server and a redirect sewer forward call requests. The difference between the two is that the proxy server forwards the requests to the next - hop server, whereas the redirect server returns the address of the next - hop server to the client, so as to redirect the call toward the destination.

The third type is a location server, which finds current locations of users. Location servers usually communicate with the redirect or proxy servers.They may use finger, rwhois, Lightweight Directory Access Protocol (LDAP), or other multicast - based protocols to determine a user's address.

SIP can advertise its session using e - mail, news groups, web pages or directories, or Session Announcement Protocol (SAP) — a multicast protocol.

The methods (commands) for clients to invoke are

  • INVITE — invites callee(s) to participate in a call.
  • ACK — acknowledges the invitation.
  • OPTIONS — inquires about media capabilities without setting up a call.
  • CANCEL — terminates the invitation.
  • BYE — terminates a call.
  • REGISTER — sends user's location information to a registrar (a SIP server).

A possible scenario of SIP session initiation

A possible scenario of SIP session initiation

The above figure illustrates a possible scenario when a caller initiates a SIP session:

  • Step 1. Caller sends an INVITE to the local Proxy server PI.
  • Step 2.The proxy uses its Domain Name Service (DNS) to locate the server for and sends the request to it.
  • Steps 3, 4. is not logged on the server. A request is sent to the nearby location server. John's current address, is located
  • Step 5.Since the server is a redirect server, it returns the address. Ca to the proxy server PL
  • Step 6. Try the next proxy server P2 for john
  • Steps 7,8. P2 consults its location server and obtains John's local address, john _ doe.
  • Steps 9,10. The next - hop proxy server P3 is contacted, which in turn forwards the invitation to where the client (callee) is.
  • Steps 11-14. John accepts the call at his current location (at work) and the acknowledgments are returned to the caller.

SIP can also use Session Description Protocol (SDP) to gather information about the callee's media capabilities.

Session Description Protocol (SDP). As its name suggests, SDP describes multimedia sessions. As in SIP, SDP descriptions are in textual form. They include the number and types of media streams (audio, video, whiteboard session, etc.), destination address (unicast or multicast) for each stream, sending and receiving port numbers, and media formats (payload types). When initiating a call, the caller includes the SDP information in the INVITE message. The called party responds and sometimes revises the SDP information, according to its capability.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status