intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Chapter 23- Conferencing on the Internet

Chia sẻ: Nguyễn Văn Chiến | Ngày: | Loại File: PDF | Số trang:16

89
lượt xem
8
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Conferencing involves communication among several users. Multimedia conferencing, including audio, video, instant messaging, whiteboard sharing, and file transfer, is a popular service on the Internet and in enterprises. Chat rooms where users exchange instant messages are an example of a conference service on the Internet. The collaboration tools used in most enterprises are also examples of conferences.

Chủ đề:
Lưu

Nội dung Text: Chapter 23- Conferencing on the Internet

  1. Chapter 23 Conferencing on the Internet Conferencing involves communication among several users. Multimedia conferencing, including audio, video, instant messaging, whiteboard sharing, and file transfer, is a popular service on the Internet and in enterprises. Chat rooms where users exchange instant messages are an example of a conference service on the Internet. The collaboration tools used in most enterprises are also examples of conferences. Thus, conferences are not limited to traditional unmoderated audio or video conferences. They can include all types of media and can be moderated by using floor control mechanisms. Conferencing is an important area for enterprises with employees working in different countries. A conference system including collaboration tools can save much money and time by reducing the need for face-to-face meetings where attendees need to travel great distances. However, we are still far from having conference systems that can replace face-to- face meetings completely. That is why there is much ongoing research in areas such as telepresence and virtual reality. The goal is to make virtual interactions as close to real ones as possible. 23.1 Conferencing Standardization at the IETF In the past, working groups such as MMUSIC did some work on conferencing (e.g., SDP was designed with multiparty sessions in mind). Lately, the working groups that have been active in this area have been SIPPING and XCON. In fact, implementers sometimes find it confusing to have similar specifications in the same area coming from two different working groups. Knowing the history behind conferencing standardization at the IETF will help readers understand how the specifications coming from both working groups relate among them. Initially, the SIPPING working group developed a set of specifications that described how to provide conferencing services using SIP. Coming from the SIPPING working group, these specifications were, unsurprisingly, very much focused on SIP. Pieces needed to build a complete conference service such as floor control and conference management mechanisms (beyond the simple ones SIP provides) were out of the scope of this work. The XCON working group was chartered to work on generalizing the work done in SIPPING so that different signaling protocols (not only SIP) could be used and to specify those missing pieces needed to build a complete conference system. The charter was limited to centralized conferences where clients connect to a central server following a star topology. The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds Third Edition Gonzalo Camarillo and Miguel A . Garc ıa-Mart´n ´ ı © 2008 John Wiley & Sons, Ltd. ISBN: 978-0-470-51662-1
  2. CHAPTER 23. CONFERENCING ON THE INTERNET 484 Conferences using different topologies such as full-meshed and cascaded conferences were left out of scope. The results of the work of these two working groups include two conferencing frame- works: the SIPPING conferencing framework and the XCON conferencing framework. We discuss both of them, their differences, and how they relate to each other. 23.2 The SIPPING Conferencing Framework The SIPPING conferencing framework (specified in RFC 4353 [272]) describes three conferencing models: loosely coupled, fully distributed, and tightly coupled. In the loosely- coupled conferencing model, shown in Figure 23.1, media streams are multicast. Conference participants join the multicast group of the conference using, for example, IGMP (Internet Group Management Protocol, specified in RFC 3376 [95]) in order to receive media. Conference participants do not typically have any signaling relationship between them. Still, they can use SIP to invite new participants into the conference. A SIP INVITE request sent to a new participant would contain (in its body) all information needed to join the multicast group. Figure 23.1: The loosely-coupled conference model In the fully-distributed conferencing model, shown in Figure 23.2, each participant has a signaling relationship with all of the other participants in the conference. Each participant sends media to all of the other participants. In the tightly-coupled conferencing model, shown in Figure 23.3, each participant has a signaling relationship with a central conference server. The central conference server mixes the media received from different participants and distributes it to all of them. Of course, the three conferencing models just described are not the only models that can be implemented with SIP. Many other variants are possible. For example, when the central conference server in a tightly-coupled conference is distributed among several SIP nodes, the resulting model is typically referred to as the cascaded conferencing model. In any case, the SIPPING conferencing framework focuses on the tightly-coupled conferencing model; the rest of the models are considered to be out of scope of our work.
  3. 23.2. THE SIPPING CONFERENCING FRAMEWORK 485 Figure 23.2: The fully-distributed conference model Figure 23.3: The tightly-coupled conference model 23.2.1 Signaling Architecture Figure 23.4 shows the signaling architecture proposed by the SIPPING conferencing framework. The conference server consists of several logical functions: the conference policy, the conference policy server, and the focus, which includes the conference notification service. The conference policy is the set of rules that define a conference. The conference policy includes information about the participants of the conference, the time and date when the conference will take place, the media streams the conference has, etc. Participants manipulate
  4. CHAPTER 23. CONFERENCING ON THE INTERNET 486 Figure 23.4: Signaling architecture in the SIPPING framework the conference policy (e.g., to add a video stream to an audio-only conference) through the conference policy server. The protocol between participants and the conference policy server is left unspecified. The focus interacts with the conference participants using SIP. It acts as a user agent towards all of the participants. The focus includes the conference notification service, which provides participants with information about the conference using the SIP event package for the conference state (specified in RFC 4575 [289]). This event package defines an XML- based format to convey conference-related information. Figure 23.5 shows an example of a document that uses this format. This document, which is mostly self-explanatory, describes a conference and provides information about two of its participants: Bob and Alice. Bob was kicked out from the conference because he experienced bad voice quality and Alice was brought in into the conference by Mike. Note that even though the number of participants in the conference is 33 (see the element), the document only provides detailed information about two of them (Bob and Alice). Conferencing servers can omit information about certain users for policy reasons. The XML document in Figure 23.5 is already fairly long, even though it only carries information about two users. A document describing a large conference with many users would be much longer. In principle, every time a small change occurs in the conference (e.g., one user leaves the conference), the conference notifications service would need to send a new large XML document that would very similar to the last one it sent (e.g., the only difference would be in the elements related to the user that left). This would result in a non-efficient bandwidth use. In order to avoid this situation, the SIP event package for conference state implements a mechanism for partial notifications. The “state” attribute indicates whether an element carries full or partial information. In addition, the “state” attribute can also indicate that an element
  5. 23.2. THE SIPPING CONFERENCING FRAMEWORK 487 Agenda: This month’s goals http://sharepoint/salesgroup/ web-page 33 Bob Hoskins Bob’s Laptop disconnected departed 2005-03-04T20:00:00Z bad voice quality sip:mike@example.com main audio audio 34567 432424 sendrecv Figure 23.5: Example of an XML-based conference description (part 1)
  6. CHAPTER 23. CONFERENCING ON THE INTERNET 488 Alice connected dialed-out 2005-03-04T20:00:00Z sip:mike@example.com main audio audio 34567 534232 sendrecv Figure 23.6: Example of an XML-based conference description (part 2) has been deleted. Accordingly, the “state” attribute can take on the following values: full, partial, or deleted. An element with a “state” attribute with a value of partial carries only the information that has changed since the previous document was sent to the participant. If a parent element has a “state” of full, all of its child elements should also have a “state” of full. On the other hand, if a parent element has a “state” of partial, its child elements can have any “state”. The default value for the “state” attribute is full. The only elements that can carry a “state” attribute are , , , , , and . In Figure 23.5, all of the “state” attributes have a value of full. 23.2.2 Media Architecture The SIPPING conferencing framework describes the following media plane realizations: centralized server, endpoint server, media server component, distributed mixing, and cascaded mixers. In the centralized-server model, a central server handles both signaling and media, as shown in Figure 23.3. In the endpoint-server model, one of the endpoints behaves as the central server in the centralized-server model, as shown in Figure 23.7. The endpoint-server model is typically the result of a two-party call between two endpoints that transitions into an ad-hoc conference. This is the case when the users involved in the original two-party call decide to bring in one or more additional users into the call at some point.
  7. 23.3. THE XCON CONFERENCING FRAMEWORK 489 Figure 23.7: The endpoint-server model The endpoint-server model works well when the endpoint performing the mixing does not have processing, bandwidth, or battery constraints. Conferences between endpoints with those constraints are better handled by a central server. In the media-server-component model, the central server of the centralized-server model is divided into two servers: an application server and a mixing server. The application server interacts with the conference participants but does not have mixing capabilities. The mixing server performs the actual media mixing. The interface between the application server and the mixing server is based on SIP. The application server can use SIP mechanisms such as third-party call control (specified in RFC 3725 [282]) to instruct the mixing server how to mix the conference’s media streams. The SIPPING conferencing framework does not talk about distributed conference servers that use a protocol other than SIP (e.g., H.248 [189]) between the server handling SIP signaling (i.e., hosting the focus) and the server performing the mixing. However, this model can be considered a special case of the centralized-server model in which the internal structure of the server is distributed. In the distributed-mixing model, the central server of the centralized-server model handles signaling but not media. The central server does not have any media mixing capabilities; instead, it instructs users to exchange media among them. In this model, the conference server is, effectively, a third-party call controller (as specified in RFC 3725 [282]). Figure 23.8 shows how, in this model, media can be exchanged using unicast or multicast. In the cascaded-servers model, the mixing functionality is distributed among several physical mixers. The central server handling the signaling of the conference coordinates all of the mixers so that all users receive the conference’s media correctly. 23.3 The XCON Conferencing Framework As discussed earlier, the XCON working group was chartered to work on generalizing the work on conferencing performed on SIPPING, which was specific to SIP. The XCON
  8. CHAPTER 23. CONFERENCING ON THE INTERNET 490 Figure 23.8: The distributed-mixing model framework (specified in the Internet-Draft “A Framework and Data Model for Centralized Conferencing” [82]) defines the conferencing architecture shown in Figure 23.9. This figure shows a conference system able to host several conferences. That is why the figure shows more than one conference object. 23.3.1 Conference Objects A conference object contains all of the information related to a given conference. It is the same concept as the conference policy in the SIPPING conferencing framework (see Figure 23.4) with a different name. Figure 23.5 shows an example of the XML-based format to describe conference policies developed by the SIPPING working group (which is specified in RFC 4575 [289]). The XCON working group extended this format so that it can be used to describe more general conferences (i.e., not only SIP-based conferences) and to provide more information about a given conference (e.g., floor-control-related information was missing from the original format and was added by the XCON working group). The resulting format is referred to as the XCON data model (which is specified in the Internet-Draft “Conference Information Data Model for Centralized Conferencing (XCON)” [224]). The improvements in the XCON data model, with respect to the original format defined by the SIPPING working group, include the ability to carry different types of URIs and the inclusion of information that relates to floor control, conference scheduling, and media controls (e.g., a control to mute a media stream). In order to create a conference, it is necessary to create its conference object. The initial values for the variables of a conference object are typically taken from a conference blueprint. A conference blueprint is a template to create conference objects. For example, a conference
  9. 23.3. THE XCON CONFERENCING FRAMEWORK 491 Figure 23.9: XCON architecture system may have a conference blueprint with the typical values to create an audio-only conference. 23.3.2 Conference Control Server Users can manipulate conference objects and, thus, the properties of any conference, using a conference control protocol. Such a protocol runs between the participant’s conference control client and the conference control server. The XCON working group is chartered to develop a conference control protocol. We expect this working group to specify such a protocol in the future. One of the main decisions concerning this protocol is whether it should follow a semantic approach or a syntactic approach. A semantically-oriented protocol would have primitives to perform conference-related operations such as create a conference, add a user to a conference, and remove a media stream from a conference. Such primitives would have an effect on a conference object (which is described by an XML document). For example, the creation of a conference would create a new conference object. The addition of a user would add a new element to the XML document describing the conference object. A syntactically-oriented protocol would have primitives to operate directly at the XML level. For example, in order to add a user to a conference, the protocol would directly instruct
  10. CHAPTER 23. CONFERENCING ON THE INTERNET 492 the conference control server to add a element to the XML document describing the conference object. Both approaches have advantages and disadvantages. A syntactically-oriented protocol may initially be more complex since it would need to provide general XML manipulation mechanisms. On the other hand, it would not need to be extended in order to manipulate new data model elements that may be defined in the future. A semantically-oriented protocol may initially be simpler and, in general, more efficient but would need to be extended in order to perform new operations. Specifying policies (e.g., only the moderator can add new user into the conference) seems to be easier if the semantic approach is followed. The XCON working group started working on an XCAP-based protocol that followed a syntactic approach. However, that protocol was abandoned and, at present, it seems that the conference control protocol to be developed by XCON will follow a semantic approach. 23.3.3 Foci and Notification Service As in the SIPPING conferencing framework, an XCON focus has a signaling relationship with the user agents in the conference. However, in the XCON framework, a conference can have multiple foci; each one handling a different protocol (e.g., SIP and H.323). In the SIPPING framework, both the focus and the notification service used SIP and, thus, were part of the same logical entity. The XCON framework separates them into two different logical entities because they can use different protocols. As discussed earlier, the XCON data model extends the XML-based format used by the SIPPING notification service (which is specified in RFC 4575 [289]). The XCON notification service needs to be able to use this extended format (i.e., the XCON data model) in its notifications. An extension to the SIP event package for conference state (also specified in RFC 4575 [289]) has been defined so that the event package can carry information in the format specified in the XCON data model (this extension is specified in the Internet- Draft “Conference Event Package Data Format Extension for Centralized Conferencing (XCON)” [113]). 23.3.4 Floor Control Server Floor control is used to manage the access to a shared resource. Examples of resources in a conferencing environment are a shared whiteboard, a video stream, and a voice stream. The user that has the floor corresponding to a resource at a given moment is allowed to access the resource. For example, the user that has the floor corresponding to a shared whiteboard, is allowed to draw on the whiteboard. It is important to note the difference between not being allowed to do something and actually being kept from doing it. Let us think of a face-to-face conference where all participants have their own microphone. The conference’s chair will indicate which participant can speak (e.g., to ask a question) at a given time. However, the chair does not need to manage access to the microphones. If the participants are polite enough, they will only talk into their microphones when they are told to by the chair. However, if participants start talking when it is not their turn, the chair may have to disable all of the microphones expect the one of the participant that has the floor at any given time. Therefore, the fact that a conference uses floor control does not imply that floor-control- related decisions are enforced in any way. They may or may not be enforced, depending on the environment.
  11. 23.4. THE BINARY FLOOR CONTROL PROTOCOL (BFCP) 493 In XCON, the enforcement of floor-control-related decisions is outside the scope of floor control. That is, the floor control server uses a floor control protocol to communicate with its clients. However, if the floor control protocol wants to enforce its decisions, it will use a different protocol. For example, the floor control server could use H.248 to instruct the conference’s mixer to ignore incoming media from participants that do not hold the floor. A conference can have multiple floors. Each of them can control the access to a different resource within the conference. A floor control server can have different policies regarding multiple floors. For example, in a video conference, the floor control server can grant the video floor to the participant holding the audio floor (i.e., video follows the speaker). However, if the current speaker does not have a camera, the floor control server can grant the video floor to the conference’s chair. A floor control server receives floor requests from different participants and needs to decide which floor request to grant. A floor control server can implement an automatic algorithm to make this decision (e.g., first come, first served) or contact a conference chair so that he or she makes it. When a floor control server receives multiple floor requests for the same floor, it does not need to grant or deny all floor requests immediately. It can put them in a queue and grant them one by one at a later point. In this way, participants requesting the floor do not need to keep sending floor requests until one is granted. The XCON WG has specified a floor control protocol: BFCP (Binary Floor Control Protocol, specified in RFC 4582 [106]), which is discussed in the following section. 23.4 The Binary Floor Control Protocol (BFCP) BFCP (specified in RFC 4582 [106]) was designed to be a simple protocol. The goal was to provide enough functionality for well-defined scenarios. The scenarios which BFCP had to be able to cover included those with low-bandwidth links. BFCP needed to be able to meet the delay requirements of applications such as Push-to- talk, where users often use low-bandwidth radio links and the time from when a user requests to speak (i.e., requests a floor) until the user is allowed to do so (i.e., the floor is granted) should be fairly short. Because of these design constraints, BFCP was designed to use a binary encoding. Figure 23.10 shows the BFCP architecture. There is a central floor control server that communicates with floor participants and floor chairs. Floor participants request floors from the floor control server, which can contact floor chairs in order to decide whether or not to grant those floor requests. Of course, a floor participant can also act as a floor chair. These roles are defined on a transaction-by-transaction basis. In one transaction a client can act as a floor participant and in the next transaction the same client can act as a floor chair. Floor participants and floor chairs can request to be informed about the status of a particular floor or a particular floor request. The floor control server sends notifications every time the status of the floor or the floor request changes. BFCP supports third-party floor requests. That is, a floor participant can request a floor for another participant. This functionality is useful for distributed conferencing clients that implement floor control functionality and media handling in different devices. A floor participant can request more than one floor in an atomic operation. Such a floor request will only be granted when all of the floors requested can be granted to the participant.
  12. CHAPTER 23. CONFERENCING ON THE INTERNET 494 Figure 23.10: BFCP architecture 23.4.1 Contacting the Floor Control Server In order to contact a conference’s floor control server, the conference participants (which will act as BFCP clients when they contact the floor control server) need to obtain the floor control server’s address. Once a BFCP client (the term BFCP client refers to a floor participant or a floor chair that communicates with a BFCP floor control server) obtains the floor control server’s address, the client establishes a TCP connection with the server in order to be able to exchange BFCP messages between them. There are two ways for clients to establish a connection with a floor control server: inside and outside the context of an offer/answer exchange. 23.4.1.1 Inside an Offer/Answer Exchange Connections within the context of an offer/answer exchange are established in the same manner as any other media stream. The client and the server perform an offer/answer exchange using SIP where the exchange the parameters needed to establish the connection (e.g., IP addresses and port numbers). In addition, the floor control server’s SDP description (it can be the offer or the answer in the offer/answer exchange) also contains parameters related to BFCP. These parameters are the conference ID and the floor IDs to be used by the client, the role each endpoint will perform (client or floor control server), and the relation between floors and media streams (the SDP format to establish BFCP connections is specified in RFC 4583 [99]. Figure 23.11 shows an example of an SDP session description generated by a floor control server (only “m” and “a” lines are shown for simplicity reasons). The “setup” and “connection” attributes (specified in RFC 4145 [319]) relate to the establishment of the TCP connection. The “fingerprint” attribute (specified in RFC 4572 [206]) relates to the establishment of TLS on top of the TCP connection in order to provide integrity protection and confidentiality.
  13. 23.4. THE BINARY FLOOR CONTROL PROTOCOL (BFCP) 495 m=application 50000 TCP/TLS/BFCP * a=setup:passive a=connection:new a=fingerprint:SHA-1 \ 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB a=floorctrl:s-only a=confid:4321 a=userid:1234 a=floorid:1 m-stream:10 a=floorid:2 m-stream:11 m=audio 50002 RTP/AVP 0 a=label:10 m=video 50004 RTP/AVP 31 a=label:11 Figure 23.11: Floor control server’s SDP session description The “floorctrl” attribute indicates that the entity generating this session description can only act as a floor control server, and not as a client. Although in a typical centralized conference it is always quite clear which entity is the floor control server and which one is the client, negotiating which entity acts as floor control server can be useful in other scenarios. For example, two endpoint establishing a floor-controlled shared whiteboarding session between them need to decide which endpoint acts as the floor control server. The “confid” and the “userid” attributes provide the client with the conference ID and the client’s BFCP user ID respectively. The client will use these values in the BFCP messages it sends to the floor control server. The “floorid” attributes associate floors with media stream. In this case, the client will need to request floor “1” in order to use the audio stream (whose label is “10”) and floor “2” in order to use the video stream (whose label is “11”). 23.4.1.2 Outside an Offer/Answer Exchange A BFCP client can also establish a connection with a floor control server without using an offer/answer exchange (how to do it is specified in RFC 5018 [100]). In order to establish the connection, the client needs to obtain the same data as when an offer/answer exchange is used (i.e., the server’s IP address and port number, the conference and user ID, floor IDs and their relationship with resources such as media streams, etc.). Instead of getting all of these data in a session description from the floor control server, the client typically obtains all of the data it needs using the conference event package. The XCON data model describes how to encode all of these data in an XML document. Once the client obtains, via the conference event package, all of the data it needs, it establishes a TCP connection to the floor control server. The client uses the conference ID and the user ID obtained through the conference event package in its BFCP messages. 23.4.2 BFCP Message Flow Figure 23.12 shows a typical BFCP message flow with three entities: a BFCP client, the floor control server, and a floor chair.
  14. CHAPTER 23. CONFERENCING ON THE INTERNET 496 Figure 23.12: BFCP message flow The floor chair in Figure 23.12 is responsible for the floor whose ID is 100. The floor chair joins the ongoing conference by requesting information about floor 100 by sending a FloorQuery message to the floor control server. The floor control server informs the floor chair that floor 100 is currently granted to the user whose ID is 150 (another floor chair was handling floor 100 before the floor chair in the figure joined the conference). User 125 requests floor 100 by sending a FloorRequest message to the floor control server. The floor control server informs the BFCP client about the status of its floor request using FloorRequestStatus messages. When the floor chair is notified about the new floor request by user 125, the floor chair accepts the floor request by sending a ChairAction message to the floor control server. In our example, the floor control server is configured to place accepted floor requests in a queue. In this case, the floor request is placed in the first position of the queue. At a later point, when
  15. 23.4. THE BINARY FLOOR CONTROL PROTOCOL (BFCP) 497 the current holder of the floor (i.e., user 150) releases the floor, the floor control server grants the floor request in the first position of the queue. That is, the floor request by user 125. Note that, in this conference, the floor chair simply accepts or rejects floor requests. The floor control server takes care of performing queue management and granting floor requests when they reach the top of the queue. BFCP also allows floor chairs to be more in control of how floor requests are handled. In a different conference, the floor control server may be configured to let floor chairs perform queue management and decide when to grant particular floor requests. When user 125 is finished using the resource associated to floor 100, the user releases the floor by sending a FloorRelease message. The floor control server informs the floor chair that there are no floor requests for floor 100 with a FloorStatus message. 23.4.3 BFCP Primitives BFCP defines a set of primitives or messages. Table 23.1 lists all of the messages defined so far. The table also shows the direction in which each message can be sent. “P” denotes floor participant, “S” denotes floor control server, and “Ch” denotes floor chair. Table 23.1: BFCP Primitives Primitive Direction P→S FloorRequest P→S FloorRelease P→S Ch → S FloorRequestQuery P←S Ch ← S FloorRequestStatus P→S Ch → S UserQuery P←S Ch ← S UserStatus P→S Ch → S FloorQuery P←S Ch ← S FloorStatus Ch → S ChairAction Ch ← S ChairActionAck P→S Ch → S Hello P←S Ch ← S HelloAck P←S Ch ← S Error 23.4.4 BFCP Encoding BFCP messages use a TLV (Type-Length-Value)-based binary encoding. A BFCP message consists of a common header followed by attributes. Figure 23.13 shows the format of the common header. The payload length field contains the length of all attributes following the common header. Figure 23.14 shows the format of BFCP attributes. Following a TLV-based format, every attribute carries its type (i.e., which attribute it is), its length, and the actual contents of the attribute. In addition, attributes carry an “M” bit, which indicates whether or not it is
  16. CHAPTER 23. CONFERENCING ON THE INTERNET 498 Figure 23.13: BFCP common header format Figure 23.14: BFCP attribute format Figure 23.15: BFCP Floor ID attribute mandatory for the receiver of the BFCP message to understand this attribute. The “M” bit is useful for extension attributes where even if the receiver does not understand the extension attribute, it can still process the message. Figure 23.15 shows an example of a BFCP attribute: the Floor ID attribute. Its attribute type is 0000010 and, since the value of Floor IDs is always a 16-bit number (i.e., 2 bytes), the attribute’s length is always 4 bytes (binary 100).
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2