Chapter 23
Conferencing on the Internet
Conferencing involves communication among several users. Multimedia conferencing,
including audio, video, instant messaging, whiteboard sharing, and file transfer, is a popular
service on the Internet and in enterprises. Chat rooms where users exchange instant messages
are an example of a conference service on the Internet. The collaboration tools used in most
enterprises are also examples of conferences.
Thus, conferences are not limited to traditional unmoderated audio or video conferences.
They can include all types of media and can be moderated by using floor control mechanisms.
Conferencing is an important area for enterprises with employees working in different
countries. A conference system including collaboration tools can save much money and time
by reducing the need for face-to-face meetings where attendees need to travel great distances.
However, we are still far from having conference systems that can replace face-to-
face meetings completely. That is why there is much ongoing research in areas such as
telepresence and virtual reality. The goal is to make virtual interactions as close to real ones
as possible.
23.1 Conferencing Standardization at the IETF
In the past, working groups such as MMUSIC did some work on conferencing (e.g., SDP
was designed with multiparty sessions in mind). Lately, the working groups that have been
active in this area have been SIPPING and XCON. In fact, implementers sometimes find it
confusing to have similar specifications in the same area coming from two different working
groups. Knowing the history behind conferencing standardization at the IETF will help
readers understand how the specifications coming from both working groups relate among
them.
Initially, the SIPPING working group developed a set of specifications that described
how to provide conferencing services using SIP. Coming from the SIPPING working group,
these specifications were, unsurprisingly, very much focused on SIP. Pieces needed to build a
complete conference service such as floor control and conference management mechanisms
(beyond the simple ones SIP provides) were out of the scope of this work.
The XCON working group was chartered to work on generalizing the work done in
SIPPING so that different signaling protocols (not only SIP) could be used and to specify
those missing pieces needed to build a complete conference system. The charter was limited
to centralized conferences where clients connect to a central server following a star topology.
´ıa- M ar t´ın
The 3G IP Multimedia Subsystem (IMS): Merging the Internet and the Cellular Worlds Third Edition
Gonzalo Camarillo and Miguel A. Garc
© 2008 John Wiley & Sons, Ltd. ISBN: 978- 0- 470- 51662- 1
484
CHAPTER 23. CONFERENCING ON THE INTERNET
Conferences using different topologies such as full-meshed and cascaded conferences were
left out of scope.
The results of the work of these two working groups include two conferencing frame-
works: the SIPPING conferencing framework and the XCON conferencing framework. We
discuss both of them, their differences, and how they relate to each other.
23.2 The SIPPING Conferencing Framework
The SIPPING conferencing framework (specified in RFC 4353 [272]) describes three
conferencing models: loosely coupled, fully distributed, and tightly coupled. In the loosely-
coupled conferencing model, shown in Figure 23.1, media streams are multicast. Conference
participants join the multicast group of the conference using, for example, IGMP (Internet
Group Management Protocol, specified in RFC 3376 [95]) in order to receive media.
Conference participants do not typically have any signaling relationship between them. Still,
they can use SIP to invite new participants into the conference. A SIP INVITE request sent
to a new participant would contain (in its body) all information needed to join the multicast
group.
Figure 23.1: The loosely-coupled conference model
In the fully-distributed conferencing model, shown in Figure 23.2, each participant has
a signaling relationship with all of the other participants in the conference. Each participant
sends media to all of the other participants.
In the tightly-coupled conferencing model, shown in Figure 23.3, each participant has a
signaling relationship with a central conference server. The central conference server mixes
the media received from different participants and distributes it to all of them.
Of course, the three conferencing models just described are not the only models that can
be implemented with SIP. Many other variants are possible. For example, when the central
conference server in a tightly-coupled conference is distributed among several SIP nodes, the
resulting model is typically referred to as the cascaded conferencing model. In any case, the
SIPPING conferencing framework focuses on the tightly-coupled conferencing model; the
rest of the models are considered to be out of scope of our work.
23.2. THE SIPPING CONFERENCING FRAMEWORK
485
Figure 23.2: The fully-distributed conference model
Figure 23.3: The tightly-coupled conference model
23.2.1 Signaling Architecture
Figure 23.4 shows the signaling architecture proposed by the SIPPING conferencing
framework. The conference server consists of several logical functions: the conference
policy, the conference policy server, and the focus, which includes the conference notification
service.
The conference policy is the set of rules that define a conference. The conference policy
includes information about the participants of the conference, the time and date when the
conference will take place, the media streams the conference has, etc. Participants manipulate
486
CHAPTER 23. CONFERENCING ON THE INTERNET
Figure 23.4: Signaling architecture in the SIPPING framework
the conference policy (e.g., to add a video stream to an audio-only conference) through the
conference policy server. The protocol between participants and the conference policy server
is left unspecified.
The focus interacts with the conference participants using SIP. It acts as a user agent
towards all of the participants. The focus includes the conference notification service, which
provides participants with information about the conference using the SIP event package for
the conference state (specified in RFC 4575 [289]). This event package defines an XML-
based format to convey conference-related information. Figure 23.5 shows an example of a
document that uses this format. This document, which is mostly self-explanatory, describes
a conference and provides information about two of its participants: Bob and Alice. Bob
was kicked out from the conference because he experienced bad voice quality and Alice was
brought in into the conference by Mike. Note that even though the number of participants in
the conference is 33 (see the <user-count> element), the document only provides detailed
information about two of them (Bob and Alice). Conferencing servers can omit information
about certain users for policy reasons.
The XML document in Figure 23.5 is already fairly long, even though it only carries
information about two users. A document describing a large conference with many users
would be much longer. In principle, every time a small change occurs in the conference (e.g.,
one user leaves the conference), the conference notifications service would need to send a new
large XML document that would very similar to the last one it sent (e.g., the only difference
would be in the elements related to the user that left). This would result in a non-efficient
bandwidth use.
In order to avoid this situation, the SIP event package for conference state implements a
mechanism for partial notifications. The “state” attribute indicates whether an element carries
full or partial information. In addition, the “state” attribute can also indicate that an element
23.2. THE SIPPING CONFERENCING FRAMEWORK
487
<?xml version="1.0" encoding="UTF-8"?>
<conference-info
xmlns="urn:ietf:params:xml:ns:conference-info"
entity="URI}sips:conf233@example.com"
state="full" version="1">
<!-- CONFERENCE INFO -->
<conference-description>
<subject>Agenda: This month’s goals</subject>
<service-uris>
<entry>
<uri>http://sharepoint/salesgroup/</uri>
<purpose>web-page</purpose>
</entry>
</service-uris>
</conference-description>
<!-- CONFERENCE STATE -->
<conference-state>
<user-count>33</user-count>
</conference-state>
<!-- USERS -->
<users>
<user entity="sip:bob@example.com" state="full">
<display-text>Bob Hoskins</display-text>
<!-- ENDPOINTS -->
<endpoint entity="sip:bob@pc33.example.com">
<display-text>Bob’s Laptop</display-text>
<status>disconnected</status>
<disconnection-method>departed</disconnection-method>
<disconnection-info>
<when>2005-03-04T20:00:00Z</when>
<reason>bad voice quality</reason>
<by>sip:mike@example.com</by>
</disconnection-info>
<!-- MEDIA -->
<media id="1">
<display-text>main audio</display-text>
<type>audio</type>
<label>34567</label>
<src-id>432424</src-id>
<status>sendrecv</status>
</media>
</endpoint>
</user>
Figure 23.5: Example of an XML-based conference description (part 1)