Office Communications Server
How Voice Conferencing Powers OCS 2007 R2
At a Glance:
- Ad hoc conferencing
- Centralized Conference Control Protocol
- Multimodal conferences
In a previous article in this series, "How Voice Powers OCS 2007," I discussed how voice calls work in the Office Communications Server (OCS) system. I also explained how a basic SIP INVITE is routed in the system to set up a peer to peer voice session and how clients traverse NAT and Firewalls for call establishment.
In this article, I will explore the conferencing part of the story. OCS allows real-time conferences to be set up with users inside and outside a corporate firewall, supporting both ad hoc escalation of calls to a conference and pre-scheduled conferences or meetings.
Conferencing in OCS 2007 is built on its Session Initiation Protocol (SIP) support and leverages the NAT/Firewall traversal capabilities introduced for peer to peer calls. Like peer to peer calls, conferences can be joined from anywhere outside the corporate firewall. Conferencing in OCS relies on the OCS 2007 Conferencing server roles. With dedicated servers that provide the conferencing features, conferences can scale from a few members to a hundred or even more.
Figure 1 Invite options in Office Communicator 2007 R2
Office Communicator clients provide the ability to escalate an audio or audio/video call, an instant message session, or a multimodal call and instant message session to a conference, by seamlessly escalating all modes together. Users can add additional attendees to two-party audio calls by dragging and dropping other users from the Contact list in Office Communicator or by using the Invite menu (see Figure 1) in a conversation window, which brings up a contact picker where the user can enter a phone number into the invitation.
The Invite by E-mail option, which is also available from the Invite menu, creates an e-mail message using Microsoft Office Outlook and sends the conference URI to the remote user (more on the conference URI in a moment). Invite by E-mail also sends the dial-in number in the message if the conference is scheduled using the Conferencing Add-In for Microsoft Office Outlook. When Invite by E-mail is used, the remote participant can join the conference using Communicator Web Access, authenticate (or join anonymously), and select to dial out to the local phone number. This provides a flexible solution.
Office Communicator 2007 R2 also provides the Meet Now option (shown in Figure 2) for an easy way to quickly create ad hoc conferences.
Figure 2 Meet Now in Office Communicator 2007 R2
Types of Meetings
Before I delve into the details of how conferencing works, I should discuss the types of conferences OCS supports. Conferences can be created in Office Communications Server 2007 R2 with the following types:
Open Authenticated Users authenticated against OCS 2007 can join the conference and invite other authenticated users. The word "open" signifies that membership is open. Ad hoc conferences created by Office Communicator 2007 R2 are open authenticated by default.
Closed Authenticated Authenticated users can join the conference, but the membership to the conference is restricted and controlled by the organizer.
Anonymous These conferences have the loosest restrictions on who is allowed to join. Users are able to join from Communicator Web Access without authenticating against OCS, or they can use a dial-in number to join a conference from a phone.
In this article, I am focusing primarily on open authenticated conferences.
Basic Conferencing Architecture
Conferencing in the OCS architecture is based on a star topology where all the clients connect to a central conferencing server pool. The conferencing solution in OCS typically has two major server types that participate in the conference.
First off, there is a focus factory, which acts as a conference manager. This manages the participant list in the conference and the modalities that various participants are currently using. The focus factory also terminates the signaling control connection with the conference leader and ensures that commands (such as mute, eject, and so on) are properly channeled to the correct media servers. The focus factory also maintains the connection to the Conferencing database, which is used for looking up scheduled conferences and dial-in numbers.
Secondly, there are one or more multipoint control units (MCUs). The MCU provides media multiplexing capabilities for a conference. In the OCS system, there are the following MCU roles:
- IM MCU provides Instant Messaging between multiple parties.
- Audio/Video MCU provides audio mixing and video switching between multiple parties.
- Data MCU provides Live Meeting features, such as desktop sharing, whiteboarding, and so on.
Since the focus factory is the conference manager, all the clients are given an address for the focus factory when they are provisioned. This address is sent through the inband provisioning mechanism I mentioned in the article "How Presence Powers OCS 2007."
Figure 3 shows the logical architecture for a conference that has three Office Communicator clients for three users. The dotted lines represent the SIP-based signaling channel that is established between the clients and the focus, which in turn establishes the channel with the various MCUs. The solid lines show the audio/video stream that is terminated on the Audio/Video MCU. Other media streams, such as IM and data, are intentionally omitted from this diagram for the purpose of simplicity.
Figure 3 Logical architecture of a three-party conference
The C3P Protocol
The OCS conferencing solution is based off of Centralized Conference Control Protocol (commonly referred to as CCCP or C3P for short). This is an XML-based client-server protocol that piggy backs on a SIP and provides the following mechanisms:
- A conference document (or roster) that lists the participants in the conference and the various modes in which the various participants are currently in.
- A command/response mechanism that allows clients to issue commands to the conferencing server (focus factory) so they can create the conference or control other aspects of it.
For instance, AddConference is a C3P command that is used to add a conference to the focus factory. The focus factory responds with a unique conference SIP URI, which is based off the user's own SIP URI. For example, say my SIP URI is sip:firstname.lastname@example.org. When the client issues an AddConference command, the focus factory would return a unique key to the conference that looks something like this: sip:email@example.com;gruu;opaque=app:conf:focus:id:A0DB798E3EDA984FACAD30D1A8DCD35A. This SIP URI key uniquely identifies the OCS conference. It can be shared with other participants to give them access to the conference. This is the same URI that is sent in the message generated by using the Invite by E-mail option.
Since the conference's SIP URI is created using the conference creator's own SIP URI, this ensures that the policies applied to the conference can be derived from the creator's policies. It also means that policies related to dial out to PSTN and the like can be applied based on what is allowed for the specific conference leader.
Another C3P command, AddUser, is a command that adds a participant into the conference. It also specifies the role of the participant, such as attendee or presenter. The leader/presenter has to add himself to the conference using AddUser as soon as the AddConference command is issued. The leader/presenter can use AddUser to invite endpoints or clients with a SIP URI into a conference, as well as PSTN phone numbers. To initiate dialing out to phone numbers from the MCUs, a <dialout> XML node has to be set in the AddUser command. I will refer to this combined command as AddUserDialOut.
A third C3P command, GetConference, is used to retrieve all conference capabilities. Once a client connects to the focus, it needs to retrieve the SIP URI information about the various MCUs in the system so it can talk directly to the MCUs. This information about the MCUs is retrieved using GetConference. An Audio/Video MCU SIP URI that is retrieved using GetConference looks something like this: sip:firstname.lastname@example.org;gruu;opaque=app:conf:audio-video:id:A0DB798E3EDA984FACAD30D1A8DCD35A. Note that each of the SIP URIs— whether a conference focus factory or a specific conferencing server—is actually a Globally Routable User URI (GRUU). I briefly talked about GRUUs in the "How Presence Powers OCS 2007" article.
As I mentioned before, C3P rides on top of the SIP, and the SIP allows sessions to be created between any two user agents (or, to be syntactically correct, between a user agent client and a user agent server). The payload of a SIP session need not always be an audio or video SDP (Session Description Protocol); it can be a way to establish a pure signaling channel. This concept is used by clients to establish a SIP-based signaling channel with the focus factory to a particular conference session.
Figure 4 Creating and joining a conference
How Conferences Are Created
The first step is to create a conference and establish a SIP signaling session with the focus factory. The first command a client uses is to create a conference session in the focus factory so that the client can begin interacting with the conference focus. This is done using a special SIP request called SERVICE. The SERVICE request/200 OK pair carry the AddConference command and the AddConference response, respectively. Once the SERVICE/200 OK step is completed, the client has obtained a unique conference ID that it can use to talk to the focus to obtain information related to the MCUs, add other users, and manage the conference membership.
The client then adds itself to the conference by sending an INVITE to the conference URI that contains a C3P AddUser command specifying the client's own session. This completes the initial step of creating the conference and joining the focus.
The next step is for the client to join the media on the various MCUs. For example, if the conference will be an audio call, then the Office Communicator client issues a regular SIP INVITE with an audio or audio/video SDP body.
The last step is for the client to invite other clients to participate in the conference. Figure 4 illustrates this sequence of events used to initialize a conference in the focus factory.
Note that once the SERVICE request is made, the first <addUser> command is carried as an INVITE payload instead of an SDP. Once this dialog is created, other commands, such as adding new users, are sent as SIP INFO messages over the same INVITE dialog with the focus. The first C3P command that is sent to the focus factory is the GetConference command, which returns the MCU SIP URIs I've already mentioned. The client now sets up a second media session (this time with the audio SDP) to the SIP URI of the A/V MCU indicated in the getConference response.
Inviting Others to the Conference
Now that you are familiar with how a client creates a conference from the focus factory and joins the A/V MCU, I can detail how the client can invite other clients into the conference. There are three basic methods that can be used here: an ad hoc application invite sent to the remote client using SIP, a dial-out request sent to the focus factory to create a VoIP INVITE from the A/V MCU, or joining from the conference URI (which is part of a scheduled meeting).
When a contact is selected, Office Communicator attempts to send an Application-INVITE (or App-INVITE) to the remote client to invite it into a conference instead of directly initiating a dial-out request from the A/V MCU. An App-INVITE is a special SIP INVITE that contains an XML data payload containing the conference's SIP URI. A client that receives the App-INVITE is able to join the conference based on the focus-URI in the App-INVITE. The App-INVITE is always the preferred way to invite another client because it contains information about other potential modalities, such as Instant Messaging in the conference.
When a user selects a specific phone number for a contact, or when the user selects Invite by Phone, Office Communicator instead issues an AddUserDialout command to the A/V MCU. A dial-out INVITE is a simple VoIP INVITE originating from the A/V MCU that allows downlevel clients and PSTN endpoints to join an Audio/Video conference.
Creating a 3 Person Conference
Based on the information I've presented so far, I'll detail how Alice is able to create a conference with Bob and Carol directly. Alice chooses to right-click on Bob and Carol in Office Communicator's contact list and start a conference from Communicator. Figure 5 shows that Alice's Office Communicator client first creates the conference and joins it using the constructs I have discussed.
Figure 5 How a conference is created
Media is now flowing from Alice's instance of Office Communicator to the AV MCUs. Next, Alice's Office Communicator fires separate App-INVITEs to Bob's SIP URI and Carol's SIP URI. Bob happens to be at work and his Office Communicator phone rings. When Bob accepts the conference, his Office Communicator client sends 200 OK to the App-INVITE and sends a BYE to the session immediately since it has all the information from the App-INVITE body and does not need to keep the virtual session with Alice's Office Communicator client alive. Bob's Office Communicator then joins the focus factory and the A/V MCU in the same way Alice's Office Communicator first joined the conference.
Carol also receives the App-INVITE and Office Communicator rings for her, as well. Carol decides to divert the conference call to her mobile phone from the incoming call toast. When Carol selects this option, the Office Communicator client joins the focus based on the focus-URI and then issues an AddUserDialout C3P command to dial out to Carol's mobile phone rather than joining directly.
Escalating a Two-Party Call to a Conference
Escalating a two-party conference is a bit more involved than simply starting a conference. This is because the two-party call must be maintained until both parties switch over to the conference. Office Communicator clients conduct a synchronized escalation process where each client joins the conference before terminating the peer to peer call.
Say, Alice is talking to Bob and decides to drag and drop Carol into the conference. When escalating the two parties into a conference, the following steps occur:
- Alice's Office Communicator client creates a conference session based on the focus factory.
- Then Alice's Office Communicator joins the A/V MCU with the call on hold (RTP stream inactive).
- Once the conference join is successful, Alice's Office Communicator sends an App-INVITE to Bob's client specifying the conference URI.
- Bob's Office Communicator begins the escalation step and joins the focus and the A/V MCU with the RTP stream on hold.
- Once Bob's Office Communicator joins the A/V MCU successfully, it sends a BYE to the peer session .
- Both Alice's client and Bob's client activate the RTP stream to the A/V MCU at the same time to maintain continuity of the call.
- Now Alice's client sends an App-INVITE to Carol's Office Communicator client to invite the third party into the conference.
Note that this sequence of steps is very carefully executed. If Bob's Office Communicator client is unable to join the conferencing server, then the conference escalation fails and the peer to peer calls continue.
Escalating PSTN calls to a conference is also supported, but instead of sending the App-INVITE in step 3, Office Communicator would issue an AddUserDialOut to the A/V MCU and add a Replaces header so that the call from the A/V MCU can replace the peer to peer call between the two Office Communicator endpoints. This is illustrated in Figure 6 and Figure 7.
Figure 6 Escalating PSTN Calls to a Conference—In Process
Figure 7 Escalating PSTN Calls to a Conference—Final State
Office Communicator supports interoperating with other clients that do not support the App-INVITE mechanism. For this, it uses a fallback mechanism to using AddUserDialout from the A/V MCU. In the scenario above, if Carol was on a client that didn't support App-INVITE, a 415 response code to the App-INVITE would trigger Office Communicator to fall back to the A/V MCU dial out.
I've described an audio call between two people being escalated to a conference. But what would happen if there was instant messaging in the session? When there is more than one modality in a conference, Office Communicator clients ensure that both the modalities are successfully escalated before the conference is committed and the third party invited to the conference. This ensures that the multimodal experience between the two participants is maintained.
has worked in the communications space for 15 years, and has designed voice protocols, user experiences, and more recently the Communicator voice and Conferencing experience for Office Communicator 2007 and R2. He currently works as Lead Program Manager on the Office Communicator team. He can be reached for comments at