Understanding Unified Messaging Architecture

[This is pre-release documentation and subject to change in future releases. This topic's current status is: Writing.]

Applies to: Exchange Server 2010* *Topic Last Modified: 2008-12-10

When you install the Unified Messaging (UM) server role on a computer running Microsoft Exchange Server 2010, several UM-specific components and services are installed. The Unified Messaging services and components installed by Setup enable a Unified Messaging server to answer and process incoming voice calls and enable users to interact with the Unified Messaging system by using Outlook Voice Access or by hearing a UM auto attendant when they call in to the Unified Messaging system. This topic discusses the interaction between these Unified Messaging components and services and how the services and components provide the features offered by Unified Messaging.

Overview of Unified Messaging Services

The features and components of Unified Messaging rely on the functionality of two Exchange 2010 services: the Microsoft Exchange Unified Messaging service (UMservice.exe) and the Microsoft Exchange Speech Engine service (SpeechService.exe). The Service Control Manager controls and monitors both of these services and their related processes.

The Microsoft Exchange Unified Messaging service lets voice messages be stored in an Exchange 2010 mailbox and gives users telephone access to e-mail, voice mail, calendar, and contacts. If you stop this service, Unified Messaging features won't be available for users in your organization. For the Microsoft Exchange Unified Messaging service to work, the Microsoft Exchange Speech Engine service has to be already started and functioning correctly.

The Microsoft Exchange Speech Engine service controls the following:

  • The dual tone multi-frequency (DTMF), also known as touchtone, interface
  • Automatic Speech Recognition (ASR) that is used with the Voice User Interface (VUI) in Outlook Voice Access
  • The Text-to-Speech (TTS) engine that reads e-mail, voice mail, and calendar items and plays the menu prompts for callers

When the Microsoft Exchange Unified Messaging service and Microsoft Exchange Speech Engine service are starting, they each create their own worker processes: the UM worker process (UMWorkerProcess.exe) and the Speech Engine service worker process (SESWorker.exe). Each UM worker process enables the Microsoft Exchange Unified Messaging service and the Microsoft Exchange Speech Engine service to interact to provide Outlook Voice Access and call answering. The Speech Engine service worker process provides the TTS engine features, lets callers use both Outlook Voice Access interfaces, and plays the system prompts for callers. For more information about Outlook Voice Access, see Understanding Unified Messaging Subscriber Access. For more information about Unified Messaging system prompts, see Understanding Unified Messaging Audio Prompts.

The following figure illustrates the relationships between Unified Messaging components.

Unified Messaging architecture
Unified Messaging Architecture

Service Ports

The Microsoft Exchange Unified Messaging service and the UM worker process use multiple Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) service ports to communicate with IP gateways and the Speech Engine service worker process that is created by the Microsoft Exchange Speech Engine service at startup. The Microsoft Exchange Unified Messaging service and the UM worker process use Session Initiation Protocol (SIP) over TCP. By default, the Microsoft Exchange Unified Messaging service listens on both TCP port 5060 in unsecured mode and TCP port 5061 when mutual Transport Layer Security (TLS) is used at the same time. Each UM worker process that is created listens on port 5065 and 5067 (unsecured) and 5066 and 5068 (secured). But when an IP gateway or IP sends Realtime Transport Protocol (RTP) traffic to the Speech Engine service worker process, the IP gateway or IP PBX will use a valid UDP port that ranges from 1024 through 65535.

A TCP control port is also used on a Unified Messaging server. When a UM worker process is created, the Microsoft Exchange Unified Messaging service passes the appropriate configuration options to the UM worker process. The configuration options sent include the parameters for the TCP control port number that is used for communication between the Microsoft Exchange Unified Messaging service and the UM worker process. The TCP control port that is chosen will be between TCP ports 16,000 to 17,000.

Unified Messaging Services

The Microsoft Exchange Unified Messaging service is one of the two services that provide Unified Messaging services for your network. The Microsoft Exchange Unified Messaging service performs the following functions:

  • Retrieves the dial plan configuration from the Active Directory directory service
  • Loads the configuration information for monitoring UM worker processes from the UmRecycleConfig.xml file
  • Initializes the UM Worker Process Manager and the startup of a UM worker process
  • Registers SIP endpoints

The Microsoft Exchange Unified Messaging service first accepts all incoming connections, and then reroutes those requests to a UM worker process that handles the incoming request. In addition, the Microsoft Exchange Unified Messaging service monitors any UM worker process that is created and ensures that the UM worker process is functioning correctly. If a UM worker process becomes unresponsive, the Microsoft Exchange Unified Messaging service stops the UM worker process, and then creates a new UM worker process to replace it.

Note

By default, each UM worker process will be recycled every seven days or 604,800 seconds. The setting can be found in the \bin\umrecyclerconfig.xml file.

The Microsoft Exchange Unified Messaging service works with the Exchange Speech Engine service to implement all the telephony features offered by Unified Messaging. The Microsoft Exchange Unified Messaging service handles call control and interacts with the Exchange Speech Engine service to handle the incoming media streams that are negotiated in the SIP signaling information between the Microsoft Exchange Unified Messaging service and a SIP-enabled telephony device such as an IP gateway or IP PBX. The following events happen when an incoming call is initiated by the Microsoft Exchange Unified Messaging service:

  1. A call session is initiated by the Microsoft Exchange Unified Messaging service.
  2. The Microsoft Exchange Unified Messaging service redirects the call to a UM worker process.
  3. The UM worker process requests that a media session be established with the Microsoft Exchange Speech Engine service, and then the UM worker process relays the media information back to the caller.
  4. The Speech Engine service worker process that is created by the Microsoft Exchange Speech Engine service provides a UDP port for the RTP stream.
  5. The UM worker process uses the SIP signaling information to inform the Speech Engine service worker process to end the call session when the RTP media stream is no longer needed.

UM Worker Process

A UM worker process is a process that is created during the startup of the Microsoft Exchange Unified Messaging service. UM worker processes interact with all incoming and outgoing requests received by the Microsoft Exchange Unified Messaging service.

The UM Worker Process Manager is also a component of the Microsoft Exchange Unified Messaging service. The UM Worker Process Manager handles the creation and monitoring of all the UM worker processes that are created. The UM Worker Process Manager creates new instances of a UM worker process based on the configuration settings located in the UmRecyclerConfig.xml file and also monitors the health of these processes. As a new incoming call arrives, the UM Worker Process Manager determines the appropriate instance of a UM worker process to which to redirect the call. The UM worker process then interacts with the Microsoft Exchange Speech Engine service components to correctly process incoming and outgoing requests. The UM worker process is responsible for the following startup tasks:

  • Allocation of the runtime management objects
  • Loading of the Unified Messaging configuration from UMConfig.xml
  • Registration of the process with the Microsoft Exchange Speech Engine service
  • Initialization of Simple Mail Transfer Protocol (SMTP) message submission

For more information about Voice over IP (VoIP) security in Unified Messaging, see Understanding Unified Messaging VoIP Security.

Microsoft Exchange Speech Services

The Microsoft Exchange Speech Engine service is an embedded speech engine that is installed when you install the Unified Messaging server role. This Microsoft Exchange Speech Engine service is an Interactive Voice Response (IVR) platform that provides speech recognition capability that is used to recognize user input and provide TTS capabilities.

The applications in an IVR platform communicate with end users through a telephony or VoIP network. The Microsoft Exchange Speech Engine service supports SIP and RTP for telephony connectivity and TLS. For Unified Messaging, when an incoming call is received, the Microsoft Exchange Speech Engine service processes the RTP stream associated with the call, and then passes the information and events to the UM worker process that is managing the SIP connection. The Microsoft Exchange Speech Engine service supports the following features in Unified Messaging:

  • ASR input recognition
  • DTMF, or touchtone, input recognition
  • The TTS conversion process
  • Recording e-mail and voice mail messages
  • Playing e-mail and voice mail messages to the user

For more information about ASR, see Understanding Automatic Speech Recognition Directory Lookups. For more information about the TTS engine, see Understanding Unified Messaging Audio Prompts.

When the Microsoft Exchange Speech Engine service is starting, it creates the Speech Engine service worker process. During call flow, the Speech Engine Service worker process is responsible for recognizing touchtone or voice input from the user. For example, if a caller uses ASR or voice inputs to navigate the main menu, the following steps occur:

  1. An Outlook Voice Access user calls a subscriber access number and logs on to their mailbox or an outside caller dials in to a number that is configured to have a UM auto attendant and they use ASR or voice inputs to navigate the main menu.
  2. When a call is received by a Unified Messaging server, the Unified Messaging server determines whether the menu is speech-enabled. If the menu is speech-enabled, the Unified Messaging server uses specific prompts and grammars.
  3. The UM worker process notifies the Speech Engine service worker process to begin recognition based on the grammar file that is needed. For this example, the main menu is needed. So the Speech Engine service worker process loads the mainmenu.grxml file. The Microsoft Exchange Speech Engine service plays the main menu prompts over the telephone to the Outlook Voice Access user.
  4. For example, the user may respond by saying "e-mail". The voice traffic that is created is sent over an RTP stream and is received by the Speech Engine service worker process. The Speech Engine service worker process, which has already loaded the mainmenu.grxml file, compares the voice recognition results to the contents in the file. The result is sent to the UM worker process.
  5. The UM worker process determines what transition to make based on the results from the Speech Engine service worker process. For this example, the next transition state is to play the menu of e-mail options to the user.
  6. The correct activity manager is loaded into memory for playing the e-mail menu. The corresponding grammar file for the e-mail menu, which is email.grxml, is then loaded by the Speech Engine service worker process.
  7. The UM worker process sends a request to the Microsoft Exchange Speech Engine service to play the corresponding prompts for the e-mail menu.

For more information about the grammar files used in Unified Messaging, see Understanding Automatic Speech Recognition Directory Lookups.

A similar series of events happens when a caller is using DTMF, or touchtone, inputs to navigate the menus. Handling of DTMF input resembles handling voice inputs, except that the Speech Engine service worker process notifies the UM worker process when DTMF events are detected in the RTP stream. The data that is passed by this event corresponds to the number pressed by the caller. For more information about the DTMF interface, see Understanding the DTMF Interface.

For More Information

For an overview of Unified Messaging, see Unified Messaging.

For more information about telephony concepts and components, see Understanding Telephony Concepts and Components.