The Microsoft Exchange Speech Engine service is an embedded speech engine that is installed when you install the Unified Messaging server role. This Microsoft Exchange Speech Engine service is an Interactive Voice Response (IVR) platform that provides speech recognition capability that is used to recognize user input and provide Text-to-Speech (TTS) capabilities.
The applications in an IVR platform communicate with end users through a telephony or VoIP network. The Microsoft Exchange Speech Engine service supports SIP and RTP for telephony connectivity and TLS. For Unified Messaging, when an incoming call is received, the Microsoft Exchange Speech Engine service processes the RTP stream that is associated with the call, and then passes the information and events to the UM worker process that is managing the SIP connection. The Microsoft Exchange Speech Engine service supports the following features in Unified Messaging:
-
Automatic Speech Recognition (ASR) input recognition
-
DTMF, or touchtone, input recognition
-
The TTS conversion process
-
Recording e-mail and voice mail messages
-
Playing e-mail and voice mail messages to the user
For more information about Automatic Speech Recognition, see Understanding Automatic Speech Recognition Directory Lookups. For more information about the TTS engine, see Understanding Unified Messaging Audio Prompts.
When the Microsoft Exchange Speech Engine service is starting, it creates the Speech Engine Service worker process. During call flow, the Speech Engine Service worker process is responsible for recognizing touchtone or voice input from the user. For example, if a caller uses ASR or voice inputs to navigate the main menu, the following steps occur:
-
An Outlook Voice Access user calls a subscriber access number and logs on to their mailbox or an outside caller dials in to a number that is configured to have a UM auto attendant and they use ASR or voice inputs to navigate the main menu.
-
When a call is received by a Unified Messaging server, the Unified Messaging server determines whether the menu is speech-enabled. If the menu is speech-enabled, the Unified Messaging server uses specific prompts and grammars.
-
The UM worker process notifies the Speech Engine service worker process to begin recognition based on the grammar file that is needed. For this example, the main menu is needed. Therefore, the Speech Engine service worker process loads the mainmenu.grxml file. The Microsoft Exchange Speech Engine service plays the main menu prompts over the telephone to the Outlook Voice Access user.
-
For example, the user may respond by saying “e-mail”. The voice traffic that is created is sent over an RTP stream and is received by the Speech Engine Service worker process. The Speech Engine Service worker process, which has already loaded the mainmenu.grxml file, compares the voice recognition results to the contents in the file. The result is sent to the UM worker process.
-
The UM worker process determines what transition to make based on the results from the Speech Engine Service worker process. For this example, the next transition state is to play the menu of e-mail options to the user.
-
The correct activity manager is loaded into memory for playing the e-mail menu. The corresponding grammar file for the e-mail menu, which is email.grxml, is then loaded by the Speech Engine Service worker process.
-
The UM worker process sends a request to the Microsoft Exchange Speech Engine service to play the corresponding prompts for the e-mail menu.
For more information about the grammar files that are used in Unified Messaging, see Understanding Automatic Speech Recognition Directory Lookups.
A similar series of events occurs when a caller is using DTMF, or touchtone, inputs to navigate the menus. Handling of DTMF input resembles handling voice inputs, except that the Speech Engine Service worker process notifies the UM worker process when DTMF events are detected in the RTP stream. The data that is passed by this event corresponds to the number pressed by the caller. For more information about the DTMF interface, see Understanding the DTMF Interface.