Capacity Guidelines

Article
08/28/2007

Published: June 20, 2005 | Updated: March 29, 2006

Capacity Planning

By carefully planning the capacity of your Microsoft Speech Server (MSS) deployment, you can ensure that your system's telephony and speech recognition (SR) resources are adequately sized to meet expected demands. You can roughly estimate the number of computers you will need by factoring in information about the type of application you are using and the topology of the deployment. As soon as you have a rough estimate of the number of computers (or nodes) needed, you can fine-tune the deployment by testing the system under the load of expected call volume to discover the actual capacity and performance numbers that you need.

In general, your system's capacity is constrained by the performance characteristics you expect from each computer in your MSS deployment. As port density increases (that is, as the number of simultaneous calls increase), computer performance tends to decrease. Therefore, you must strike a balance between port density and performance to determine the sizing requirements of your deployment.

To estimate the sizing requirements of your deployment, follow these steps:

These steps represent a basic process for incorporating some of the general variables that affect capacity planning. However, capacity planning involves many variables, several of which are probably unique to your deployment. To begin capacity planning, we recommend that you work through the steps in this process to develop a basic view of your capacity needs. If necessary, Microsoft can then help you develop a more thorough capacity model that incorporates the unique characteristics of your deployment.

Step 1: Estimate Necessary Capacity

Start capacity planning by estimating how many channels of communication are required to support your business need. Typically, capacity needs are based on the known or planned call volume, and you can calculate call volume by using the Erlang traffic calculation. This calculation method is a well-known way of determining how many telephony channels are required to support call center traffic and should give you a good point to start working on your capacity planning.

An Erlang is a measure of traffic through telephony equipment. One Erlang equals one continuous call for 3600 seconds (the number of seconds in one hour). To determine the number of Erlangs, multiply the number calls during the busy hour (B) times the average call length in seconds (L), and then divide by 3600:

Erlangs = (B*L)/3600

For example, if you estimate that your deployment receives 1000 calls during the busy hour and the average length of a call is two minutes (or 120 seconds), the estimated number of Erlangs is determined in the following way:

(1000*120)/3600 = 33.3 Erlangs

To determine the number of required telephony channels, use the Erlang traffic calculation, which is a mathematical algorithm that incorporates the number of Erlangs you estimated and a percentage of blocked calls (or busy signals) that you think is acceptable. Using the 33.3 Erlangs from the previous example and 1 percent call blockage, the Erlang traffic calculation would determine that your deployment requires 45 telephony channels.

To understand more about the Erlang traffic calculation and how to use it, see a Microsoft sales representative or use one of the online Erlang calculators, such as the ones available at https://www.erlang.com.

Spoken Language Requirements

The spoken languages used in your application affect your overall capacity planning. If your application is deployed in multiple languages, you must consider the capacity needs of each language separately. For example, if your application uses English (U.S.) and Spanish (U.S.), you must estimate the expected call volumes for each language separately to determine the total number of telephony channels required to support calls in both languages. If you determine that you need 24 channels for English speaking callers and 24 channels for Spanish speaking callers, you will need a total of 48 channels.

Step 2: Know the Application

The type of application you deploy affects the capacity of the computers or nodes in your deployment. The amount and complexity of speech recognition and text-to-speech (TTS) used by an application has a direct influence on system performance. For the purposes of estimating, categorize your application as one of the following three types:

Dual Tone Multi-Frequency (DTMF)—This type of application plays pre-recorded prompts and collects digits from a caller's handset in response to prompts. This application type can use TTS to play back small amounts of dynamic data, such as bank balances or callers' names. DTMF applications have a relatively low burden on the speech recognition engine and put a lower overall load on system resources.
Average Speech Recognition—This type of application plays pre-recorded prompts and performs moderate levels of speech recognition with small or variable grammars. This application type can also use TTS to play back relatively small amounts of dynamic data, such as bank balances or callers' names. Average Speech Recognition applications burden the speech recognition engine more than DTMF applications, but because of the moderately sized grammars, overall performance can be better than the performance of applications that have larger grammars.
Complex Speech Recognition—This type of application plays pre-recorded or TTS prompts and performs speech recognition with large grammars (grammars with more than 25,000 items). This application type can also use TTS to play back large amounts of dynamic data, such as reading back e-mail messages. Complex Speech Recognition applications put the most burden on the speech recognition and TTS resources.

Step 3: Choose the Topology

Choose a topology that fits with your capacity needs. You can deploy MSS in the following topologies:

Standard Edition—In this topology, Telephony Application Services (TAS) and Speech Engine Services (SES) run on the same node. This MSS edition is limited to 24 telephony channels per node. This can be the best choice for small deployments that require fewer than 24 channels of capacity. If you need more than 24 channels per node, consider choosing one of the Enterprise Edition topologies.
Enterprise Edition Single Server—In this topology, TAS and SES run on the same node, but the system is not limited to 24 telephony channels. Depending on your requirements, this topology might be the best choice to achieve a capacity of more than 24 channels but on fewer overall nodes.
Enterprise Edition Distributed—In this topology, TAS and SES run on separate nodes. This MSS topology requires a minimum of two computers—one for TAS and one for SES. The advantage of this topology is that it provides the greatest flexibility in planning the size, proximity, and availability of your deployment.
- Size—The TAS and SES nodes scale differently depending on your capacity requirements and the resource needs of the application. For example, if your deployment needs a lot of channels, you may need fewer overall TAS nodes if you use this topology than if you use the Standard Edition or the Enterprise Edition Single Server topology. Fewer servers and telephony boards may result in lower cost overall.
- Proximity—With TAS and SES on separate nodes, this topology provides more flexibility in the physical location of the resources. You can locate TAS nodes in proximity to other telephony facilities and SES nodes in proximity to other computer resources that are similarly managed.
- Availability—Separating the TAS and SES nodes has failover planning benefits in that node failure does not necessarily affect the availability of both types of resources. For example, SES node failure does not necessarily affect the availability of the TAS resources in the deployment because each type of resource is on a separate computer. In the Standard Edition and Enterprise Edition Single Server, node failure affects both types of resources because TAS and SES are running on the same computer.

For more information about MSS topologies, see Standard Edition Topology and Enterprise Edition Topology.

Step 4: Determine the Number of Nodes

The capacity of a particular node is determined by the type of application, the type of topology, and the performance requirements of the system under load. To estimate the sizing requirement of your deployment, determine the number of nodes you need based on the capacity of the system running your type of application.

To determine the number of nodes required for your deployment, first use the table in Estimated Capacity per Node to find the capacity of a single node in your MSS topology. Then divide the total capacity needed for your deployment (from Step 1: Estimate Necessary Capacity) by the per-node capacity found in the table. For example, if you have a Complex Speech Recognition application and need a maximum capacity of 96 channels, you can determine the required number of nodes in this way:

For MSS Standard Edition - 96 channels / 24 channels per node = 4 nodes
For Enterprise Edition Single Server - 96 channels / 41 channels per node = 3 nodes
For Enterprise Edition Distributed - 96 channels / 96 channels per TAS node = 1 TAS node and 96 channels / 48 channels per SES node = 2 SES nodes

In this example, the Enterprise Edition Single Server topology and the Enterprise Edition Distributed topology appear to use the same number of nodes overall (3). However, you can consolidate your telephony resources on fewer nodes (1) by using the Enterprise Edition Distributed topology. This difference may reduce overall cost of a deployment from a telephony standpoint, because the Distributed topology requires only one telephony board for the one TAS node, whereas all three of the Enterprise Edition Single Server nodes require telephony boards.

If your deployment supports multiple applications, consider each application separately, and add together the results. By doing this, you can roughly estimate the accumulative effect of hosting multiple applications in your deployment.

Call Transfer Considerations

If your application transfers calls to a third party, you should consider the impact of that functionality on the overall capacity needs of your deployment. Call transfers usually require an additional outbound telephony channel to connect to the third party and make the transfer. The manner in which the call transfer is handled depends on many factors related to the type of telephony equipment you use. To estimate the call transfer requirements, you should consider the type of equipment you use and the percentage of calls that would result in call transfers. This should help you determine how many additional channels you need to handle those outbound calls.

Note Even though call transfers require additional physical telephony channels to handle the outbound calls, it does not affect the capacity of the TAS resources in MSS. Transferring a call does not require an additional TAS instance to open the outbound channel and complete the transfer. Therefore, disregard call transfer requirements when determining the numbers of nodes needed in the various MSS topologies. You need only consider call transfer requirements when determining the number of physical channels needed.

Load Balancing Considerations

Multiple MSS nodes must be load-balanced to effectively and efficiently use the telephony and speech recognition resources. TAS nodes in Enterprise Edition Distributed topology as well as Standard Edition nodes and Enterprise Edition Single Server nodes, must be load-balanced by the Private Branch Exchange (PBX) switch. In other words, the PBX that is used to provide the telephony channels for incoming calls must be configured to allocate the calls to all nodes in a balanced way. SES nodes in the Enterprise Edition Distributed topology must be load-balanced using a hardware-based load balancer or software-based load balancing based on Windows Network Load Balancing (NLB). For more information, see Load Balancing and Availability and Load Balancing Microsoft Speech Server 2004 Enterprise Edition in the September 2004 issue of Microsoft Speech Server Newsletter.

Step 5: Plan for Failover

Your capacity planning should include a contingency plan for node failure. Failover planning varies depending on the topology you use and the level of availability you need. Minimally, you should plan for at least a single-point-of-failure scenario in which the capacity needs of your application are manageable when one node in your deployment fails. For MSS Standard Edition and Enterprise Edition Single Server topologies, you should have at least one more node than indicated in Step 4: Determine the Number of Nodes. In the Enterprise Edition Distributed topology, you should plan to have at least one additional TAS node and one additional SES node to handle failover for each type of MSS resource.

Step 6: Test the Deployment

To ensure that your deployment has been sized appropriately for the capacity you need, test it. To test your deployment, you must use the application that you are planning to put into production, or use an application that closely approximates the resource requirements of the final production application. You also need a way to generate calls at the load you expect during peak times. You can use a system to generate the load, such as the Empirix Hammer test system, or generate the calls using some other means. It is unlikely that your deployment will be under this load at all times. The intention of the testing is to determine how the system performs under the heaviest expected call volume with the assumption that a lighter load will bring better performance.

It is very important that you determine the acceptable levels of performance for your deployment. You or your organization probably already have performance metrics that are monitored on other IT resources. Those levels are a good starting point for monitoring performance of your MSS resources. It is also important to monitor other metrics that are unique to speech applications, such as call blockage rate and User Perceived Latency (UPL).

As you monitor your deployment under load, aim to achieve the following performance levels:

CPU utilization of less than 70 percent. Keeping CPU utilization at or below 70 percent provides a good buffer for usage spikes and unexpected utilization demands.
UPL of less than two seconds. UPL is the amount of time between the end of the caller's speech and the start of the system response in the form of a prompt. UPL of more than two seconds begins to negatively impact the caller experience.
Call blockage rate of less than 5 percent. At this level, 95 percent or more of all calls are answered.

If your metrics are running higher or lower than what is acceptable for you, adjust your deployment to bring the numbers in line with what you need. For example, if CPU utilization is running high for SES nodes, consider adding a node to your deployment to balance the load on SES resources and bring CPU utilization on all SES nodes down to an acceptable level.

For more information about performance metrics and monitoring MSS resources, see Log Analysis and Tuning with Microsoft Speech Server 2004.

For more information about performance and load testing applications in general, see Performance Testing for Application Blocks.