Експортиране (0) Отпечатване
Разширяване на всички
Related Help Topics
Loading
No resources found.
Related Blog Articles
Loading
No resources found.
Expand Minimize
EN
Това съдържание не е налично на вашия език. На разположение е английската версия.

Exchange 2010 Tested Solutions: 32400 Mailboxes In Three Sites Running Hyper-V on Cisco Unified Compute System Blade Servers and EMC CLARiiON Storage

 

Topic Last Modified: 2012-03-05

Rob Simpson, Program Manager, Microsoft Exchange; Boris Voronin, Sr. Solutions Engineer, Exchange Solutions Engineering, EMC; Mike Mankovsky, Microsoft Solutions Architect, Cisco

June 2011

In Exchange 2010 Tested Solutions, Microsoft and participating server, storage, and network partners examine common customer scenarios and key design decision points facing customers who plan to deploy Microsoft Exchange Server 2010. Through this series of white papers, we provide examples of well-designed, cost-effective Exchange 2010 solutions deployed on hardware offered by some of our server, storage, and network partners.

You can download this document from the Microsoft Download Center.

Microsoft Exchange Server 2010 with Service Pack 1 (SP1)

Windows Server 2008 R2

Windows Server 2008 R2 Hyper-V

Table of Contents

This document provides an example of how to design, test, and validate an Exchange Server 2010 solution running Windows Server 2008 R2 Hyper-V technology for a customer environment with 32,400 mailboxes deployed on Cisco Unified Computing System blade servers and EMC CLARiiON storage solutions. One of the key challenges with designing larger Exchange 2010 environments is examining the current server and storage options available and making the right hardware choices that provide the best value over the anticipated life of the solution. Following the step-by-step methodology in this document, we will walk through the important design decision points that help address these key challenges while ensuring that the customer's core business requirements are met. After we have determined the optimal solution for this customer, the solution undergoes a standard validation process to ensure that it holds up under simulated production workloads for normal operating, maintenance, and failure scenarios.

Return to top

The following tables summarize the key Exchange and hardware components of this solution.

Exchange components

Exchange component Value or description

Target mailbox count

32400

Target average mailbox size

2 gigabytes (GB) (thin provisioned at 600 megabytes (MB) initial size)

Target average message profile

100 messages per day

Database copy count

3

Volume Shadow Copy Service (VSS) backup

None

Site resiliency

Yes

Number of sites

3

Database availability group (DAG) model

Active/Active distribution (multiple DAGs)

Virtualization

Hyper-V

Exchange server count

4 virtual machines (VMs)

Physical server count

2

Hardware components

Hardware component Value or description

Server partner

Cisco

Server model

M200

Server type

Blade

Processor

Intel Xeon X5570

Storage partner

EMC

Storage model

CX4-480

Storage type

Storage area network (SAN)

Disk type

450 GB 15,000 SAS 3.5"

Load balancing partner

Cisco

Hardware load balancing model

Ace

Return to top

One of the most important first steps in Exchange solution design is to accurately summarize the business and technical requirements that are critical to making the correct design decisions. The following sections outline the customer requirements for this solution.

Return to top

Determine mailbox profile requirements as accurately as possible because these requirements may impact all other components of the design. If Exchange is new to you, you may have to make some educated guesses. If you have an existing Exchange environment, you can use the Microsoft Exchange Server Profile Analyzer tool to assist with gathering most of this information. The following tables summarize the mailbox profile requirements for this solution.

Mailbox count requirements

Mailbox count requirement Value

Mailbox count (total number of mailboxes including resource mailboxes)

30000

Projected growth percent (%) in mailbox count (projected increase in mailbox count over the life of the solution)

8%

Expected mailbox concurrency % (maximum number of active mailboxes at any time)

100%

Target mailbox count (mailbox count including growth x expected concurrency)

32400

Mailbox size requirements

Mailbox size requirement Value

Average mailbox size in MB

600 MB

Average mailbox archive size in MB

Not applicable

Projected growth (%) in mailbox size in MB (projected increase in mailbox size over the life of the solution)

230%

Target average mailbox size in MB

2048 MB

Mailbox profile requirements

Mailbox profile requirement Value

Target message profile (average total number of messages sent plus received per user per day)

100 messages per day

Target average message size in kilobytes (KB)

75

% in MAPI cached mode

100

% in MAPI online mode

0

% in Outlook Anywhere cached mode

0

% in Microsoft Office Outlook Web App (Outlook Web Access in Exchange 2007 and previous versions)

0

% in Exchange ActiveSync

0

Return to top

Understanding the distribution of mailbox users and datacenters is important when making design decisions about high availability and site resiliency.

The following table outlines the geographic distribution of people who will be using the Exchange system.

Geographic distribution of people

Mailbox user site requirement Value

Number of major sites containing mailbox users

3

Number of mailbox users in site 1

10800

Number of mailbox users in site 2

10800

Number of mailbox users in site 3

10800

The following table outlines the geographic distribution of datacenters that could potentially support the Exchange e-mail infrastructure.

Geographic distribution of datacenters

Datacenter site requirement Value

Total number of datacenters

3

Number of active mailboxes in proximity to datacenter 1

10800

Number of active mailboxes in proximity to datacenter 2

10800

Number of active mailboxes in proximity to datacenter 3

10800

Requirement for Exchange to reside in more than one datacenter

Yes

Return to top

It's also important to define server and data protection requirements for the environment because these requirements will support design decisions about high availability and site resiliency.

The following table identifies server protection requirements.

Server protection requirements

Server protection requirement Value

Number of simultaneous server or VM failures within site

1

Number of simultaneous server or VM failures during site failure

0

The following table identifies data protection requirements.

Data protection requirements

Data protection requirement Value or description

Requirement to maintain a backup of the Exchange databases outside of the Exchange environment (for example, third-party backup solution)

No

Requirement to maintain copies of the Exchange databases within the Exchange environment (for example, Exchange native data protection)

Yes

Requirement to maintain multiple copies of mailbox data in the primary datacenter

Yes

Requirement to maintain multiple copies of mailbox data in a secondary datacenter

No

Requirement to maintain a lagged copy of any Exchange databases

No

Lagged copy period in days

Not applicable

Target number of database copies

3

Deleted Items folder retention window in days

14

Return to top

This section includes information that isn't typically collected as part of customer requirements, but is critical to both the design and the approach to validating the design.

Return to top

The following table describes the peak CPU utilization targets for normal operating conditions, and for site server failure or server maintenance conditions.

Server utilization targets

Target server CPU utilization design assumption Value

Normal operating for Mailbox servers

<70%

Normal operating for Client Access servers

<70%

Normal operating for Hub Transport servers

<70%

Normal operating for multiple server roles (Client Access, Hub Transport, and Mailbox servers)

<70%

Normal operating for multiple server roles (Client Access and Hub Transport servers)

<70%

Node failure for Mailbox servers

<80%

Node failure for Client Access servers

<80%

Node failure for Hub Transport servers

<80%

Node failure for multiple server roles (Client Access, Hub Transport, and Mailbox servers)

<80%

Node failure for multiple server roles (Client Access and Hub Transport servers)

<80%

Site failure for Mailbox servers

<80%

Site failure for Client Access servers

<80%

Site failure for Hub Transport servers

<80%

Site failure for multiple server roles (Client Access, Hub Transport, and Mailbox servers)

<80%

Site failure for multiple server roles (Client Access and Hub Transport servers)

<80%

Return to top

The following tables summarize some data configuration and input/output (I/O) assumptions made when designing the storage configuration.

Data configuration assumptions

Data configuration assumption Value or description

Data overhead factor

20%

Mailbox moves per week

1%

Dedicated maintenance or restore logical unit number (LUN)

No

LUN free space

20%

Log shipping compression enabled

Yes

Log shipping encryption enabled

Yes

I/O configuration assumptions

I/O configuration assumption Value or description

I/O overhead factor

20%

Additional I/O requirements

None

Return to top

The following section provides a step-by-step methodology used to design this solution. This methodology takes customer requirements and design assumptions and walks through the key design decision points that need to be made when designing an Exchange 2010 environment.

Return to top

When designing an Exchange 2010 environment, many design decision points for high availability strategies impact other design components. We recommend that you determine your high availability strategy as the first step in the design process. We highly recommend that you review the following information prior to starting this step:

If you have more than one datacenter, you must decide whether to deploy Exchange infrastructure in a single datacenter or distribute it across two or more datacenters. The organization's recovery service level agreements (SLAs) should define what level of service is required following a primary datacenter failure. This information should form the basis for this decision.

*Design Decision Point*

In this solution, there are three physical datacenter locations. The SLA states that datacenter resiliency is required for all mission-critical services including e-mail. The Exchange 2010 design will be based on a multisite deployment with site resiliency for the messaging service and data.

In this step, we look at whether all mailbox users are located primarily in one site or if they're distributed across many sites and whether those sites are associated with datacenters. If they're distributed across many sites and there are datacenters associated with those sites, you need to determine if there's a requirement to maintain affinity between mailbox users and the datacenter associated with that site.

*Design Decision Point*

In this example, each of the three datacenters is co-located with regional offices. There's a desire to maintain affinity between the user location and the location of the primary active copy of their mailbox during normal operating conditions.

Because the customer has decided to deploy Exchange infrastructure in more than one physical location, the customer needs to determine which database distribution model best meets the needs of the organization. There are three database distribution models:

  • Active/Passive distribution   Active mailbox database copies are deployed in the primary datacenter and only passive database copies are deployed in a secondary datacenter. The secondary datacenter serves as a standby datacenter and no active mailboxes are hosted in the datacenter under normal operating conditions. In the event of an outage impacting the primary datacenter, a manual switchover to the secondary datacenter is performed and active databases are hosted there until the primary datacenter returns online.

    Active/Passive distribution

    Active-passive database distribution
  • Active/Active distribution (single DAG)   Active mailbox databases are deployed in the primary and secondary datacenters. A corresponding passive copy is located in the alternate datacenter. All Mailbox servers are members of a single DAG. In this model, the wide area network (WAN) connection between two datacenters is potentially a single point of failure. Loss of the WAN connection results in Mailbox servers in one of the datacenters going into a failed state due to loss of quorum.

    Active/Active distribution (single DAG)

    Active-active database distribution single DAG
  • Active/Active distribution (multiple DAGs)   This model leverages multiple DAGs to remove WAN connectivity as a single point of failure. One DAG has active database copies in the first datacenter and its corresponding passive database copies in the second datacenter. The second DAG has active database copies in the second datacenter and its corresponding passive database copies in the first datacenter. In the event of loss of WAN connectivity, the active copies in each site continue to provide database availability to local mailbox users.

    Active/Active distribution (multiple DAGs)

    Active-active distribution with multiple DAGs

*Design Decision Point*

In this example, because active mailbox databases will be deployed in each of the three datacenter locations, the database distribution model will be active/active with multiple DAGs. There are some additional design considerations when deploying an active/active database distribution model with multiple DAGs, which will be addressed in a later step.

Exchange 2010 includes several new features and core changes that, when deployed and configured correctly, can provide native data protection that eliminates the need to make traditional data backups. Backups are traditionally used for disaster recovery, recovery of accidentally deleted items, long-term data storage, and point-in-time database recovery. Exchange 2010 can address all of these scenarios without the need for traditional backups:

  • Disaster recovery   In the event of a hardware or software failure, multiple database copies in a DAG enable high availability with fast failover and no data loss. DAGs can be extended to multiple sites and can provide resilience against datacenter failures.

  • Recovery of accidentally deleted items   With the new Recoverable Items folder in Exchange 2010 and the hold policy that can be applied to it, it's possible to retain all deleted and modified data for a specified period of time, so recovery of these items is easier and faster. For more information, see Messaging Policy and Compliance, Understanding Recoverable Items, and Understanding Retention Tags and Retention Policies.

  • Long-term data storage   Sometimes, backups also serve an archival purpose. Typically, tape is used to preserve point-in-time snapshots of data for extended periods of time as governed by compliance requirements. The new archiving, multiple-mailbox search, and message retention features in Exchange 2010 provide a mechanism to efficiently preserve data in an end-user accessible manner for extended periods of time. For more information, see Understanding Personal Archives, Understanding Multi-Mailbox Search, and Understanding Retention Tags and Retention Policies.

  • Point-in-time database snapshot   If a past point-in-time copy of mailbox data is a requirement for your organization, Exchange provides the ability to create a lagged copy in a DAG environment. This can be useful in the rare event that there's a logical corruption that replicates across the databases in the DAG, resulting in a need to return to a previous point in time. It may also be useful if an administrator accidentally deletes mailboxes or user data.

There are technical reasons and several issues that you should consider before using the features built into Exchange 2010 as a replacement for traditional backups. Prior to making this decision, see Understanding Backup, Restore and Disaster Recovery.

*Design Decision Point*

In this example, with the current Exchange implementation, the primary use of the traditional backup solution is to recover from accidental deletion of mail items. Eighty percent of requests for single item recovery are for messages that are less than 15 days old. Therefore, the deleted items retention period will be 14 days. Because traditional VSS backups aren't required to restore a single item and don't meet the recovery time objective, Exchange Native Data Protection and Deleted Items folder retention features will be used as the database resiliency strategy.

The next important decision when defining your database resiliency strategy is to determine the number of database copies to deploy. We strongly recommend deploying a minimum of three copies of a mailbox database before eliminating traditional forms of protection for the database, such as Redundant Array of Independent Disks (RAID) or traditional VSS-based backups.

Prior to making this decision, see Understanding Mailbox Database Copies.

*Design Decision Point*

In this example, because a traditional VSS backup solution isn't being deployed, a minimum of three database copies will be deployed to ensure that recovery time objective and recovery point objective requirements are met. Two copies will be located in the primary datacenter and a third copy will be located in an alternate datacenter to provide site resiliency.

There are two types of database copies:

  • High availability database copy   This database copy is configured with a replay lag time of zero. As the name implies, high availability database copies are kept up-to-date by the system, can be automatically activated by the system, and are used to provide high availability for mailbox service and data.

  • Lagged database copy   This database copy is configured to delay transaction log replay for a period of time. Lagged database copies are designed to provide point-in-time protection, which can be used to recover from store logical corruptions, administrative errors (for example, deleting or purging a disconnected mailbox), and automation errors (for example, bulk purging of disconnected mailboxes).

*Design Decision Point*

In this example, all three mailbox database copies will be deployed as high availability copies. The SLA doesn't require a lagged copy of the data. Because logical corruption hasn't been experienced in the past and no other applications are being used that manipulate messaging data, a lagged copy isn't needed. The only other need for a lagged copy would be to provide the ability to recover single deleted items, but it's much easier and cost effective to meet this requirement using the Deleted Items folder retention feature.

Exchange 2010 has been re-engineered for mailbox resiliency. Automatic failover protection is now provided at the mailbox database level instead of at the server level. You can strategically distribute active and passive database copies to Mailbox servers within a DAG. Determining how many database copies you plan to activate on a per-server basis is a key aspect to Exchange 2010 capacity planning. There are different database distribution models that you can deploy, but generally we recommend one of the following:

  • Design for all copies activated   In this model, the Mailbox server role is sized to accommodate the activation of all database copies on the server. For example, a Mailbox server may host four database copies. During normal operating conditions, the server may have two active database copies and two passive database copies. During a failure or maintenance event, all four database copies would become active on the Mailbox server. This solution is usually deployed in pairs. For example, if deploying four servers, the first pair is servers MBX1 and MBX2, and the second pair is servers MBX3 and MBX4. In addition, when designing for this model, you will size each Mailbox server for no more than 40 percent of available resources during normal operating conditions. In a site resilient deployment with three database copies and six servers, this model can be deployed in sets of three servers, with the third server residing in the secondary datacenter. This model provides a three-server building block for solutions using an active/passive site resiliency model.

    This model can be used in the following scenarios:

    • Active/Passive multisite configuration where failure domains (for example, racks, blade enclosures, and storage arrays) require easy isolation of database copies in the primary datacenter

    • Active/Passive multisite configuration where anticipated growth may warrant easy addition of logical units of scale

    • Configurations that aren't required to survive the simultaneous loss of any two Mailbox servers in the DAG

    This model requires servers to be deployed in pairs for single site deployments and sets of three for multisite deployments. The following table illustrates a sample database layout for this model.

    Design for all copies activated

    Mailbox server resiliency strategy

    In the preceding table, the following applies:

    • C1 = active copy (activation preference value of 1) during normal operations

    • C2 = passive copy (activation preference value of 2) during normal operations

    • C3 = passive copy (activation preference value of 3) during site failure event

  • Design for targeted failure scenarios   In this model, the Mailbox server role is designed to accommodate the activation of a subset of the database copies on the server. The number of database copies in the subset will depend on the specific failure scenario that you're designing for. The main goal of this design is to evenly distribute active database load across the remaining Mailbox servers in the DAG.

    This model should be used in the following scenarios:

    • All single site configurations with three or more database copies

    • Configurations required to survive the simultaneous loss of any two Mailbox servers in the DAG

    The DAG design for this model requires between 3 and 16 Mailbox servers. The following table illustrates a sample database layout for this model.

    Design for targeted failure scenarios

    Mailbox server resiliency strategy

    In the preceding table, the following applies:

    • C1 = active copy (activation preference value of 1) during normal operations

    • C2 = passive copy (activation preference value of 2) during normal operations

    • C3 = passive copy (activation preference value of 3) during normal operations

*Design Decision Point*

In a previous step, it was decided to deploy an active/active database distribution model with multiple DAGs. Because each DAG in this model has an active/passive configuration with only two high availability database copies in the primary datacenter, a Mailbox server resiliency strategy that designs for all copies being activated is the best fit.

A DAG is the base component of the high availability and site resilience framework built into Exchange 2010. A DAG is a group of up to 16 Mailbox servers that hosts a set of replicated databases and provides automatic database-level recovery from failures that affect individual servers or databases.

A DAG is a boundary for mailbox database replication, database and server switchovers and failovers, and for an internal component called Active Manager. Active Manager is an Exchange 2010 component, which manages switchovers and failovers. Active Manager runs on every server in a DAG.

From a planning perspective, you should try to minimize the number of DAGs deployed. You should consider more than one DAG if:

  • You deploy more than 16 Mailbox servers.

  • You have active mailbox users in multiple sites (active/active site configuration).

  • You require separate DAG-level administrative boundaries.

  • You have Mailbox servers in separate domains. (DAG is domain bound.)

*Design Decision Point*

In a previous step, it was decided to deploy an active/active database distribution model. A single DAG that has active mailbox users in each site could be deployed. However, in the event that DAG members in one site temporarily lose connectively with DAG members in the other site, the cluster in that site will lose quorum and cease to function correctly. For this reason, three DAGs will be deployed. Each DAG will contain Mailbox servers from the primary datacenter that will host the primary and secondary database copies. Each DAG will also contain servers in one of the alternate datacenters that will host the third database copy. The resulting design is three active/passive DAGs with each datacenter hosting the primary and secondary copies from one DAG as well as the third copies from another DAG.

In this step, you need to determine the minimum number of Mailbox servers required to support the DAG design. This number may be different from the number of servers required to support the workload, so the final decision on the number of servers is made in a later step.

*Design Decision Point*

In a previous step, it was decided to deploy three active/passive DAGs, and design a server resiliency strategy for all copies being activated. Each DAG must be deployed in increments of three servers (two in the primary site and one in an alternate site). Because there are three DAGs deployed, the minimum number of servers required to support the DAG design is nine. The solution will have 9, 18, or 27 servers depending on the number of servers required to support the workload. The following table outlines the possible configurations.

Number of Mailbox servers per DAG

DAG1 primary datacenter DAG1 secondary datacenter DAG2 primary datacenter DAG2 secondary datacenter DAG3 primary datacenter DAG3 secondary datacenter Total Mailbox server count

2

1

2

1

2

1

9

4

2

4

2

4

2

18

6

3

6

3

6

3

27

noteNote:
In a three node DAG model, if you lose the two nodes in the primary datacenter, the cluster will lose quorum and automatic activation. The third copy in the secondary datacenter will provide additional data availability, but recovering the service in the secondary datacenter will be a manual operation.

Return to top

Many factors influence the storage capacity requirements for the Mailbox server role. For additional information, we recommend that you review Understanding Mailbox Database and Log Capacity Factors.

The following steps outline how to calculate mailbox capacity requirements. These requirements will then be used to make decisions about which storage solution options meet the capacity requirements. A later section covers additional calculations required to properly design the storage layout on the chosen storage platform.

Microsoft has created a Mailbox Server Role Requirements Calculator that will do most of this work for you. To download the calculator, see E2010 Mailbox Server Role Requirements Calculator. For additional information about using the calculator, see Exchange 2010 Mailbox Server Role Requirements Calculator.

Before attempting to determine what your total storage requirements are, you should know what the mailbox size on disk will be. A full mailbox with a 1-GB quota requires more than 1 GB of disk space because you have to account for the prohibit send/receive limit, the number of messages the user sends or receives per day, the Deleted Items folder retention window (with or without calendar version logging and single item recovery enabled), and the average database daily variations per mailbox. The Mailbox Server Role Requirements Calculator does these calculations for you. You can also use the following information to do the calculations manually.

The following calculations are used to determine the mailbox size on disk for a mailbox with a 2 GB mailbox limit in this solution:

  • Whitespace = 100 messages per day × 75 ÷ 1024 MB = 7.3 MB

  • Dumpster = (100 messages per day × 75 ÷ 1024 MB × 14 days) + (2048 MB × 0.012) + (2048 MB × 0.058) = 246 MB

  • Mailbox size on disk = mailbox limit + whitespace + dumpster

    = 2048 + 7.3 + 246

    = 2301 MB

In this step, the high level storage capacity required for all mailbox databases is determined. The calculated capacity includes database size, catalog index size, and 20 percent free space.

To determine the storage capacity required for all databases, use the following formulas:

  • Database size = (number of mailboxes × mailbox size on disk × database overhead growth factor) + (20% data overhead)

    = (32400 × 2301 × 1) + (14910480)

    = 89462880 MB

    = 87366 GB

  • Database index size = 10% of database size

    = 87366 × 0.10

    = 8737 GB

  • Total database capacity = (database size) + (index size) ÷ 0.80 to add 20% volume free space

    = (87366 + 8737) ÷ 0.8

    = 120128 GB

To ensure that the Mailbox server doesn't sustain any outages as a result of space allocation issues, the transaction logs also need to be sized to accommodate all of the logs that will be generated during the backup set. Provided that this architecture is leveraging the mailbox resiliency and single item recovery features as the backup architecture, the log capacity should allocate for three times the daily log generation rate in the event that a failed copy isn't repaired for three days. (Any failed copy prevents log truncation from occurring.) In the event that the server isn't back online within three days, you would want to temporarily remove the copy to allow truncation to occur.

To determine the storage capacity required for all transaction logs, use the following formulas:

  • Log files size = (log file size × number of logs per mailbox per day × number of days required to replace failed infrastructure × number of mailbox users) + (1% mailbox move overhead)

    = (1 MB × 20 × 4 × 32400) + (32400 × 0.01 × 2048 MB)

    = 3255552 MB

    = 3179 GB

  • Total log capacity = log files size ÷ 0.80 to add 20% volume free space

    = (3179) ÷ 0.80

    = 3974

The following table summarizes the high level storage capacity requirements for this solution. In a later step, you will use this information to make decisions about which storage solution to deploy. You will then take a closer look at specific storage requirements in later steps.

Summary of storage capacity requirements

Disk space requirements Value

Average mailbox size on disk (MB)

2301

Database capacity required (GB)

120128

Log capacity required (GB)

3974

Total capacity required (GB)

124102

Total capacity required for three database copies (GB)

372306

Total capacity required for three database copies (terabytes)

364

Total capacity required for each site (terabytes)

122

Return to top

When designing an Exchange environment, you need an understanding of database and log performance factors. We recommend that you review Understanding Database and Log Performance Factors.

Because it's one of the key transactional I/O metrics needed for adequately sizing storage, you should understand the amount of database I/O per second (IOPS) consumed by each mailbox user. Pure sequential I/O operations aren't factored in the IOPS per Mailbox server calculation because storage subsystems can handle sequential I/O much more efficiently than random I/O. These operations include background database maintenance, log transactional I/O, and log replication I/O. In this step, you calculate the total IOPS required to support all mailbox users, using the following:

  • Estimated IOPS per mailbox user = 0.10

  • Total required IOPS = IOPS per mailbox user × number of mailboxes × I/O overhead factor

    = 0.10 × 32400 × 1.2

    = 3888

noteNote:
To determine the IOPS profile for a different message profile, see the table "Database cache and estimated IOPS per mailbox based on message activity" in Understanding Database and Log Performance Factors.

Because this is a multisite deployment, you need to consider the IOPS requirements by site to properly size storage for each site. In a previous step, it was decided that each site would host the primary and secondary database copies from the primary DAG and the tertiary database copy from an alternate DAG. In this model, the worst case scenario would be a single site failure where 10,800 mailboxes from the primary DAG and 10,800 mailboxes from the alternate DAG are active on the storage in that site. Use the following calculation:

  • Total IOPS required per site = IOPS per mailbox user × number of mailboxes × I/O overhead factor

    = 0.10 × 21600 × 1.2

    = 2592

Return to top

Exchange 2010 includes improvements in performance, reliability, and high availability that enable organizations to run Exchange on a wide range of storage options.

When examining the storage options available, being able to balance the performance, capacity, manageability, and cost requirements is essential to achieving a successful storage solution for Exchange.

For more information about choosing a storage solution for Exchange 2010, see Mailbox Server Storage Design.

Return to top

There is a wide range of storage options available for Exchange 2010. The list of choices can be reduced by determining whether deploying a direct-attached storage (DAS) solution (including using local disk) or a SAN solution is preferred. There are many reasons for choosing one over the other, and you should work with your preferred storage vendor to determine which solution meets your business and total cost of ownership (TCO) requirements.

*Design Decision Point*

In this example, a SAN infrastructure is deployed, and SAN is used for storing all data in the environment. A SAN storage solution will continue to be used, and options for deploying Exchange 2010 will be explored.

Return to top

Use the following steps to choose a storage solution.

In this example, EMC storage has been used for many years, and an EMC storage solution will be used for Exchange 2010 deployment. EMC Corporation offers high performing storage arrays like CLARiiON and Symmetric.

The EMC CLARiiON family provides multiple tiers of storage, such as enterprise flash drives, Fibre Channel, and Serial ATA (SATA), which reduces costs because multiple tiers can be managed with a single management interface.

CLARiiON Virtual Provisioning provides benefits beyond traditional thin provisioning, including simplified storage management and improved capacity utilization. You can present a large amount of capacity to a host, and then consume space as needed from a shared pool.

CLARiiON CX4 Series provides four models with flexible levels of capacity, functionality, and performance. The features of each model are described in the following table.

CLARiiON CX4 Series features

Feature CX4 model 120 CX4 model 240 CX4 model 480 CX4 model 960

Maximum disks

120

240

480

960

Storage processors

2

2

2

2

Physical memory per storage processor

3 GB

4 GB

8 GB

16 GB

Maximum write cache

600 MB

1.264 GB

4.5 GB

10.764 GB

Maximum initiators per system

256

512

512

1024

High-availability hosts

128

256

256

512

Minimum form factor size

6U

6U

6U

9U

Maximum standard LUNs

1024

1024

4096

4096

SnapView snapshots

Yes

Yes

Yes

Yes

SnapView clones

Yes

Yes

Yes

Yes

SAN copy

Yes

Yes

Yes

Yes

MirrorView/S

Yes

Yes

Yes

Yes

MirrorView/A

Yes

Yes

Yes

Yes

RecoverPoint/S

Yes

Yes

Yes

Yes

RecoverPoint/A

Yes

Yes

Yes

Yes

In this example, 450 GB Fibre Channel 15,000 rpm disks are selected, which provide good I/O performance and capacity to satisfy the initial Exchange user requirements.

noteNote:
Since the time of testing, 600 GB 10,000 rpm disks have come down in cost and would also be a good choice for this deployment.

In this example, the solution needs to provide 122 terabytes of usable storage and 2,592 IOPS. Any of the options in the preceding table will handle the IOPS requirements, so the decision will be based on capacity requirements. The CLARiiON CX4 model 240 only provides approximately 100 terabytes of usable capacity with 450 GB disks in a RAID-5 configuration. The EMC CLARiiON CX4 model 480 is selected because it provides the necessary capacity and I/O performance to support all Exchange 2010 requirements.

Return to top

Sizing memory correctly is an important step in designing a healthy Exchange environment. We recommend that you review Understanding Memory Configurations and Exchange Performance and Understanding the Mailbox Database Cache.

Return to top

The Extensible Storage Engine (ESE) uses database cache to reduce I/O operations. In general, the more database cache available, the less I/O generated on an Exchange 2010 Mailbox server. However, there's a point where adding additional database cache no longer results in a significant reduction in IOPS. Therefore, adding large amounts of physical memory to your Exchange server without determining the optimal amount of database cache required may result in higher costs with minimal performance benefit.

The IOPS estimates that you completed in a previous step assume a minimum amount of database cache per mailbox. These minimum amounts are summarized in the table "Estimated IOPS per mailbox based on message activity and mailbox database cache" in Understanding the Mailbox Database Cache.

The following table outlines the database cache per user for various message profiles.

Database cache per user

Messages sent or received per mailbox per day (about 75 KB average message size) Database cache per user (MB)

50

3 MB

100

6 MB

150

9 MB

200

12 MB

In this step, you determine high level memory requirements for the entire environment. In a later step, you use this result to determine the amount of physical memory needed for each Mailbox server. Use the following calculation:

  • Database cache = profile specific database cache × number of mailbox users

    = 6 MB × 32400

    = 194400 MB

    = 190 GB

Return to top

Mailbox server capacity planning has changed significantly from previous versions of Exchange due to the new mailbox database resiliency model provided in Exchange 2010. For additional information, see Mailbox Server Processor Capacity Planning.

In the following steps, you calculate the high level megacycle requirements for active and passive database copies. These requirements will be used in a later step to determine the number of Mailbox servers needed to support the workload. Note that the number of Mailbox servers required also depends on the Mailbox server resiliency model and database copy layout.

Using megacycle requirements to determine the number of mailbox users that an Exchange Mailbox server can support isn't an exact science. A number of factors can result in unexpected megacycle results in test and production environments. Megacycles should only be used to approximate the number of mailbox users that an Exchange Mailbox server can support. It's always better to be conservative rather than aggressive during the capacity planning portion of the design process.

The following calculations are based on published megacycle estimates as summarized in the following table.

Megacycle estimates

Messages sent or received per mailbox per day Megacycles per mailbox for active mailbox database Megacycles per mailbox for remote passive mailbox database Megacycles per mailbox for local passive mailbox

50

1

0.1

0.15

100

2

0.2

0.3

150

3

0.3

0.45

200

4

0.4

0.6

In this step, you calculate the megacycles required to support the active database copies, using the following:

  • Active mailbox megacycles required = profile specific megacycles × number of mailbox users

    = 2 × 32400

    = 64800

In a design with three copies of each database, there is processor overhead associated with shipping logs required to maintain database copies on the remote servers. This overhead is typically 10 percent of the active mailbox megacycles for each remote copy being serviced. Calculate the requirements, using the following:

  • Remote copy megacycles required = profile specific megacycles × number of mailbox users × number of remote copies

    = 0.1 × 32400 × 2

    = 6480

In a design with three copies of each database, there is processor overhead associated with maintaining the local passive copies of each database. In this step, the high level megacycles required to support local passive database copies will be calculated. These numbers will be refined in a later step so that they match the server resiliency strategy and database copy layout. Calculate the requirements, using the following:

  • Local passive mailbox megacycles required = profile specific megacycles × number of mailbox users × number of passive copies

    = 0.3 × 32400 × 2

    = 19440

Calculate the total requirements, using the following:

Total megacycles required = active mailbox + remote passive + local passive

= 64800 + 6480 + 19440

= 90720

Average megacycles per mailbox = 90720 ÷ 32400 = 2.8

Return to top

The following table summarizes the approximate megacycles and database cache required for this environment. This information will be used in later steps to determine which servers will be deployed in the solution.

Mailbox requirements summary

Mailbox CPU requirements Value

Total megacycles required for entire environment

97200

Total database cache required for entire environment

190 GB

Return to top

You can use the following steps to determine the server model.

In this solution, the preferred server platform is the Cisco Unified Computing System, a datacenter platform that unites computing, networking, storage access, and virtualization into a system designed to reduce TCO and increase flexibility. The system integrates a low-latency 10-gigabit Ethernet unified network fabric with enterprise-class, x86-architecture servers. With a systems approach to architecture, technology, partnerships, and services, the Cisco Unified Computing System streamlines datacenter resources, scales service delivery, and reduces the number of devices requiring setup, management, power, cooling, and cabling.

The Cisco Unified Computing System is a blade server system comprised of four primary system components. These system components are the fabric interconnect, chassis, fabric extenders (I/O modules), and blade servers.

The following blade server models are potential options for this solution.

Option 1: B200 Blade Server

The Cisco Unified Computing System B200 Blade Server is a half-width, two-socket blade server. The system uses two Intel Xeon 5500 or 5600 series processors, up to 96 GB of DDR3 memory, two optional hot-swappable small form factor SAS disk drives, and a single mezzanine connector for up to 20 gigabits per second (Gbps) of I/O throughput. The server balances simplicity, performance, and density for production-level virtualization and other mainstream datacenter workloads.

Option 2: B250 Blade Server

The Cisco Unified Computing System B250 Extended Memory Blade Server is a full-width, two-socket blade server featuring Cisco extended memory technology. The system supports two Intel Xeon 5500 or 5600 series processors, up to 384 GB of DDR3 memory, two optional small form factor SAS disk drives, and two mezzanine connections for up to 40 Gbps of I/O throughput. The server increases performance and capacity for virtualization and large dataset workloads.

Option 3: B440 Blade Server

The Cisco Unified Computing System B440 Blade Server is designed to power enterprise applications such as large dataset and transaction-intensive databases, enterprise resource planning (ERP) programs, and decision-support systems (DSSs). Powered by the scalable performance and reliability features of Intel Xeon 7500 series processors, the Cisco Unified Computing System B440 helps widen the scope of workload virtualization and unifies performance-intensive standalone applications within an integrated, simplified infrastructure. The Cisco Unified Computing System B440 supports up to 32 processing cores and 256 GB of main memory with combined I/O throughput of up to 40 Gbps.

The Cisco Unified Computing System B200 with Intel Xeon X5570 processors is selected because this blade server had the optimal balance of processing power, memory capacity, and form factor for this deployment. The two-socket server platform is frequently a good choice for Exchange 2010 deployments, based on all relevant factors, including scalability and cost. The Cisco Unified Computing System B250 supports a higher memory configuration and higher I/O throughput, but this isn't required for the solution.

noteNote:
Since the time of testing, 600 GB 10,000 rpm disks have come down in cost and would also be a good choice for this deployment.

Return to top

In previous steps, you calculated the megacycles required to support the number of active mailbox users. In the following steps, you determine how many available megacycles the server model and processor can support, to determine the number of active mailboxes each server can support.

Because the megacycle requirements are based on a baseline server and processor model, you need to adjust the available megacycles for the server against the baseline. To do this, independent performance benchmarks maintained by Standard Performance Evaluation Corporation (SPEC) are used. SPEC is a non-profit corporation formed to establish, maintain, and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers.

To help simplify the process of obtaining the benchmark value for your server and processor, we recommend you use the Exchange Processor Query tool. This tool automates the manual steps to determine your planned processor's SPECint 2006 rate value. To run this tool, your computer must be connected to the Internet. The tool uses your planned processor model as input, and then runs a query against the Standard Performance Evaluation Corporation Web site returning all test result data for that specific processor model. The tool also calculates an average SPECint 2006 rate value based on the number of processors planned to be used in each Mailbox server. Use the following calculation:

  • Processor = Intel X5570 2.93 gigahertz (GHz)

  • SPECint_rate2006 value = 256

  • SPECint_rate2006 value per processor core = 256 ÷ 8

    = 32

In previous steps, you calculated the required megacycles for the entire environment based on megacycle per mailbox estimates. Those estimates were measured on a baseline system (HP DL380 G5 x5470 3.33 GHz, 8 cores) that has a SPECint_rate2006 value of 150 (for an 8 core server), or 18.75 per core.

In this step, you need to adjust the available megacycles for the chosen server and processor against the baseline processor so that the required megacycles can be used for capacity planning.

To determine the megacycles of the Cisco B200-M1 Intel X5570 2.93 GHz platform, use the following formulas:

  • Adjusted megacycles per core = (new platform per core value) × (hertz per core of the baseline platform) ÷ (baseline per core value)

    = (32 × 3330) ÷ 18.75

    = 5683

  • Adjusted megacycles per server = adjusted megacycles per core × number of cores

    = 5683 × 8

    = 45466

Now that the adjusted megacycles per server are known, you need to adjust for the target maximum processor utilization. In a previous section, it was decided not to exceed 80 percent processor utilization during peak workloads or failure scenarios. Use the following calculation:

  • Adjusted available megacycles = available megacycles per server × target maximum processor utilization

    = 45466 × 0.80

    = 36372

Each server has a usable capacity of 36,372 megacycles.

Return to top

You can use the following steps to determine the number of physical Mailbox servers required.

To determine the number of active mailboxes supported by a Mailbox server, use the following calculation:

  • Number of active mailboxes = available megacycles per server ÷ megacycles per mailbox

    = 36372 ÷ 2.8

    = 12990

Each DAG has 10,800 active mailboxes. To determine the minimum number of Mailbox servers required to support all mailboxes in a DAG, use the following calculation:

  • Number of servers required = total mailbox count per DAG ÷ active mailboxes per server

    = 10800 ÷ 12990

    = 0.83

A minimum of one Mailbox server is required for each DAG to support the workload of 10,800 mailboxes.

In a previous step, it was determined to design for all copies being activated in an active/passive DAG. This model requires that Mailbox server roles be deployed in groups of three for each DAG.

Number of Mailbox servers and DAGs

DAG1 primary datacenter DAG1 secondary datacenter DAG2 primary datacenter DAG2 secondary datacenter DAG3 primary datacenter DAG3 secondary datacenter Total Mailbox server count

2

1

2

1

2

1

9

4

2

4

2

4

2

18

6

3

6

3

6

3

27

Based on the DAG design, a minimum of three physical Mailbox servers in each DAG or a total of nine physical Mailbox servers for all three DAGs are required.

Return to top

You can use the following steps to determine the number of active mailboxes per Mailbox server under normal operating and failure scenarios.

To determine the number of active mailboxes hosted by each Mailbox server during normal operations, use the following calculation:

  • Number of mailboxes per server = total mailbox count in DAG ÷ number of Mailbox servers in primary datacenter

    = 10800 ÷ 2

    = 5400

In the event that one Mailbox server in the primary datacenter fails, the 5,400 active mailboxes on the failed server will become active on the remaining server. In this scenario, the remaining Mailbox server will have 10,800 active mailboxes.

In the event that the primary datacenter goes offline, the 10,800 active mailboxes in the primary datacenter will be activated in the secondary datacenter. In this scenario, the single server in the secondary datacenter will have 10,800 active mailboxes.

Return to top

When determining the number of Client Access server roles and Hub Transport server roles to deploy in environments with smaller numbers of servers, you may consider deploying both roles on the same physical machine. This reduces the number of physical machines to manage, the number of server operating systems to update and maintain, and the number of Windows and Exchange licenses you need to purchase. The other benefit to combining the Client Access and Hub Transport server roles is to simplify the design process. When deploying roles in isolation, we recommend that you deploy one Hub Transport server logical processor for every four Mailbox server logical processors and that you deploy three Client Access server logical processors for every four Mailbox logical processors. This can become confusing especially when you factor in having to provide sufficient Client Access and Hub Transport servers during multiple physical server failure or maintenance scenarios. When deploying Client Access and Hub Transport servers and Mailbox servers on like physical servers or like VMs, you can deploy one combination Client Access and Hub Transport server for every Mailbox server in the site.

*Design Decision Point*

In this solution, it was decided to co-locate the Hub Transport and Client Access server roles together on the same physical machine. This will reduce the number of operating systems to manage as well as make it easier to plan for server resiliency.

Return to top

In a previous step, you determined that three Mailbox servers were required in each site (two Mailbox servers from the DAG hosting active mailboxes for that site and one Mailbox server from an alternate DAG supporting site resiliency in the event of a failure of the primary datacenter for that DAG).

We recommend that you deploy one combination Client Access and Hub Transport server for every Mailbox server, as shown in the following table.

Number of physical Client Access and Hub Transport combination servers required

Server role configuration Recommended processor core ratio

Mailbox:Client Access and Hub Transport combined server role

1:1

When you have more than one DAG represented in the same site, you need to examine the worst case failure scenario before you can determine the number of Client Access and Hub Transport combination servers required. In this solution, the worst case failure scenario would be to lose one of the two Mailbox servers in the primary DAG and have a simultaneous site failure where the active mailboxes from another site are now being hosted in the same site. In this case, you will have 21,600 active mailboxes in the site running on two Mailbox servers and therefore will require a minimum of two Client Access and Hub Transport combination servers in each site, as shown in the following figure.

Client Access and Hub Transport servers

tbd

Return to top

So far, you have determined that 15 physical servers are required to support Client Access, Hub Transport, and Mailbox server roles for 32,400 active mailboxes in three datacenters, as shown in the following figure.

Required number of physical servers

tbd

Return to top

Several factors are important when considering server virtualization for Exchange. For more information about supported configurations for virtualization, see Exchange 2010 System Requirements.

Consider using virtualization with Exchange for the following reasons:

  • If you expect server capacity to be underutilized and anticipate better utilization, you may purchase fewer servers as a result of virtualization.

  • You may want to use Windows Network Load Balancing (NLB) when deploying Client Access, Hub Transport, and Mailbox server roles on the same physical server.

  • If your organization is using virtualization in all server infrastructure, you may want to use virtualization with Exchange, to be in alignment with corporate standard policy.

To determine whether virtualization should be used in this environment, consider the anticipated processor utilization and determine if the servers are likely to be underutilized.

To determine the CPU utilization of 5,400 active mailboxes on a single Mailbox server, use the following calculation:

  • Percent processor (peak normal operating) = required megacycles ÷ available megacycles

    = (5400 × 2.8) ÷ 45466

    = 33.2%

To determine the CPU utilization of 10,800 active mailboxes on a single Mailbox server, use the following calculation:

  • Percent processor (peak failure conditions) = required megacycles ÷ available megacycles

    = (10800 × 2.8) ÷ 45466

    = 66.5%

*Design Decision Point*

Because the server is projected to be under the 80 percent utilization target for the worst case failure scenario, there may be an opportunity to reduce server count using virtualization. This will be explored further in the following steps.

Return to top

In the following steps, you will determine whether virtualization can be used to reduce the number of physical servers required in this solution. Microsoft Hyper-V will be used as the virtualization platform.

At the time of testing, Microsoft Hyper-V supports a maximum of four virtual processors per VM. In the physical design, the Mailbox server role for the primary DAG was deployed across two physical servers with a total of 16 logical processors. By adding virtualization, the Mailbox server role for the primary DAG is now deployed in four VMs, each with four virtual processors for a total of 16 virtual processors.

In the physical design, the Mailbox server role for the alternate DAG was deployed on a single physical server with eight logical processors. By adding virtualization, the Mailbox server role for the alternate DAG is now deployed in two VMs, each with four virtual processors for a total of eight virtual processors.

In the physical design, the Client Access and Hub Transport combination server was deployed on two physical servers with a total of 16 logical processors. By adding virtualization, the Client Access and Hub Transport combination servers are now deployed in four VMs, each with four virtual processors for a total of 16 virtual processors.

When using Hyper-V root servers with eight logical processors, it's a best practice to deploy one Mailbox server VM and one Client Access and Hub Transport combination server VM on each Hyper-V root server.

The solution now has 10 VMs running on five physical servers in each site, as shown in the following figure.

Virtual machines

tbd

Based on calculations in previous steps, you anticipate that the megacycle requirements of the worst case workload can be handled by four physical servers. In this step, you will reduce the physical server count from five to four and redistribute the Mailbox servers in the alternate DAG to the remaining four physical servers. To maintain symmetry across the four physical servers, you will need to change the two Mailbox server VMs (with four virtual processors) to four Mailbox server VMs (with two virtual processors).

This results in 12 VMs running on four physical servers in each site, as shown in the following figures.

Virtual machines

tbd

Virtual machines

tbd

In this step, you will estimate the number of virtual processors required for each VM. In the following steps, you will perform the calculations to verify the assumptions made.

Each of the four Mailbox server VMs in the primary DAG will support 25 percent of the 10,800 active mailboxes in the DAG under normal operating conditions, or 2,700 mailboxes each. In the event of a server failure, the surviving Mailbox server VM will have to support 5,400 active mailboxes.

In the event of a site failure, each of the four Mailbox server VMs in the primary DAG will support 25 percent of the 10,800 active mailboxes in the DAG, or 2,700 mailboxes each. In this scenario, the mailboxes will be running on the third and final copy of the database. In the event of a server or VM failure, the surviving VM won't have to support 5,400 active mailboxes. The maximum number of mailboxes will always be 2,700.

It makes sense that the VMs in the alternate DAG have half as many virtual processors as the VMs in the primary DAG. In this solution, assign four virtual processors to the VMs in the primary DAG and two virtual processors to the VMs in the alternate DAG.

If you maintain a 1:1 ratio of logical to virtual processors, this leaves two virtual processors for each Client Access and Hub Transport combination server. Because you want to maintain a 1:1 ratio of Mailbox server cores to Client Access and Hub Transport combination server cores, assign four virtual processors to each Client Access and Hub Transport combination server. This results in a scenario where the number of virtual processors exceeds the number of physical processors on the root server. This is referred to as oversubscription. Under most circumstances, we recommend that you don't use oversubscription. However, in this solution, the Mailbox server VMs in the alternate DAG will only be used during a site failure event. Because this is a low occurrence event, a slight oversubscription is okay.

The following table shows the proposed virtual processor allocations.

Virtual processor allocation

Virtual machine Virtual processor count

Client Access and Hub Transport combination

3

Mailbox (primary DAG)

4

Mailbox (alternate DAG)

2

Total

9

Return to top

In previous steps, you calculated the megacycles required to support the number of active mailbox users. In the following steps, you will determine how many available megacycles the server model and processor can support, so the number of active mailboxes that each virtual server can support can be determined.

Return to top

When deploying VMs on the root server, you need to consider megacycles required to support the hypervisor and virtualization stack. This overhead varies from server to server and under different workloads. A conservative estimate of 10 percent of available megacycles will be used, as shown in the following calculation:

  • Adjusted available megacycles = available megacycles × 0.90

    = 45466 × 0.90

    = 40919

Each server has a usable capacity for VMs of 40,919 megacycles.

The usable capacity per logical processor is 5,115 megacycles.

Return to top

In a previous step, you determined the virtual processor allocation for the three VM types, as shown in the following table.

Virtual processor allocation

Virtual machine Virtual processor count

Client Access and Hub Transport combination

3

Mailbox (primary DAG)

4

Mailbox (alternate DAG)

2

Total

9

Because there are nine virtual processors running on a root server with eight logical processors, the megacycle capacity of a virtual processor isn't equal to the megacycle capacity of a logical processor. In this step, calculate the available megacycles per virtual processor:

  • Megacycles per virtual processor= megacycles per logical processor × (number of logical processors ÷ number of virtual processors)

    = 5115 × (8 ÷ 9)

    = 4547

In this step, to determine the available megacycles per VM, reference the following table.

Available megacycles per VM

Virtual machine Virtual processor count Megacycles per virtual processor Available megacycles

Client Access and Hub Transport combination

3

4547

13641

Mailbox (primary DAG)

4

4547

18188

Mailbox (alternate DAG)

2

4547

9094

Because the design assumptions state not to exceed 80 percent processor utilization, adjust the available megacycles to reflect the 80 percent target, as shown in the following table.

Target available megacycles per VM

Virtual machine Available megacycles Maximum processor utilization Target available megacycles

Client Access and Hub Transport combination

13641

80%

10913

Mailbox (primary DAG)

18188

80%

14550

Mailbox (alternate DAG)

9094

80%

7275

Return to top

To verify CPU capacity of the primary Mailbox server VMs, use the following steps.

The worst case workload for the primary Mailbox server is during a server failure or maintenance scenario where 5,400 mailboxes are active on the primary Mailbox server and the second and third remote copies are being maintained (for example, following a recovery event where the passive copies are being updated, but the active mailboxes haven't been moved back to the target server). In this step, you determine the megacycle requirements for the primary Mailbox server VM, using the following calculation:

  • Mailbox megacycles required = (number of mailbox users × profile specific megacycles) + number of remote database copies × (number of mailbox users × profile specific megacycles × 10%)

    = (5400 × 2) + 2 × (5400 × 2 × 0.1)

    = 10800 + 2160

    = 12960

In this step, you determine whether the available megacycles are greater than the required megacycles. You require 12,960 megacycles and have 14,550 megacycles, so the primary Mailbox server VM has sufficient capacity to support 5,400 active mailboxes.

Return to top

To verify CPU capacity of the secondary Mailbox server VMs, use the following steps.

The worst case workload for the secondary Mailbox server is during a site failure scenario where 2,700 mailboxes are active on the secondary Mailbox server and the second and third remote copies are being maintained (for example, following the original site coming back online where the original primary and secondary copies are being updated, but the active mailboxes haven't been moved back to the original site.) In this step, to determine the megacycle requirements for the secondary Mailbox server VM, use the following calculation:

  • Mailbox megacycles required = (number of mailbox users × profile specific megacycles) + number of remote database copies × (number of mailbox users × profile specific megacycles × 10%)

    = (2700 × 2) + 2 × (2700 × 2 × 0.1)

    = 5400 + 1080

    = 6480

In this step, you determine whether the available megacycles are greater than the required megacycles. You require 6,480 megacycles and have 7,275 megacycles, so the secondary Mailbox server VM has sufficient capacity to support 2,700 active mailboxes.

Return to top

You can use the following steps to determine the memory required per primary Mailbox server VM.

In a previous step, you determined that the database cache requirements for all mailboxes were 190 GB and the average cache required per active mailbox was 6 MB.

To design for the worst case failure scenario, you calculate the required database cache based on 5,400 active mailboxes on the remaining Mailbox server VMs:

  • Memory required for database cache = number of active mailboxes × average cache per mailbox

    = 5400 × 6 MB

    = 32400 MB

    = 31.6 GB

In this step, reference the following table to determine the recommended memory configuration.

Memory requirements

Server physical memory (RAM) Database cache size (Mailbox server role only)

24 GB

17.6 GB

32 GB

24.4 GB

48 GB

39.2 GB

The recommended memory configuration to support 31.6 GB of database cache for a Mailbox server role is 48 GB.

Return to top

You can use the following steps to determine the memory required per secondary Mailbox server VM.

In a previous step, you determined that the database cache requirements for all mailboxes were 190 GB, and the average cache required per active mailbox was 6 MB.

To design for the worst case failure scenario, you calculate the required database cache based on 2,700 active mailboxes residing on the secondary Mailbox server VMs:

  • Memory required for database cache = number of active mailboxes × average cache per mailbox

    = 2700 × 6 MB

    = 16200 MB

    = 15.8 GB

In this step, reference the following table to determine the recommended memory configuration.

Memory requirements

Server physical memory (RAM) Database cache size (Mailbox server role only) Database cache size (multiple server roles, for example, Mailbox and Hub Transport server roles)

24 GB

17.6 GB

14 GB

32 GB

24.4 GB

20 GB

48 GB

39.2 GB

32 GB

The recommended memory configuration to support 15.8 GB of database cache for a Mailbox server role is 24 GB.

Return to top

To determine the memory configuration for the Client Access and Hub Transport combination server VM, reference the following table.

Memory configurations for Exchange 2010 servers based on installed server roles

Exchange 2010 server role Minimum supported Recommended maximum

Client Access and Hub Transport combined server role (Client Access and Hub Transport server roles running on the same physical server)

4 GB

2 GB per core (8 GB minimum)

Because the Client Access and Hub Transport combination server VM has three virtual processors, 6 GB of memory are allocated to each Client Access and Hub Transport combination server VM.

Return to top

To determine the memory required per Hyper-V root server, use the following calculation:

  • Root server memory = root operating system memory + Client Access and Hub Transport combination server VM memory + primary Mailbox server VM memory + secondary Mailbox server VM memory

    = 4 GB + 6 GB + 48 GB + 24 GB

    = 82 GB

The physical memory requirement for the root server is 82 GB. To align with recommended physical memory configurations, the server will be populated with 96 GB.

Return to top

In a previous step, you determined that the solution would contain three DAGs and that each DAG would span two of the three physical locations. Now that you have determined how many Mailbox servers are required to support the workload and the DAG requirements, you can continue with the DAG design.

DAG design

TBD

Return to top

Use the following steps to design database copy layout.

To determine the optimal number of Exchange databases to deploy, use the Exchange 2010 Mailbox Server Role Requirements Calculator. Enter the appropriate information on the input tab and select Yes for Automatically Calculate Number of Databases / DAG. For the mailbox size limit field, use the fully provisioned mailbox quota of 2,048 MB.

Exchange databases in the DAG

tbd

On the Role Requirements tab, the recommended number of databases appears. For this solution, the calculator recommends that each DAG have a minimum of 24 unique databases.

*Design Decision Point*

Following the recommendations of the calculator, 24 databases per DAG will be deployed.

Because there are 24 unique databases per DAG and eight servers in the DAG, each of the four servers in the primary site will host six active database copies during normal operating conditions.

Start by adding the active database copies to the four servers, as shown in the following table.

Database layout during normal operating conditions

Database MBX1 MBX2 MBX3 MBX4

DB1-6

A1

DB7-12

A1

DB13-18

A1

DB19-24

A1

In the preceding table, the following applies:

  • A1 = active database copy

In a previous step, you determined that the Mailbox server resiliency strategy would be designed for operational efficiency. Mailbox servers would be deployed in pairs.

Because there are four Mailbox servers in the DAG, server 1 and 2 will be a pair and server 3 and 4 will be a pair. In this step, you add the passive database copies (P1) to the alternate server in each pair as shown in the following table.

Database layout during normal operating conditions with passive copies

Database MBX1 MBX2 MBX3 MBX4

DB1-6

A1

P1

DB7-12

P1

A1

DB13-18

A1

P1

DB19-24

P1

A1

In the preceding table, the following applies:

  • A1 = active database copy

  • P1 = passive database copy

During a server failure or maintenance event, the P1 copies are activated on the alternate server. The following table illustrates this when MBX2 and MBX4 are down for maintenance.

Database copy layout during in-site server failure or maintenance conditions

tbd

In the preceding table, the following applies:

  • A1 = active database copy

  • P1 = passive database copy

In this step, add a third database copy to the DAG members in the secondary datacenter to provide site resiliency, as shown in the following table.

Database copies added to secondary datacenter to support site resiliency

Database SiteA MBX1 SiteA MBX2 SiteA MBX3 SiteA MBX4 SiteB MBX5 SiteB MBX6 SiteB MBX7 SiteB MBX8

DB1-6

A1

P1

P2

DB7-12

P1

A1

P2

DB13-18

A1

P1

P2

DB19-24

P1

A1

P2

In the preceding table, the following applies:

  • A1 = active database copy

  • P1 = local passive database copy

  • P2 = remote passive database copy

In the event of a failure of the primary datacenter, the P2 copies will be activated in the secondary site, as shown in the following table. Note that until the primary site comes back online, there will only be a single copy of the database.

Database layout during site failure conditions

tbd

In the preceding table, the following applies:

  • A1 = active database copy

  • P1 = passive database copy

  • P2 = passive database copy

Return to top

Datacenter Activation Coordination (DAC) mode is used to control the activation behavior of a DAG when a catastrophic failure occurs that affects the DAG (for example, a complete failure of one of the datacenters). When DAC mode isn't enabled, and a failure occurs that affects multiple servers in the DAG, when a majority of the DAG members are restored after the failure, the DAG will restart and attempt to mount databases. In a multiple datacenter configuration, this behavior could cause split brain syndrome, a condition that occurs when all networks fail, and DAG members can't receive heartbeat signals from each other. Split brain syndrome can also occur when network connectivity is severed between the datacenters. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG's witness server) to be available and interacting for the DAG to be operational. For more information, see Understanding Datacenter Activation Coordination Mode.

*Design Decision Point*

DAC mode will be enabled for all three DAGs in the environment to prevent split brain syndrome from occurring.

Return to top

In Exchange 2010, the DAG uses a minimal set of components from Windows failover clustering. One of those components is the quorum resource, which provides a means for arbitration when determining cluster state and making membership decisions. It's critical that each DAG member have a consistent view of how the DAGs underlying cluster is configured. The quorum acts as the definitive repository for all configuration information relating to the cluster. The quorum is also used as a tiebreaker to avoid split brain syndrome. Split brain syndrome is a condition that occurs when DAG members can't communicate with each other but are available and running. Split brain syndrome is prevented by always requiring a majority of the DAG members (and in the case of DAGs with an even number of members, the DAG witness server) to be available and interacting for the DAG to be operational.

A witness server is a server outside of a DAG that hosts the file share witness, which is used to achieve and maintain quorum when the DAG has an even number of members. DAGs with an odd number of members don't use a witness server. Upon creation of a DAG, the file share witness is added by default to a Hub Transport server (that doesn't have the Mailbox server role installed) in the same site as the first member of the DAG. If your Hub Transport server is running in a VM that resides on the same root server as VMs running the Mailbox server role, we recommend that you move the location of the file share witness to another highly available server. You can move the file share witness to a domain controller, but because of security implications, do this only as a last resort.

In solutions where the DAG spans multiple sites, we recommend that an alternate file share witness be defined for the secondary site. This will allow the cluster to maintain quorum during a site failure event with DAC mode enabled.

*Design Decision Point*

Because it was decided to deploy three DAGs and all DAGs will contain members in multiple sites, three primary witness directories and three alternate witness directories need to be defined. These directories will be located on file servers within each site.

Return to top

When you plan your Exchange 2010 organization, one of the most important decisions that you must make is how to arrange your organization's external namespace. A namespace is a logical structure usually represented by a domain name in Domain Name System (DNS). When you define your namespace, you must consider the different locations of your clients and the servers that house their mailboxes. In addition to the physical locations of clients, you must evaluate how they connect to Exchange 2010. The answers to these questions will determine how many namespaces you must have. Your namespaces will typically align with your DNS configuration. We recommend that each Active Directory site in a region that has one or more Internet-facing Client Access servers have a unique namespace. This is usually represented in DNS by an A record, for example, mail.contoso.com or mail.europe.contoso.com.

For more information, see Understanding Client Access Server Namespaces.

There are a number of different ways to arrange your external namespaces, but usually your requirements can be met with one of the following namespace models:

  • Consolidated datacenter model   This model consists of a single physical site. All servers are located within the site, and there is a single namespace, for example, mail.contoso.com.

  • Single namespace with proxy sites   This model consists of multiple physical sites. Only one site contains an Internet-facing Client Access server. The other sites aren't exposed to the Internet. There is only one namespace for the sites in this model, for example, mail.contoso.com.

  • Single namespace and multiple sites   This model consists of multiple physical sites. Each site can have an Internet-facing Client Access server. Alternatively, there may be only a single site that contains Internet-facing Client Access servers. There is only one namespace for the sites in this model, for example, mail.contoso.com.

  • Regional namespaces   This model consists of multiple physical sites and multiple namespaces. For example, a site located in New York City would have the namespace mail.usa.contoso.com, a site located in Toronto would have the namespace mail.canada.contoso.com, and a site located in London would have the namespace mail.europe.contoso.com.

  • Multiple forests   This model consists of multiple forests that have multiple namespaces. An organization that uses this model could be made up of two partner companies, for example, Contoso and Fabrikam. Namespaces might include mail.usa.contoso.com, mail.europe.contoso.com, mail.asia.fabrikam.com, and mail.europe.fabrikam.com.

*Design Decision Point*

For this scenario, the regional namespaces model is selected because it's the best fit for organizations with active mailboxes in multiple sites.

The advantage of this model is that proxying is reduced because a larger percentage of users will be able to connect to a Client Access server in the same Active Directory site as their Mailbox server. This will improve the end-user experience and performance. Users who have mailboxes in a site that doesn't have an Internet-facing Client Access server will still be proxied.

This solution also has the following configuration requirements:

  • Multiple DNS records must be managed.

  • Multiple certificates must be obtained, configured, and managed.

  • Managing security is more complex because each Internet-facing site requires a Microsoft Forefront Threat Management Gateway computer or other reverse-proxy or firewall solution.

  • Users must connect to their own regional namespace. This may result in additional Help desk calls and training.

Return to top

In Exchange 2010, the RPC Client Access service and the Exchange Address Book service were introduced on the Client Access server role to improve the mailbox users experience when the active mailbox database copy is moved to another Mailbox server (for example, during mailbox database failures and maintenance events). The connection endpoints for mailbox access from Microsoft Outlook and other MAPI clients have been moved from the Mailbox server role to the Client Access server role. Therefore, both internal and external Outlook connections must now be load balanced across all Client Access servers in the site to achieve fault tolerance. To associate the MAPI endpoint with a group of Client Access servers rather than a specific Client Access server, you can define a Client Access server array. You can only configure one array per Active Directory site, and an array can't span more than one Active Directory site. For more information, see Understanding RPC Client Access and Understanding Load Balancing in Exchange 2010.

*Design Decision Point*

Because this is a three site deployment with four servers running the Client Access server role in each site, there will be a total of three Client Access server arrays. A hardware load balancing solution will be used to distribute load across the Client Access server arrays in each site.

Return to top

Use the following steps to determine a hardware load balancing model.

In this example, the preferred vendor is Cisco because the Cisco Application Control Engine (ACE) product line works with the Cisco Unified Computing System that was selected for the server, network, and storage connectivity components of this solution.

The Cisco ACE product line provides a highly available and scalable datacenter solution from which the Exchange 2010 application environment can benefit. Cisco ACE products offer interoperability, with the following advantages:

  • Performance, scalability, throughput, and application availability

  • Standards-based design

  • Virtual architecture with device partitioning

  • Role-based administration and centralized management

  • Security services through deep packet inspection, access control lists (ACLs), unicast reverse path forwarding, and network address translation (NAT)/port address translation

The Cisco ACE product line includes two different hardware load balancing models that meet the needs for the highly available and scalable datacenter solution appropriate for the Exchange 2010 application environment. These are the Cisco ACE 4710 appliance and the integrated service module in the Cisco Catalyst 6500/Cisco 7600 routing platforms.

The Cisco ACE 4710 appliance provides up to 4 Gbps throughput in a one-rack-unit (1RU) form factor, upgradeable through software licenses, which provides long-term investment protection and scalability. At its foundation, the 4710 is a 1U rack chassis with a Cavium Nitrox Octeon accelerator card, which delivers four gigabit Ethernet ports that can be bundled together using the Cisco EtherChannel and connected to a switch. By default, the Cisco ACE 4710 supports virtualization with one administrator device and five user devices, 1-Gbps bandwidth, 1,000 Secure Sockets Layer (SSL) transactions per second, and 100 megabits per second (Mbps) of compression. The solution can be expanded without the need for new equipment, through the following software license upgrades:

  • Throughput   The default throughput of 1 Gbps can be increased to 2 or 4 Gbps.

  • Virtual devices   The number of virtual devices can be increased from 5 to 20 virtual devices.

  • SSL transactions per second   The SSL transactions per second value can be increased from 1,000 to 5,000 or 7,500.

  • Compression   Compression can be increased to 500 Mbps or 1 or 2 Gbps of throughput.

  • Role-based access control   Centralized role-based management is provided via the Application Network Manager GUI or command line interface (CLI).

  • High availability   There is support for redundant configurations (intra-appliance and inter-context).

The Cisco ACE module for Cisco Catalyst 6500 Series Switches or Cisco 7600 Series Routers provides up to 16 Gbps throughput in a one slot module form factor, which like the ACE 4710 appliance, are upgradable though software licenses. Up to four Cisco ACE modules can be installed in a single Cisco Catalyst 6500 Series Switch or Cisco 7600 Series Router. Each can support the business processes of multiple, independent business units taking advantage of the wide range of connectivity options available from the switch or router. The system administrator determines the application requirements and assigns the appropriate network services as virtual contexts. Each context contains its own set of policies, interfaces, resources, and administrators:

  • Throughput   Load balancing services provide up to 16 Gbps of throughput capacity and 345,000 Layer 4 connections per second.

  • Virtual devices   The number of virtual devices can be increased from 5 to 250.

  • SSL transactions per second   The SSL transactions per second can be increased to 15,000 SSL sessions through licensing on ACE20 modules and to 30,000 on ACE30 modules.

  • Compression   Compression can be increased to 6 Gbps on ACE30 modules.

  • Role-based access control   Centralized role-based management is provided via the Application Network Manager GUI or CLI.

  • High availability   There is support for redundant configurations (intra-chassis, inter-chassis, and inter-context).

The Cisco ACE 4710 appliance is selected because it provides maximized application availability, comprehensive application security, virtualized architecture, and investment value and protection:

  • Maximized application availability   The Cisco ACE 4710 helps ensure business continuity and service to end users by enhancing availability through highly scalable Layer 4 load balancing and Layer 7 content switching, which also minimizes the effects of application or device failure.

  • Comprehensive application security   The Cisco ACE 4710 acts as a last line of server defense, providing protection against application threats and denial of service attacks (DoS) with features such as deep packet inspection, network and protocol security, and highly scalable access control capabilities.

  • Virtualized architecture   Virtualized architecture is a primary design element of Cisco ACE and a unique selling proposition in contrast to other solutions in the marketplace. IT managers can configure up to 20 virtual devices on a single Cisco ACE 4710 appliance. The benefit is fewer devices to manage as application deployments increase, significantly lower power and cooling expenses, and faster time-to-service for new applications.

Return to top

A well designed storage solution is a critical aspect of a successful Exchange 2010 Mailbox server role deployment. For more information, see Mailbox Server Storage Design.

The following table summarizes the storage requirements that have been calculated or determined in a previous design step.

Summary of disk space requirements

Disk space requirements Value

Mailbox size on disk for a 2 GB mailbox (MB)

2301

Total database capacity required (GB)

120128

Total log capacity required (GB)

3974

Total capacity required (GB)

124102

Total capacity required for three database copies (GB)

372306

Total capacity required for three database copies (terabytes)

364

Total capacity required per site (terabytes)

122

Many customers want to significantly increase their mailbox quotas as they move to Exchange 2010. However, it may take some time for mailbox sizes to grow from several hundred megabytes to several gigabytes in size. In this case, it may be beneficial for some organizations to try and defer additional storage purchases to some point in the future when disk storage space is likely to be less expensive.

Many storage vendors offer some type of thin provisioning solution so that you can present more storage capacity to the Exchange server than is physically available, and then dynamically add physical storage to meet increasing demand without disruption or downtime. This lowers TCO by reducing the initial allocation of storage capacity and simplifies management by reducing the steps required to support growth.

The EMC unified storage implementation of thin provisioning is provided by its virtual provisioning feature, which supports hot sparing, proactive sparing, thin pool expansion that isn't disruptive, and the ability to migrate between thin LUNs and traditional, thick LUNs without downtime. This flexibility separates EMC unified storage virtual provisioning from typical thin provisioning implementations.

*Design Decision Point*

The current Exchange implementation has a defined mailbox quota of 200 MB. After moving to Exchange 2010, it's estimated that mailbox sizes will grow approximately 300 percent in the first 12 to 18 months. The plan is to purchase sufficient storage to accommodate an average mailbox size of 600 MB. Over the life of the Exchange 2010 implementation, average mailbox size is expected to approach 2 GB. Because it's expensive to pay for 2 GB mailbox quotas, thin provisioning will be implemented, so that an initial mailbox quota of 600 MB can be deployed. The underlying physical storage will be expanded in subsequent budget cycles to meet the anticipated demand.

When leveraging thin provisioning on EMC unified storage for Exchange 2010 deployments, it's a best practice to separate log files from database files. If you anticipate growth in mailbox size but not in the message profile (messages sent/received per day), you will need to incrementally increase the database LUNs but not the log LUNs. It may not be beneficial to put the logs on thin provisioned LUNs.

Separating the database and log LUNs also allows the flexibility to put them on different disk types or use different levels of RAID.

*Design Decision Point*

Following an EMC best practice, databases and logs will be separated on different LUNs. Because the message profile is expected to remain fairly constant over the next three years, there is no benefit to putting logs on thin provisioned LUNs.

Because VSS-based backups and restores operate at the LUN level, the number of databases per LUN is usually determined by the backup strategy. In a previous step, it was decided to not include VSS-based backups as part of the database resiliency strategy. The decision on the number of databases per LUN will be based on other factors. As a best practice, you should generally deploy a single database per LUN. Having more than one database per LUN could result in the following:

  • An overloaded database impacting a healthy database

  • A seeding operation on one database impacting a healthy database

  • Passive database I/O impacting active databases

*Design Decision Point*

Because there are no requirements for deploying more than one database per LUN, the storage design will be based on a single database per LUN model.

In a previous step, you identified that each primary Mailbox server would support six active databases and six passive databases. There will be a total of 24 LUNs for each primary datacenter Mailbox server as outlined in the following table.

Number of LUNs required per Mailbox server

LUN types LUNs per server

Active database LUNs

6

Active log LUNs

6

Passive database LUNs

6

Passive log LUNs

6

Total LUNS

24

In a previous step, you identified that each secondary Mailbox server would support six passive databases. There will be a total of 12 LUNs for each secondary datacenter Mailbox server as outlined in the following table.

Number of LUNs required per Mailbox server

LUN types LUNs per server

Passive database LUNs

6

Passive log LUNs

6

Total LUNS

12

To simplify the remainder of the storage design steps, use a building block approach. In this solution, each database supports 450 active mailboxes. Each Mailbox server supports 6 databases or 2,700 active mailboxes on 6 database LUNs and 6 log LUNs. A 12 LUN building block supporting increments of 2,700 mailboxes will be used.

In this step, calculate the transactional IOPS required to support the 2,700 active mailbox users in the building block. In a subsequent step, you will use the IOPS requirements to determine the minimum and maximum number of spindles to deploy for the building block based on the initial and fully provisioned mailbox quota. Use the following calculation:

  • The total transactional required IOPS for building block = IOPS per mailbox user × number of mailboxes × I/O overhead factor

    = 0.10 × 2700 × 20%

    = 324 IOPS

In a previous step, you calculated the mailbox size on disk for a 2,048-MB mailbox quota limit to be 2,301 MB. Because thin provisioning will be used, calculate the initial mailbox size on disk. This value will be used in later steps to determine the initial capacity requirements.

The following calculations are used to determine the initial mailbox size on disk for this solution based on a 600-MB mailbox quota:

  • Whitespace = 100 messages per day × 75 ÷ 1024 MB = 7.3 MB

  • Dumpster = (100 messages per day × 75 ÷ 1024 MB × 14 days) + (600 MB × 0.012) + (600 MB × 0.058) = 144.2 MB

  • Mailbox size on disk = mailbox limit + whitespace + dumpster

    = 600 MB + 7.3 MB + 144.2 MB

    = 752 MB

To determine the initial storage capacity required for 2,700 mailboxes with an initial mailbox quota of 600 MB, use the following calculations:

  • Database files capacity = (number of mailboxes × mailbox size on disk × database overhead growth factor) + (20% data overhead)

    = (2700 × 752 × 1) + (406080)

    = 2436480 MB

    = 2379 GB

  • Database catalog capacity = 10% of database files capacity

    = 238 GB

  • Total database capacity = (database files size) + (index size) ÷ 0.80 to provide 20% volume free space

    = (2379 + 238) ÷ 0.8

    = 3271 GB

The six databases in the building block require 3,271 GB of initial storage capacity.

To determine the fully provisioned storage capacity required for 2,700 mailboxes with a mailbox quota of 2,048 MB, use the following calculations:

  • Database files capacity = (number of mailboxes × mailbox size on disk × database overhead growth factor) + (20% data overhead)

    = (2700 × 2301 × 1) + (1242540)

    = 7455240 MB

    = 7281 GB

  • Database catalog capacity = 10% of database files capacity

    = 728 GB

  • Total database capacity = (database files size) + (index size) ÷ 0.80 to provide 20% volume free space

    = (7281 + 728) ÷ 0.8

    = 10011 GB

The six databases in the building block require 10,011 GB of fully provisioned storage capacity.

To determine the log storage capacity required for the 2,700 mailboxes in the building block, use the following calculations:

  • Building block log capacity required = number of mailbox users × number of logs per mailbox per day × log size × (number of days required to replace failed infrastructure) + (mailbox move percent overhead)

    = (2700 × 20 × 1024 × 4) + (2700 × 0.01 × 2048)

    = 216054 MB

    = 211 GB

  • Total log capacity = log capacity ÷ 0.80 to give 20% volume free space

    = 211 ÷ 0.80

    = 264

The six sets of logs in the building block require 264 GB of storage capacity.

noteNote:
Because the log volumes aren't thin provisioned, the calculated storage capacity represents the log capacity requirements of a fully provisioned environment.

In this step, determine the number of spindles required to support the IOPS requirements. In the next step, you will determine the spindle count that meets the capacity requirements.

In a previous step, it was determined that the IOPS required to support the 2,700 mailbox building block was 324. In this step, calculate the number of disks required to meet the IOPS requirements, using the following calculation:

  • Disk count = (user IOPS × read ratio) + write penalty × (user IOPS × write ratio) ÷ IOPS capability of disk type chosen

    = (324 × 0.6) + 4 × (324 × 0.4) ÷ 155

    = 4.6

The IOPS requirements can be met by five disks in a RAID-5 configuration.

noteNote:
These calculations are specific to this EMC solution. You should consult your storage vendor for guidance about spindle requirements for your chosen storage solution.

In a previous step, you determined that the 2,700 mailbox building block for an initially provisioned mailbox of 600 MB required a storage capacity of 3,271 GB. The useable capacity per a 450-GB spindle in a RAID-5 configuration on the CX4 model 480 is approximately 402 GB. To determine the number of disks required, use the following calculation:

  • Disk count = (total capacity required) ÷ (useable capacity per spindle with RAID-5)

    = 3271 GB ÷ 402 GB

    = 8.1

The initial database capacity requirements can be met with nine disks.

EMC best practices for deploying storage on EMC unified storage using thin provisioning are to configure RAID-5 thin pools in multiples of five disks. Allocate 10 disks for one building block of 2,700 mailboxes, and there will be headroom for future growth.

In a previous step, you determined that the 2,700 mailbox building block for an initially provisioned mailbox of 2,048 MB required a storage capacity of 10,011 GB. The useable capacity per a 450-GB spindle in a RAID-5 configuration on the CX4 model 480 is approximately 402 GB. To determine the number of disks required, use the following calculation:

  • Disk count = (total capacity required) ÷ (useable capacity per spindle with RAID-5)

    = 10011 GB ÷ 402 GB

    = 24.9

The fully provisioned database capacity requirements can be met with 25 disks.

In a previous step, you determined that the 2,700 mailbox building block required a log storage capacity of 264 GB. Using two 450 GB drives in a RAID-1/0 configuration on a CX4-480 provides 402 GB of usable storage capacity. The proposed two disk configuration meets the log capacity requirements of the 2,700 mailbox building block.

Now that the number of spindles required to support the IOPS and capacity requirements of the building block have been determined, you need to determine the best way to provision LUNs on the array for that building block when using virtual or thin provisioning.

There are three main models that can be used when designing thin pools to be used for Exchange:

  • Single storage pool   One large storage pool for all Exchange database and logs is the simplest method and provides the best space utilization. However, a single thin pool isn't recommended when multiple copies of the same database are located on the same physical array.

  • One storage pool per server   A storage pool for each Exchange Mailbox server provides more granularity when laying out LUNs on the array. If designed properly, it will provide isolation of database copies to separate sets of spindles and can minimize any disk contention issues that can surface during activities such as seeding/reseeding, backup, and online maintenance (background database maintenance). However, depending on the number of Mailbox servers you have, this model may result in many thin pools, which can be more difficult to manage.

  • One storage pool per database copy   A storage pool for each database copy ensures that each copy is isolated on a different set of spindles on the array. Because most organizations are deploying between two and four database copies, the number of thin pools is kept to a manageable number. In this model, multiple Mailbox servers have database LUNs in the same thin pool. There is a chance that activities such as seeding/reseeding, backup, and online maintenance (background database maintenance) on one Mailbox server could impact performance on another Mailbox server.

*Design Decision Point*

Although the benefits of a one storage pool per server model are appealing, this would result in eight thin pools in each site, or a total of 24 thin pools. To keep things simple, the one storage pool per database copy model will be used, which will result in three thin pools in each site and guarantee that each database copy resides on a unique set of spindles. This will also ensure database copy spindle isolation is maintained during any events where additional storage must be added to accommodate growth.

The first thin pool will contain a 2,700 mailbox building block from each of the four primary datacenter Mailbox servers in the site. In a previous step, it was determined that 10 spindles were required to support the IOPS and capacity requirements of the building block. The first thin pool supporting 10,800 active mailboxes will require 40 spindles.

The second thin pool will also contain a 2,700 mailbox building block from each of the four primary datacenter Mailbox servers in the site. The second thin pool supporting 10,800 passive mailboxes will require 40 spindles.

The third thin pool will also contain a 2,700 mailbox building block from each of the four secondary datacenter Mailbox servers in the site (the servers from an alternate DAG that are supporting the site resilient database copies). The third thin pool supporting 10,800 passive mailboxes will require 40 spindles.

A total of 120 spindles per site are required to support the initial database capacity requirements.

The first thin pool will contain a 2,700 mailbox building block from each of the four primary datacenter Mailbox servers in the site. In a previous step, it was determined that 25 spindles were required to support the IOPS and fully provisioned capacity requirements of the building block. The first thin pool supporting 10,800 active mailboxes will require 100 spindles.

The second thin pool will also contain a 2,700 mailbox building block from each of the four primary datacenter Mailbox servers in the site. The second thin pool supporting 10,800 passive mailboxes will require 100 spindles.

The third thin pool will also contain a 2,700 mailbox building block from each of the four secondary datacenter Mailbox servers in the site (the servers from an alternate DAG that are supporting the site resilient database copies). The third thin pool supporting 10,800 passive mailboxes will require 100 spindles.

A total of 300 spindles per site are required to support the fully provisioned database capacity requirements.

In a previous step, it was determined that each 2,700 mailbox building block required two spindles to support log LUN requirements.

There are four building blocks supporting active mailbox databases on the primary datacenter Mailbox servers. The log LUN supporting 10,800 active mailboxes will require eight spindles.

There are four building blocks supporting passive mailbox databases on the primary datacenter Mailbox servers. The log LUN supporting 10,800 passive mailboxes will require eight spindles.

There are four building blocks supporting passive mailbox databases on the secondary datacenter Mailbox servers. The log LUN supporting 10,800 passive mailboxes will require eight spindles.

To support the log LUN requirements in a single site, 24 spindles are required.

In this step, verify that the total spindle count required can be supported by the storage array that was chosen, using the following calculation:

  • Total spindles required per site = spindles required for database LUNs + spindles required for log LUNs

    = 120 + 24

    = 144

A CX4-480 with 10 disk array enclosures has 150 spindles and meets the requirements.

In this step, calculate the total number of spindles required to support the fully provisioned environment, using the following calculation:

  • Total spindles required per site = spindles required for database LUNs + spindles required for log LUNs

    = 300 + 24

    = 324

A CX4-480 with 22 disk array enclosures has 330 spindles and meets the requirements.

Return to top

The previous section provided information about the design decisions that were made when considering an Exchange 2010 solution. The following section provides an overview of the solution.

Return to top

This solution consists of a total of 36 Exchange 2010 servers deployed in a multisite topology. Twelve of the 36 servers are running both the Client Access and Hub Transport server roles. The other 24 servers are running the Mailbox server role. There is a Client Access server array with four Client Access and Hub Transport combination servers in each site. There are three DAGs, each with eight Mailbox servers. File servers in each site host the primary and alternate file share witness servers for each DAG.

Diagram of logical solution

tbd

Return to top

Each of the three sites contain four Cisco B200 blade servers connected to an EMC CLARiiON CX4 model 480 storage array via redundant Cisco Fabric Interconnect 6120 and Cisco MDS 9134 switches. Redundant Cisco Nexus 5010 Ethernet switches provide the underlying network infrastructure. Client traffic is load balanced across the Client Access server array in each site via redundant Cisco ACE 4710 load balance devices.

Diagram of physical solution

tbd

Return to top

The following table summarizes the physical server hardware used in this solution.

Cisco Unified Computing System summary

Item Description

Blade server

4 × B200 M1

Processors

2 × Intel Zeon x5570 (2.93 GHz)

Memory

96 GB RAM (12 × 8 GB DIMM)

Converged network adapter

M71KR-Q (2 × 10 gigabit Ethernet and 2 × 4 Gbps Fibre Channel)

Internal blade storage

2 × 146 GB SAS 10,000 RPM disk (RAID-1)

Chassis

5108 (6RU)

Fabric extender

2 × 2104XP

Fabric interconnect

2 × 6120XP

Fabric interconnect expansion module

2 × 8-port 4 Gbps Fibre Channel

The following table summarizes the storage and network hardware used in this solution.

LAN and SAN switches

Item Description

10 gigabit Ethernet (GbE) switch

2 × Nexus 5010 (8 fixed 1 GbE/10 GbE ports, 12 fixed 10 GbE ports, datacenter bridging)

Fibre Channel switch

2 × MDS 9134 (32 fixed 4 Gbps ports)

The following table provides information about software used in this solution.

Software summary for the solution

Item Description

Hypervisor host servers

Windows Server 2008 R2 Hyper-V Enterprise

Exchange Server VMs

Windows Server 2008 R2 Enterprise

Exchange Server 2010 Mailbox server role

Enterprise Edition RU2

Exchange Server 2010 Hub Transport and Client Access server role

Standard Edition RU2

Multiple path and I/O balancing

EMC PowerPath

Return to top

The following table summarizes the Client Access and Hub Transport combination server configuration used in this solution.

Client Access and Hub Transport server configuration

Component Value or description

Physical or virtual

Hyper-V VM

Virtual processors

3

Memory

8 GB

Storage

Virtual hard disk on root server operating system volume

Operating system

Windows Server 2008 R2

Exchange version

Exchange Server 2010 Standard Edition

Exchange update level

Exchange 2010 Update Rollup 2

Third-party software

None

Return to top

The following table summarizes the primary Mailbox server (hosting the primary and secondary database copies in the primary site for the DAG) configuration used in this solution.

Primary Mailbox server configuration

Component Value or description

Physical or virtual

Hyper-V VM

Virtual processors

4

Memory

53 GB

Storage

Virtual hard disk on root server operating system volume

   

Operating system

Windows Server 2008 R2

Exchange version

Exchange Server 2010 Enterprise Edition

Exchange update level

Exchange 2010 Update Rollup 2

Third-party software

None

The following table summarizes the secondary Mailbox server (hosting the tertiary database copy in the secondary site for the DAG) configuration used in this solution.

Secondary Mailbox server configuration

Component Value or description

Physical or virtual

Hyper-V VM

Virtual processors

2

Memory

24 GB

Storage

Virtual hard disk on root server operating system volume

Operating system

Windows Server 2008 R2

Exchange version

Exchange Server 2010 Enterprise Edition

Exchange update level

Exchange 2010 Update Rollup 2

Third-party software

None

Return to top

The following diagrams summarize the database copy layout used in this solution during normal operating conditions.

Database copy layout: 1

tbd

Database copy layout: 2

tbd

Database copy layout: 3

tbd

Return to top

The following table provides information about storage hardware used in this solution.

EMC unified storage NS-480 (integrated CLARiiON CX4-480)

Item Description

Storage

3 CLARiiON CX4-480 (1 per site)

Storage connectivity (Fibre Channel, SAS, SATA, iSCSI)

Fibre Channel

Storage cache

32 GB (600 MB read cache and 10,160 MB write cache per storage port)

Number of storage controllers

2 per storage frame

Number of storage ports available or used

8 (4 per storage port) available per storage frame, 4 used (2 per storage port)

Maximum bandwidth of storage connectivity to host

8 × 4 Gbps

Total number of disks tested in solution

432 (360 for databases and 72 for logs across 3 sites)

Maximum number of spindles that can be hosted in the storage

480 in a single storage array

Return to top

Each of the CX4 model 480 storage arrays used in the solution were configured as illustrated in the following table.

Storage configuration

Component Value or description

Total storage enclosures

3

Total storage enclosures per site

1

Total disks per enclosure

150

Total storage pools per enclosure

3

Total disks per storage pool (initial)

40

Total disks per database LUN (initial)

10

Total disks per log LUN

2

Total disks used per enclosure

144

LUN size for database (initial)

4020 GB

LUN size for logs

402 GB

RAID level for databases

5

RAID level for logs

1/0

The following table illustrates how the available storage was designed and allocated between the three CX4 model 480 storage systems.

Storage configuration between CX4 model 480 storage systems

Datacenter DAG Database Array1 Array2 Array3

1

1

DB1-24

C1, C2

C3

2

2

DB25-48

C3

C1, C2

3

3

DB49-72

C3

C1, C2

Return to top

Prior to deploying an Exchange solution in a production environment, validate that the solution was designed, sized, and configured properly. This validation must include functional testing to ensure that the system is operating as desired as well as performance testing to ensure that the system can handle the desired user load. This section describes the approach and test methodology used to validate server and storage design for this solution. In particular, the following tests will be defined in detail:

  • Performance tests

    • Storage performance validation (Jetstress)

    • Server performance validation (Loadgen)

  • Functional tests

    • Database switchover validation

    • Server switchover validation

    • Server failover validation

    • Datacenter switchover validation

Return to top

The level of performance and reliability of the storage subsystem connected to the Exchange Mailbox server role has a significant impact on the overall health of the Exchange deployment. Additionally, poor storage performance will result in high transaction latency, primarily reflected in poor client experience when accessing the Exchange system. To ensure the best possible client experience, validate storage sizing and configuration via the method described in this section.

For validating Exchange storage sizing and configuration, we recommend the Microsoft Exchange Server Jetstress tool. The Jetstress tool is designed to simulate an Exchange I/O workload at the database level by interacting directly with the ESE, which is also known as Jet. The ESE is the database technology that Exchange uses to store messaging data on the Mailbox server role. Jetstress can be configured to test the maximum I/O throughput available to your storage subsystem within the required performance constraints of Exchange. Or, Jetstress can accept a target profile of user count and per-user IOPS, and validate that the storage subsystem is capable of maintaining an acceptable level of performance with the target profile. Test duration is adjustable and can be run for a minimal period of time to validate adequate performance or for an extended period of time to additionally validate storage subsystem reliability.

The Jetstress tool can be obtained from the Microsoft Download Center at the following locations:

The documentation included with the Jetstress installer describes how to configure and execute a Jetstress validation test on your server hardware.

There are two main types of storage configurations:

  • DAS or internal disk scenarios

  • SAN scenarios

With DAS or internal disk scenarios, there's only one server accessing the disk subsystem, so the performance capabilities of the storage subsystem can be validated in isolation.

In SAN scenarios, the storage utilized by the solution may be shared by many servers and the infrastructure that connects the servers to the storage may also be a shared dependency. This requires additional testing, because the impact of other servers on the shared infrastructure must be adequately simulated to validate performance and functionality.

The following storage validation test cases were executed against the solution and should be considered as a starting point for storage validation. Specific deployments may have other validation requirements that can be met with additional testing, so this list isn't intended to be exhaustive:

  • Validation of worst case database switchover scenario   In this test case, the level of I/O is expected to be serviced by the storage subsystem in a worst case switchover scenario (largest possible number of active copies on fewest servers). Depending on whether the storage subsystem is DAS or SAN, this test may be required to run on multiple hosts to ensure that the end-to-end solution load on the storage subsystem can be sustained.

  • Validation of storage performance under storage failure and recovery scenario (for example, failed disk replacement and rebuild)   In this test case, the performance of the storage subsystem during a failure and rebuild scenario is evaluated to ensure that the necessary level of performance is maintained for optimal Exchange client experience. The same caveat applies for a DAS vs. SAN deployment: If multiple hosts are dependent on a shared storage subsystem, the test must include load from these hosts to simulate the entire effect of the failure and rebuild.

The Jetstress tool produces a report file after each test is completed. To help you analyze the report, use the guidelines in Reading Jetstress 2010 Test Reports.

Specifically, you should use the guidelines in the following table when you examine data in the Test Results table of the report.

Jetstress results analysis

Performance counter instance Guidelines for performance test

I/O Database Reads Average Latency (msec)

Average value should be less than 20 milliseconds (msec) (0.020 seconds), and the maximum values should be less than 50 msec.

I/O Log Writes Average Latency (msec)

Log disk writes are sequential, so average write latencies should be less than 10 msec, with a maximum of no more than 50 msec.

%Processor Time

Average should be less than 80%, and the maximum should be less than 90%.

Transition Pages Repurposed/sec (Windows Server 2003, Windows Server 2008, Windows Server 2008 R2)

Average should be less than 100.

The report file shows various categories of I/O performed by the Exchange system:

  • Transactional I/O Performance   This table reports I/O that represents user activity against the database (for example, Outlook generated I/O). This data is generated by subtracting background maintenance I/O and log replication I/O from the total I/O measured during the test. This data provides the actual database IOPS generated along with I/O latency measurements required to determine whether a Jetstress performance test passed or failed.

  • Background Database Maintenance I/O Performance   This table reports the I/O generated due to ongoing ESE database background maintenance.

  • Log Replication I/O Performance   This table reports the I/O generated from simulated log replication.

  • Total I/O Performance   This table reports the total I/O generated during the Jetstress test.

Return to top

After the performance and reliability of the storage subsystem is validated, ensure that all of the components in the messaging system are validated together for functionality, performance, and scalability. This means moving up in the stack to validate client software interaction with the Exchange product as well as any server-side products that interact with Exchange. To ensure that the end-to-end client experience is acceptable and that the entire solution can sustain the desired user load, the method described in this section can be applied for server design validation.

For validation of end-to-end solution performance and scalability, we recommend the Microsoft Exchange Server Load Generator tool (Loadgen). Loadgen is designed to produce a simulated client workload against an Exchange deployment. This workload can be used to evaluate the performance of the Exchange system, and can also be used to evaluate the effect of various configuration changes on the overall solution while the system is under load. Loadgen is capable of simulating Microsoft Office Outlook 2007 (online and cached), Office Outlook 2003 (online and cached), POP3, IMAP4, SMTP, ActiveSync, and Outlook Web App (known in Exchange 2007 and earlier versions as Outlook Web Access) client activity. It can be used to generate a single protocol workload, or these client protocols can be combined to generate a multiple protocol workload.

You can get the Loadgen tool from the Microsoft Download Center at the following locations:

The documentation included with the Loadgen installer describes how to configure and execute a Loadgen test against an Exchange deployment.

When validating your server design, test the worst case scenario under anticipated peak workload. Based on a number of data sets from Microsoft IT and other customers, peak load is generally equal to two times the average workload throughout the remainder of the work day. This is referred to as the peak-to-average workload ratio.

Performance Monitor

Screen shot of Performance Monitor

In this Performance Monitor snapshot, which displays various counters that represent the amount of Exchange work being performed over time on a production Mailbox server, the average value for RPC operations per second (the highlighted line) is about 2,386 when averaged across the entire day. The average for this counter during the peak period from 10:00 through 11:00 is about 4,971, giving a peak-to-average ratio of 2.08.

To ensure that the Exchange solution is capable of sustaining the workload generated during the peak average, modify Loadgen settings to generate a constant amount of load at the peak average level, rather than spreading out the workload over the entire simulated work day. Loadgen task-based simulation modules (like the Outlook simulation modules) utilize a task profile that defines the number of times each task will occur for an average user within a simulated day.

The total number of tasks that need to run during a simulated day is calculated as the number of users multiplied by the sum of task counts in the configured task profile. Loadgen then determines the rate at which it should run tasks for the configured set of users by dividing the total number of tasks to run in the simulated day by the simulated day length. For example, if Loadgen needs to run 1,000,000 tasks in a simulated day, and a simulated day is equal to 8 hours (28,800 seconds), Loadgen must run 1,000,000 ÷ 28,800 = 34.72 tasks per second to meet the required workload definition. To increase the amount of load to the desired peak average, divide the default simulated day length (8 hours) by the peak-to-average ratio (2) and use this as the new simulated day length.

Using the task rate example again, 1,000,000 ÷ 14,400 = 69.44 tasks per second. This reduces the simulated day length by half, which results in doubling the actual workload run against the server and achieving our goal of a peak average workload. You don't adjust the run length duration of the test in the Loadgen configuration. The run length duration specifies the duration of the test and doesn't affect the rate at which tasks will be run against the Exchange server.

The following server design validation test cases were executed against the solution and should be considered as a starting point for server design validation. Specific deployments may have other validation requirements that can be met with additional testing, so this list isn't intended to be exhaustive:

  • Normal operating conditions   In this test case, the basic design of the solution is validated with all components in their normal operating state (no failures simulated). The desired workload is generated against the solution, and the overall performance of the solution is validated against the metrics that follow.

  • Single server failure or single server maintenance (in site)   In this test case, a single server is taken down to simulate either an unexpected failure of the server or a planned maintenance operation for the server. The workload that would normally be handled by the unavailable server is now handled by other servers in the solution topology, and the overall performance of the solution is validated.

Exchange performance data has some natural variation within test runs and among test runs. We recommend that you take the average of multiple runs to smooth out this variation. For Exchange tested solutions, a minimum of three separate test runs with durations of eight hours was completed. Performance data was collected for the full eight-hour duration of the test. Performance summary data was taken from a three to four hour stable period (excluding the first two hours of the test and the last hour of the test). For each Exchange server role, performance summary data was averaged between servers for each test run, providing a single average value for each data point. The values for each run were then averaged, providing a single data point for all servers of a like server role across all test runs.

Before you look at any performance counters or start your performance validation analysis, verify that the workload you expected to run matched the workload that you actually ran. Although there are many ways to determine whether the simulated workload matched the expected workload, the easiest and most consistent way is to look at the message delivery rate.

Every message profile consists of the sum of the average number of messages sent per day and the average number of messages received per day. To calculate the message delivery rate, select the average number of messages received per day from the following table.

Peak message delivery rate

Message profile Messages sent per day Messages received per day

50

10

40

100

20

80

150

30

120

200

40

160

The following example assumes that each Mailbox server has 5,000 active mailboxes with a 150 messages per day profile (30 messages sent and 120 messages received per day).

Peak message delivery rate for 5,000 active mailboxes

Description Calculation Value

Message profile

Number of messages received per day

120

Number of active mailboxes per Mailbox server

Not applicable

5000

Total messages received per day per Mailbox server

5000 × 120

600000

Total messages received per second per Mailbox server

600000 ÷ 28800

20.83

Total messages adjusted for peak load

20.83 × 2

41.67

You expect 41.67 messages per second delivered on each Mailbox server running 5,000 active mailboxes with a message profile of 150 messages per day during peak load.

The actual message delivery rate can be measured using the following counter on each Mailbox server: MSExchangeIS Mailbox(_Total)\Messages Delivered/sec. If the measured message delivery rate is within one or two messages per second of the target message delivery rate, you can be confident that the desired load profile was run successfully.

This section describes the Performance Monitor counters and thresholds used to determine whether the Exchange environment was sized properly and is able to run in a healthy state during extended periods of peak workload. For more information about counters relevant to Exchange performance, see Performance and Scalability Counters and Thresholds.

To validate the performance and health criteria of a Hyper-V root server and the applications running within VMs, you should have a basic understanding of the Hyper-V architecture and how that impacts performance monitoring.

Hyper-V has three main components: the virtualization stack, the hypervisor, and devices. The virtualization stack handles emulated devices, manages VMs, and services I/O. The hypervisor schedules virtual processors, manages interrupts, services timers, and controls other chip-level functions. The hypervisor doesn't handle devices or I/O (for example, there are no hypervisor drivers). The devices are part of the root server or installed in guest servers as part of integration services. Because the root server has a full view of the system and controls the VMs, it also provides monitoring information via Windows Management Instrumentation (WMI) and performance counters.

Processor

When validating physical processor utilization on the root server (or within the guest VM), the standard Processor\% Processor Time counter isn't very useful.

Instead, you can examine the Hyper-V Hypervisor Logical Processor\% Total Run Time counter. This counter shows the percentage of processor time spent in guest and hypervisor runtime and should be used to measure the total processor utilization for the hypervisor and all VMs running on the root server. This counter shouldn't exceed 80 percent or whatever the maximum utilization target you have designed for.

 

Counter Target

Hyper-V Hypervisor Logical Processor\% Total Run Time

<80%

If you're interested in what percentage of processor time is spent servicing the guest VMs, you can examine the Hyper-V Hypervisor Logical Processor\% Guest Run Time counter. If you're interested in what percentage of processor time is spent in hypervisor, you can look at the Hyper-V Hypervisor Logical Processor\% Hypervisor Run Time counter. This counter should be below 5 percent. The Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time counter shows the percentage of processor time spent in the virtualization stack. This counter should also be below 5 percent. These two counters can be used to determine what percentage of your available physical processor time is being used to support virtualization.

 

Counter Target

Hyper-V Hypervisor Logical Processor\% Guest Run Time

<80%

Hyper-V Hypervisor Logical Processor\% Hypervisor Run Time

<5%

Hyper-V Hypervisor Root Virtual Processor\% Guest Run Time

<5%

Memory

You need to ensure that your Hyper-V root server has enough memory to support the memory allocated to VMs. Hyper-V automatically reserves 512 MB (this may vary with different Hyper-V releases) for the root operating system. If you don't have enough memory, Hyper-V will prevent the last VM from starting. In general, don't worry about validating the memory on a Hyper-V root server. Be more concerned with ensuring that sufficient memory is allocated to the VMs to support the Exchange roles.

Application Health

An easy way to determine whether all the VMs are in a healthy state is to look at the Hyper-V Virtual Machine Health Summary counters.

 

Counter Target

Hyper-V Virtual Machine Health Summary\Health OK

1

Hyper-V Virtual Machine Health Summary\Health Critical

0

Mailbox Servers

When validating whether a Mailbox server was properly sized, focus on processor, memory, storage, and Exchange application health. This section describes the approach to validating each of these components.

Processor

During the design process, you calculated the adjusted megacycle capacity of the server or processor platform. You then determined the maximum number of active mailboxes that could be supported by the server without exceeding 80 percent of the available megacycle capacity. You also determined what the projected CPU utilization should be during normal operating conditions and during various server maintenance or failure scenarios.

During the validation process, verify that the worst case scenario workload doesn't exceed 80 percent of the available megacycles. Also, verify that actual CPU utilization is close to the expected CPU utilization during normal operating conditions and during various server maintenance or failure scenarios.

For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter and verify that this counter is less than 80 percent on average.

 

Counter Target

Processor(_Total)\% Processor Time

<80%

For virtual Exchange deployments, the Processor(_Total)\% Processor Time counter is measured within the VM. In this case, the counter isn't measuring the physical CPU utilization. It's measuring the utilization of the virtual CPU provided by the hypervisor. Therefore, it doesn't provide an accurate reading of the physical processor and shouldn't be used for design validation purposes. For more information, see Hyper-V: Clocks lie... which performance counters can you trust.

For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides a more accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.

 

Counter Target

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

Memory

During the design process, you calculated the amount of database cache required to support the maximum number of active databases on each Mailbox server. You then determined the optimal physical memory configuration to support the database cache and system memory requirements.

Validating whether an Exchange Mailbox server has sufficient memory to support the target workload isn't a simple task. Using available memory counters to view how much physical memory is remaining isn't helpful because the memory manager in Exchange is designed to use almost all of the available physical memory. The information store (store.exe) reserves a large portion of physical memory for database cache. The database cache is used to store database pages in memory. When a page is accessed in memory, the information doesn't have to be retrieved from disk, reducing read I/O. The database cache is also used to optimize write I/O.

When a database page is modified (known as a dirty page), the page stays in cache for a period of time. The longer it stays in cache, the better the chance that the page will be modified multiple times before those changes are written to the disk. Keeping dirty pages in cache also causes multiple pages to be written to the disk in the same operation (known as write coalescing). Exchange uses as much of the available memory in the system as possible, which is why there aren't large amounts of available memory on an Exchange Mailbox server.

It may not be easy to know whether the memory configuration on your Exchange Mailbox server is undersized. For the most part, the Mailbox server will still function, but your I/O profile may be much higher than expected. Higher I/O can lead to higher disk read and write latencies, which may impact application health and client user experience. In the results section, there isn't any reference to memory counters. Potential memory issues will be identified in the storage validation and application health result sections, where memory-related issues are more easily detected.

Storage

If you have performance issues with your Exchange Mailbox server, those issues may be storage-related issues. Storage issues may be caused by having an insufficient number of disks to support the target I/O requirements, having overloaded or poorly designed storage connectivity infrastructure, or by factors that change the target I/O profile like insufficient memory, as discussed previously.

The first step in storage validation is to verify that the database latencies are below the target thresholds. In previous releases, logical disk counters determined disk read and write latency. In Exchange 2010, the Exchange Mailbox server that you are monitoring is likely to have a mix of active and passive mailbox database copies. The I/O characteristics of active and passive database copies are different. Because the size of the I/O is much larger on passive copies, there are typically much higher latencies on passive copies. Latency targets for passive databases are 200 msec, which is 10 times higher than targets on active database copies. This isn't much of a concern because high latencies on passive databases have no impact on client experience. But if you are using the traditional logical disk counters to measure latencies, you must review the individual volumes and separate volumes containing active and passive databases. Instead, we recommend that you use the new MSExchange Database counters in Exchange 2010.

When validating latencies on Exchange 2010 Mailbox servers, we recommend you use the counters in the following table for active databases.

 

Counter Target

MSExchange Database\I/O Database Reads (Attached) Average Latency

<20 msec

MSExchange Database\I/O Database Writes (Attached) Average Latency

<20 msec

MSExchange Database\IO Log Writes Average Latency

<1 msec

We recommend that you use the counters in the following table for passive databases

 

Counter Target

MSExchange Database\I/O Database Reads (Recovery) Average Latency

<200 msec

MSExchange Database\I/O Database Writes (Recovery) Average Latency

<200 msec

MSExchange Database\IO Log Read Average Latency

<200 msec

noteNote:
To view these counters in Performance Monitor, you must enable the advanced database counters. For more information, see How to Enable Extended ESE Performance Counters.

When you're validating disk latencies for Exchange deployments running on Microsoft Hyper-V, be aware that the I/O Database Average Latency counters (as with many time-based counters) may not be accurate because the concept of time within the VM is different than on the physical server. The following example shows that the I/O Database Reads (Attached) Average Latency is 22.8 in the VM and 17.3 on a physical server for the same simulated workload. If the values of time-based counters are over the target thresholds, your server may be running correctly. Review all health criteria to make a decision regarding server health when your Mailbox server role is deployed within a VM.

Values of disk latency counters for virtual and physical Mailbox servers

Counter Virtual Mailbox server Physical Mailbox server

MSExchange Database/

I/O Database Reads (Attached) / Average Latency

22.792

17.250

I/O Database Reads (Attached) / sec

17.693

18.131

I/O Database Reads (Recovery) / Average Latency

34.215

27.758

I/O Database Writes (Recovery) / sec

10.829

  8.483

I/O Database Writes (Attached) / Average Latency

  0.944

  0.411

I/O Database Writes (Attached) / sec

10.184

10.963

MSExchangeIS

   

RPC Averaged Latency

   1.966

   1.695

RPC Operations / sec

334.371

341.139

RPC Packets / sec

180.656

183.360

MSExchangeIS Mailbox

Messages Delivered / sec

2.062

2.065

Messages Sent / sec

0.511

0.514

In addition to disk latencies, review the Database\Database Page Fault Stalls/sec counter. This counter indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. This counter should be 0 on a healthy server.

 

Counter Target

Database\Database Page Fault Stalls/sec

<1

Also, review the Database\Log Record Stalls/sec counter, which indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. This counter should average less than 10.

 

Counter Target

Database\Log Record Stalls/sec

<10

Exchange Application Health

Even if there are no obvious issues with processor, memory, and disk, we recommend that you monitor the standard application health counters to ensure that the Exchange Mailbox server is in a healthy state.

The MSExchangeIS\RPC Averaged Latency counter provides the best indication of whether other counters with high database latencies are actually impacting Exchange health and client experience. Often, high RPC averaged latencies are associated with a high number of RPC requests, which should be less than 70 at all times.

 

Counter Target

MSExchangeIS\RPC Averaged Latency

<10msec on average

MSExchangeIS\RPC Requests

<70 at all times

Next, make sure that the transport layer is healthy. Any issues in transport or issues downstream of transport affecting the transport layer can be detected with the MSExchangeIS Mailbox(_Total)\Messages Queued for Submission counter. This counter should be less than 50 at all times. There may be temporary increases in this counter, but the counter value shouldn't grow over time and shouldn't be sustained for more than 15 minutes.

 

Counter Target

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

<50 at all times

Next, ensure that maintenance of the database copies is in a healthy state. Any issues with log shipping or log replay can be identified using the MSExchange Replication(*)\CopyQueueLength and MSExchange Replication(*)\ReplayQueueLength counters. The copy queue length shows the number of transaction log files waiting to be copied to the passive copy log file folder and should be less than 1 at all times. The replay queue length shows the number of transaction log files waiting to be replayed into the passive copy and should be less than 5. Higher values don't impact client experience, but result in longer store mount times when a handoff, failover, or activation is performed.

 

Counter Target

MSExchange Replication(*)\CopyQueueLength

<1

MSExchange Replication(*)\ReplayQueueLength

<5

Client Access Servers

To determine whether a Client Access server is healthy, review processor, memory, and application health. For an extended list of important counters, see Client Access Server Counters.

Processor

For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter. This counter should be less than 80 percent on average.

 

Counter Target

Processor(_Total)\% Processor Time

<80%

For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.

 

Counter Target

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

Application Health

To determine whether the MAPI client experience is acceptable, use the MSExchange RpcClientAccess\RPC Averaged Latency counter. This counter should be below 250 msec. High latencies can be associated with a large number of RPC requests. The MSExchange RpcClientAccess\RPC Requests counter should be below 40 on average.

 

Counter Target

MSExchange RpcClientAccess\RPC Averaged Latency

<250 msec

MSExchange RpcClientAccess\RPC Requests

<40

Transport Servers

To determine whether a transport server is healthy, review processor, disk, and application health. For an extended list of important counters, see Transport Server Counters.

Processor

For physical Exchange deployments, use the Processor(_Total)\% Processor Time counter. This counter should be less than 80 percent on average.

 

Counter Target

Processor(_Total)\% Processor Time

<80%

For validating Exchange deployments running on Microsoft Hyper-V, use the Hyper-V Hypervisor Virtual Processor\% Guest Run Time counter. This provides an accurate value for the amount of physical CPU being utilized by the guest operating system. This counter should be less than 80 percent on average.

 

Counter Target

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

Disk

To determine whether disk performance is acceptable, use the Logical Disk(*)\Avg. Disk sec/Read and Write counters for the volumes containing the transport logs and database. Both of these counters should be less than 20 msec.

 

Counter Target

Logical Disk(*)\Avg. Disk sec/Read

<20 msec

Logical Disk(*)\Avg. Disk sec/Write

<20 msec

Application Health

To determine whether a Hub Transport server is sized properly and running in a healthy state, examine the MSExchangeTransport Queues counters outlined in the following table. All of these queues will have messages at various times. You want to ensure that the queue length isn't sustained and growing over a period of time. If larger queue lengths occur, this could indicate an overloaded Hub Transport server. Or, there may be network issues or an overloaded Mailbox server that's unable to receive new messages. You will need to check other components of the Exchange environment to verify.

 

Counter Target

MSExchangeTransport Queues(_total)\Aggregate Delivery

<3000

MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

<250

MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

<250

MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

<100

MSExchangeTransport Queues(_total)\Submission Queue Length

<100

Return to top

You can use the information in the following sections for functional validation tests.

Return to top

A database switchover is the process by which an individual active database is switched over to another database copy (a passive copy), and that database copy is made the new active database copy. Database switchovers can happen both within and across datacenters. A database switchover can be performed by using the Exchange Management Console (EMC) or the Exchange Management Shell.

To validate that a passive copy of a database can be successfully activated on another server, run the following command.

Move-ActiveMailboxDatabase <DatabaseName> -ActivateOnServer <TargetServer>

Success criteria: The active mailbox database is mounted on the specified target server. This result can be confirmed by running the following command.

Get-MailboxDatabaseCopyStatus <DatabaseName>

Return to top

A server switchover is the process by which all active databases on a DAG member are activated on one or more other DAG members. Like database switchovers, a server switchover can occur both within a datacenter and across datacenters, and it can be initiated by using both the EMC and the Shell.

  • To validate that all passive copies of databases on a server can be successfully activated on other servers hosting a passive copy, run the following command.

    Get-MailboxDatabase -Server <ActiveMailboxServer> | Move-ActiveMailboxDatabase -ActivateOnServer <TargetServer>
    

    Success criteria: The active mailbox databases are mounted on the specified target server. This can be confirmed by running the following command.

    Get-MailboxDatabaseCopyStatus <DatabaseName>
    
  • To validate that one copy of each of the active databases will be successfully activated on another Mailbox server hosting passive copies of the databases, shut down the server by performing the following action.

    Turn off the current active server.

    Success criteria: The active mailbox databases are mounted on another Mailbox server in the DAG. This can be confirmed by running the following command.

    Get-MailboxDatabaseCopyStatus <DatabaseName>
    

Return to top

A server failover occurs when the DAG member can no longer service the MAPI network, or when the Cluster service on a DAG member can no longer contact the remaining DAG members.

To validate that one copy of each of the active databases will be successfully activated on another Mailbox server hosting passive copies of the databases, turn off the server by performing one of the following actions:

  • Press and hold the power button on the server until the server turns off.

  • Pull the power cables from the server, which results in the server turning off.

Success criteria: The active mailbox databases are mounted on another Mailbox server in the DAG. This can be confirmed by running the following command.

Get-MailboxDatabase -Server <MailboxServer> | Get-MailboxDatabaseCopyStatus

Return to top

A datacenter or site failure is managed differently from the types of failures that can cause a server or database failover. In a high availability configuration, automatic recovery is initiated by the system, and the failure typically leaves the messaging system in a fully functional state. By contrast, a datacenter failure is considered to be a disaster recovery event, and as such, recovery must be manually performed and completed for the client service to be restored and for the outage to end. The process you perform is called a datacenter switchover. As with many disaster recovery scenarios, prior planning and preparation for a datacenter switchover can simplify your recovery process and reduce the duration of your outage.

For more information, including detailed steps for performing a datacenter switchover, see Datacenter Switchovers.

There are four basic steps that you complete to perform a datacenter switchover, after making the initial decision to activate the second datacenter:

  1. Terminate a partially running (failed) datacenter.

  2. Validate and confirm the prerequisites for the second datacenter.

  3. Activate the Mailbox servers.

  4. Activate the Client Access servers.

The following sections describe the steps used to validate a datacenter switchover.

When the DAG is in DAC mode, the specific actions to terminate any surviving DAG members in the primary datacenter depend on the state of the failed datacenter. Perform one of the following:

  • If the Mailbox servers in the failed datacenter are still accessible (usually not the case), run the following command on each Mailbox server.

    Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename>
    
  • If the Mailbox servers in the failed datacenter are unavailable but Active Directory is operating in the primary datacenter, run the following command on a domain controller.

    Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename> -ConfigurationOnly
    
noteNote:
Failure to either turn off the Mailbox servers in the failed datacenter or to successfully perform the Stop-DatabaseAvailabilityGroup command against the servers will create the potential for split brain syndrome to occur across the two datacenters. You may need to individually turn off computers through power management devices to satisfy this requirement.

Success criteria: All Mailbox servers in the failed site are in a stopped state. You can verify this by running the following command from a server in the failed datacenter.

Get-DatabaseAvailabilityGroup | Format-List

The second datacenter must be updated to represent which primary datacenter servers are stopped. From a server in the secondary datacenter, run the following command.

Stop-DatabaseAvailabilityGroup -ActiveDirectorySite <insertsitename> -ConfigurationOnly

The purpose of this step is to inform the servers in the secondary datacenter about which Mailbox servers are available to use when restoring service.

Success criteria: All Mailbox servers in the failed datacenter are in a stopped state. To verify this, run the following command from a server in the secondary datacenter.

Get-DatabaseAvailabilityGroup | Format-List

Before activating the DAG members in the secondary datacenter, we recommend that you verify that the infrastructure services in the secondary datacenter are ready for messaging service activation.

When the DAG is in DAC mode, the steps to complete activation of the Mailbox servers in the second datacenter are as follows:

  1. Stop the cluster service on each DAG member in the secondary datacenter. You can use the Stop-Service cmdlet to stop the service (for example, Stop-Service ClusSvc), or use net stop clussvc from an elevated command prompt.

  2. To activate the Mailbox servers in the secondary datacenter, run the following command.

    Restore-DatabaseAvailabilityGroup -Identity <DAGname> -ActiveDirectorySite <insertsitename>
    

    If this command succeeds, the quorum criteria are shrunk to the servers in the secondary datacenter. If the number of servers in that datacenter is an even number, the DAG will switch to using the alternate witness server as identified by the setting on the DAG object.

  3. To activate the databases, run one of the following commands.

    Get-MailboxDatabase <insertcriteriatoselectDBs> | Move-ActiveMailboxDatabase -ActivateOnServer <DAGMemberinPrimarySite>
    

    or

    Move-ActiveMailboxDatabase -Server <DAGMemberInSecondarySite> -ActivateOnServer <DAGMemberinPrimarySite>
    
  4. Check the event logs and review all error and warning messages to ensure that the secondary site is healthy. Any indicated issues should be followed up and corrected prior to mounting the databases.

  5. To mount the databases, run the following command.

    Get-MailboxDatabase <DAGMemberInSecondarySite> | Mount-Database
    

Success criteria: The active mailbox databases are mounted on Mailbox servers in the secondary site. To confirm, run the following command.

Get-MailboxDatabaseCopyStatus <DatabaseName>

Clients connect to service endpoints to access Exchange services and data. Activating Internet-facing Client Access servers therefore involves changing DNS records to point to the new IP addresses that will be configured for the new service endpoints. Clients will then automatically connect to the new service endpoints in one of two ways:

  • Clients will continue to try to connect, and should automatically connect after Time to Live (TTL) has expired for the original DNS entry, and after the entry is expired from the client's DNS cache. Users can also run the ipconfig /flushdns command from a command prompt to manually clear their DNS cache. If using Outlook Web App, the Web browser may need to be closed and restarted to clear the DNS cache used by the browser. In Exchange 2010 SP1, this browser caching issue can be mitigated by configuring the FailbackURL parameter on the Outlook Web App virtual directory owa.

  • Clients starting or restarting will perform a DNS lookup on startup and will get the new IP address for the service endpoint, which will be a Client Access server or array in the second datacenter.

To validate the scenario with Loadgen, perform the following actions:

  1. Change the DNS entry for the Client Access server array to point to the virtual IP address of the hardware load balancing in the secondary site.

  2. Run the ipconfig /flushdns command on all Loadgen servers.

  3. Restart the Loadgen test.

  4. Verify that the Client Access servers in the secondary site are now servicing the load.

To validate the scenario with an Outlook 2007 client, perform the following:

  1. Change the DNS entry for the Client Access server array to point to the virtual IP address of the hardware load balancing in the secondary site.

  2. Run the ipconfig /flushdns command on the client or wait until TTL expires.

  3. Wait for the Outlook client to reconnect.

Return to top

The process of restoring service to a previously failed datacenter is referred to as a failback. The steps used to perform a datacenter failback are similar to the steps used to perform a datacenter switchover. A significant distinction is that datacenter failbacks are scheduled, and the duration of the outage is often much shorter.

It's important that failback not be performed until the infrastructure dependencies for Exchange have been reactivated, are functioning and stable, and have been validated. If these dependencies aren't available or healthy, it's likely that the failback process will cause a longer than necessary outage, and it's possible the process could fail altogether.

The Mailbox server role should be the first role that's failed back to the primary datacenter. The following steps detail the Mailbox server role failback process (assuming DAG is in DAC mode).

  1. To reincorporate the DAG members in the primary site, run the following command.

    Start-DatabaseAvailabilityGroup -Identity <DatabaseAvailabilityGroupIdParameter> -ActiveDirectorySite <insertsitename>
    
  2. To verify the state of the database copies in the primary datacenter, run the following command.

    Get-MailboxDatabaseCopyStatus
    

After the Mailbox servers in the primary datacenter have been incorporated into the DAG, they will need some time to synchronize their database copies. Depending on the nature of the failure, the length of the outage, and actions taken by an administrator during the outage, this may require reseeding the database copies. For example, if during the outage, you remove the database copies from the failed primary datacenter to allow log file truncation to occur for the surviving active copies in the secondary datacenter, reseeding will be required. At this time, each database can be synchronized individually. After a replicated database copy in the primary datacenter is healthy, you can proceed to the next step.

  1. During the datacenter switchover process, the DAG was configured to use an alternate witness server. To reconfigure the DAG to use a witness server in the primary datacenter, run the following command.

    Set-DatabaseAvailabilityGroup -Identity <DAGName> -WitnessServer <PrimaryDatacenterWitnessServer>
    
  2. The databases being reactivated in the primary datacenter should now be dismounted in the secondary datacenter. Run the following command.

    Get-MailboxDatabase | Dismount-Database
    
  3. After the databases have been dismounted, the Client Access server URLs should be moved from the secondary datacenter to the primary datacenter. To do this, change the DNS record for the URLs to point to the Client Access server or array in the primary datacenter.

    importantImportant:
    Don't proceed to the next step until the Client Access server URLs have been moved and the DNS TTL and cache entries have expired. Activating the databases in the primary datacenter prior to moving the Client Access server URLs to the primary datacenter will result in an invalid configuration (for example, a mounted database that has no Client Access servers in its Active Directory site).
  4. To activate the databases, run one of the following commands.

    Get-MailboxDatabase <insertcriteriatoselectDBs> | Move-ActiveMailboxDatabase -ActivateOnServer <DAGMemberinSecondSite>
    

    or

    Move-ActiveMailboxDatabase -Server <DAGMemberinPrimarySite> -ActivateOnServer <DAGMemberinSecondSite>
    
  5. To mount the databases, run the following command.

    Get-MailboxDatabase <insertcriteriatoselectDBs> | Mount-Database
    

Success criteria: The active mailbox databases are successfully mounted on Mailbox servers in the primary site. To confirm, run the following command.

Get-MailboxDatabaseCopyStatus <DatabaseName>

Return to top

Testing was conducted at the Microsoft Enterprise Engineering Center, a state-of-the-art enterprise solutions validation laboratory on the Microsoft main campus in Redmond, Washington.

With more than 125 million dollars in hardware and with ongoing strong partnerships with the industry's leading original equipment manufacturers (OEMs), virtually any production environment can be replicated at the EEC. The EEC offers an environment that enables extensive collaboration among customers, partners, and Microsoft product engineers. This helps ensure that Microsoft end-to-end solutions will meet the high expectations of customers.

Return to top

The following section summarizes the results of the functional and performance validation tests.

Return to top

The following table summarizes the functional validation test results.

Functional validation results

Test case Result Comments

Database switchover

Successful

Completed without errors

Server switchover

Successful

Completed without errors

Server failure

Successful

Completed without errors

Site failure

Successful

Completed without errors

Return to top

Testing against all disks per site on a single storage frame shows that CX4-480 handles just over 8,000 Exchange 2010 transactional IOPS across eight Exchange VMs configured with the user profile of 150 messages at .15 IOPS and an additional 20 percent headroom. Performance exceeded the target baseline of 5,832 IOPS required for this configuration and provided some additional headroom for peak loads. Disk latencies were all within the acceptable parameters according to Microsoft best practices for Exchange 2010 performance.

Storage design validation results

Database I/O Target values 4 Mailbox servers in normal operating condition (2,700 users per Mailbox server) 4 Mailbox servers in a switchover condition (5,400 users per Mailbox server) Total

Achieved Transactional IOPS (I/O Database Reads/sec + I/O Database Writes/sec)

1944 / 3888

3576 IOPS

4488 IOPS

8064 IOPS

I/O Database Reads/sec

Not applicable

2193

2729

4922

I/O Database Writes/sec

Not applicable

1439

1703

3142

I/O Database Reads Average Latency (msec)

<20 msec

14

18

16

I/O Database Writes Average Latency (msec)

Not a good indicator for client latency because database writes are asynchronous

14

18

16

   

I/O Log Writes/sec

Not applicable

1238

1560

2798

I/O Log Reads Average Latency (msec)

<10 msec

2

2

2

Return to top

The following sections summarize the server design validation results for the test cases.

Loadgen validation: test scenarios

Test Description

Normal operation

A 100 percent concurrency load for 10,800 users was simulated at one site, with each Mailbox server handling 2,700 users.

Single server failure or single server maintenance (in site)

The failure of a single Hyper-V host server per site was simulated. A 100 percent concurrency load was run against a single Hyper-V host with one VM handling 5,400 users. Only three combined Client Access and Hub Transport servers handled the load.

Site failure

A site failure was simulated, and secondary images on standby Mailbox server VMs were activated. A 100 percent concurrency load was run against 21,600 users in a single site.

This test case represents peak workload during normal operating conditions. Normal operating conditions refer to a state where all of the active and passive databases reside on the servers they were planned to run on. Because this test case doesn't represent the worst case workload, it isn't the key performance validation test. It provides a good indication of how this environment should run outside of a server failure or maintenance event. In this test, the objective was to validate the entire Exchange environment under normal operating condition with a peak load. All of the Exchange VMs were operating under normal conditions. Loadgen was configured to simulate peak load. The 150-message action profile running in peak mode was expected to generate double the sent and delivered messages per second.

The message delivery rate verifies that tested workload matched the target workload. Message delivery rate is slightly higher than target, resulting in slightly higher load than the desired profile.

 

Counter Target Tested result

Message Delivery Rate Per Mailbox

15.0

15.2

The following tables show the validation results of the primary Mailbox server VMs.

Processor

Processor utilization is below 70 percent, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

69

Storage

The storage results are good. All latencies are under target values.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Attached) Average Latency

<20 msec

19

MSExchange Database\I/O Database Writes (Attached) Average Latency

<20 msec

<Reads average

18

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

5

Database\Log Record Stalls/sec

0

0

Application Health

Exchange is very healthy, and all of the counters used to determine application health are well under target values.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

3.0

MSExchangeIS\RPC Averaged Latency

<10 msec

2.0

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

2.0

The following tables show the validation results of the secondary Mailbox server VMs.

Processor

Processor utilization is below 70 percent, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

26

Storage

The storage results are good. All latencies are under target values.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Recovery) Average Latency

<100 msec

0

MSExchange Database\I/O Database Writes (Recovery) Average Latency

<100 msec

<Reads average

16

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

3

Database\Log Record Stalls/sec

0

0

Application Health

The secondary Mailbox servers are only maintaining the third passive database copies, so the standard Exchange application health indicators aren't applicable for this scenario.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

Not applicable

MSExchangeIS\RPC Averaged Latency

<10 msec

Not applicable

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

Not applicable

The following tables show the validation results of the Client Access and Hub Transport server VMs.

Processor

Processor utilization is low, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

48

Storage

The storage results look good. The very low latencies should have no impact on message transport.

 

Counter Target Tested result

Logical/Physical Disk(*)\Avg. Disk sec/Read

<20 msec

0.001

Logical/Physical Disk(*)\Avg. Disk sec/Write

<20 msec

0.005

Application Health

The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.

 

Counter Target Tested result

MSExchange RpcClientAccess\RPC Averaged Latency

<250 msec

8

MSExchange RpcClientAccess\RPC Requests

<40

3

The Transport Queues counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.

 

Counter Target Tested result

\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)

<3000

2.5

\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

<250

0

\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

<250

2.3

\MSExchangeTransport Queues(_total)\Submission Queue Length

<100

0

\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

<100

0.3

The following tables show the validation results of the Hyper-V root server.

Processor

As expected, the processor utilization is very low and well under target thresholds.

 

Counter Target Tested result

Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time

<75%

66

Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time

<5%

2

Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time

<80%

68

Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time

<5%

3

Application Health

The Virtual Machine Health Summary counters indicate that all VMs are in a healthy state.

 

Counter Target Tested result

Hyper-V Virtual Machine Health Summary\Health Critical

0

0

In this test, the objective was to validate the entire Exchange environment under physical Hyper-V root server failure or maintenance operating conditions with a peak load. All VMs running on one of the Hyper-V root servers within the site were shut down to simulate a host maintenance condition. This resulted in database images (copies) being moved to other Mailbox server VMs, which created an operating condition of 5,400 users per Mailbox server VM. Only half of the combined Client Access and Hub Transport servers processed client access and mail delivery.

The actual message delivery rate was on target.

 

Counter Target Tested result

Message Delivery Rate Per Server

30

30

The following tables show the validation results of the primary Mailbox server VMs.

Processor

Processor utilization is just over target. This test case represents a failure or maintenance scenario at peak load so this would be a low occurrence event. You wouldn't want processor utilization to be this high for an extended period of time.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

83

Storage

Storage results look acceptable. The average read latency is just over target. The average database write latency is higher than preferred. This is during the worst case failure scenario under peak load, which is a low occurrence event. The high latencies don't put the application health counters over target so user experience should still be acceptable. You wouldn't want latencies to be this high for an extended period of time.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Attached) Average Latency

<20 msec

20.5

MSExchange Database\I/O Database Writes (Attached) Average Latency

<20 msec

23

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

8

Database\Log Record Stalls/sec

0

0

Application Health

The counters show that Exchange is still reasonably healthy. Some message queuing under peak load is starting to occur. You wouldn't want this to continue for an extended period of time.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

9.0

MSExchangeIS\RPC Averaged Latency

<10 msec

2.0

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

77

The following tables show the validation results of the secondary Mailbox server VMs.

Processor

Processor utilization is below 70 percent, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

21

Storage

The storage results are good. All latencies are under target values.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Recovery) Average Latency

<100 msec

0

MSExchange Database\I/O Database Writes (Recovery) Average Latency

<100 msec

<Reads average

21

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

3

Database\Log Record Stalls/sec

0

0

Application Health

The secondary Mailbox servers are only maintaining the third passive database copies, so the standard Exchange application health indicators aren't applicable for this scenario.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

Not applicable

MSExchangeIS\RPC Averaged Latency

<10 msec

Not applicable

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

Not applicable

The following tables show the validation results of the Client Access and Hub Transport server VMs.

Processor

Processor utilization is below 80 percent, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

74

Storage

The storage results look good. The very low latencies should have no impact on message transport.

 

Counter Target Tested result

Logical/Physical Disk(*)\Avg. Disk sec/Read

<20 msec

0.001

Logical/Physical Disk(*)\Avg. Disk sec/Write

<20 msec

0.008

Application Health

The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.

 

Counter Target Tested result

MSExchange RpcClientAccess\RPC Averaged Latency

<250 msec

18

MSExchange RpcClientAccess\RPC Requests

<40

14

The Transport Queues counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.

 

Counter Target Tested result

\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)

<3000

49

\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

<250

0

\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

<250

43

\MSExchangeTransport Queues(_total)\Submission Queue Length

<100

53

\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

<100

4

The following tables show the validation results of the Hyper-V root server.

Processor

The processor utilization is close to target thresholds, which is expected for the failure or maintenance scenario under peak load.

 

Counter Target Tested result

Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time

<75%

77

Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time

<5%

2

Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time

<80%

79

Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time

<5%

3

Application Health

The Virtual Machine Health Summary counters indicate that all VMs are in a healthy state.

 

Counter Target Tested result

Hyper-V Virtual Machine Health Summary\Health Critical

0

0

This test case simulates a site failure by switching the active databases in the primary site to the passive databases in the secondary site resulting in 21,600 mailboxes active in one site. The four primary Mailbox server VMs in the surviving site are running a normal workload of 2,700 active mailboxes each. The four secondary Mailbox server VMs in the surviving site are now running 2,700 active mailboxes each. Each Hyper-V root server is hosting 5,400 active mailboxes.

Message delivery rate is slightly higher than target, resulting in slightly higher load than the desired profile.

 

Counter Target Tested result

Message Delivery Rate Per Server

15

15.1

The following tables show the validation results of the primary Mailbox server VMs.

Processor

The primary Mailbox server VMs are running a normal workload and are under the processor utilization target, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

63

Storage

The storage results are good. All latencies are under target values.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Attached) Average Latency

<20 msec

12

MSExchange Database\I/O Database Writes (Attached) Average Latency

<20 msec

13

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

4

Database\Log Record Stalls/sec

0

0

Application Health

Exchange is very healthy, and all of the counters used to determine application health are well under target values.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

3.0

MSExchangeIS\RPC Averaged Latency

<10 msec

2.0

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

3

The following tables show the validation results of the secondary Mailbox server VMs.

Processor

Processor utilization is just over the 80 percent target. This is higher than preferred, but it doesn't appear to be impacting other Exchange health counters. Because this test represents peak load during a low occurrence site failure event, it's okay. You wouldn't want this level of processor utilization for a sustained period of time.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<80%

84

Storage

The storage results are good. All latencies are under target values.

 

Counter Target Tested result

MSExchange Database\I/O Database Reads (Attached) Average Latency

<20 msec

17

MSExchange Database\I/O Database Writes (Attached) Average Latency

<20 msec

<Reads average

12

Database\Database Page Fault Stalls/sec

0

0

MSExchange Database\IO Log Writes Average Latency

<20 msec

3

Database\Log Record Stalls/sec

0

0

Application Health

The counters show Exchange is healthy. There is a small amount of queuing.

 

Counter Target Tested result

MSExchangeIS\RPC Requests

<70

3

MSExchangeIS\RPC Averaged Latency

<10 msec

2

MSExchangeIS Mailbox(_Total)\Messages Queued for Submission

0

106

The following tables show the validation results of the Client Access and Hub Transport server VMs.

Processor

Processor utilization is below 80 percent, as expected.

 

Counter Target Tested result

Hyper-V Hypervisor Virtual Processor\% Guest Run Time

<70%

63

Storage

The storage results look good. The very low latencies should have no impact on message transport.

 

Counter Target Tested result

Logical/Physical Disk(*)\Avg. Disk sec/Read

<20 msec

0.002

Logical/Physical Disk(*)\Avg. Disk sec/Write

<20 msec

0.003

Application Health

The low RPC Averaged Latency values confirm a healthy Client Access server with no impact on client experience.

 

Counter Target Tested result

MSExchange RpcClientAccess\RPC Averaged Latency

<250 msec

9

MSExchange RpcClientAccess\RPC Requests

<40

7

The Transport Queues counters are all well under target, confirming that the Hub Transport server is healthy and able to process and deliver the required messages.

 

Counter Target Tested result

\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)

<3000

5

\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

<250

0

\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

<250

4

\MSExchangeTransport Queues(_total)\Submission Queue Length

<100

0

\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

<100

1

The following tables show the validation results of the Hyper-V root server.

Processor

Processor utilization is over the 80 percent target. Because this test represents peak load during a low occurrence site failure event, it's okay. You wouldn't want this level of processor utilization for a sustained period of time.

 

Counter Target Tested result

Hyper-V Hypervisor Logical Processor(_total)\% Guest Run Time

<75%

85

Hyper-V Hypervisor Logical Processor(_total)\% Hypervisor Run Time

<5%

2

Hyper-V Hypervisor Logical Processor(_total)\% Total Run Time

<80%

87

Hyper-V Hypervisor Root Virtual Processor(_total)\% Guest Run Time

<5%

3

Application Health

The Virtual Machine Health Summary counters indicate that all VMs are in a healthy state.

 

Counter Target Tested result

Hyper-V Virtual Machine Health Summary\Health Critical

0

0

Return to top

This white paper provides an example of how to design, test, and validate an Exchange 2010 solution for customer environments with 32,400 mailboxes in multiple sites deployed on Cisco and EMC hardware. The step-by-step methodology in this document walks through the important design decision points that help address key challenges while ensuring that core business requirements are met.

Return to top

For the complete Exchange 2010 documentation, see Exchange Server 2010.

For additional information related to Cisco and EMC, see the following resources:

This document is provided "as-is." Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.

Return to top

 
Смятате ли това за полезно?
(1500 знака остават)
Благодарим ви за обратната връзка

Обществено съдържание

Добавяне
Покажи:
© 2014 Microsoft