Export (0) Print
Expand All
Expand Minimize

Tips for Designing a Well Performing Exchange Disk Subsystem

 

Topic Last Modified: 2006-11-02

by Nicole Allen and Nino Bilic

This article describes a subject that is relevant to many environments. Disk subsystem issues are frequently a main reason of performance problems on servers. Storage Area Networks (SANs) are mentioned frequently in this article though the topics also apply to directly attached storage.

First, we will define IO (input/output, also written as I/O). The IO is the number of reads and writes to a drive. The actual bytes that are read or written are less interesting than the number of times the disk head has to move to a location. There is frequently confusion about the size of a disk (the number of bytes that can be stored on it) and the throughput (the number of IOs per second that can be read and written). Throughput is usually measured in IOPS (IOs per second or IO/sec). It is important to know the maximum throughput (the maximum number of IOs your disks can sustain) because if you exceed that maximum, Remote Procedure Call (RPC) latencies will quickly increase when maximum disk throughput is exceeded. When someone is referring to a disk bottleneck, they are referring to a throughput bottleneck, not a limitation of disk space.

In addition, when IO is mentioned, we refer to the physical disk or disk transfers per second to the database drives. But the basic principles can apply to sizing the rest of the drives also. The reason for our focus on the database drives is that Microsoft® Exchange Server makes heavy use of the disks that house the database drives. For comparison, the store writes one-tenth the number of IOs to the log drive than the database drives. Even though the focus is on database drives, be aware that Simple Mail Transfer Protocol (SMTP) queue drives and Exchange Server temp drives, depending on networks e-mail users, can also be heavy consumers of IO. You will want to make sure that the disk maximum throughput of those drives is not exceeded either.

You should understand several things about disk IO that Exchange Server generates on its data devices.

  • Transaction log drives are accessed in a sequential manner. When the transaction logs are written to the disk from memory, they do it sequentially.

  • Database drives are accessed randomly. When a database is mounted, the data is read and written all over the database in smaller chunks, randomly.

This is important because every physical disk can handle a finite amount of IO. This amount will vary on the disk speed (more on this later). It always comes down to a physical disk, whether you have a SAN or direct attached SCSI storage. At some point, data is stored on the physical disk somewhere.

If you try to mix different IO profiles (random and sequential) on the same physical disk you are at risk for performance issues, as the disk that is busy servicing a large sequential operation will not respond well (in a timely manner) to intermittent random operations.

Most of IO profile mixing on the same physical drives can, in itself, account for performance problems on an Exchange Server.

Here is an example:

  • The environment has a SAN with 60 drives in it.

  • The SAN is "carved" to create a logical "disk group" that contains 20 physical disks.

  • This disk group is then additionally defined into two different volumes that Microsoft Windows sees as two different drives.

  • Volume 1 has Exchange Server databases for four different storage groups.

  • Volume 2 has the Microsoft SQL Server database that is heavily used by the accounting department.

What is really occurring here?

  • Exchange Server data is spread over 20 physical disks

  • SQL Server data is spread over the same 20 physical disks

Even though those are logically separate, in fact the two separate volumes that were created might not be even be available from the same server. We still have the Exchange Server database (random IO) trying to compete with the SQL Server database (sequential IO). Both are demanding IO from the same physical group of disks.

This might not be obvious immediately. But what happens when the SQL Server administrator has to run an Integrity check on the SQL Server database? This will generate very high sequential IO demand on those 20 disks. Depending on the load and number of users who are using the Exchange Server at that time, and the IO that Exchange Server might demand from the disk group at that same time that the disks might run out of steam. This then translates into delays on the Exchange Server side, and eventually to RPC dialog boxes popping up on Outlook clients.

We can tell through performance monitoring.

By default, Logical Disk counters will not be available. Therefore, you might have to sample Physical Disk instead. For more information about how to enable Logical Disk counters, see Microsoft Knowledge Base article 253251, "Using Diskperf in Windows 2000."

The following key counters that will quickly show if database disk is a bottleneck are as follows:

 

Counter Expected values

LogicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

The average value should be below 40 ms. Ideally, this will be under 20 ms.

LogicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

The average value should be below 40 ms. Ideally, this will be under 20 ms.

If LogicalDisk is unavailable and you cannot restart the server immediately to enable them, you can use the PhysicalDisk counters. Make sure that you know which disks are being used by the database drives.

You may also want to monitor the following counters.

 

Counter Expectations

MSExchangeIS\RPC Requests

Indicates the number of MAPI RPC requests currently being serviced by the Microsoft Exchange Information Store service.

The Microsoft Exchange Information Store service can service only 100 RPC requests (the default maximum value, unless configured otherwise) at the same time before rejecting client requests.

Should be below 30 at all times.

MSExchangeIS\RPC Averaged Latency

Indicates the RPC latency in milliseconds, averaged for the past 1024 packets.

Should be below 50 ms at all times.

If disk is a problem, you will typically see a clear relationship of RPC Latency increasing when disk Reads / Writes increase. This means that Outlook clients are most likely seeing RPC dialog boxes or small delays in the client.

The following is an example of what the performance might look like if there are no problems:

Screenshot of Perfmon showing no disk bottlenecks

The following is an example of what performance might look like with problems:

Screenshot of PerfMon showing disk bottleneck
noteNote:
While it is expected to see the occasional spikes in read and write activity on disks, it is the trend that really is important here. High disk latencies for extensive periods of time will translate into RPC dialog boxes that indicates delays on the Outlook client-side.

While the concept of storage efficiency is something that administrators usually think about, the concept of performance efficiency might not be as apparent. Unfortunately, being storage efficient frequently means that you will not be performance efficient. While storage efficiency will determine whether you can store all the information that you have to store on the disks that have minimal wasted space, performance efficiency means you will use the available IO bandwidth. Both efficiencies are discussed in the following section.

Storage efficiency can be defined as having a disk subsystem that will enable us to store all the information that we need to store on them, without wasting hard disk space that is not being used.

Using the previous example:

  • Environment has a SAN with 60 drives in it. Drives are 72GB each.

  • SAN tech "carves" the SAN to create a logical "disk group" that contains 20 physical disks, RAID 1 (example).

  • This disk group is then defined into two different "Volumes" that Windows sees as two different drives.

  • Volume 1 has Exchange Server databases for 4 different storage groups. The Exchange Server database occupies 20 GB on each physical disk.

  • Volume 2 has the SQL Server database that is heavily used by the Accounting department. The SQL Server database occupies 30 GB on each disk.

This means that we have, for Exchange Server: 20 GB (per physical disk) x 20 disks = 400 GB total (which is spanned across 20 disks).

For SQL Server: 30 GB (per physical disk) x 20 disks = 600 GB total (which is spanned across 20 disks).

This also means that, 76 GB (total disk size) - 50 GB (Exchange Server + SQL Server) = 26 GB "unused" hard disk space per physical disk.

The total "unused" space = 26 GB (per disk) x 20 (disks in this disk group) = 520 GB.

Imagine what would occur if SQL Server was not put on the same physical disk group as Exchange Server. We would still be using 400 GB across the disk group for Exchange Server data. But that is 400 GB of 1520 GB total that this disk group can store physically. So that would mean that we would be using only 26 percent of storage capacity in that case. That is fairly storage inefficient, and most administrators will decide that this is unacceptable and will use the rest of the storage space for some purpose.

The next question might be, why do not we merely make a smaller disk group and include 10 physical disks into it and not 20 therefore making this more storage efficient? Because if we have 400 GB of Exchange Server data, we would be using 40 GB per physical disk for Exchange Server, leaving only 37 GB unused.

The problem is that each physical disk can only handle a finite amount of IO. If you have only 10 disks in your disk group, those 10 disks might not provide sufficient throughput performance to keep up with the demand that the users need from your disk subsystem.

Performance efficiency can be defined as having a disk subsystem that can handle the IO requirements of the application that stores its files on it. In our case, that application is Exchange Server and the IO requirements will change based on the number of users who have their mailboxes on Exchange Server databases that we put on those disks, plus the IO "profile" of those users.

The best way to determine maximum throughput is to measure it. The JetStress tool is an excellent way to measure the maximum throughput of your disks. The documentation explains how to do this, so we'll skip that detail here. However, to use JetStress, you have to test your disks in a lab (not in a production environment). If you already have a server in production and suspect you have exceeded the maximum throughput, the best thing that you can do is make an estimate.

To make estimates (there are many tricks, but these are fairly simple):

  • Most disks can do between 130 to 180 IOPS.

  • Exchange Server typically has a Read-to-Write (R:W) ratio of 3:1 or 2:1.

  • We recommend that you plan to use less than 80 percent disk usage at peak load.

    Raid 0 (striping) has the same cost as no raid. Reads and writes occur one time.

    Raid 0+1 requires two disk IOs for every write (the mirrored data is written two times)

    Raid 5 requires four disk IOs for every write (two reads, two writes to calculate and write parity).

For corporate servers that have a large number of users (500 users or more), the R:W ratios are usually 3:1 or 2:1. However, servers that have fewer than 500 users will have lower R:W ratios (approaching 0:1 as the number of users and the data in the database decreases). This is because for servers that have few users, much of the user’s data will be in the database cache. Therefore, some of the read actions will be satisfied by data in memory. This reduces the number of read operations. Of course, all the write operations will still have to be written to disk. Therefore, the net effect of having a smaller number of users on the server is that the ratio of R:W decreases.

To measure your R:W ratio, examine the ratio of LogicalDisk\Disk Reads/sec to LogicalDisk\Disk Writes/sec for the database drives.

The following tables can be used to look up the recommended maximum disk throughput per disk.

Estimated maximum disk throughput for No Raid or Raid 0

 

R:W ratio    \   Disk speed 130 IOs per second 180 IOs per second

3:1

104 IOPS

144 IOPS

2:1

104 IOPS

144 IOPS

0:1

104 IOPS

144 IOPS

Estimated maximum disk throughput for Raid 0+1 (or Raid 10)

 

R:W ratio    \   Disk speed 130 IOs per second 180 IOs per second

3:1

83 IOPS

115 IOPS

2:1

78 IOPS

108 IOPS

0:1

52 IOPS

72 IOPS

Estimated maximum disk throughput for Raid 5

 

R:W ratio    \   Disk speed 130 IOs per second 180 IOs per second

3:1

59 IOPS

82 IOPS

2:1

52 IOPS

72 IOPS

0:1

26 IOPS

36 IOPS

You can safely assume that you can obtain a throughput of 80 IOs per second for most disks, in a raid 0+1 configuration (as Raid 0+1 is generally recommended for most database drives).

How much IO does one Exchange Server client create in an environment?

User IO requirements are measured in IO/second also referred to as IOP. To measure your current user IOP, you should monitor performance of disk subsystem.

Monitor: Logical disk > Disk transfers/sec / number of users = user IOP.

Measure the physical disk\disk transfers per second for all databases for between 20 minutes to 2 hours during your most active time. During this time, measure the number of active users (MSExchangeIS\ Active User Count). Take an average of these counters. Sum the disk transfers/sec for each database, divide the first number by the second and you have calculated the number of IOPS per user.

noteNote:
The Active User Count number is not a perfect representation of number of users. From experience, it is very close to reality and is a good starting point when you make calculations.

Be aware that the number of IOPS per user is determined by how active the clients are. You may find that this differs from server to server and database to database. These numbers are used as guidelines, but accurate numbers are not always necessary as long as you build in a bit of overhead when planning and populating your servers. However, you can use these numbers to help decide when you want to move users from a busy server to another server.

noteNote:
As a general practice, it is a good idea to always measure the server when it is at peak load. When sizing servers, always plan for maximum usage and then leave some buffer overhead for those days when all the users return from a holiday break.

Now that we know how to measure (through JetStress) or estimate (from earlier) maximum disk throughput, and know the IOPS per user, it is a simple task to plan for how many disks we need for a new server.

Assuming the new users have a similar e-mail usage profile (are using the same clients, have the same percentage of e-mail plugins, send about the same mail), here is how it is done:

Calculate the throughput needed (multiply the number of users on the new server by the number of IOPS per user).

Divide the throughput by the maximum throughput of the disks used. Use the numbers from the earlier table, or the result from JetStress * 0.8. The numbers in the table earlier already include the 80 percent max usage to build in some overhead.

Round up. This gives us the minimum number of disks needed for the server. Next, divide by the number of databases, and round up. This gives us the number of disks needed per database (or repeat with storage groups if the databases share the same physical drive).

Example   Suppose we are hiring 5000 people, and we want to figure out how to size our server. Current users must have 0.4 IOPS per user, and we expect the new users to be as hard working as our current employees. We will need a total of 2000 IOPS.

We are purchasing fast disks capable of 180 IOPS that we will configure in Raid 0+1. From the table earlier, we can expect 108 IOs per second. 2000 IOPS/108 IOPS per disk = 18.5. This will imply that we will need 19 disks, if all IOs were all going to the same position. But they are not of course. We plan to have 20 db spread across 4 storage groups. The databases on the same storage group will share the same disk. Each storage group disk will have to support 2000/4 = 500 IOPS. That means each storage group disk will have to have 500/108 = 4.6 disks. Rounding up shows that we will need 5 disks for each storage group. Therefore, the total number of disks needed (for all storage groups) is 5*4 = 20 disks.

After buying our disks, we test them in the lab and the JetStress tests of these disks only show 120 IOs per second total. This gives us 96 IOPS to play with after multiplying by 0.8 to give us a 20 percent safety buffer. (Remember, we recommend 80 percent use at peak load.). We redo our calculations and find it does not affect the results because we had already rounded up earlier. We are ready to build out the server and add the new users.

There are several general conclusions that we can draw from all this.

  • Exchange Server is very IO intensive, mostly with random IO access to its disks.

  • Because of the random nature of Exchange Server reads, the benefits of read caching are small; that is why we need to always think of actual physical disk IO capability instead of the cache IO capability (for example, the SAN cache).

  • Always think of performance efficiency first. This is what will make or break the user experience. Whereas this might mean that system will be storage inefficient, it is a tradeoff for good client performance.

  • Do not mix Exchange Server (random IO) with sequential applications (SQL Server, and so on) on the same physical disk. If you have more physical disks than you need to be Exchange Server performance efficient, your disk subsystem might tolerate some of this mixing. However, it is not a good practice because performance of Exchange Server is then left in the hands of what might be occurring with sequential applications that use the same physical disk group.

  • As hard disks become larger, there will be more and more pressure to share the physical disks between Exchange Server and other applications merely to stay storage efficient. Unfortunately, to stay performance efficient, you may have to sacrifice the performance-efficiency side.

  • RAID 5 is, in most cases, not a good way to go for Exchange Server database b the write penalty of parity bit increases the number of disks significantly if you want this configuration to be performance efficient.

For more information about disk performance and related topics, see the following Exchange resources:

 
Was this page helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

ADD
Show:
© 2014 Microsoft