Disk Bottleneck Detected

[This topic is intended to address a specific issue called out by the Exchange Server Analyzer Tool. You should apply it only to systems that have had the Exchange Server Analyzer Tool run against them and are experiencing that specific issue. The Exchange Server Analyzer Tool, available as a free download, remotely collects configuration data from each server in the topology and automatically analyzes the data. The resulting report details important configuration issues, potential problems, and nondefault product settings. By following these recommendations, you can achieve better performance, scalability, reliability, and uptime. For more information about the tool or to download the latest versions, see "Microsoft Exchange Analyzers" at https://go.microsoft.com/fwlink/?linkid=34707.]  

Topic Last Modified: 2006-11-14

The Microsoft® Exchange Server Analyzer Tool has determined that your disk system is currently running within 20 percent of the expected maximum available throughput. This determination is made by one of the following calculations:

  • Measuring disk latencies. The performance counters that indicate latency are LogicalDisk\Avg. Disk sec/Read and LogicalDisk\Avg. Disk sec/Write.

  • Comparing the maximum possible disk I/O per second (IOPS) of your current disk configuration and spindle count against the current IOPS value, as recorded by Performance Monitor (Perfmon).

Acceptable Latencies

Depending on the type of data that is being accessed, the Exchange server has different tolerance for delays. Because of this, the maximum acceptable average latency threshold is given based on the type of resource that is on the disk. For all cases, except where indicated in the following list, the maximum acceptable value for the read or write latencies is 50 milliseconds (ms).

  • Transaction log drives   The drive that hosts the transaction log should have average write latencies below 10 ms. Spikes in write latencies should be under 50ms. Writes to the transaction log are synchronous. This means that, before a thread in the Store.exe process can perform another task, the thread must wait for the write to complete. Having low write latencies for the transaction logs is important to server performance. The average Read latency to the transaction log drives should be below 20 ms. Spikes in read latency should be under 50ms. Database Log Record Stalls per second should be less than 10. Database Log Threads Waiting should be less than 10.

    Ordinarily, Exchange servers do not read from the transaction logs. Therefore, the read latencies to that drive do not matter. However, because the transaction log write latencies are so important to Exchange performance, it is recommended that, on large servers, you do not use the drives that host transaction logs for any other purpose. In this case, the rate of reads (as measured by LogicalDisk\Disk Reads/sec) should be minimal compared to the rate of writes (LogicalDisk\Disk Writes/sec). The Exchange Server Analyzer will detect if the ratio of reads to writes on the transaction log drive is greater than 0.10 (more than one read for every ten writes).

    If there are more than 0.10 reads for every write, you should identify which application is reading from the transaction log drive, and then prevent this action from occurring.

  • Database drives   The acceptable latency for the drives that contain Exchange database files ( *edb, and *stm files) are as below (higher values indicate a disk bottleneck):

    • The maximum value for Logical Disk\Avg. Disk sec/Read on a database drive should be less than 50 ms. (0.050 seconds)

    • The average value for Logical Disk\Avg. Disk sec/Read on a database drive should be less than 20 ms. (0.020 seconds)

  • TEMP and TMP drives   The latency for the drives that contain the TEMP and TMP directories should have read and write latencies below 10 ms. The maximum value for the read or write latency should be below 50 ms.

  • Page and system drive   The latency for the drives that contain page files and the Windows system files should have read and write latencies below 10 ms.

  • SMTP drive   The latency for the drives that contain SMTP server files should have read and write latencies below 10 ms.

Causes of Latency

Disks that are accessed at a rate higher than the disk subsystem can support is one of the most common causes of disk latency. In this case, the disk is said to be a bottleneck. This also means that the disk subsystem is at or beyond its throughput capacity.

If the rate of disk IOPS (LogicalDisk\Disk Transfers/sec) is close to or greater than the estimated capacity, then the disk is referred to as “beyond throughput capacity” or “beyond capacity.” In these cases, to improve server performance, you must either reduce the load on the disk subsystem (move users to another server) or increase the capacity of the drive (by adding more or faster spindles).

In cases where your hardware is not performing as expected, you may see latencies even when the I/O rate is significantly below the estimated capacity.

Reducing Disk Latency

To reduce the load on the disk subsystem, you can reduce some of the tasks that are performed by the server. Specifically, you should remove any optional applications that are contributing to disk load. The Store.exe process is the only application that should be accessing the database and transaction log drives.

Depending on your current disk system configuration, you can take several actions to reduce the effect of this issue.

To reduce disk latency

  • If a database or transaction log drive is bottlenecked, you can reduce load on that drive by moving users to a database or storage group that is hosted on a drive that is not near maximum capacity. If all the databases or transaction log drives are nearing capacity, you may have to move users to another server. For more information, see Move User Mailboxes to Another Server

  • Occasionally the duration of a high I/O rate is brief because it is caused by an expensive MAPI operation. In that case, to determine which user or action is causing high I/O rates, you can use the Exchange Server User Monitor (ExMon) tool in conjunction with the logical disk performance counters. If you can isolate the user, you may be able to identify which client application is generating the greater load. You can download ExMon from the Microsoft Download Center (https://go.microsoft.com/fwlink/?LinkId=54983).

  • If you are running a RAID-5 disk array, you may want to change to a RAID-10 disk array to improve the available supported IOPS of the disk subsystem.

  • To improve the available supported IOPS, consider adding additional disks to your disk system.

For More Information