Ruling Out Disk-Bound Problems

 

Disk problems are a common bottleneck for large Exchange deployments.

Exchange makes extensive use of the disk subsystem, but its use varies depending on the intended function of each disk. There are five functions that are of importance:

  • Temp disk

  • Database disks

  • Transaction log disks

  • SMTP queue

  • Page file disk

Each group of disks that serve the functions above sees distinct I/O utilization patterns that require a separate analysis. Because of these different patterns, no disk should be used for more than one function.

Looking at the I/O Patterns of the Temp Disk

The operating system temporary drive is where all the format conversions, such as from RTF to HTML, occur. It is also the home for all temporary files created and accessed during crawls performed by the Microsoft Index Server Indexing Service.

When first installed, the operating system sets the location for creation and use of temporary files as the same disk used by the operating system itself. This means that any I/O for the temp disk competes with I/O for programs and page file operations being run from that drive. This competition for I/O impacts performance. To avoid having the operating system compete with for I/O with the temp disk, it is recommended that you change the global environment setting of TEMP to point to another disk and, thereby, set the temp disk to its own disk.

Use the counters listed in the following table to determine if there are any resource contentions in the temp disk.

Performance Counters for Temp Disks

Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 10 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • The average value should be below 10 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk Queue Length

Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

  • The average value should be less than the number of spindles of the disk (1 if it is really a physical disk).

Looking at the I/O Patterns of Database Disks

An Exchange database consists of two files:

  • **An .edb file (MAPI content)   **This file stores all of the MAPI messages and tables used by the store process to locate all messages and checksums of both the .edb and .stm files, and MAPI messages.

  • **An .stm file (non-MAPI content)   **This file contains messages that are transmitted with their native Internet content.

Because access to either type of these files is generally random, both file types can be placed on the same disk volume.

When analyzed per physical database disk, you can use the counters listed in the following table to determine whether there is any performance degradation on the disks.

Performance Counters for Database Disks

Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 20 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • The average value should be below 20 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk Queue Length

Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

  • The average should be less than the number of spindles of the disk. If a SAN is being used, ignore this counter and concentrate on the latency counters: PhysicalDisk\Average Disk sec/Read and PhysicalDisk\Average Disk sec/Write.

Example of Database Disk Monitoring

In the following figure, one of the database disks (P:\) is experiencing high write latencies (as indicated by the PhysicalDisk\Average Disk sec/Write counter), averaging 62 milliseconds (ms), and frequently spiking above 80 ms and sometimes above 100 ms.

Monitoring a database disk using the Performance snap-in

f0497cf2-ca1f-440a-96ef-4696e9c01b93

Looking at the I/O Patterns of Database Disks Using Synchronous Replication

Exchange servers should generally have database write latencies under 20 ms, with spikes (maximum values) under 50 ms. However, it is not always possible to keep write latencies in this range when synchronous replication is in use. Database write latency issues often do not become apparent to the end user until the database cache is full and cannot be written to. When using synchronous replication, the Performance Monitor Database Page Fault Stalls/sec counter is a better indicator of whether the client is being affected by write latency than the PhysicalDisk\Average Disk sec/Write counter.

On a production server, the value of the Database Page Fault Stalls/sec counter should always be zero, because a database page fault stall indicates that the database cache is full. A full database cache means that Exchange is unable to place items in cache until pages are committed to disk. Moreover, on most storage subsystems, read latencies are affected by write latencies. These read latencies may not be detectable at the default storage subsystem Performance Monitor sampling rate. Remote procedure call (RPC) latencies also increase as a consequence of database page fault stalls, which can degrade the client experience.

Because disk-related performance problems can negatively affect the user experience, it is recommended that administrators monitor disk performance as part of routine system health monitoring. When analyzing a database logical unit number (LUN) in a synchronously replicated environment, you can use the counters listed in the following table to determine whether there is any performance degradation on the disks.

Performance Monitor Counters for Database Disk Performance Evaluation

Performance Monitor Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 20 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • This counter is not a good indicator for client latency in a synchronous replication environment

Database\Database Page Fault Stalls/sec

Indicates the rate of page faults that cannot be serviced because there are no pages available for allocation from the database cache.

  • This counter should be zero on production servers.

Example of Database Disk Monitoring

In the following illustration, one of the database disks (Q:\) has extremely high write latencies (as indicated by the Average Disk sec/Write counter), averaging 313 ms. However, the Average Disk sec/Read value, shows no sign of this latency problem, and it is in an acceptable range, averaging 14 ms (< 20 ms is recommended). Clearly, the Average Disk sec/Write counter is not a good indicator for client latency in a synchronous replication environment. RPC Average Latency averages 24 ms in this example (< 50 ms is recommended). In this scenario users of Microsoft Outlook were not affected by the high write latencies.

Monitoring a synchronously replicated database disk with Performance Monitor

Performance Monitor graph

In the following illustration, the database is heavily loaded, with RPC Averaged Latency spikes that correlate with spikes in Database Page Fault Stalls/sec.

Example of RPC Averaged Latency being affected by Database Fault Stalls/sec

Performance Monitor graph

Looking at the I/O Patterns of Transaction Log Disks

The transaction log files maintain the state and integrity of your .edb and .stm files. This means that the log files in effect, represent the data. There is a transaction log file set for each storage group. To increase performance, Exchange implements each transaction log file as a database. If a disaster occurs and you have to rebuild your server, use the latest transaction log files to rebuild your databases. If you have the log files and the latest backup, you can recover all of your data. However, if you lose your log files, the data is lost.

There are generally no reads to the log drives, except when restoring backups. This means that write performance is essential to the transaction logs and any analysis should closely observe this aspect. When analyzed per physical log disk, you can use the counters listed in the following table to determine whether there is any performance degradation on the disks.

Performance Counters for Transaction Log Disks

Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 5 ms.

  • Spikes (maximum values) should not be higher then 50 ms.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • The average value should be below 10 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk Queue Length

Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

  • The average should be less than the number of spindles of the disk.

    If a SAN is being used, ignore this counter and concentrate on the latency counters:

    PhysicalDisk\Average Disk sec/Write and PhysicalDisk\Average Disk sec/Read

Database\Log Record Stalls/sec

Indicates the number of log records that cannot be added to the log buffers per second because the log buffers are full.

  • The average value should be below 10 per second.

  • Spikes (maximum values) should not be higher than 100 per second.

Database\Log Threads Waiting

Indicates the number of threads waiting to complete an update of the database by writing their data to the log.

If this number is too high, the log may be a bottleneck.

  • The average value should be less than 10 threads waiting.

Looking at the I/O Patterns of SMTP Queues

The SMTP queue stores SMTP messages until Exchange writes them to a database (private or public), or sends them to another server or connector. SMTP queues generally experience random, small I/Os.

When analyzed per physical SMTP queue disk, you can use the counters listed in the following table to determine if there is any performance degradation on the disks.

Performance Counters for SMTP Queues

Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 10 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • The average value should be below 10 ms.

  • Spikes (maximum values) should not be higher than 50 ms.

PhysicalDisk\Average Disk Queue Length

Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

  • The average should be less than the number of spindles of the disk.

Looking at the I/O Patterns of the Page File Disk

The page file serves as an extension of the physical memory, serving as an area where the system puts unused pages or pages it will need later. The page file always sees some utilization, even in machines with a good amount of free memory. This constant utilization is because the operating system tries to keep in memory only the pages that it needs and enough free space for operations. For example, a printing tool that is used only at startup might have some of its memory paged to disk and never brought back if it is never used.

In servers where the physical memory is being used heavily, it is important to ensure that all access to the page file is as fast as possible and to avoid thrashing situations. It is common for servers to start seeing errors in memory operations long before the page file is full. So, observing usage patterns of the page file disk is more important than how full the disk is. Use the counters listed in the following table to determine whether there is any performance degradation on the page file disk.

Performance Counters for Page File Disks

Counter Expected values

PhysicalDisk\Average Disk sec/Read

Indicates the average time (in seconds) to read data from the disk.

  • The average value should be below 10 ms at all times.

PhysicalDisk\Average Disk sec/Write

Indicates the average time (in seconds) to write data to the disk.

  • The average value should be below 10 ms at all times.

PhysicalDisk\Average Disk Queue Length

Indicates the average number of both read and write requests that were queued for the selected disk during the sample interval.

  • The average should be less than the number of spindles of the disk.

Paging File\% Usage

Indicates the amount (as a percentage) of the paging file used during the sample interval.

A high value indicates that you may need to increase the size of your Pagefile.sys file or add more RAM.

  • This value should remain below 50%.

Improving Disk Performance

The following list describes ways to improve disk performance:

  • Enable caching on the array controller

    For direct attach storage solutions, there is a big performance improvement when enabling the array controller's caching capabilities. In particular, the write-back cache is highly effective and should exhibit a noticeable performance improvement. It is also important to ensure that the array controller features battery backup, so any power fluctuations or outages do not cause errors or inconsistencies.

  • Increase log buffers

    Increasing the log buffers improves the performance of a log disk that is experiencing a high number of log record stalls. For more information, see the Exchange technical article, "Microsoft Exchange 2000 Internals: Quick Tuning Guide" and, for Exchange 2003, start with the default value of 512 for the log buffer and increase this value in increments of 512 up to the maximum value of 9000.

  • Increase the database cache

    Increasing the size of the database cache (dynamic buffer allocation) yields better disk performance. However, increasing the size of the database cache is not recommended because the increased size affects memory over time with the server becoming memory fragmented or running out of memory.

  • Align disk partitions with storage track boundaries

    Aligning the disk partitions with storage track boundaries has a positive effect on disk performance. However, how significant the performance benefit is depends on the storage technology and implementation, and therefore on the storage vendor.

  • Enforce message size limits

    Enforcing message size limits can reduce disk utilization and therefore result in better performance. Before enforcing message size limits, consider how this would affect the Service Level Agreement (SLA) of your organization.

  • Enforce mailbox size limits

    As with enforcing message limits, enforcing mailbox quotas can result in better disk performance. Again, consider enforcing message limits, taking into consideration the SLA of your organization.