
Looking at the I/O Patterns of Database Disks Using Synchronous Replication
Exchange servers should generally have database write latencies under 20 ms, with spikes (maximum values) under 50 ms. However, it is not always possible to keep write latencies in this range when synchronous replication is in use. Database write latency issues often do not become apparent to the end user until the database cache is full and cannot be written to. When using synchronous replication, the Performance Monitor Database Page Fault Stalls/sec counter is a better indicator of whether the client is being affected by write latency than the PhysicalDisk\Average Disk sec/Write counter.
On a production server, the value of the Database Page Fault Stalls/sec counter should always be zero, because a database page fault stall indicates that the database cache is full. A full database cache means that Exchange is unable to place items in cache until pages are committed to disk. Moreover, on most storage subsystems, read latencies are affected by write latencies. These read latencies may not be detectable at the default storage subsystem Performance Monitor sampling rate. Remote procedure call (RPC) latencies also increase as a consequence of database page fault stalls, which can degrade the client experience.
Because disk-related performance problems can negatively affect the user experience, it is recommended that administrators monitor disk performance as part of routine system health monitoring. When analyzing a database logical unit number (LUN) in a synchronously replicated environment, you can use the counters listed in the following table to determine whether there is any performance degradation on the disks.
Performance Monitor Counters for Database Disk Performance Evaluation
|
Performance Monitor Counter
|
Expected values
|
|---|
|
PhysicalDisk\Average Disk sec/Read
Indicates the average time (in seconds) to read data from the disk.
|
-
The average value should be below 20 ms.
-
Spikes (maximum values) should not be higher than 50 ms.
|
|
PhysicalDisk\Average Disk sec/Write
Indicates the average time (in seconds) to write data to the disk.
|
-
This counter is not a good indicator for client latency in a synchronous replication environment
|
|
Database\Database Page Fault Stalls/sec
Indicates the rate of page faults that cannot be serviced because there are no pages available for allocation from the database cache.
|
-
This counter should be zero on production servers.
|
Example of Database Disk Monitoring
In the following illustration, one of the database disks (Q:\) has extremely high write latencies (as indicated by the Average Disk sec/Write counter), averaging 313 ms. However, the Average Disk sec/Read value, shows no sign of this latency problem, and it is in an acceptable range, averaging 14 ms (< 20 ms is recommended). Clearly, the Average Disk sec/Write counter is not a good indicator for client latency in a synchronous replication environment. RPC Average Latency averages 24 ms in this example (< 50 ms is recommended). In this scenario users of Microsoft Outlook were not affected by the high write latencies.
Monitoring a synchronously replicated database disk with Performance Monitor
In the following illustration, the database is heavily loaded, with RPC Averaged Latency spikes that correlate with spikes in Database Page Fault Stalls/sec.
Example of RPC Averaged Latency being affected by Database Fault Stalls/sec