Monitor and troubleshoot storage performance
Updated: November 5, 2009
Applies To: Office SharePoint Server 2007
Topic Last Modified: 2009-10-28
This topic lists the performance counters that we recommend that you use to monitor the performance of the SQL Server databases that are running in your Microsoft Office SharePoint Server 2007 environment. Also listed are approximate healthy values for each counter.
For details about how to monitor performance and use performance counters, see Monitoring Performance (http://go.microsoft.com/fwlink/?LinkID=105584).
Monitor the following counters to ensure the health of your servers:
Processor: % Processor Time: _Total This counter shows the percentage of time that the processor is executing application or operating system processes other than Idle. On the computer that is running SQL Server, this counter should be kept between 50 percent and 75 percent. In case of constant overloading, investigate whether there is abnormal process activity or if the server needs additional CPUs.
System: Processor Queue Length This counter shows the number of threads in the processor queue. Monitor this counter to ensure that it remains less than two times the number of core CPUs.
Memory: Available Mbytes This counter shows the amount of physical memory, in megabytes, available to processes running on the computer. Monitor this counter to ensure that you maintain a level of at least 20 percent of the total available physical RAM.
Memory: Pages/sec This counter shows the rate at which pages are read from or written to disk to resolve hard page faults. Monitor this counter to ensure that it remains under 100.
For more information and memory troubleshooting methods, see SQL Server 2005 Monitoring Memory Usage (http://go.microsoft.com/fwlink/?LinkID=105585).
Monitor the following counters to ensure the health of disks. Note that the following values represent values measured over time — not values that occur during a sudden spike and not values that are based on a single measurement.
Physical Disk: % Disk Time: DataDrive This counter shows the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Monitor this counter to ensure that it remains less than two times the number of disks.
Logical Disk: Disk Transfers/sec This counter shows the rate at which read and write operations are performed on the disk. Use this counter to monitor growth trends and forecast appropriately.
Logical Disk: Disk Read Bytes/sec and Logical Disk: Disk Write Bytes/sec These counters show the rate at which bytes are transferred from the disk during read or write operations.
Logical Disk: Avg. Disk Bytes/Read This counter shows the average number of bytes transferred from the disk during read operations. This value can reflect disk latency — larger read operations can result in slightly increased latency.
Logical Disk: Avg. Disk Bytes/Write This counter shows the average number of bytes transferred to the disk during write operations. This value can reflect disk latency — larger write operations can result in slightly increased latency.
Logical Disk: Current Disk Queue Length This counter shows the number of requests outstanding on the disk at the time that the performance data is collected. For this counter, lower values are better. Values above 2 per disk may indicate a bottleneck and should be investigated. This means that a value of up to 8 may be acceptable for a LUN comprised of 4 disks. Bottlenecks can create a backlog that can spread beyond the current server that is accessing the disk, and result in long wait times for users. Possible solutions to a bottleneck are to add more disks to the RAID array, replace existing disks with faster disks, or move some data to other disks.
Logical Disk: Avg. Disk Queue Length This counter shows the average number of both read and write requests that were queued for the selected disk during the sample interval. The rule is that there should be two or fewer outstanding read and write requests per spindle, but this can be difficult to measure because of storage virtualization and differences in RAID levels between configurations. Look for larger than average disk queue lengths in combination with larger than average disk latencies. This combination can indicate that the storage array cache is being overused or that spindle sharing with other applications is affecting performance.
Logical Disk: Avg. Disk sec/Read and Logical Disk: Avg. Disk sec/Write These counters show the average time, in seconds, of a read or write operation to the disk. Monitor these counters to ensure that they remain below 85 percent of the disk capacity. Disk access time increases exponentially if read or write operations are more than 85 percent of disk capacity. To determine the specific capacity for your hardware, refer to the vendor documentation, or use the SQLIO Disk Subsystem Benchmark Tool to calculate it. For more information, see SQLIO Disk Subsystem Benchmark Tool (http://go.microsoft.com/fwlink/?LinkID=105586).
Logical Disk: Avg. Disk sec/Read This counter shows the average time, in seconds, of a read operation from the disk. On a well-tuned system, ideal values are from 1-5 milliseconds (ms) for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
Logical Disk: Avg. Disk sec/Write This counter shows the average time, in seconds, of a write operation to the disk. On a well-tuned system, ideal values are from 1-5 ms for logs (ideally 1 ms on a cached array), and 4-20 ms for data (ideally less than 10 ms). Higher latencies can occur during peak times, but if high values occur regularly, you should investigate the cause.
When you are using RAID configurations with the Avg. Disk sec/Read or Avg. Disk sec/Write, use the formulas listed in the following table to determine the rate of input and output on the disk.
RAID level Formula
I/Os per disk = (reads + writes) / number of disks
I/Os per disk = [reads + (2 * writes)] / 2
I/Os per disk = [reads + (4 * writes)] / number of disks
I/Os per disk = [reads + (2 * writes)] / number of disks
For example, if you have a RAID 1 system that has two physical disks, and your counters are at the values that are shown in the following table:
Avg. Disk sec/Read
Logical Disk: Avg. Disk sec/Write
Avg. Disk Queue Length
The I/O value per disk can be calculated as follows: (80 + (2 * 70))/2 = 110
The disk queue length can be calculated as follows: 5/2 = 2.5
In this situation, you have a borderline I/O bottleneck.
You can also monitor disk latency and analyze trends by using the sys.dm_io_virtual_file_stats dynamic management view in SQL Server 2008. For more information, see sys.dm_io_virtual_file_stats (Transact-SQL) (http://go.microsoft.com/fwlink/?LinkID=105587).
If a disk or RAID experiences a bottleneck, consider taking the following actions:
Relocate some files to a separate disk or RAID array.
Add disks to the array.
If you cannot separate the content (for example, if there is a single content database file), replace the disks with faster disks.
Where possible, split the relevant content database into several databases; that is, relocate, or split and relocate site collections by using tools such as the Stsadm.exe command-line tool or Microsoft Office SharePoint Designer. If the problem persists, repeat the troubleshooting steps from step 1.