SQL Server: SQL Server Delays Demystified

Article
08/31/2016

You can use the tracking data for SQL Server transaction wait times to find overused resources and eliminate bottlenecks.

Excerpted from “SQL Server DMV Starter Pack,” published by Red Gate Books (2010).

Glenn Berry, Louis Davidson and Tim Ford

Every time a SQL Server session has to wait for some reason before the requested work can continue, SQL Server records the length of time waited. It also records the resource for which the transaction is waiting.

The sys.dm_os_wait_stats dynamic management view (DMV) exposes these wait statistics, aggregated across all sessions, to provide a summary review of where the major waits are on any given instance. This same DMV also exposes performance (PerfMon) counters, which provide specific resource usage measurements (disk transfer rates, amount of CPU time consumed and so on).

By correlating wait statistics with resource measurements, you can quickly locate the most “contested” resources on your system, and highlight potential bottlenecks.

SQL Server 2005 Waits and Queues

The use of “waits and queues” as the basis for a performance-tuning methodology is explained in an excellent white paper by Tom Davidson. Essentially, each request to SQL Server will result in the initiation of a number of “worker tasks.”

A SQL Server Scheduler assigns each task to a worker thread. Normally there’s one SQL OS scheduler per CPU, and only one session per scheduler running at any time. It’s the scheduler’s job to spread the workload evenly between available worker threads.

If a session’s worker thread is running on the processor, the session status will be “Running,” as exposed by the Status column of sys.dm_exec_requests DMV. If a thread is ready to go, but the scheduler to which it’s assigned currently has another session running, then it will be placed in the “Runnable” queue. This means it’s in the queue to get on the processor. This is referred to as a signal wait.

The signal wait time is exposed by the signal_wait_time_ms column, and is solely CPU wait time. If a session is waiting for another resource to become available in order to proceed, such as a locked page, or if a running session needs to perform I/O, then it’s moved to the wait list. This is a resource wait and the waiting session’s status will be recorded as “Suspended.”

The reason for the wait is recorded, and exposed in the wait_type column of the sys.dm_os_wait_stats DMV. The total time spent waiting is exposed by the wait_time_ms column, so you can calculate the resource wait time as follows:

Resource waits = Total waits – Signal waits = (wait_time_ms) - (signal_wait_time_ms)

Signal waits are unavoidable in online transaction processing (OLTP) systems, which are comprised of a large number of short transactions. The key metric, with regard to potential CPU pressure, is the signal wait as a percentage of the total waits. A high-percentage signal is a sign of excessive CPU pressure. The literature tends to quote “high” as more than about 25 percent, but it depends on your system.

On our systems, we treat values greater than 10 percent to 15 percent as a worrying sign. Overall, the use of wait statistics represents a very effective means of diagnosing response times in your system. In simple terms, you either work or you wait. Response time equals service time plus the wait time.

If response times are slow and you find no significant waits, or mainly signal waits, then you need to focus on CPU. If you find response time is mainly comprised of time spent waiting for other resources (such as the network, I/O and so on) then you know exactly where to focus your tuning efforts.

Take Out the Guesswork

Mario Broodbakker has written an excellent introductory series of articles on using wait events to diagnose performance problems. Our first script in the OS category uses the sys.dm_os_wait_stats DMV, which is defined as: Returns information about all the waits encountered by threads that executed. You can use this aggregated view to diagnose performance issues with SQL Server and also with specific queries and batches.

This query calculates signal waits and resource waits as a percentage of the overall wait time, in order to diagnose potential CPU pressure:

This query is useful to help confirm CPU pressure. Because signal waits are time waiting for a CPU to service a thread, if you record total signal waits that are roughly greater than 10 percent to 15 percent, then this is a pretty good indicator of CPU pressure.

These wait stats are cumulative since SQL Server was last restarted, so you need to know your baseline value for signal waits, and watch the trend over time. You can manually clear out the wait statistics, without restarting the server, by issuing a DBCC SQLPERF command, as follows:

If your SQL Server instance has been running for quite a while, and you make a significant change like adding a new index, then you should consider clearing the old wait stats. Otherwise, the old cumulative wait stats will mask whatever impact your change has on the wait times.

Our second sample script using the sys.dm_os_wait_stats DMV will help determine the resources on which SQL Server is spending the most time waiting:

This script will help you locate the biggest bottleneck at the instance level. This helps you focus your tuning efforts on a particular type of problem. For example, if the top cumulative wait types are disk I/O-related, then you’d want to investigate this issue further using disk-related DMV queries and PerfMon counters.

Expose Performance Counters

The DMV that exposes the PerfMon counters is sys.dm_os_performance_counters, which is described as: “Returns a row per performance counter maintained by the server.” This is a useful DMV, but it can be frustrating to use. Depending on the value for cntr_type for a given row, you’ll have to go through some interesting gyrations to get meaningful information from this DMV. It’s a replacement for the old sys.sysperfinfo from SQL Server 2000.

This script helps you investigate unusual conditions filling up your transaction log. It returns the recovery model, log reuse wait description, transaction log size, log space used, percentage of log used, compatibility level and page verify option for each database on the current SQL Server instance:

This query helps evaluate an unfamiliar database server. It’s also useful from a monitoring perspective. For example, if your log reuse wait description is something unusual and your transaction log is 85 percent full, that should trigger an alarm.

Glenn Berry

Glenn Berry works as a database architect at NewsGator Technologies in Denver, Colo. He’s a SQL Server MVP, and has a whole collection of Microsoft certifications, including MCITP, MCDBA, MCSE, MCSD, MCAD and MCTS, which proves that he likes to take tests.

Louis Davidson

Louis Davidson has been in the IT industry for 16 years as a corporate database developer and architect. He has been a SQL Server Microsoft MVP for six years and has written four books on database design. Currently he’s the data architect and sometimes DBA for the Christian Broadcasting Network, supporting offices in Virginia Beach, Va., and Nashville, Tenn.

Timothy Ford

Timothy Ford is a SQL Server MVP and has been working with SQL Server for more than 10 years. He’s the primary DBA and subject-matter expert for the SQL Server platform for Spectrum Health. He’s been writing about technology since 2007 for a variety of Web sites and maintains his own blog at thesqlagentman.com, covering SQL as well as telecommuting and professional development topics.

Learn more about “SQL Server DMV Starter Pack” at red-gate.com/our-company/about/book-store.

SQL Server: SQL Server Delays Demystified

Glenn Berry, Louis Davidson and Tim Ford

SQL Server 2005 Waits and Queues

Take Out the Guesswork

Expose Performance Counters

Related Content

Additional resources