Optimizing SQL Server CPU Performance
At a Glance:
- Troubleshooting database performance issues
- Reviewing hardware causes
- Using PerfMon to track database bottlenecks
- Evaluating query performance
Troubleshooting performance problems on a database system can be an overwhelming task. Knowing where to look for trouble is important, but more crucial is knowing why your system reacts the way that it does to a particular request. A number of factors can affect CPU utilization
on a database server: compilation and recompilation of SQL statements, missing indexes, multithreaded operations, disk bottlenecks, memory bottlenecks, routine maintenance, and extract, transform, and load (ETL) activity, among others. CPU utilization is not a bad thing in itself—performing work is what the CPU is there for. The key to healthy CPU utilization is making sure that the CPU is spending its time processing what you want it to process and not wasting cycles on poorly optimized code or sluggish hardware.
Two Paths Leading to the Same Place
When viewed from a high level, there are two paths to identifying CPU performance problems. The first is reviewing the system's hardware performance, an exercise that helps determine where to look when you head down the second path, reviewing the server's query efficiency. This second path is usually more effective in identifying SQL Server™ performance issues. Unless you know exactly where your query performance issues lie, however, you should always start with a system performance evaluation. In the end, you'll usually end up taking both paths. Let's lay some groundwork so that we can review both of these paths.
Laying the Ground Work
A high-end dual-core processor will easily outperform the RAM in a machine, which will in turn be faster than an attached storage device. A good CPU can handle approximately six times the throughput of current top-end DDR2 memory and about two times that of top-end DDR3 memory. Typical memory throughput is more than 10 times that of the fastest fiber channel drives. In turn, hard disks can only perform a finite number of IOPS (input/output operations per second), a value which is entirely limited by the number of seeks per second a drive can perform. To be fair, it is not typical that only a single storage drive is used to handle all of the storage needs on enterprise database systems. Most setups today utilize Storage Area Networks (SANs) on enterprise database servers or larger RAID groups that can nullify or minimize the disk I/O Processor issue. The most important thing to remember is that no matter what your setup looks like, disk and memory bottlenecks can affect the performance of your processors.
Because of the I/O speed differences, retrieving data from disk is much more costly than retrieving data from memory. A data page in SQL Server is 8KB. An extent in SQL Server is made up of eight 8KB pages, making it equivalent to 64KB. This is important to understand because when SQL Server requests a particular data page from disk, it is not just the data page that is retrieved but the entire extent in which the data page resides. There are reasons that actually make this more cost-effective for SQL Server, but I won't go into details here. Pulling a data page that is already cached from the buffer pool, at peak performance, should take under half a millisecond; retrieving a single extent from disk should take between 2 and 4 milliseconds in an optimal environment. I typically expect a well-performing, healthy disk subsystem read to take between 4 and 10ms. Retrieving a data page from memory is generally between 4 and 20 times faster than pulling a data page from disk.
When SQL Server requests a data page, it checks the in-memory buffer cache before looking for the data page on the disk subsystem. If the data page is found in the buffer
pool, the processor will retrieve the data and then perform the work required. This is called a soft page fault. Soft page faults are ideal for SQL Server because the data that is retrieved as part of a request must be in the buffer cache before it can be used. A data page that is not found in the buffer cache must be retrieved from the server's disk subsystem. When the OS has to retrieve the data page
from disk, it's known as a hard page fault.
When correlating memory performance, disk performance, and CPU performance, a common denominator helps us put everything in perspective: throughput. In not-so-scientific terms, throughput is the measurement of how much data you can stuff down a finite pipe.
Path 1: System Performance
There are really only a few methods for determining if a server has a CPU bottleneck, and there aren't many potential causes of high CPU utilization. Some of these issues can be tracked using PerfMon or a similar system-monitoring tool while others are tracked using SQL Profiler or similar tools. Another method is to use SQL commands through Query Analyzer or SQL Server Management Studio (SSMS).
The philosophy I use when evaluating a systems performance is "Start broad, then focus deep." Obviously, you can't focus on problem areas until you've identified them. After you evaluate overall CPU utilization with a tool like PerfMon, you can use it to look at a couple of very simple and easy-to-understand performance counters.
One of the most familiar performance counters is % Processor Time; when you're in PerfMon, it's highlighted as soon as you open the Add Counter window. % Processor Time is the amount of time the processors stay busy executing work. Utilization on processors is generally considered high when this value is 80 percent or higher for most of your peak operating time. It's typical, and should be expected, that you will see spikes up to 100 percent at times even when the server is not operating with 80 percent utilization.
Another counter you should review is Processor Queue Length, which can be found under the System performance object in PerfMon. Processor Queue Length shows how many threads are waiting to perform work on the CPU. SQL Server manages its work through schedulers in the database engine, where it queues and processes its own requests. Because SQL Server manages its own work, it will only utilize a single CPU thread for each logical processor. This means that there should be minimal threads waiting in the processor queue to perform work on a system dedicated to SQL Server. Typically you shouldn't have anything higher than five times the number of physical processors on a dedicated SQL Server, but I consider more than two times problematic. On servers where the DBMS shares a system with other applications, you will want to review this along with the % Processor Time and Context Switches/sec performance counters (I'll discuss context switches shortly) to determine if your other applications or the DBMS
needs to be moved to a different server.
When I see processor queuing along with high CPU utilization, I look at the Compilations/sec and Re-Compilations/sec counters under the SQL Server: SQL Statistics performance object (see Figure 1). Compiling and recompiling query plans adds to a system's CPU utilization. You should see values close to zero for the Re-Compilations, but watch trends within your systems to determine how your server typically behaves and how many compiles are normal. Recompiles can't always be avoided, but queries and stored procedures can be optimized to minimize recompiles and to reuse query plans. Compare these values to the actual SQL statements coming into the system through the Batch Requests/sec also found in the SQL Server: SQL Statistics performance object. If the compilations and recompilations per second comprise a high percentage of the batch requests that are coming into the system, then this is an area that should be reviewed. In some situations SQL developers may not understand how or why their code can contribute to these types of system resource problems. Later in this article I'll provide some references to help you minimize this type of activity.
Figure 1 Selecting the counters to monitor (Click the image for a larger view)
While you're in PerfMon, check out the performance counter called Context Switches/sec (see Figure 2). This counter tells how many times threads have to be taken out of the OS schedulers (not SQL schedulers) to perform work for other waiting threads. Context switches are often much more frequent on database systems that are shared with other applications like IIS or other vendor application server components. The threshold that I use for Context Switches/sec is about 5000 times the number of processors in the server. This value can also be high on systems that have hyper-threading turned on and also have moderate to high CPU utilization. When CPU utilization and context switches both exceed their thresholds regularly, this indicates a CPU bottleneck. If this is a regular occurrence, you should start planning for the purchase of more or faster CPUs if your system is outdated. For further information, see the
Figure 2 Performance counters to watch
|% Processor Time
||Potential causes include memory pressure, low query plan reuse, non-optimized queries.
||> 5000 x processors
||Potential causes include other applications on the server, more than one instance of SQL Server running on the same server, hyper-threading turned on.
|Processor Queue Length
||> 5 x processors
||Potential causes include other applications on the server, high compilations or recompilations, more than one instance of SQL Server running on the same server.
||Compare to Batch Requests/sec.
||Compare to Batch Requests/sec.
||Compare with the Compilation and Re-Compilations per second.
|Page Life Expectancy
||Potential for memory pressure.
||Potential for large data cache flushes or memory pressure.
||Evaluate checkpoints against PLE and Lazy Writes/sec.
|Cache Hit Ratio: SQL Plans
||Indicates low plan reuse.
|Buffer Cache Hit Ratio
||Potential for memory pressure.
The SQL Server Lazy Writer (as it's called in SQL Server 2000) or the Resource Monitor (as it's called in SQL Server 2005) is another area to monitor when CPU utilization is high. Flushing the buffer and procedure caches can add to CPU time via the resource thread called the Resource Monitor. The Resource Monitor is a SQL Server process that determines which pages to keep and which pages need to be flushed from the buffer pool to disk. Each page in the buffer and procedure caches are originally assigned a cost representing the resources that are consumed when that page is placed into the cache. This cost value is decremented each time the Resource Monitor scans it. When a request requires cache space, the pages are flushed from memory based upon the cost associated to each page; pages with the lowest values are the first to be flushed. The Resource Monitor's activity can be tracked through the Lazy Writes/sec performance counter under the SQL Server: Buffer Manager object within PerfMon. You should track how this value trends to determine what threshold is typical on your system. This counter is usually reviewed along with the Page Life Expectancy and Checkpoints/sec counters to determine whether there is memory pressure.
The Page Life Expectancy (PLE) counter helps determine memory pressure. The PLE counter shows how long a data page stays in the buffer cache. 300 seconds is the industry-accepted threshold for this counter. Anything less than a 300-second average over an extended period of time tells you that the data pages are being flushed from memory too frequently. When this happens it causes the Resource Monitor to work harder, which in turn forces more activity onto the processors. The PLE counter should be evaluated along with the Checkpoints Pages/sec counter. When a checkpoint occurs in the system, the dirty data pages in the buffer cache are flushed to disk, causing the PLE value to drop. The Resource Monitor process is the mechanism that actually flushes these pages to disk, so during these checkpoints you should also expect to see the Lazy Writes/sec value increase. If your PLE value goes up immediately after a checkpoint is completed, you can ignore this temporary symptom. On the other hand, if you find that you are regularly below the PLE threshold there is a very good chance that additional memory will alleviate your problems and at the same time release some resources back to the CPU. All of these counters are found in the SQL Server: Buffer Manager performance object.
Path 2: Query Performance
Query plans are evaluated, optimized, compiled, and placed in the procedure cache when a new query is submitted to SQL Server. Each time a query is submitted to the server, the procedure cache is reviewed to attempt to match a query plan with a request. If one is not found then SQL Server will create a new plan for it, which is a potentially costly operation.
Some considerations for T-SQL CPU optimization are:
- Query plan reuse
- Reducing compiles and recompiles
- Sort operations
- Improper joins
- Missing indexes
- Table/index scans
- Function usage in SELECT and WHERE clauses
- Multithreaded operations
So let's pull this back into perspective a bit. SQL Server typically pulls data from both memory and from disk, and it's not often that you are working with just a single data page. More often, you have multiple parts of an application working on a record, running multiple smaller queries or joining tables to provide a full view of relevant data. In OLAP environments, your applications may be pulling millions of rows from one or two tables so that you can consolidate, roll up, and summarize data for a regional sales report. In situations like these, returning data can be measured in milliseconds if the data is in memory, but those milliseconds can turn into minutes when retrieving the same data from disk instead of RAM.
The first example is a situation with a high volume of transactions and plan reuse depends on the application. Low plan reuse causes a large number of compilations of SQL statements which in turn causes a great deal of CPU processing. In the second example, the heavy system resource utilization can cause a system's CPU to be overly active, as existing data has to be constantly flushed from the buffer cache to make room for the large volume of new data pages.
Consider a highly transactional system, where a SQL statement like the one shown below is executed 2000 times over a 15-minute period in order to retrieve shipping carton information. Without query plan reuse you could hypothetically have an individual execution time of around 450ms per statement.
If the same query plan is used after the initial execution, every subsequent query could probably run in about 2ms, bringing the total execution time down to about 5 seconds.
WHERE Carton_ID = 982350144;
Query plan reuse is critical for optimal performance on highly transactional systems and it is most often achieved by parameterizing your queries or stored procedures. Here are a few excellent resources for information on query plan reuse:
- Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx)
- Optimizing SQL Server Stored Procedures to Avoid Recompiles (sql-server-performance.com/rd_optimizing_sp_recompiles.asp)
- Query Recompilation in SQL Server 2000 (msdn2.microsoft.com/aa902682.aspx)
One helpful place that has a wealth of information is the SQL Server 2005 dynamic management views (DMVs). When CPU utilization is high, there are a couple of DMVs that I use to help me determine if the CPU is being utilized appropriately or not.
One of the DMVs I review is sys.dm_os_wait_stats, which is used to supply DBAs with a means to determine each resource type or function that SQL Server uses and it measures the amount of time the system waits because of that resource. The counters in this DMV are accumulative. This means that in order to get a clean look at what resources could be affecting different areas of the system, you will first have to issue a DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR) command to reset all of the counters after you review the data for any outstanding issues. The sys.dm_os_wait_stats DMV is the equivalent of the database consistency check DBCC SQLPERF(WAITSTATS) command in SQL Server 2000. You can find out more about the different wait types in SQL Server Books Online at msdn2.microsoft.com/
It's important to know that waits are typical in a system even when everything is running optimally. You just have to determine if the waits are being affected by a CPU bottleneck. Signal waits should be as low as possible in relation to the overall wait time. The amount of time a particular resource waits for a processor resource can be determined simply by subtracting the signal wait time from the total wait time; this value should not be greater than approximately 20 percent of the total wait time.
The sys.dm_exec_sessions DMV shows all of the open sessions on the SQL Server. This DMV provides a high-level view of the performance of each session and all of the work that each session has performed from its inception. This includes the total amount of time the session has spent waiting, total CPU usage, memory usage, and a count of reads and writes. The DMV will also provide you with the login, login time, host machine, and the last time the session made a request of SQL Server.
Using the sys.dm_exec_sessions DMV, you will be able to determine only the active sessions, so if you are seeing high CPU utilization this is one of the first places to look. Review the sessions that have a high CPU count first. Determine the application and the user that has been performing the work, then start to dig deeper. Pairing sys.dm_exec_sessions with the sys.dm_exec_requests DMV can provide much of the information that is available through the sp_who and sp_who2 stored procedures. If you join this data together along with the sys.exec_sql_text dynamic management function (DMF) through the sql_handle column, you can get the session's currently running query. The snippet in Figure 3 shows how to pull this data together to help determine what is currently happening on a server.
Figure 3 Determining server activity
FROM sys.dm_exec_sessions es
LEFT JOIN sys.dm_exec_connections ec
ON es.session_id = ec.session_id
LEFT JOIN sys.dm_exec_requests er
ON es.session_id = er.session_id
OUTER APPLY sys.dm_exec_sql_text (er.sql_handle) st
WHERE es.session_id > 50 -- < 50 system sessions
ORDER BY es.cpu_time DESC
I find that this statement helps determine which applications need to be focused on. When I compare CPU, memory, reads, writes, and logical reads for all of the sessions within an application and I determine that the CPU resource is much higher than other resources that are utilized, I start focusing on the on those SQL statements.
To track SQL statements historically for an application I use SQL Server traces. You can get to these either through the SQL Server Profiler tool or through the trace system stored procedures to help evaluate what is going on. (See the sidebar "SP Tracing" for more information on this topic.) Profiler should be reviewed for statements with high CPU utilization as well as hash and sort warnings, cache misses, and other red flags. This can help you narrow in on specific SQL statements or a specific time period that caused high resource utilization. Profiler is capable of tracking things like the SQL statement text, execution plans, CPU usage, memory usage, logical reads, writes, caching of query plans, recompiles, ejection of query plans from the cache, cache misses, table and index scans, missing statistics, and many other events.
After I have collected data from either sp_trace stored procedures or the SQL Server Profiler, I generally use a database, which is populated with trace data either after the fact or by setting the trace to write to the database. Populating the database after the fact can be done using the SQL Server system function called fn_trace_getinfo. The benefit of this approach is that I can query and sort the data in multiple ways to see what SQL statements used the most CPU or had the most reads, count how many recompiles occur, and many other things. Here's an example of how this function is used to load a table with a profiler trace file. The default specifies that all of the trace files for that trace will be loaded in the order that they were created:
SELECT * INTO trc_20070401
FROM fn_trace_gettable('S:\Mountpoints\TRC_20070401_1.trc', default);
Wrapping It Up
As you've seen, high CPU utilization doesn't necessarily indictate that there is a CPU bottleneck. High CPU utilization can be masking a number of other application or hardware bottlenecks too. Once you've identified that you have high CPU utilization despite your other counters looking healthy, you can start looking for the cause within the system, and isolate a solution (whether it be purchasing more CPUs or optimizing your SQL code). And whatever you do, don't give up! With the tips provided in this article, along with a bit of practice and research, optimizing CPU utilization under SQL Server is an attainable execution plan.
Zach Nichter is a SQL Server professional with more than 10 years of experience. He has held a number of SQL Server support roles including DBA, team lead, manager, and consultant. Currently, Zach is employed by Levi Strauss & Co. as the DBA Architect, focusing on SQL Server performance, monitoring, and architecture as well as other strategic initiatives. In addition, Zach is the author of a video blog found on www.sqlcatch.com.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited