Chapter 6 - Performance Tuning
This chapter provides information to help you configure Microsoft SQL Server 7.0 for maximum performance and to help you determine the cause of poor performance in a SQL Server environment. This chapter also provides information about using SQL Server indexes and SQL Server tools for analyzing the I/O performance efficiency of SQL Server queries.
Performance Tuning Comparison
The table compares SQL Server version 7.0 behaviors with behaviors of earlier versions of SQL Server.
SQL 6.x considerations
SQL Server 7.0 enhancements and philosophies
There are many configuration options to consider during performance tuning.
The database engine has become self-configuring, self-tuning, and self-managing. The lazy writer and Read Ahead Manager are self-tuning. The max async IO option is the only sp_configure option that must be configured initially, and then only when you are working with servers with large amounts of storage. This reduction in tuning requirements saves valuable administrative time, which can be applied to other tasks.
There are fewer sp_configure options that you must adjust manually to achieve good SQL Server performance. Although it is still possible to manually configure and adjust many of the sp_configure options available in earlier versions of SQL Server, it is recommended that you allow SQL Server to automatically configure and tune all sp_configure options for which SQL Server provides defaults. This allows SQL Server to adjust automatically the configuration of the database server as factors affecting the server change, such as RAM and CPU utilization for SQL Server and other applications running on the database server.
Manual tuning of lazy writer is sometimes necessary.
Unlike earlier versions of Microsoft SQL Server, Microsoft SQL Server 7.0 automatically configures and tunes the lazy writer. You no longer need to manually tune free buffer and max lazywrite IO. Free buffer and lazy writer I/O activity still can be monitored with SQL Server 7.0 Performance Monitor objects.
Manual tuning of checkpoint is sometimes necessary.
In earlier versions of Microsoft SQL Server, the recovery interval option also was used to tune the checkpoint process. When the recovery interval option is set to the default of 0, Microsoft SQL Server 7.0 automatically monitors and tunes the recovery interval. The default setting maintains recovery times of less than one minute for all databases, as long as there are no exceptionally long-running transactions present on the system. For more information, see SQL Server Books Online.
SQL Server 6.x log pages share RAM cache with data pages. Manual tuning of the log manager is sometimes necessary.
The SQL Server log manager has changed significantly. The SQL Server 7.0 log manager manages its own log cache. SQL Server no longer depends on the syslogs table as it does in earlier versions of SQL Server. Separating the log file management from the data cache management brings enhanced performance for both components.
Page splitting is costly to a database server operation.
In earlier versions of Microsoft SQL Server, B-tree index pages were high maintenance due to significant row pointer recalculation necessary on the index pages during page splits (page splits occur as insert rows fill a data or index page). This issue has been minimized with the SQL Server 7.0 storage structures. Nonclustered index pages now use either Fixed RID (Row ID) for tables with no clustered index (these tables are called heaps) or a Clustering Key for tables with a clustered index. B-tree maintenance activities during inserts and page splits have been dramatically reduced. The overall effect is that index maintenance is much faster, which means that more nonclustered indexes can be created on a table with less impact on data modification performance. For more information, see SQL Server Books Online.
SQL Server Profiler replaces the SQL Trace utility in SQL Server 6.5. SQL Server Profiler provides similar but significantly enhanced functionality.
SQL Server Query Analyzer replaces ISQL/W from earlier versions of SQL Server.
Devices and segments
Files and filegroups replace the device and segment model used in earlier versions. Files and filegroups provide a more convenient method for spreading data proportionately across disk drives or RAID arrays. For more information, see SQL Server Books Online.
Principles for Performance Tuning SQL Server
Microsoft SQL Server 7.0 introduces and enhances methods and tools to tune SQL Server for optimum performance. Keep these principles in mind when you are tuning SQL Server:
Let SQL Server do most of the tuning.
Microsoft SQL Server has been enhanced to create an auto-configuring and self-tuning database server. Take advantage of the auto-tuning settings available with SQL Server. These settings help SQL Server run at peak performance even as user load and queries change over time.
RAM is a limited resource.
An integral feature of the database server environment is the management of RAM buffer cache. Access to data in RAM cache is much faster than access to the same information from disk, but RAM is a limited resource. If database I/O can be reduced to the minimum required set of data and index pages, these pages will stay in RAM longer. Too much unneeded data and index information flowing into buffer cache quickly push out valuable pages. The focus of performance tuning is to reduce I/O so that buffer cache is best utilized.
Pick good indexes.
A key factor in maintaining minimum I/O for all database queries is to ensure that good indexes are created and maintained.
Evaluate disk I/O subsystem performance.
The physical disk subsystem must provide a database server with sufficient I/O processing power for the database server to run without disk queuing. Disk queuing results in poor performance. For more information, see "Disk I/O Performance" later in this chapter.
Tune applications and queries.
Tuning your applications and queries becomes especially important when a database server will service requests from hundreds or thousands of connections by way of a given application. Because applications typically determine the SQL queries that will be executed on a database server, application developers must understand SQL Server architectural basics and how to take full advantage of SQL Server indexes to minimize I/O.
Take advantage of SQL Server Profiler and Index Tuning Wizard.
SQL Server Profiler can be used to monitor and log a SQL Server workload, which can then be submitted to the Index Tuning Wizard to tune indexes for better performance. Regular use of SQL Server Profiler and the Index Tuning Wizard helps you optimize the indexes, allowing SQL Server to perform well with changing query workloads.
Take advantage of SQL Server Performance Monitor.
SQL Server 7.0 provides a revised set of Performance Monitor objects and counters, which are designed to provide helpful information for monitoring and analyzing the operations of SQL Server. For more information, see "Key Performance Monitor Counters" later in this chapter.
Take advantage of graphical showplan and SQL Server Query Analyzer.
SQL Server 7.0 Query Analyzer introduces Graphical Showplan, an enhancement to help analyze problematic Transact-SQL queries. SQL Server Query Analyzer also includes STATISTICS IO, another important tool option for tuning queries.
Review the max async IO Option During Configuration
The max async IO option should be reviewed and adjusted if necessary during your initial configuration of Microsoft SQL Server 7.0.
The max async IO option default of 32 is sufficient for lower-end disk subsystems. With a higher-end RAID storage subsystem attached to a database server that is capable of high disk I/O transfer rates, the setting of 32 may be inadequate because the RAID subsystem is capable of completing many more simultaneous disk transfer requests than 32. If the SQL Server write activity also dictates the need for more disk transfer capability, the max async IO option should be set higher.
An appropriate value for the max async IO option is one that allows checkpoint to finish before another checkpoint is needed (based upon desired recovery characteristics), but not to finish so fast that the system is seriously stressed by the event. For more information, see "Lazy Writer" and "Checkpoint" later in this chapter.
A general rule to follow for setting the max async IO option for SQL Servers running on large disk subsystems is to multiply by two or three the number of physical drives available to do simultaneous I/O. Then watch Performance Monitor for signs of disk activity or queuing issues. The negative impact of setting this configuration option too high is that it may cause checkpoint to monopolize disk subsystem bandwidth that is required by other SQL Server I/O operations such as reads.
To set the max async IO option value, execute this command in SQL Server Query Analyzer:
sp_configure 'max async io', value
value is expressed as the number of simultaneous disk I/O requests that SQL Server system can submit to the Windows operating system during a checkpoint operation, which in turn submits the requests to the physical disk subsystem. For more information, see "Disk I/O Tuning Performance" later in this chapter. This configuration option is dynamic and does not require a stop and restart of SQL Server to take effect.
For more information, see SQL Server Books Online.
Components that Consume CPU and Disk I/O Resources
You can optimize SQL Server performance with some careful attention to Microsoft SQL Server components that can consume resources.
Microsoft SQL Server maintains a pool of Windows operating-system threads to service batches of SQL Server commands submitted to the database server. The total of these threads (called worker threads) available to service all incoming command batches is dictated by the setting for the sp_configure option max worker threads. If the number of connections actively submitting batches is greater than the number specified for max worker threads, the worker threads are shared among connections actively submitting batches. The default setting of 255 works well for many installations.
Worker threads write out most of the dirty 8-KB pages from the SQL Server buffer cache. I/O operations are scheduled by worker threads asynchronously for maximum performance.
For more information, see SQL Server Books Online.
SQL Server lazy writer helps produce free buffers, which are 8-KB data cache pages without any data contained in them. As lazy writer flushes each 8-KB cache buffer out to disk, it initializes the cache page identity so that other data can be written into the free buffer. Lazy writer produces free buffers during periods of low disk I/O, so disk I/O resources are readily available for use and there is minimal impact on other SQL Server operations.
SQL Server 7.0 automatically configures and manages the level of free buffers. Monitor the SQL Server: Buffer Manager - Free Buffers object to ensure that the free buffer level remains steady. Lazy writer ensures that the level of free buffers keeps up with the user demand for free buffers. The SQL Server: Buffer Manager - Free Buffers object should not drop to 0 because this indicates there were times the user load demanded a higher level of free buffers than the SQL Server lazy writer was able to provide.
If the lazy writer cannot keep the free buffer steady, or at least above 0, it might mean the disk subsystem cannot provide lazy writer with the disk I/O performance that it needs to maintain the free buffer level (compare drops in free buffer level to any disk queuing to confirm there is a disk subsystem problem). One solution to the disk queuing problem is to add more physical disk drives (also called spindles) to the database server disk subsystem to provide more disk I/O processing power. The SQL Server: Buffer Manager – Lazy Writes/sec object indicates the number of 8-KB pages written to disk by lazy writer.
Monitor the current level of disk queuing in Performance Monitor by looking at the counters for (logical or physical) Disk: Average Disk Queue or Current Disk Queue and ensure the disk queue is at a level less than 2 for each physical drive associated with any SQL Server activity. For database servers that employ hardware RAID controllers and disk arrays, divide the number reported by (logical or physical) disk counters by the number of actual hard disk drives associated with that logical drive letter or physical hard disk drive number reported by the Windows NT Disk Administrator program. Windows and SQL Server are unaware of the actual number of physical hard disk drives attached to a RAID controller. You should know the number of drives associated with RAID array controller to interpret the disk queue numbers Performance Monitor reports.
Adjust lazy writer disk I/O request behavior with the max async IO option, which controls the number of 8-KB disk write requests that SQL Server (including requests coming in from lazy writer, checkpoint, and the worker threads) can simultaneously submit to the Windows operating system and in turn, to the disk I/O subsystem. If disk queuing occurs at unacceptable levels, decrease the level of the max async IO option. If the currently configured level of the max async IO option must be maintained, add more disks to the disk subsystem until disk queuing reaches acceptable levels.
For more information, see SQL Server Books Online.
Checkpoint writes out dirty pages to the SQL Server data files. Dirty pages are any buffer cache pages that have been modified since being brought into the buffer cache. A buffer written to disk by checkpoint still contains the page, and users can read or update it without rereading it from disk, which is not the case for free buffers created by lazy writer.
Checkpoint allows worker threads and lazy writer to do most of the work writing out dirty pages by waiting an extra checkpoint before writing out a dirty page. This gives the worker threads and lazy writer more time to write out the dirty pages. The conditions under which this extra wait time occurs are explained in your SQL Server documentation. Checkpoint evens out SQL Server disk I/O activity over a longer time period with an extra checkpoint wait.
To make checkpoint more efficient when there are many pages to flush out of cache, SQL Server sorts the data pages to be flushed in the order the pages appear on disk. This sorting helps minimize disk arm movement during cache flush and potentially allows checkpoint to take advantage of sequential disk I/O. Checkpoint also submits 8-KB disk I/O requests asynchronously to the disk subsystem. This allows SQL Server to finish submitting required disk I/O requests faster because checkpoint does not wait for the disk subsystem to report back that the data actually has been written to disk.
You should watch disk queuing on hard disk drives associated with SQL Server data files to notice if SQL Server is sending down more disk I/O requests than the disk(s) can handle. If it is, then more disk I/O capacity must be added to the disk subsystem to handle the load.
Adjust the checkpoint dirty page flushing behavior by using the max async IO option. The sp_configure option max async IO controls the number of 8-KB cache flushes that checkpoint can submit simultaneously to the Windows operating system and in turn, to the disk I/O subsystem. If disk queuing occurs at unacceptable levels, decrease the max async IO option. If SQL Server must maintain the currently configured level of the max async IO option, add more disks to the disk subsystem until disk queuing decreases to acceptable levels.
If you must increase the speed with which SQL Server executes checkpoint and the disk subsystem is powerful enough to handle the increased disk I/O while avoiding disk queuing, then increase the max async IO option to allow SQL Server to send more asynchronous disk I/O requests. Observe the disk queuing counters carefully after you change the max async IO option. Be sure to watch disk read queuing in addition to disk write queuing. If the max async IO option is set too high for a disk subsystem, checkpoint may queue up many disk write I/O requests, which can cause SQL Server read activity to be blocked. Physical disk and logical disk objects in Performance Monitor provide the Average Disk Read Queue Length counter, which can be used to monitor queued disk read I/O requests. If disk read queuing is caused by checkpoint, you either can decrease the max async IO option or add more hard disk drives so the checkpoint and read requests can be handled simultaneously.
Like other major RDBMS products, SQL Server ensures that all write activity (inserts, updates, and deletes) performed on the database is not lost if something interrupts the SQL Server online status (for example, power failure, disk drive failure, fire in the data center, and so on). The SQL Server logging process helps guarantee recoverability. Before any implicit (single Transact-SQL query) or explicit (transaction that issues BEGIN TRANSACTION, COMMIT, or ROLLBACK statements) transactions can be completed, the SQL Server log manager must receive a signal from the disk subsystem that all data changes associated with the transaction have been written successfully to the associated log file. This rule guarantees the transaction log can be read and reapplied in SQL Server when the server is turned on after an abrupt shut down during which the transactions written into the data cache are not yet flushed to the data files. Flushing data buffers are checkpoint or lazy writer responsibility. Reading the transaction log and applying the transactions to SQL Server after a server stoppage is referred to as recovery.
Because SQL Server must wait for the disk subsystem to complete I/O to SQL Server log files as each transaction is completed, disks containing SQL Server log files must have sufficient disk I/O handling capacity for the anticipated transaction load.
The method for monitoring disk queuing is not the same for SQL Server log files and for SQL Server database files. You can use the Performance Monitor counters SQL Server: Databases database instance : Log Flush Waits Times and SQL Server: Databases database instance : Log Flush Waits/sec to view log writer requests waiting on the disk subsystem for completion.
For highest performance, you can use a caching controller for SQL Server log files if the controller guarantees that data entrusted to it is written to disk eventually, even if the power fails. For more information about caching controllers, see "Effect of On-board Cache of Hardware RAID Controllers" later in this chapter and SQL Server Books Online.
Read Ahead Manager
The Microsoft SQL Server 7.0 Read Ahead Manager is completely self-configuring and self-tuning. Read Ahead Manager is tightly integrated with the operations of SQL Server query processor. SQL Server query processor identifies and communicates situations that will benefit from read-ahead scans to the Read Ahead Manager. Large table scans, large index range scans, and probes into clustered and nonclustered index B-trees are situations that would benefit from a read-ahead. Read-ahead reads occur with 64-KB I/Os, which provide higher disk throughput potential for the disk subsystem than 8-KB I/Os do. When a large amount of data must be retrieved from SQL Server, read-ahead is the best way to do it.
Read Ahead Manager benefits from the simpler and more efficient Index Allocation Map (IAM) storage structure. The IAM is the SQL Server 7.0 method of recording the location of extents (eight pages of SQL Server data or index information for a total of 64 KB of information per extent). The IAM is an 8-KB page that tightly packs information through a bitmap about which extents within the range of extents covered by the IAM contain required data. The compact IAM pages are fast to read and tend to keep regularly used IAM pages in buffer cache.
The Read Ahead Manager can construct multiple sequential read requests by combining the query information from query processor and quickly retrieving the location of all extents that must be read from the IAM page(s). Sequential 64-KB disk reads provide excellent disk I/O performance.
Read-ahead activity is monitored by the SQL Server: Buffer Manager - Readahead Pages counter. You can find more information about read-ahead activity by executing the DBCC PERFMON (IOSTATS) statement. Some of the information provided is RA Pages Found in Cache and RA Pages Placed in Cache. If the page is already hashed (the application read it in first and read-ahead wasted a read), it is a page found in cache. If the page is not already hashed (a successful read-ahead), it is a page placed in cache.
Too much read-ahead can be detrimental to overall performance because it can fill cache with unnecessary pages, requiring additional I/O and CPU that could have been used for other purposes. The solution is a performance tuning goal that all Transact-SQL queries are tuned so a minimal number of pages are brought into buffer cache. This includes using the right index for the right job. Save clustered indexes for efficient range scans and define nonclustered indexes to help locate single rows or smaller rowsets quickly.
For more information, see SQL Server Books Online.
Disk I/O Performance
When you configure a SQL Server that contains only a few gigabytes of data and does not sustain heavy read or write activity, you are not as concerned with disk I/O and balancing SQL Server I/O activity across hard disk drives for maximum performance. If you build larger SQL Server databases, however, that contain hundreds of gigabytes of data and/or will sustain heavy read and/or write activity, you must configure SQL Server to maximize disk I/O performance by load-balancing across multiple hard disk drives.
Advertised Disk Transfer Rates and SQL Server
An important aspect of database performance tuning is I/O performance tuning. Unless SQL Server is running on a computer with enough RAM to hold the entire database, I/O performance is dictated by how fast reads and writes of SQL Server data can be processed by the disk I/O subsystem.
The typical hard disk drive is capable of providing the Windows operating system and SQL Server with about 75 nonsequential (random) and 150 sequential I/O operations per second. Advertised transfer rates for these hard disk drives are around 40 megabytes (MB) per second. It is much more likely for a database server to be constrained by the 75/150 I/O transfers per second than the 40-MB per second transfer rate. This is illustrated by these calculations:
(75 random I/O operations per second) x (8-KB transfer) = 600 KB per second
This calculation indicates that by doing strictly random single-page read and write operations on a hard disk drive, you can expect at most 600 KB (0.6 MB) per second I/O processing capability from that hard disk drive. This is much lower than the advertised 40 MB per second I/O handling capacity of the drive. SQL Server worker threads, checkpoint, and lazy writer perform I/O in 8-KB transfer sizes.
(150 sequential I/O operations per second) x (8-KB transfer) = 1200 KB per second
This calculation indicates by doing strictly sequential single-page read and write operations on a hard disk drive, it is reasonable to expect at most 1200 KB (1.2 MB) per second I/O processing capability from the hard disk drive.
(75 random I/O operations per second) x (64-KB transfer) = 4800 KB (4.8 MB) per second
This calculation illustrates a worst-case scenario for read-aheads, assuming all random I/O. Even in the completely random situation, the 64-KB transfer size provides much better disk I/O transfer rate from disk (4.8 MB per second) than the single-page transfer rates (0.6 and 1.2 MB per second):
(150 sequential I/O operations per second) x (64-KB transfer) = 9600 KB (9.6 MB) per second
This calculation indicates that by doing strictly sequential read or write operations on a hard disk drive, you can expect at most 9.6 MB per second I/O processing capability from that hard disk drive. This is much better than the random I/O case. SQL Server Read Ahead Manager performs disk I/O in the 64-KB transfer rate and attempts to arrange reads so read-ahead scans are done sequentially (often referred to as serially or in disk order). Although Read Ahead Manager performs I/O operations sequentially, page splitting tends to cause extents to be read nonsequentially rather than sequentially. This is one reason to eliminate and prevent page splitting.
Log manager writes sequentially to log files up to 32 KB.
Sequential vs. Nonsequential Disk I/O Operations
The terms sequential and nonsequential (random) have been used to refer to hard disk drive operations. A single hard disk drive consists of a set of drive platters, each of which provides services for read and write operations with a set of arms with read and write heads that can move across the platters and read information from the drive platter or write data onto the platters. Remember these points about hard disk drives and SQL Server:
Using sequential I/O is good for SQL Server performance.
The read/write heads and associated disk arms must move to find and operate on the location of the hard disk drive platter that SQL Server and the Windows operating system requested. If the data is located on nonsequential locations on the hard disk drive platter, it takes significantly more time for the hard disk drive to move the disk arm and read/write head to all of the necessary hard disk drive platter locations. This contrasts with the sequential case, in which all of the required data is located on one continuous physical section of the hard disk drive platter, so that the disk arm and read/write heads move a minimal amount to perform the necessary disk I/O operations. The time difference between the nonsequential versus sequential case is significant, about 50 milliseconds per nonsequential seek versus approximately 2 to 3 milliseconds for sequential seeks. These times are rough estimates and will vary based on how far apart the nonsequential data is spread on the disk, how fast the hard disk platters can spin (RPM), and other physical attributes of the hard disk drive.
Using Read Ahead Manager and separating SQL Server log files from other nonsequentially accessed files can improve SQL Server performance.
A typical hard disk drive supports about 75 nonsequential and 150 sequential I/Os per second. It takes almost as much time to read or write 8 KB as it does to read or write 64 KB. Within the range of 8 KB to about 64 KB the disk arm and read/write head movement account for most of the time of a single disk I/O transfer operation. Perform 64-KB disk transfers as often as possible when more than 64 KB of SQL data must be transferred, because a 64-KB transfer is fast as an 8-KB transfer and eight times the amount of SQL Server data is processed for each transfer. Read Ahead Manager does disk operations in 64-KB chunks (referred to as a SQL Server extent). Log manager also performs sequential writes in larger I/O sizes.
Disk I/O Transfer Rates and PCI Bus Bandwidth
A typical hard disk provides a maximum transfer rate of about 40 MB per second or 75 nonsequential/150 sequential disk transfers per second. Typical RAID controllers have an advertised transfer rate of about 40 MB per second or approximately 2,000 disk transfers per second. PCI buses have an advertised transfer rate of about 133 MB per second and higher. The actual transfer rates achievable for a device usually differ from the advertised rate. You should understand how to use these transfer rates as a starting point for determining the number of hard disk drives to associate with each RAID controller and in turn, how many drives and RAID controllers can be attached to a PCI bus without I/O bottlenecks.
The maximum amount of SQL Server data that can be read from or written to a hard disk drive per second is 9.6 MB. Assuming a RAID controller can handle 40 MB per second, you can calculate the number of hard disk drives that should be associated with one RAID controller by dividing 40 by 9.6 to get about 4. This means that at most 4 drives should be associated with that one controller when SQL Server is doing nothing but sequential I/O of 64 KB. Similarly, with all nonsequential I/O of 64 KB, the maximum data sent from the hard disk drive to the controller is 4.8 MB per second. Dividing 40 MB per second by 4.8 MB per second gives approximately 8 as the result. This means that at most 8 hard disk drives should be associated with the single controller in the nonsequential 64-KB scenario. The random 8-KB data transfer scenario requires the most drives. Divide 40 by 0.6 to determine that about 66 drives are needed to saturate a RAID controller doing 100 percent random 8-KB reads and writes. This is not a realistic scenario because read-ahead and log-writing use transfer sizes greater than 8-KB and it is unlikely that a SQL Server will perform 100 percent random I/O.
You can also determine how many drives should be associated with a RAID controller by looking at disk transfers per second instead of looking at the megabytes per second. If a hard disk drive is capable of 75 nonsequential (random) I/Os per second, then about 26 hard disk drives working together could theoretically produce 2,000 nonsequential I/Os per second, or enough to hit the maximum I/O handling capacity of a single RAID controller. Alternately, it takes only about 13 hard disk drives working together to produce 2,000 sequential I/Os per second and keep the RAID controller running at maximum throughput because a single hard disk drive can sustain 150 sequential I/Os per second.
RAID controller and PCI bus bottlenecks are not as common as I/O bottlenecks related to hard disk drives. For example, assume a set of hard disk drives associated with a RAID controller is busy enough to push 40 MB per second of throughput through the controller. The next consideration is how many RAID controllers can be attached safely to the PCI bus without risking a PCI bus I/O bottleneck. To make an estimate, divide the I/O processing capacity of the PCI bus by the I/O processing capacity of the RAID controller: 133 MB/sec divided by 40 MB/sec results in approximately three RAID controllers that can be attached to a single PCI bus. Most large servers come with more than one PCI bus, which increases the number of RAID controllers that can be installed in a single server.
These calculations illustrate the relationship of the transfer rates of the components that comprise a disk I/O subsystem (hard disk drives, RAID controllers, PCI bus) and are not literal figures. These calculations assume all sequential or all nonsequential data access, which is not likely in a production database server environment. In reality, a mixture of sequential, nonsequential, 8-KB and 64-KB I/O occurs. Additional factors influence how many I/O operations can be pushed through a set of hard disk drives at one time. On-board read/write caching available for RAID controllers increases the amount of I/O that a set of drives can effectively produce. How much more is difficult to estimate for the same reason that it is difficult to determine an exact number an 8-KB versus a 64-KB I/O SQL Server environment needs.
When scaling databases more than a few gigabytes (GB), it is important to have at least a basic understanding of RAID (Redundant Array of Inexpensive Disks) technology and how it relates to database performance.
These are the benefits of RAID:
Hardware RAID controllers divide read/writes of all data from Windows and applications such as Microsoft SQL Server into slices (usually 16 KB to 128 KB) that are spread across all disks participating in the RAID array. Splitting data across physical drives distributes the read/write I/O workload evenly across all physical hard disk drives participating in the RAID array. This increases disk I/O performance because the hard disks participating in the RAID array are all kept equally busy, instead of some disks becoming a bottleneck due to uneven distribution of I/O requests.
RAID provides protection from hard disk failure and accompanying data loss with two methods: mirroring and parity.
Mirroring is implemented by writing information onto two sets of drives, one on each side of the mirrored pairs of drives. If there is a drive loss with mirroring in place, the data for the lost drive can be rebuilt by replacing the failed drive and rebuilding the data from the failed drive's matching drive. Most RAID controllers can provide a failed drive replacement and rebuild from the other side of the mirrored pair while Windows and SQL Server are online (referred to as "Hot Plug" capable drives). Mirroring is the best performing RAID option when fault tolerance is required. Each SQL Server write in the mirroring situation costs two disk I/O operations, one to each side of the mirrorset. Mirroring also provides more fault tolerance than parity RAID implementations. Mirroring can sustain at least one failed drive and may be able to survive failure of up to half of the drives in the mirrorset without forcing the system administrator to shut down the server and recover from file backup. The disadvantage of mirroring is cost. The disk cost of mirroring is one drive for each drive of data. RAID 1 and its hybrid, RAID 0+1 are implemented through mirroring.
Parity is implemented by calculating recovery information about data written to disk and writing this parity information on the other drives that form the RAID array. If a drive fails, a new drive is inserted into the RAID array and the data on that failed drive is recovered by taking the recovery information (parity) written on the other drives and using this information to regenerate the data from the failed drive. RAID 5 and its hybrids are implemented through parity. The advantage of parity is low cost. To protect any number of drives with RAID 5, only one additional drive is required. Parity information is evenly distributed among all drives participating in the RAID 5 array. The disadvantages of parity are performance and fault tolerance. Due to the additional costs associated with calculating and writing parity, RAID 5 requires four disk I/O operations for each Windows NT and SQL Server write as compared to two disk I/O operations for mirroring. Read I/O operation costs are the same for mirroring and parity. Also, RAID 5 can sustain only one failed drive before the array must be taken offline and recovery from backup media must be performed to restore data.
A general rule is to stripe across as many disks as necessary to achieve solid disk I/O performance. Performance Monitor indicates whether there is a disk I/O bottleneck on a particular RAID array. Be ready to add disks and redistribute data across RAID arrays and/or SCSI channels as necessary to balance disk I/O and maximize performance.
Effect of On-board Cache of Hardware RAID Controllers
Many hardware RAID controllers have some form of read and/or write caching. Take advantage of this available caching with SQL Server because it can enhance the effective I/O handling capacity of the disk subsystem. The principle of these controller-based caching mechanisms is to gather smaller and potentially nonsequential I/O requests coming in from the host server (hence SQL Server) and try to batch them with other I/O requests so the batched I/Os can form larger (32 KB to 128 KB) and maybe sequential I/O requests to send to the hard disk drives. This helps produce more disk I/O throughput given the fixed number of I/Os that hard disks can provide to the RAID controller. The RAID controller cache arranges incoming I/O requests by making the best use of the hard disks' underlying I/O processing ability.
RAID controllers usually protect their caching mechanisms with a form of backup power. The backup power can preserve the data written in cache for a period of time (perhaps days) in case of a power outage. And in production environments, the backup power can provide the database server greater protection by providing adequate UPS protection to the server to flush data to disk if power to the server is disrupted.
RAID 1 and RAID 0+1 offer the best data protections and best performance among RAID levels but costs more required disks. When cost of hard disks is not a limiting factor, RAID 1 or RAID 0+1 are the best RAID choices for performance and fault tolerance.
RAID 5 provides fault tolerance at the best cost but has half the write performance of RAID 1 and 0+1 because of the additional I/O that RAID 5 must do reading and writing parity information onto disk. RAID 5 is not as fault tolerant as RAID 1 and 0+1.
The best disk I/O performance is achieved with RAID 0 (disk striping with no fault tolerance protection), but because there is no fault tolerance with RAID 0, this RAID level typically can be used only for development database servers or other testing environments.
Many RAID array controllers provide the option of RAID 0+1 (also referred to as RAID 1/0 and RAID 10) over physical hard disk drives. RAID 0+1 is a hybrid RAID solution. On the lower level, it mirrors all data, like RAID 1. On the upper level, the controller stripes data across all of the drives, like RAID 0. Thus, RAID 0+1 provides maximum protection (mirroring) with high performance (striping). These mirroring and striping operations are transparent to Windows NT and SQL Server because they are managed by the RAID controller. The difference between RAID 1 and RAID 0+1 is on the hardware controller level. RAID 1 and RAID 0+1 require the same number of drives for a given amount of storage. For more specifics about RAID 0+1 implementation of specific RAID controllers, contact the hardware vendor that produced the controller.
This illustration compares RAID 0, RAID 1, RAID 5, and RAID 0+1. To hold four disks of data, RAID 1 (and RAID 0+1) needs eight disks, whereas Raid 5 needs five disks. Be sure to involve the appropriate hardware vendors to learn more about RAID implementation specific to the hardware running the database server.
Online RAID Expansion
Online RAID expansion is a feature that allows disks to be added dynamically to a physical RAID array while SQL Server is online, as long as there are hot-plug slots available. Many hardware vendors offer hardware RAID controllers capable of providing this functionality. Data is automatically restriped across all drives evenly, including the newly added drive, and there is no need to shut down SQL Server or Windows. You can take advantage of this functionality by leaving hot-plug hard disk drive slots free in the disk array cages. Thus, if SQL Server is regularly over-taxing a RAID array with I/O requests (this is indicated by disk queue length for the Windows logical drive letter associated with that RAID array), you can install one or more new hard disk drives into the hot-plug slot while SQL Server is still running. The RAID controller redistributes some existing SQL data to these new drives so that SQL data is evenly distributed across all drives in the RAID array. Then, the I/O processing capacity of the new drives (75 nonsequential/150 sequential I/Os per second, per drive) is added to the overall I/O processing capacity of the RAID array.
Performance Monitor and RAID
In Windows NT Performance Monitor, logical and physical disk objects provide the same information. The difference is that logical disks in Performance Monitor are associated with what Windows NT interprets as a logical drive letter. Physical disks in Performance Monitor are associated with what Windows NT interprets as a single physical hard disk.
To enable Performance Monitor counters, use the command diskperf.exe from the command line of the command-prompt window. Use diskperf –y so that Performance Monitor reports logical and physical disk counters when using hard disk drives or sets of hard disk drives and RAID controllers, without the use of Windows NT software RAID.
When running Windows NT software RAID, use diskperf –ye so that Performance Monitor correctly reports physical counters across the Windows NT stripe sets. When diskperf –ye is used in conjunction with Windows NT stripe sets, logical counters do not report correct information and must be disregarded. If logical disk counter information is required in conjunction with Windows NT stripe sets, use diskperf –y instead. With diskperf -y and Windows NT stripe sets, logical disk counters are reported correctly but physical disk counters do not report correct information and should be disregarded.
The effects of the diskperf -y command do not occur until Windows NT has been restarted.
Hardware RAID controllers present multiple physical hard disk drives that compose a single RAID mirror set or stripe set to the Windows operating system, as one single physical. The Windows NT Disk Administrator application is used to associate logical drive letters to the single physical disk and does not need to know how many hard disks are actually associated with the single hard physical that the RAID controller has presented to it.
You should know how many physical hard disk drives are associated with a RAID array so you can determine the number of disk I/O requests that the Windows operating system and SQL Server send to each physical hard disk drive. Divide the number of disk I/O requests that Performance Monitor reports as associated with a hard disk drive by the number of actual physical hard disk drives known to be in the RAID array.
To estimate I/O activity per hard disk drive in a RAID array, multiply the number of disk write I/Os reported by Performance Monitor by either 2 (RAID 1 and 0+1) or 4 (RAID 5). This accurately accounts for the number of I/O requests being sent to the physical hard disk drives. It is at this physical level that the I/O capacity for hard disk drives apply (75 nonsequential and 150 sequential per drive). But do not expect to calculate exactly how much I/O is hitting the hard disk drives when the hardware RAID controller is using caching, because caching can change the amount of I/O that is hitting the hard disk drives.
It is best to monitor on disk queuing unless I/O is causing a problem. The Windows operating system cannot see the number of physical drives in a RAID array, so to assess disk queuing per physical disk accurately, you must divide the disk queue length by the number of physical drives participating in the hardware RAID disk array that contains the logical drive being observed. Keep this number under two for hard disk drives containing SQL Server files.
For more information about SQL Server and RAID, see SQL Server Books Online.
Windows NT Software RAID
Windows NT provides fault tolerance to hard disk failure by providing mirror sets and stripe sets (with or without fault tolerance) through the Windows NT operating system, instead of through a hardware RAID controller. The Windows NT Disk Administrator application is used to define either mirror sets (RAID 1) or stripe sets with parity (RAID 5). Windows NT Disk Administrator also allows the definition of stripe sets with no fault tolerance (RAID 0).
Software RAID uses more CPU resources because Windows NT is the component managing the RAID operations, versus the hardware RAID controller. Thus, performance with the same number of disk drives and Windows NT software RAID may be a few percent less than the hardware RAID solution if the system processors are near 100 percent utilized. But Windows NT software RAID generally helps a set of drives to service SQL Server I/O better than those drives would have been able to do separately. This can reduce the potential for an I/O bottleneck, leading to higher CPU utilization by SQL Server and better throughput. Windows NT software RAID can provide a better-cost solution for providing fault tolerance to a set of hard disk drives.
For more information about configuring Windows NT software RAID, see Online Help for Windows NT and SQL Server Books Online.
Disk I/O Parallelism
With smaller SQL Server databases located on a few disk drives, disk I/O parallelism is not a likely performance factor. But with large SQL Server databases stored on many disk drives, performance is enhanced by using disk I/O parallelism to make optimal use of the I/O processing power of the disk subsystem.
Microsoft SQL Server 7.0 introduces files and filegroups, which replace the device and segment model of earlier versions of SQL Server. The files and filegroups provide a more convenient method for spreading data proportionately across disk drives or RAID arrays. For more information, see SQL Server Books Online.
A technique for creating disk I/O parallelism is to create a single "pool of drives" that serves all SQL Server database files, excluding transaction log files. The pool can be a single RAID array that is represented in Windows NT as a single physical disk drive. Or a larger pool can be set up using multiple RAID arrays and SQL Server files/filegroups. A SQL Server file can be associated with each RAID array, and the files can be combined into a SQL Server filegroup. Then a database can be built on the filegroup so that the data is spread evenly across all of the drives and RAID controllers. The drive pool methodology depends on RAID to divide data across all physical disk drives to help ensure parallel access to the data during database server operations.
The pool methodology simplifies SQL Server I/O performance tuning because there is only one physical location to create database objects. The single pool of drives can be watched for disk queuing and, if necessary, more hard disk drives can be added to the pool to prevent disk queuing. This technique helps optimize for the common case, in which it is not known which parts of databases will see the most use. Do not segregate part of available I/O capacity on another disk partition because SQL Server might do I/O to it 5 percent of the time. The single pool of drives methodology can make all available I/O capacity available for SQL Server operations.
SQL Server log files should always be physically separated onto different hard disk drives from all other SQL Server database files. For SQL Servers with busy databases, transaction log files should be physically separated from each other. Transaction logging is primarily sequential write I/O. Separating transaction logging activity from other nonsequential disk I/O activity can result in I/O performance benefits. That allows the hard disk drives containing the log files to concentrate on sequential I/O. There are times when the transaction log must be read as part of SQL Server operations such as replication, rollbacks, and deferred updates. If you administer SQL Servers that participate in replication, make sure all transaction log files have sufficient disk I/O processing power because of the reads that must occur.
Physically separating SQL Server objects from the rest of their associated database through SQL Server files and filegroups requires additional administration. Separating the objects can be worthwhile to investigate active tables and indexes. By separating table or index from all other database objects, accurate assessments can be made of the object I/O requirements. This is not as easy to do when all database objects are placed within one drive pool. Physical I/O separation can be appropriate during database development and benchmarking so that database I/O information can be gathered and applied to capacity planning for the production database server environment.
These are the areas of SQL Server activity that can be separated across different hard disk drives, RAID controllers, PCI channels, or combinations of the three:
Transaction log files
Tables associated with considerable query or write activity
Nonclustered indexes associated with considerable query or write activity
The physical separation of SQL Server I/O activities is made convenient by using hardware RAID controllers, RAID hot plug drives, and online RAID expansion. The approach that provides the most flexibility consists of arranging the RAID controllers so that a separate RAID SCSI channel is provided for each database activity. Each RAID SCSI channel should be attached to a separate RAID hot plug cabinet to take full advantage of online RAID expansion (if it is available through the RAID controller). Windows logical drive letters are associated with each RAID array and SQL Server files can be separated between distinct RAID arrays based on known I/O usage patterns.
With this configuration you can relate disk queuing to a distinct RAID SCSI channel and its drive cabinet as Performance Monitor reports the queuing behavior during load testing or heavy production loads. If a RAID controller and drive array cabinet support online RAID expansion and if slots for hot-plug hard disk drives are available in the cabinet, disk queuing on that RAID array is resolved by adding more drives to the RAID array until Performance Monitor reports that disk queuing for that RAID array has reached acceptable levels (less than 2 for SQL Server files.) This can be accomplished while SQL Server is online.
The tempdb database is created by SQL Server to be a shared working area for a variety of activities, including temporary tables, sorting, subqueries and aggregates with GROUP BY or ORDER BY, queries using DISTINCT (temporary worktables must be created to remove duplicate rows), cursors, and hash joins. You should enable the tempdb database I/O operations to occur in parallel to the I/O operations of related transactions. Because tempdb is a scratch area and update-intensive, RAID 5 is not as good a choice for tempdb as RAID 1 or 0+1. The tempdb database is rebuilt every time the database server is restarted; therefore, RAID 0 is a possibility for tempdb on production SQL Server computers. RAID 0 provides the best RAID performance for the tempdb database with the least physical drives. The concern with using RAID 0 for tempdb in a production environment is that SQL Server must be stopped and restarted if physical drive failure occurs in the RAID 0 array—this does not necessarily occur if tempdb is placed on a RAID 1 or 0+1 array.
To move the tempdb database, use the ALTER DATABASE statement to change the physical file location of the SQL Server logical file name associated with the tempdb database. For example, to move the tempdb database and its associated log to the new file locations E:\Mssql7 and C:\Temp, use these statements:
ALTER DATABASE tempdb MODIFY FILE (NAME ='tempdev',FILENAME= 'e:\mssql7\tempnew_location.mDF') ALTER DATABASE tempdb MODIFY FILE (NAME ='templog',FILENAME= 'c:\temp\tempnew_loglocation.LDF')
The master, msdb, and model databases are not used much during production compared to user databases, so they are not typically a consideration in I/O performance tuning. The master database is used only for adding new logins, databases, and other system objects.
Nonclustered indexes reside in B-tree structures that can be separated from their related database tables with the ALTER DATABASE statement. In this example, the first ALTER DATABASE creates a filegroup. The second ALTER DATABASE creates a file with a separate physical location associated with the filegroup. At this point, indexes can be created on the filegroup as illustrated by creating the index called index1. The sp_helpfile stored procedure reports files and filegroups present for a given database. The sp_help tablename has a section in its output that provides information about the table indexes and filegroup relationships.
ALTER DATABASE testdb ADD FILEGROUP testgroup1 ALTER DATABASE testdb ADD FILE (NAME = 'testfile', FILENAME = 'e:\mssql7\test1.ndf') TO FILEGROUP testgroup1 CREATE TABLE test1(col1 char(8)) CREATE INDEX index1 ON test1(col1) ON testgroup1 sp_helpfile sp_help test1
For more information, see SQL Server Books Online.
SQL Server Indexes
This section contains information about how SQL Server data and index structures are physically placed on disk drives and how these structures apply to disk I/O performance.
SQL Server data and index pages are 8 KB each. SQL Server data pages contain all of the data associated with the rows of a table, except text and image data. For text and image data, the SQL Server data page, which contains the row associated with the text or image column, contains a pointer to a B-tree structure of one or more 8-KB pages.
SQL Server index pages contain only data from columns that comprise a particular index; therefore, an 8-KB index page contains much more information than an 8-KB data page because it compresses information associated with many more rows. If the columns picked to be part of an index form a low percentage of the row size of the table, information about more rows can be compressed in an 8-KB index page, which has performance benefits. When an SQL query requests a set of rows from a table in which columns in the query match values in the rows, SQL Server can save I/O operations by reading the index pages for the values and then accessing only the rows in the table required to satisfy the query. Therefore, SQL Server does not have to perform I/O operations to scan all rows in the table to locate the required rows.
SQL Server indexes are built upon B-tree structures formed out of 8-KB index pages. The difference between clustered and nonclustered indexes is at the bottom of the B-tree structures (referred to as leaf level). The upper parts of index B-tree structures are referred to as nonleaf levels of the index. A B-tree structure is built for every single index defined on a SQL Server table.
In a nonclustered index, the leaf-level nodes contain only the data that participates in the index. In the nonclusterered index leaf-level nodes, index rows contain pointers to the remaining row data on the associated data page. At worst, each row access from the nonclustered index requires an additional nonsequential disk I/O to retrieve the row data. At best, many of the required rows will be on the same data page and thus allow retrieval of several required rows with each data page fetched.
In the clustered index, the leaf-level nodes of the index are the actual data rows for the table. Therefore, no bookmark lookups are required for retrieval of table data. Range scans based on clustered indexes perform well because the leaf level of the clustered index (and hence all rows of that table) is physically ordered on disk by the columns that comprise the clustered index, and will perform I/O in 64-KB extents. These 64-KB I/Os are physically sequential if there is not much page splitting on the clustered index B-tree (nonleaf and leaf levels). The dotted lines indicate there are other 8-KB pages present in the B-tree structures but they are not shown. The illustration shows the structural difference between nonclustered and clustered indexes.
There can be only one clustered index per table because, while the upper parts of the clustered index B-tree structure are organized like the nonclustered index B-tree structures, the bottom level of the clustered index B-tree consists of the actual 8-KB data pages associated with the table. There are performance implications:
Retrieval of SQL data based on key search with a clustered index requires no bookmark lookup (and a likely nonsequential change of location on the hard disk) to get to the associated data page, because the leaf level of the clustered index is already the associated data page.
The leaf level of the clustered index is sorted by the columns that comprise the clustered index. Because the leaf level of the clustered index contains the actual 8-KB data pages of the table, the row data of the entire table is physically arranged on the disk drive in the order determined by the clustered index. This provides a potential I/O performance advantage when fetching a significant number of rows from tables greater than 64-KB based on the value of the clustered index, because sequential disk I/O is being used unless page splitting is occurring on this table. For more information about page-splitting, see "FILLFACTOR and PAD_INDEX" later in this chapter. You should pick the clustered index on a table based on a column that is used to perform range scans to retrieve a large number of rows.
Nonclustered indexes are most useful for fetching few rows with good selectivity from large SQL Server tables based on a key value. Nonclustered indexes are B-trees formed out of 8-KB index pages. The bottom or leaf level of the B-tree of index pages contains all the data from the columns that comprised that index. When a nonclustered index is used to retrieve information from a table based on a match with the key value, the index B-tree is traversed until a key match is found at the leaf level of the index. A bookmark lookup is made if columns from the table are needed that did not form part of the index. This bookmark lookup will likely require a nonsequential I/O operation on the disk. It even might require the data to be read from another disk if the table and its accompanying index B-tree(s) are large. If multiple bookmark lookups lead to the same 8-KB data page, there is less I/O performance penalty because it is necessary to read the page into data cache only once. For each row returned for a SQL query that involves searching with a nonclustered index, one bookmark lookup is required. These bookmark lookups are the reason that nonclustered indexes are better suited for SQL queries that return only one or a few rows from the table. Queries that require many rows to be returned are better served with a clustered index.
For more information, see SQL Server Books Online.
A special situation that occurs with nonclustered indexes is called the covering index. A covering index is a nonclustered index built upon all of the columns required to satisfy an SQL query, both in the selection criteria and in the WHERE clause. Covering indexes can save I/O and improve query performance. But you must balance the costs of creating a new index (with its associated B-tree index structure maintenance) with the I/O performance gain the covering index will bring. If a covering index will benefit a query or a set of queries that run often on SQL Server, creating the covering index may be worthwhile, for example:
SELECT col1,col3 FROM table1 WHERE col2 = 'value' CREATE INDEX indexname1 ON table1(col2,col1,col3)
Or from SQL Server Enterprise Manager, use the Create Index Wizard.
The indexname1 index in the preceding example is a covering index because it includes all columns from the SELECT statement and the WHERE clause. During the execution of this query, SQL Server does not need to access the data pages associated with table1. SQL Server can obtain all of the information required to satisfy the query by using the index called indexname1. When SQL Server has traversed the B-tree associated with indexname1 and has found the range of index keys where col2 is equal to value, SQL Server fetches all required data (col1,col2,col3) from the leaf level of the covering index. This improves I/O performance in two ways:
SQL Server obtains all required data from an index page, not a data page; therefore, the data is more compressed and SQL Server saves disk I/O operations.
The covering index organizes all of the required data by col2 physically on the disk. The hard disk drives return all of the index rows associated with the WHERE clause (col2 = value) in sequential order, which gives better I/O performance. From a disk I/O standpoint, a covering index becomes a clustered index for this query and any other query that can be satisfied completely by the columns in the covering index.
If the number of bytes from all the columns in the index is small compared to the number of bytes in a single row of that table, and you are certain the query taking advantage of the covered index will be executed frequently, then it makes sense to use a covering index. But, before building many covered indexes, consider how SQL Server 7.0 can effectively and automatically create covered indexes for queries on the fly.
Automatic Covering Indexes or Covered Queries
The Microsoft SQL Server 7.0 query processor provides index intersection. Index intersection allows the query processor to consider multiple indexes from a given table, build a hash table based on those multiple indexes, and use the hash table to reduce I/O for a query. The hash table that results from the index intersection becomes a covering index and provides the same I/O performance benefits that covering indexes do. Index intersection provides greater flexibility for database user environments in which it is difficult to determine all of the queries that will run against the database. A good strategy in this case is to define single column, nonclustered indexes on all columns that will be queried frequently, and let index intersection handle situations in which a covered index is needed; for example:
SELECT col3 FROM table1 WHERE col2 = 'value' CREATE INDEX indexname1 ON table1(col2) CREATE INDEX indexname2 ON table1(col3)
Or from SQL Server Enterprise Manager, use the Create Index Wizard.
In the preceding example, indexname1 and indexname2 are nonclustered, single column indexes created on the SQL Server table called table1. When the query executes, the query processor recognizes that index intersection using the two indexes is advantageous. The query optimizer automatically hashes the two indexes together to save I/O while executing the query. No query hints were required. Queries handled by covering indexes (whether by explicitly declared covering indexes or index intersection) are called covered queries.
For more information, see SQL Server Books Online.
How indexes are chosen significantly affects the amount of disk I/O generated and, subsequently, performance. Nonclustered indexes are appropriate for retrieving a few rows and clustered indexes are good for range scans. In addition, you should try to keep indexes compact (few columns and bytes). This is especially true for clustered indexes because nonclustered indexes use the clustered index to locate row data. For more information, see SQL Server Books Online.
Consider selectivity for nonclustered indexes because, if a nonclustered index is created on a large table with only a few unique values, use of that nonclustered index does not save I/O during data retrieval. In fact, using the index causes much more I/O than a sequential table scan of the table. Possible candidates for a nonclustered index include invoice numbers, unique customer numbers, social security numbers, and telephone numbers.
Clustered indexes are much better than nonclustered indexes for queries that match columns or search for ranges of columns that do not have many unique values, because the clustered index physically orders the table data and allows for sequential 64-KB I/O on the key values. Possible candidates for a clustered index include states, company branches, date of sale, zip codes, and customer district. Defining a clustered index on the columns that have unique values is not beneficial unless typical queries on the system fetch large sequential ranges of the unique values. To pick the best column on each table to create the clustered index, ask if there will be many queries that must fetch many rows based on the order of this column. The answer is very specific to each user environment. One company may do more queries based on ranges of dates, whereas another company may do many queries based on ranges of bank branches.
These are examples of WHERE clauses that benefit from clustered indexes:
WHERE column_name >some_value
WHERE column_name BETWEEN some_value AND some_value
WHERE column_name < some_value
Clustered Index Selection
Clustered index selection involves two major steps: First, determine the column of the table that will benefit most from the clustered index by providing sequential I/O for range scans, and second, use the clustered index to affect the physical placement of table data while avoiding hot spots. A hot spot occurs when data is placed on hard disk drives so that, as a result, many queries try to read or write data in the same area of the disk(s) at the same time. This creates a disk I/O bottleneck because more concurrent disk I/O requests are received by the hard disk than it can handle. The solution is either to stop fetching as much data from this disk or to spread the data across multiple disks to support the I/O demand. This consideration for the physical placement of data can be critical for good concurrent access to data among hundreds or thousands of SQL Server users.
These two decisions often conflict with one another, and the best decision is to balance the two. In high user-load environments, improved concurrency (by avoiding hot spots) can be more valuable than the performance benefit gained by placing the clustered index on that column.
With SQL Server 7.0, nonclustered indexes use the clustered index to locate data rows if there is a clustered index present on the table. Because all nonclustered indexes must hold the clustered keys within their B-tree structures, it is better for performance to keep the overall byte size of the clustered index keys as small as possible. Keep the number of columns in the clustered index to a minimum and carefully consider the byte size of each of the columns chosen to be included in a clustered index. This helps reduce the size of the clustered index and subsequently, all nonclustered indexes on a table. Smaller index B-tree structures can be read more quickly and help improve performance. For more information, see SQL Server Books Online.
In earlier versions of SQL Server, tables without clustered indexes insert rows at the end of the table stored on the disk. This can create a hot spot at the end of a busy table. The SQL Server 7.0 storage management algorithms provide free space management that removes this behavior. When rows are inserted in heaps, SQL Server uses the Page Free Space (PFS) pages to quickly locate available free space in the table in which the row is inserted. PFS pages find free space throughout the table, which recovers deleted space and avoids insertion hot spots. Free space management affects clustered index selection. Because clustered indexes affect physical data placement, hot spots can occur when a clustered index physically sequences based on a column in which many concurrent inserts occur at the highest column value and are at the same physical disk location. For columns with monotonically increasing values, a clustered index sequentially orders data rows on disk by that column. By placing the clustered index on another column or by not including a clustered index on the table, this sequential data placement moves to another column or does not occur at all.
Another way to think about hot spots is within the context of selects. If many users select data with key values that are very close to but are not in the same rows, most disk I/O activity occurs within the same physical region of the disk I/O subsystem. This disk I/O activity can be spread out more evenly by defining the clustered index for this table on a column that spreads these key values evenly across the disk. If all selects are using the same unique key value, then using a clustered index does not help balance the disk I/O activity of this table. By using RAID (either hardware or software), you can alleviate this problem by spreading the I/O across many disk drives. This behavior can be described as disk access contention. It is not locking contention.
Clustered Index Selection Scenario
A scenario can illustrate clustered index selection. For example, a table contains an invoice date column, a unique invoice number column, and other data. About 10,000 new records are inserted into this table every day, and the SQL queries often search this table for all records for one week of data. Many users have concurrent access to this table. The invoice number is not a candidate for the clustered index. The invoice number is unique, and users do not usually search on ranges of invoice numbers; therefore, placing invoice numbers physically in sequential order on disk is not appropriate. Next, the values for invoice number increase monotonically (1001,1002,1003, and so on). If the clustered index is placed on invoice number, inserts of new rows into this table occur at the end of the table beside the highest invoice number on the same physical disk location, and create a hot spot.
Consider the invoice date column. To maximize sequential I/O, the invoice date column is a candidate for a clustered index because users often search for one week of data (about 70,000 rows). But for concurrency, the invoice date column may not be a candidate for the clustered index. If the clustered index is placed on an invoice date, all data tends to be inserted at the end of the table, and a hot spot can occur on the hard disk that holds the end of the table. The insertions at the end of the table are offset by the 10,000 rows that are inserted for the same date, therefore, invoice date is less likely to create a hot spot than invoice number. Also, a hardware RAID controller helps spread out the 10,000 rows across multiple disks, which can minimize the possibility of an insertion hot spot.
There is no perfect answer to this scenario. You can place the clustered index on invoice date to speed up queries involving invoice date ranges, even at the risk of hot spots. In this case, you should monitor disk queuing on the disks associated with this table for possible hot spots. It is recommended that you define the clustered index on invoice date because of the benefit to range scans based on invoice date and so that invoice numbers are not physically sequential on disk.
In this example, a table consists of the invoice number, invoice date, invoice amount, sales office where the sale originated, and other data. Suppose 10,000 records are inserted into this table every day, and users often query invoice amounts based on sales office. Sales office should be the column on which the clustered index is created because that is the range on which scans are based. Newly inserted rows will have a mix of sales offices; inserts should be spread evenly across the table and across the disks on which the table is located.
In some cases, range scans may not be an issue. For example, a very large employee table has employee number, social security number, and other data. As rows are inserted, employee number is incremented. There are 100,000 retrievals from this table every day and each retrieval is a single record fetch based on social security number. A nonclustered index created on social security number provides excellent query performance in this scenario. A clustered index on social security number provides slightly better query performance than the nonclustered index, but may be excessive because range scans are not involved. If there will be only one index on this table, place the clustered index on the social security number column. The question then is whether to define a clustered index on this table. In earlier versions of SQL Server, it is important to define a clustered index on a table even if it is not required for queries, because it helps with deleted row space recovery. This is not an issue with the SQL Server 7.0 space allocation algorithms and storage structures.
The recommendation in this example is to create the clustered index on social security number, because the social security number has data distributed so it does not follow the sequential pattern of employee number, and social security number tends to have an even distribution. If a clustered index is created on this evenly distributed column data, the employee records are evenly distributed on disk. This distribution, in conjunction with FILLFACTOR and PAD_INDEX, provides open data page areas throughout the table to insert data. Assuming that newly inserted employee records have an even distribution of social security numbers, the employee table fills evenly and page splitting is avoided. If a column with even distribution does not exist on the table, it is worthwhile to create an integer column on the table and populate the column with values that are evenly distributed and then create the clustered index column. This "filler" or "dummy" column with a clustered index defined on it is not being used to query, but to distribute data I/O across disk drives evenly to improve table access concurrency and overall I/O performance. This can be an effective methodology with large and heavily accessed SQL tables.
Another possible solution in this example is to refrain from creating a clustered index on this table. In this case, SQL Server 7.0 manages all aspects of space management. SQL Server finds a free space to insert the row, reuses space from deleted rows, and automatically reorganizes physical ordering of data pages on disk when it makes sense (to allow greater amounts of sequential I/O). The reorganization of data pages happens during database file autoshrink operations. For more information, see SQL Server Books Online.
FILLFACTOR and PAD_INDEX
If a SQL Server database is experiencing a large amount of insert activity, you should plan to provide and maintain open space on index and data pages to prevent page splitting. Page splitting occurs when an index page or data page can no longer hold any new rows and a row must be inserted into the page because of the logical ordering of data defined in that page. When this occurs, SQL Server must divide the data on the full page and move about half of the data to a new page so that both pages have some open space. This consumes system resources and time.
When indexes are built initially, SQL Server places the index B-tree structures on contiguous physical pages, which supports optimal I/O performance by scanning the index pages with sequential I/O. When page splitting occurs and new pages must be inserted into the logical B-tree structure of the index, SQL Server must allocate new 8-KB index pages somewhere. This occurs elsewhere on the hard disk drive and breaks up the physically sequential index pages, which switches I/O operations from sequential to nonsequential and cuts performance in half. Excessive page splitting should be resolved by rebuilding the index to restore the physically sequential order of the index pages. This same behavior can be encountered on the leaf level of the clustered index, which affects the data pages of the table.
Use Performance Monitor to watch the SQL Server: Access Methods - Page Splits counter. Nonzero values for this counter indicate page splitting. Further analysis should be done with the DBCC SHOWCONTIG statement. For more information about how to use this statement, see SQL Server Books Online.
The DBCC SHOWCONTIG statement can reveal excessive page splitting on a table. Scan Density, a key indicator that DBCC SHOWCONTIG provides, should be a value as close to 100 percent as possible. If this value is below 100 percent, rebuild the clustered index on that table by using the DROP_EXISTING option to defragment the table. The DROP_EXISTING option of the CREATE INDEX statement permits re-creating existing indexes and provides better index rebuild performance than dropping and re-creating the index. For more information, see SQL Server Books Online.
The FILLFACTOR option on the CREATE INDEX and DBCC DBREINDEX statements provides a way to specify the percentage of open space to leave on index and data pages. The PAD_INDEX option for CREATE INDEX applies what has been specified for FILLFACTOR on the nonleaf-level index pages. Without the PAD_INDEX option, FILLFACTOR mainly affects the leaf-level index pages of the clustered index. You should use the PAD_INDEX option with FILLFACTOR. For more information, see SQL Server Books Online.
The optimal value to specify for FILLFACTOR depends on how much new data is inserted into an 8-KB index and data page within a given time frame. Keep in mind that SQL Server index pages typically contain many more rows than data pages because index pages contain only the data for columns associated with that index, whereas data pages hold the data for the entire row. Also consider how often there will be a maintenance window that permits the rebuilding of indexes to avoid page splitting. Strive to rebuild the indexes only as the majority of the index and data pages have become filled with data by properly selecting clustered index for a table. If the clustered index distributes data evenly so that new row inserts into the table occur across all of the data pages associated with the table, then the data pages will fill evenly. This provides time before page splitting occurs and forces you to rebuild the clustered index. FILLFACTOR should be selected based partly on the estimated number of rows that will be inserted within the key range of an 8-KB page in a certain time frame, and partly by how often scheduled index rebuilds can occur on the system.
You must make a decision based on the performance trade-offs between leaving a lot of open space on pages and page splitting. A small specified percentage for FILLFACTOR leaves large open spaces on the index and data pages, which helps avoid page splitting but also negates some performance gained by compressing data onto a page. SQL Server performs faster if more data is compressed on index and data pages because it can fetch more data with fewer pages and I/Os. Specifying too high a FILLFACTOR may leave too little open space on pages and can allow pages to overflow too quickly, which causes page splitting.
Before using FILLFACTOR and PAD_INDEX, remember that reads tend to outnumber writes, even in an online transaction processing (OLTP) system. Using FILLFACTOR slows down all reads, because it spreads tables over a wider area (reduction of data compression). Before using FILLFACTOR and PAD_INDEX, you should use Performance Monitor to compare SQL Server reads to SQL Server writes and use these options only if writes are more than 30 percent of reads.
If writes are a substantial percentage of reads, the best approach in a busy OLTP system is to specify as high a FILLFACTOR as will leave a minimal amount of free space per 8-KB page and still prevent page splitting and allow SQL Server to reach the next available time window for rebuilding the index. This method balances tuning I/O performance (keeping the pages as full as possible) and avoiding page splits (not letting pages overflow). You can experiment by rebuilding the index with varying FILLFACTOR values and then simulating load activity on the table to validate an optimal value for FILLFACTOR. After the optimal FILLFACTOR value has been determined, automate the scheduled rebuilding of the index as a SQL Server task. For more information, see SQL Server Books Online.
SQL Server Performance Tuning Tools
Microsoft SQL Server version 7.0 includes several tools that can assist database administrators with performance tuning.
Sample Data and Workload
This example shows how to use SQL Server performance tools. First, the table is constructed:
CREATE TABLE testtable (nkey1 int IDENTITY, col2 char(300) DEFAULT 'abc', ckey1 char(1))
Next, the table is loaded with 10,000 rows of test data:
DECLARE @counter int SET @counter = 1 WHILE (@counter <= 2000) BEGIN INSERT testtable (ckey1) VALUES ('a') INSERT testtable (ckey1) VALUES ('b') INSERT testtable (ckey1) VALUES ('c') INSERT testtable (ckey1) VALUES ('d') INSERT testtable (ckey1) VALUES ('e') SET @counter = @counter + 1 END
These queries comprise the database server workload:
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a' select nkey1,col2 FROM testtable WHERE nkey1 = 5000
SQL Server Profiler
SQL Server Profiler records detailed information about activity occurring on the database server. SQL Server Profiler can be configured to watch and record one or many users executing queries on SQL Server and to provide a widely configurable amount of performance information, including I/O statistics, CPU statistics, locking requests, Transact-SQL and RPC statistics, index and table scans, warnings and errors raised, database object create/drop, connection connect/disconnects, stored procedure operations, cursor operation, and more. For more information about what SQL Server Profiler can record, see SQL Server Books Online.
Using SQL Server Profiler with Index Tuning Wizard
SQL Server Profiler and Index Tuning Wizard can be used together to help database administrators create proper indexes on tables. SQL Server Profiler records resource consumption for queries into a .trc file. The .trc file can be read by Index Tuning Wizard, which evaluates the .trc information and the database tables, and then provides recommendations for indexes that should be created. Index Tuning Wizard can either automatically create the proper indexes for the database by scheduling the automatic index creation or generate a Transact-SQL script that can be reviewed and executed later.
These are the steps for analyzing a query load:
To set up SQL Server Profiler (Enterprise Manager)
On the Tools menu, click SQL Server Profiler.
On the File menu, point to New, and then click Trace.
Type a name for the trace.
Select Capture to file, then select a .trc file to which to output the SQL Server Profiler information.
To run the workload (Enterprise Manager)
On the Tools menu, click SQL Server Query Analyzer.
Connect to SQL Server and set the current database to be where the table was created.
Enter these queries into the query window of SQL Server Query Analyzer:
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a' SELECT nkey1,col2 FROM testtable WHERE nkey1 = 5000
On the Query menu, click Execute.
To stop SQL Server Profiler
On the File menu, click Stop Traces.
In the Stop Selected Traces dialog box, choose the traces to stop.
To load the .trc file into the Index Tuning Wizard (SQL Server Profiler)
On the Tools menu, click Index Tuning Wizard, and then click Next.
Select the database to analyze, and then click Next.
Make sure I have a saved workload file is selected, and then click Next.
Select My workload file, locate the .trc file created with SQL Server Profiler, click OK, and then click Next.
In Select Tables to Tune, select the tables, and then click Next.
In Index Recommendations, select the indexes to create, and then click Next.
Select the preferred option, and then click Next.
This is the Transact-SQL generated by Index Tuning Wizard for the sample database and workload:
/* Created by: Index Tuning Wizard */ /* Date: 9/7/98 */ /* Time: 6:42:00 PM */ /* Server: HENRYLNT2 */ /* Database : test */ /* Workload file : E:\Mssql7\Binn\Profiler_load.sql */ USE [test] BEGIN TRANSACTION CREATE CLUSTERED INDEX [testtable2] ON [dbo].[testtable] ([ckey1]) if (@@error <> 0) rollback transaction CREATE NONCLUSTERED INDEX [testtable1] ON [dbo].[testtable] ([nkey1]) if (@@error <> 0) rollback transaction COMMIT TRANSACTION
The indexes recommended by Index Tuning Wizard for the sample table and data are expected. There are only five unique values for ckey1 and 2,000 rows of each value. Because one of the sample queries (SELECT ckey1, col2 FROM testtable WHERE ckey1 = a) requires retrieval from the table based on one of the values in ckey1, it is appropriate to create a clustered index on the ckey1 column. The second query (SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000) fetches one row based on the value of the column nkey1. nkey1 is unique, and there are 10,000 rows; therefore, it is appropriate to create a nonclustered index on this column.
SQL Server Profiler and Index Tuning Wizard are powerful tools in database server environments in which there are many tables and queries. Use SQL Server Profiler to record a .trc file while the database server is experiencing a representative set of queries. Then load the .trc file into Index Tuning Wizard to determine the proper indexes to build. Follow the prompts in Index Tuning Wizard to automatically generate and schedule index creation jobs to run at off-peak times. Run SQL Server Profiler and Index Tuning Wizard regularly (perhaps weekly) to see if queries executed on the database server have changed significantly, thus possibly requiring different indexes. Regular use of SQL Server Profiler and Index Tuning Wizard can keep SQL Server running in top form as query workloads change and database size increase over time.
For more information, see SQL Server Books Online.
Analyzing SQL Server Profiler Information
SQL Server Profiler provides an option to log information into a SQL Server table. When it is complete, the table can be queried to determine if specific queries are using excessive resources.
To log SQL Server Profiler information into a SQL Server table (Enterprise Manager)
On the Tools menu, click SQL Server Profiler.
On the File menu, point to New, and then click Trace.
Type a name for the trace, then select Capture to Table.
In the Capture to Table dialog box, enter a SQL Server table name to which to output the SQL Server Profiler information. Click OK.
On the File menu, click Stop Traces.
In the Stop Traces dialog box, choose the traces to stop.
For more information, see SQL Server Books Online.
SQL Server Query Analyzer
After the information is recorded into the SQL Server table, you can use SQL Server Query Analyzer to determine which queries on the system are consuming the most resources, and database administrators can concentrate on improving the queries that need the most help. For example, this query is typical of the analysis performed on data recorded from SQL Server Profiler into a SQL Server table:
SELECT TOP 3 TextData,CPU,Reads,Writes,Duration FROM profiler_out_table ORDER BY cpu desc
The query retrieves the top three consumers of CPU resources on the database server. Read and write I/O information, along with the duration of the queries in milliseconds is returned. If a large amount of information is recorded with the SQL Server Profiler, you should create indexes on the table to help speed analysis queries. For example, if CPU is going to be an important criteria for analyzing this table, you should create a nonclustered index on CPU column.
SQL Server Query Analyzer provides a Show stats I/O option under the General tab of the Connections Options dialog box. Select this checkbox for information about how much I/O is being consumed for the query just executed in SQL Server Query Analyzer.
For example, the query SELECT ckey1, col2 FROM testtable WHERE ckey1 = a returns this I/O information in addition to the result set when the Show stats I/O connection option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 382, read-ahead reads 400.
Similarly, the query SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000 returns this I/O information in addition to the result set when the Show stats I/O connection option is selected:
Table 'testtable'. Scan count 1, logical reads 400, physical reads 282, read-ahead reads 400.
Using STATISTICS I/O is a good way to monitor the effect of query tuning. For example, create the two indexes on this sample table as recommended by Index Tuning Wizard and then run the queries again.
In the query SELECT ckey1, col2 FROM testtable WHERE ckey1 = a, the clustered index improved performance as indicated below. The query must fetch 20 percent of the table; therefore, the performance improvement is reasonable.
Table 'testtable'. Scan count 1, logical reads 91, physical reads 5, read-ahead reads 32.
In the query SELECT nkey1, col2 FROM testtable WHERE nkey1 = 5000, the creation of the nonclustered index had a dramatic effect on the performance of the query. Because only one row of the 10,000 row table must be retrieved for this query, the performance improvement with the nonclustered index is reasonable.
Table 'testtable'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0.
Showplan can be used to display detailed information about what the query optimizer is doing. SQL Server 7.0 provides text and graphical versions of Showplan. Graphical Showplan output can be displayed in the Results pane of SQL Server Query Analyzer by executing a Transact-SQL query with Ctrl+L. Icons indicate the operations that the query optimizer will perform if it executes the query. Arrows indicate the direction of data flow for the query. Details about each operation can be shown by holding the mouse pointer over the operation icon. The equivalent information can be returned in text-based showplan by executing SET SHOWPLAN_ALL ON. To reduce the query optimizer operation details from text-based showplan, execute SET SHOWPLAN_TEXT ON.
For more information, see SQL Server Books Online.
Examples of Showplan Output
This section shows sample showplan plan output using the following example queries and the set showplan_text on option in SQL Server Query Analyzer.
SELECT ckey1,col2 FROM testtable WHERE ckey1 = 'a'
Text-based showplan output:
|--Clustered Index Seek(OBJECT:([test].[dbo].[testtable].[testtable2]), SEEK:([testtable].[ckey1]='a') ORDERED)
This query takes advantage of the clustered index on column ckey1, as indicated by Clustered Index Seek.
The illustration shows equivalent graphical showplan output.
If the clustered index is removed from the table, the query must use a table scan. The showplan output below indicates the change in behavior.
Text-based showplan output:
|--Table Scan(OBJECT:([test].[dbo].[testtable]), WHERE:([testtable].[ckey1]='a'))
The illustration shows equivalent graphical showplan output.
Table scans are the most efficient way to retrieve information from small tables. But on larger tables, table scans indicated by showplan are a warning that the table may need better indexes or that the existing indexes must have their statistics updated (by using the UPDATE STATISTICS statement). SQL Server 7.0 provides automatically updating indexes. You should let SQL Server automatically maintain index statistics because the maintenance helps guarantee queries will always work with good index statistics.
SELECT nkey1,col2 FROM testtable WHERE nkey1 = 5000
Text-based showplan output:
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([test].[dbo].[testtable])) |--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=5000) ORDERED)
The illustration shows equivalent graphical showplan output.
This query uses the nonclustered index on the column nkey1, which is indicated by the Index Seek operation on the column nkey1. The Bookmark Lookup operation indicates that SQL Server must perform a bookmark lookup from the index page to the data page of the table to retrieve the requested data. The bookmark lookup was required because the query asked for the column col2, which was not part of the nonclustered index.
SELECT nkey1 FROM testtable WHERE nkey1 = 5000
Text-based showplan output:
|--Index Seek(OBJECT:([test].[dbo].[testtable].[testtable1]), SEEK:([testtable].[nkey1]=[@1]) ORDERED)
The illustration shows equivalent graphical showplan output.
This query uses the nonclustered index on nkey1 as a covering index. No bookmark lookup operation was needed for this query because all of the information required for the query (both SELECT and WHERE clauses) is provided by the nonclustered index. A bookmark lookup to the data pages is not required from the nonclustered index pages. I/O is reduced compared to the case in which a bookmark lookup was required.
Windows NT Performance Monitor provides information about Windows and SQL Server operations occurring on the database server. For SQL Server specific counters, see SQL Server Books Online.
In Performance Monitor graph mode, note the Max and Min values. Do not emphasize the average because polarized data points can affect this value. Study the graph shape and compare to Min and Max to gather an accurate understanding of the behavior. Use BACKSPACE to highlight counters.
You can use Performance Monitor to log all available Windows NT and SQL Server performance monitor objects and counters in a log file and also to view Performance Monitor interactively in chart mode. Setting a sampling interval determines how quickly the log file grows. Log files can get big fast (for example, 100 MG in one hour with all counters turned on and a sampling interval of 15 seconds). The test server should have a couple of gigabytes free to store these files. But if conserving space is important, try running with a large log interval of 30 or 60 seconds. By doing so, Performance Monitor does not sample the system as often, and all of the counters are resampled with reasonable frequency, but a smaller log file size is maintained.
Performance Monitor also consumes some CPU and Disk I/O resources. If a system does not have much disk I/O and/or CPU to spare, consider running Performance Monitor from another computer to monitor the server running SQL Server over the network. This applies to Performance Monitor graph mode only. When using log mode, it is more efficient to log Performance Monitor information locally on the server running SQL Server. If you must use log mode over the network, then consider reducing the logging to only the most critical counters.
You should log all counters available during performance test runs into a file for analysis later. Configure Performance Monitor to log all counters into a log file and at the same time monitor the most interesting counters in one of the other modes, such as graph mode. This way, all of the information is recorded, but the most interesting counters are presented in an uncluttered Performance Monitor graph while the performance run is taking place.
To start the logging feature (Performance Monitor)
On the View menu, click Log.
In the Add to Log dialog box, select the counters to add to the log.
Click Add, and then click Done.
On the Options menu, click Log.
In File Name, enter the name of the file into which the performance information will be logged.
Click Start Log.
To stop the logging feature (Performance Monitor)
On the Options menu, click Log.
Click Stop Log.
To load the logged information into Performance Monitor (Performance Monitor)
On the View menu, click Log.
On the Options menu, click Data From, and then select Log File.
Click browse (…), then double-click the file.
In the Data From dialog box, click OK.
On the View menu, click Chart, and then click (+).
Click the (+) button.
In the Add to Chart dialog box, add desired counters to the graphical display by highlighting the object/counter combination, and then click Add.
To relate Performance Monitor logged events back to a point in time
Follow the steps for How to load information into Performance Monitor.
On the Edit menu, click Time Window.
In the Input Log File Timeframe dialog box, you can adjust the start and stop time window of the logged data by clicking and holding down the mouse button on the slidebars.
Click OK to reset the chart to display only data logged for the selected time window.
Key Performance Monitor Counters
You can observe several Performance Monitor disk counters. To enable these counters, run the diskperf –y command from a Windows NT command window and restart Windows NT. The diskperf -y command consumes some resources on the database server, but it is worthwhile to run the diskperf -y command on all production SQL Server servers. As a result, disk queuing problems can be confirmed immediately and all Performance Monitor counters are immediately available to diagnose disk I/O issues. If disk I/O counters are required, the diskperf -y command must be executed, and Windows NT must be restarted before the disk I/O counters will report data in Performance Monitor.
(Physical or Logical) Disk Queue > 2
Physical hard disk drives that experience disk queuing hold back disk I/O requests while they catch up on I/O processing. SQL Server response time is degraded for these drives, which costs query execution time.
If you use RAID, you must know how many physical hard disk drives are associated with each drive array that Windows NT interprets as a single physical drive so you can calculate disk queuing per physical drive. A hardware expert can explain the SCSI channel and physical drive distribution to help you understand how SQL Server data is held by each physical drive and how much SQL Server data is distributed on each SCSI channel.
There are several choices for viewing disk queuing in Performance Monitor. Logical disk counters are associated with the logical drive letters assigned by the Windows NT Disk Administrator, whereas physical disk counters are associated with what Windows NT Disk Administrator interprets as a single physical disk device. What looks like a single physical device to the Windows NT Disk Administrator may be either a single hard disk drive or a RAID array, which consists of several hard disk drives. The Current Disk Queue counter is an instantaneous measure of disk queuing, whereas the Average Disk Queue counter averages the disk queuing measurement over the Performance Monitor sampling period. Take note of any counter in which Logical Disk: Average Disk Queue is greater than 2, Physical Disk: Average Disk Queue is greater than 2, Logical Disk: Current Disk Queue is greater than 2, or Physical Disk: Average Disk Queue is greater than 2.
These recommended measurements are specified per physical hard disk drive. If a RAID array is associated with a disk queue measurement, the measurement must be divided by the number of physical hard disk drives in the RAID array to determine the disk queuing per physical hard disk drive.
On physical hard disk drives or RAID arrays that hold SQL Server log files, disk queuing is not a useful measure because SQL Server log manager does not queue more than a single I/O request to SQL Server log file(s). For more information, see SQL Server Books Online.
System: Processor Queue Length > 2 (per CPU)
When the System: Processor Queue Length counter is greater than 2, the server processors are receiving more work requests than they can handle as a group; therefore, Windows must place these requests in a queue.
Some processor queuing can be an indicator of good SQL Server I/O performance. If there is no processor queuing and if CPU use is low, it can be an indication of a performance bottleneck somewhere in the system, and most likely in the disk subsystem. A reasonable amount of work in the processor queue means that the CPUs are not idle and the rest of the system is keeping pace with the CPUs.
A general rule for determining an optimal processor queue number is to multiply the number of CPUs on the database server by 2.
Processor queuing significantly above 2 per CPU should be investigated. Excessive processor queuing costs query execution time. Eliminating hard and soft paging can help save CPU resources. Other methods that help reduce processor queuing include tuning SQL queries, picking better SQL indexes to reduce disk I/O (and hence CPU), or adding more CPUs (processors) to the system.
Hard Paging - Memory: Pages/Sec > 0 or Memory: Page Reads/Sec > 5
Memory: Pages/Sec greater than 0 or Memory: Page Reads/Sec greater than 5 means that Windows is going to disk to resolve memory references (hard page fault), which costs disk I/O and CPU resources. The Memory: Pages/Sec counter is a good indicator of the amount of paging that Windows is performing and the adequacy of the database server RAM configuration. A subset of the hard paging information in Performance Monitor is the number of times per second Windows had to read from the paging file to resolve memory references, which is represented by the Memory: Pages Reads/Sec counter. A value of Memory: Pages Reads/Sec greater than 5 is bad for performance.
Automatic SQL Server memory tuning adjusts SQL Server memory use dynamically so that paging is avoided. A small number of pages per second is normal but excessive paging requires corrective action.
If SQL Server is automatically tuning memory, then adding more RAM or removing other applications from the database server are options to help bring the Memory: Pages/Sec counter to a reasonable level.
If SQL Server memory is being configured manually on the database server, it may be necessary to reduce memory given to SQL Server, remove other applications from the database server, or add more RAM to the database server.
Keeping Memory: Pages/Sec at or close to 0 helps database server performance because Windows and all its applications (including SQL Server) are not going to the paging file to satisfy any data in memory requests; therefore, the amount of RAM on the server is sufficient. A Pages/Sec value slightly greater than 0 is not critical, but a relatively high performance penalty (Disk I/O) is paid every time data is retrieved from the paging file rather than from RAM.
You should understand the difference between the Memory: Pages Input/sec counter and the Memory: Pages Reads/sec counter. The Memory: Pages Input/sec counter indicates the actual number of Windows 4-KB pages being brought from disk to satisfy page faults. The Memory: Pages Reads/sec counter indicates how many disk I/O requests are made per second to satisfy page faults, which provides a slightly different point of view. A single page read can contain several Windows 4-KB pages. Disk I/O performs better as the packet size of data increases (64 KB or more), so it can be worthwhile to consider these counters at the same time. For a hard disk drive, completing a single read or write of 4 KB costs almost as much time as a single read or write of 64 KB. Consider this situation: 200 page reads consisting of eight 4-KB pages per read could finish faster than 300 page reads consisting of one 4-KB page. And we are comparing 1600 4-KB page reads finishing faster than 300 4-KB page reads. The key to all disk I/O analysis is to watch not only the number of disk bytes/sec but also the disk transfers/sec. For more information, see "Disk I/O Counters" and "The EMC Disk I/O Tuning Scenario" later in this chapter.
It is useful to compare the Memory: Page Input/sec counter to the Logical Disk: Disk Reads/sec counter across all drives associated with the Windows NT paging file, and the Memory: Page Output/sec counter to the Logical Disk: Disk Writes/sec counter across all drives associated with the Windows paging file. They provide a measure of how much disk I/O is strictly related to paging as opposed to other applications, such as SQL Server. Another way to isolate paging file I/O activity is to ensure the paging file is located on a separate set of drives from all other SQL Server files. Separating the paging file from the SQL Server files also can help disk I/O performance, because disk I/O associated with paging can be performed in parallel to disk I/O associated with SQL Server.
Soft Paging - Memory: Pages Faults/Sec > 0
A Memory: Pages Faults/Sec counter greater than 2 indicates that Windows NT is paging but includes hard and soft paging within the counter. Soft paging means that there are application(s) on the database server that requests memory pages still inside RAM but outside of the application's Working Set. The Memory: Page Faults/Sec counter is helpful for deriving the amount of soft paging that occurs. There is no counter called soft faults per second. Instead, calculate the number of soft faults happening per second this way:
Memory: Pages Faults/sec - Memory: Pages Input/sec = Soft Page Faults per Second
To determine if SQL Server is causing excessive paging, monitor the Process: Page Faults/sec counter for the SQL Server process and note whether the number of page faults per second for Sqlservr.exe is similar to the number of pages per second.
Because soft faults consume CPU resources and hard faults consume disk I/O resources, soft faults are better than hard faults for performance. The best environment for performance is to have no faulting of any kind.
Until SQL Server accesses all of its data cache pages for the first time, the initial access to each page causes a soft fault, so do not be concerned if soft faulting occurs under these circumstances. For more information, see "Monitoring Processors" later in this chapter.
For more information on memory tuning, see SQL Server Books Online.
Keep all server processors busy enough to maximize performance but not so busy that processor bottlenecks occur. The performance tuning challenge is to determine the source of the bottleneck. If CPU is not the bottleneck, then a primary candidate is the disk subsystem, and the CPU is being wasted. CPU is the most difficult resource to expand above a specific configuration level, such as four or eight on many current systems, so it is a good sign when CPU utilization is above 95 percent. In addition, the response time of transactions should be monitored. Greater than 95 percent CPU use can mean that the workload is too much for the available CPU resources and either the CPU must be increased or the workload must be reduced or tuned.
Monitor the Processor: Processor Time % counter to ensure all processors are consistently below 95 percent utilization on each CPU. The System: Processor Queue counter is the processor queue for all CPUs on a Windows NT–based system. If the System: Processor Queue counter value is greater than 2 per CPU, there is a CPU bottleneck, and it is necessary to either add processors to the server or reduce the workload on the system. Reducing workload can be accomplished by query tuning or improving indexes to reduce I/O and subsequently CPU usage.
Another counter to monitor when a CPU bottleneck is suspected is System: Context Switches/sec because it indicates the number of times per second that Windows NT and SQL Server had to change from executing on one thread to executing on another. Context switching is a normal component of a multithreaded, multiprocessor environment, but excessive context switching slows down a system. Only be concerned about context switching if there is processor queuing. If processor queuing is observed, use the level of context switching as a gauge when you performance tune SQL Server. Consider using the lightweight pooling option so that SQL Server switches to a fiber-based scheduling model versus the default thread-based scheduling model. Think of fibers as lightweight threads. Use command sp_configure 'lightweight pooling', 1 to enable fiber-based scheduling. Watch processor queuing and context switching to monitor the effect.
For more information about I/O, memory, and CPU use mapped to SPID, see the DBCC SQLPERF (THREADS) statement in SQL Server Books Online.
Disk I/O Counters
The Disk Write Bytes/sec and Disk Read Bytes/sec counters provide the data throughput in bytes per second per logical drive. Weigh these numbers carefully along with the values of the Disk Reads/sec and Disk Writes/sec counters. Do not allow few bytes per second to lead you to believe that the disk I/O subsystem is not busy. A single hard disk drive is capable of supporting a total of 75 nonsequential and 150 sequential disk reads and disk writes per second.
Monitor the Disk Queue Length counter for all drives associated with SQL Server files and determine which files are associated with excessive disk queuing.
If Performance Monitor indicates that some drives are not as busy as others, you can move SQL Server files from drives that are bottlenecking to drives that are not as busy. This spreads disk I/O activity more evenly across hard disk drives. If one large drive pool is used for SQL Server files, the resolution to disk queuing is to make the I/O capacity of the pool bigger by adding more physical drives to the pool.
Disk queuing may be a symptom that one SCSI channel is saturated with I/O requests. Performance Monitor cannot directly detect if this is true. Hardware vendors can provide tools to detect the amount of I/O serviced by a RAID controller and whether the controller is queuing I/O requests. This is more likely to occur if many disk drives (10 or more) are attached to the SCSI channel, and they all perform I/O at full speed. To resolve this issue, take half of the disk drives and connect them to another SCSI channel or RAID controller to balance the I/O. Rebalancing drives across SCSI channels requires a rebuild of the RAID arrays and full backup and restore of the SQL Server database files.
Performance Monitor Graph Output
The illustration shows typical counters that Performance Monitor uses to observe performance. The Processor Queue Length counter is being observed. Click BackSpace to highlight the current counter. This helps to distinguish the current counter from other counters being observed and can be particularly helpful when observing many counters at the same time with Performance Monitor.
The Max value for Processor Queue Length is 22.000. The values of Max, Min, and Average for the Performance Monitor Graph cover only the current time window for the graph, as indicated by Graph Time. By default, graph time covers 100 seconds. To monitor longer periods and to ensure getting representative Max, Min, and Average values for those time periods, use the logging feature of Performance Monitor.
The shape of the Processor Queue Length graph line indicates that the Max of 22 occurred only for a short period. But there is a period preceding the 22 value when Processor Queue Length is greater than 5, 100 percent is 22. And there is a period prior to the 22 value when the graph has values above 25 percent, which is approximately 5. In this example, the database server HENRYLNT2 has only one processor and should not sustain Processor Queue Length greater than 2. Therefore, Performance Monitor indicates that the processor on this computer is being overtaxed. Further investigation should be made to reduce the load on the processor or more processors should be added to HENRYLNT2 to handle these periods of higher processor workloads.
Other Performance Topics
This section presents other factors that can influence the performance of Microsoft SQL Server.
Reduce Network Traffic and Resource Consumption
When you work with the relatively easy to use SQL Server interfaces such as Microsoft ActiveX Data Objects (ADO), Remote Data Objects (RDO), and Data Access Objects (DAO) database APIs, you must remain aware of the result sets they are building. ADO, RDO, and DAO provide database development interfaces that allow rich database rowset functionality without requiring a lot of database programming experience. You can encounter performance problems if you neither account for the data your application is returning to the client nor remain aware of where the SQL Server indexes are placed and how the SQL Server data is arranged. SQL Server Profiler, Index Tuning Wizard, and showplan are helpful tools for pinpointing and fixing problem queries.
Look for opportunities to reduce the size of the result set by eliminating columns in the select list that do not have to be returned, or by returning only the required rows. This reduces I/O and CPU consumption.
For more information, see SQL Server Books Online.
You may avoid deadlocking if applications accessing SQL Server are built so that transactions access tables in the same chronological order across all user transactions. You should explain the concept of chronological table access to SQL Server application developers as early as possible during the application design process to avoid deadlocking problems, which are expensive to solve later.
By reducing SQL Server query I/O and shortening transaction time you can help prevent deadlocking. This can make queries faster, ensure lock resources are held for a shorter period, and reduce all locking contention (including deadlocking). Use the SQL Server Query Analyzer Show stats I/O option to determine the number of logical page fetches associated with large queries. Use the SQL Server Query Analyzer Show query plan option to review index use, and then consider a SQL Server query redesign that is more efficient and uses less I/O.
For more information, see SQL Server Books Online.
Language to Avoid in Queries
Use of inequality operators in SQL queries forces databases to use table scans to evaluate the inequalities. These queries generate high I/O if they run regularly against large tables.
For example, using any WHERE expression with NOT in it:
WHERE column_name != some_value
WHERE column_name <> some_value
If you must run these queries, restructure the queries to eliminate the NOT keyword or operator, for example:
SELECT * FROM tableA WHERE col1 < 'value' or col1 > 'value'
SELECT * FROM tableA WHERE col1 != 'value'
SQL Server can use the index (preferably clustered), if it is built on col1, rather than resorting to a table scan.
On frequently accessed tables, move columns that a database application does not need regularly to another table. By eliminating as many columns as possible, you can reduce I/O and increase performance. For more information, see SQL Server Books Online.
You can horizontally partition tables through views in SQL Server 7.0, which can provide I/O performance benefits when database users query subsections of large views. For example, if a large table documents sales for all sales departments for a year and if retrievals from this table are based on a single sales department, a partitioned view can be employed. A sales table is defined for each sales department, a constraint is defined on the sales department column on each table, and then a view is created on all of the tables to form a partitioned view. The query optimizer uses the constraint on the sales department column. When the view is queried, all of the sales department tables that do not match the sales department value provided in the query are ignored by the query optimizer and no I/O is performed against those base tables. This improves query performance by reducing I/O. For more information, see SQL Server Books Online.
Replication and Backup Performance
If you ensure that the disk I/O subsystem and CPUs are performing well, you ensure performance benefits to all SQL Server operations, including replication and backups. Transactional replication and transaction log backups read from transaction log files. Snapshot replication and backups perform serial scans of database files. The SQL Server 7.0 storage structures have made these operations fast and efficient, as long as there is no queuing occurring in the database server CPUs or disk subsystems. For more information, see SQL Server Books Online.
The EMC Disk I/O Tuning Scenario
For those implementing SQL Server database systems on the unique EMC Symmetrix Enterprise Storage Systems, some disk I/O balancing methods can help avoid disk I/O bottleneck problems and maximize performance.
Symmetrix storage systems contain up to 16 GB of RAM cache and contain internal processors within the disk array that help speed the I/O processing of data without using host server CPU resources. You must understand the four components in the Symmetrix storage system to balance disk I/O. One component is the 16-GB cache inside the Symmetrix. Up to 32 SA channels can be used to cable up to 32 SCSI cards from Windows NT–based host servers into the Symmetrix; all of these SA channels can be requesting data simultaneously from the 16-GB cache. Within the Symmetrix storage system, there are up to 32 connectors called DA controllers (internal SCSI controllers) that connect all of the internal disk drives within the Symmetrix into the 4-GB internal cache. And finally, there are the hard disk drives within the Symmetrix.
EMC hard disk drives are SCSI hard disk drives with the same I/O capability of the other SCSI drives referred to in this chapter. One feature commonly used with EMC technology is referred to as hyper-volumes. A hyper-volume is the logical division of an EMC hard disk drives, so the hyper-volume looks like another physical drive to the Windows NT Disk Administrator and can be manipulated with Windows NT Disk Administrator like any other disk drive. Multiple hyper-volumes can be defined on each physical drive. You should work closely with EMC field engineers to identify how hyper-volumes are defined when conducting database performance tuning on EMC storage. You can overload a physical drive with database I/O if you think two or more hyper-volumes are separate physical drives but actually are two or more hyper-volumes on the same physical drive.
SQL Server I/O activities should be divided evenly among distinct DA controllers because DA controllers are assigned to a defined set of hard disk drives. DA controllers are not likely to suffer an I/O bottleneck, but the set of hard disk drives associated with a DA controller may be more susceptible. SQL Server disk I/O balancing is accomplished the same way with DA controllers and their associated disk drives as with other vendors disk drives and controllers.
To monitor the I/O on a DA channel or on separate physical hard disk drives, get help from EMC technical support staff, because I/O activity occurs beneath the EMC internal cache and is not visible to Performance Monitor. EMC storage units have internal monitoring tools that allow an EMC technical support engineer to monitor I/O statistics within the Symmetrix. Performance Monitor can see I/O coming to and from an EMC storage unit only by the I/O coming from an SA channel. This is enough information to indicate that a SA channel is queuing disk I/O requests, but is not enough information to determine the disk or disks that are causing the disk queuing. If an SA channel is queuing, the bottleneck may be caused by the disk drives, not by the SA channel. One way to isolate the disk I/O bottleneck between the SA channels and the DA channels and drives is to add a SCSI card to the host server and connect it to another SA channel. If Performance Monitor indicates that I/O across both SA channels has not changed in volume and disk queuing is still occurring, then the SA channels are not causing the bottleneck. Another way to isolate the I/O bottleneck is to have an EMC engineer use EMC monitoring tools to monitor the EMC system and analyze the drives or DA channels that are bottlenecking.
Divide SQL Server activities evenly across as many disk drives as are available. When working with a smaller database that sustains a large amount of I/O and resides on EMC hardware using hypervolumes, you should continually monitor hyper-volumes definitions. Observing SQL Server activity will help you avoid overloading multiple hyper-volumes on one disk. For example, suppose the SQL Server consists of a 30-GB database. EMC hard disk drives can provide up to 23 GB in capacity, so you can fit the entire database onto 2 drives. For manageability and cost, this is appealing; but for I/O performance, it is not. An EMC storage unit can work with more than 100 internal drives. Involving only 2 drives for SQL Server can lead to I/O bottlenecks. Consider defining smaller hyper-volumes, perhaps of 2 GB each. Then approximately 12 hyper-volumes can be associated with a given 23-GB hard disk drive. Assuming 2-GB hyper-volumes, 15 hyper-volumes are required to store the database. Make sure that each hyper-volume is associated with a separate physical hard disk drive. Do not use 12 hyper-volumes from 1 physical drive and then 3 hyper-volumes associated on another physical drive, because this is the same as using 2 physical drives (150 nonsequential I/O / 300 sequential I/O across the two drives). But with 15 hyper-volumes, each of which is associated with a separate physical drive, SQL Server uses 15 physical drives for providing I/O (1125 nonsequential / 2250 sequential I/O activity per second across the 15 drives).
Also consider employing several SA channels from the host server to divide the I/O work across controllers for host servers that support more than a single PCI bus. Consider using one SA channel per host server PCI bus to divide I/O work across PCI buses as well as SA channels. On EMC storage systems, each SA channel is associated with a specific DA channel and a specific set of physical hard disk drives. Because SA channels read and write their data to and from the EMC internal cache, it is unlikely the SA channel is a point of I/O bottleneck. Because SCSI controller bottlenecks are not likely, it is probably best to concentrate on balancing SQL Server activities across physical drives rather than focus on how many SA channels to use.