Chapter 7 - Index Tuning Wizard
Enterprise-class databases require database administrators (DBAs) to select a physical database design appropriate for the system. An important component of physical database design is selecting indexes. In data intensive applications such as decision support and data warehousing, choosing the right set of indexes is crucial for performance. This chapter describes the Index Tuning Wizard: a Microsoft SQL Server 7.0 wizard that automates the challenge of index selection, while achieving performance competitive with that of indexes handpicked by DBAs.
Why Is Index Selection Difficult?
Despite a long history of development in the area of index selection, no widely deployed commercial products are available that select indexes automatically. Several factors make automating physical database design extremely difficult.
First, when viewed as a search problem, the variety of alternatives for indexes is large. A database may have many tables and each table may have many columns that need to be considered for indexing. An index may be clustered or nonclustered. Furthermore, in recommending a set of indexes, we cannot restrict ourselves to single-column indexes only. Considering multicolumn indexes increases the search space dramatically because for a given set of k columns, k! multicolumn indexes are possible.
Second, the textbook solution of using semantic information such as uniqueness, reference constraints, and rudimentary statistics ("small" versus "big" tables) to produce a physical database design leads to poor performance because it ignores valuable information on usage statistics. For example, the indexing requirement for a decision-support application is very different from the requirements of online transaction processing (OLTP) applications.
Third, even when index selection tools have taken the usage statistics into account, they suffer from being disconnected from the query processor. Modern query optimizers use complex strategies such as index intersection and index-only access. For example, if a table has 100 columns but a query references only 4 of the 100 columns, an index on those 4 columns may benefit the query significantly even if the query has no selection condition on any of the 4 columns. This is because the index acts as a vertical partition and saves the cost of scanning the remaining 96 columns of the table. Similarly, if a query has two selection conditions on columns A and B of a table, the query optimizer may choose to use indexes on both A and B to answer the query by taking their intersection. An index selection tool that does not take into account these strategies of the query processor can result in gross inefficiencies and poor quality of design. Therefore, even tools that adopt an expert system-like approach are unsatisfactory because they often rely on an inaccurate model of index usage by the query processor.
Finally, even if we are successful in identifying an ideal set of indexes for each Transact-SQL statement in the workload, it is challenging to obtain a set of indexes that acts as the best compromise, particularly when the workload contains queries as well as INSERT, DELETE, and UPDATE statements.
The Index Tuning Wizard in SQL Server 7.0 can help you avoid these problems. It is guided by usage statistics, synchronized with the query processor in evaluating promise of indexes, and it uses a unique search strategy to navigate the search space of indexes.
Using the Index Tuning Wizard
To use the Index Tuning Wizard, the server name and the database name must be specified. In addition, the wizard requires a workload file as its input. The workload file is used to capture the nature of the usage statistics against the database system. Any SQL Server Profiler trace can be used as a workload file. SQL Server Profiler is a graphical SQL Server client tool that makes it possible to monitor and record engine events in a workload file. A typical entry in such a workload file may consist of a variety of fields: event-class, text of the event (for example, text of the Transact-SQL query), start-time, and duration of the event. The Index Tuning Wizard can extract engine relevant events (such as Transact-SQL statements) and fields from a SQL Server Profiler trace automatically. Alternatively, any file that contains a set of Transact-SQL statements (separated by the delimiter GO) also can be used as a workload file. Such files may contain customer benchmarks for which the index selection must be tuned.
Customizing the Index Tuning Wizard
The Index Tuning Wizard provides a rich set of options to customize index selection:
Keep all existing indexes
This option can be exercised on the server and database choice screen. By selecting this option, the user instructs the wizard not to drop any of the existing indexes. Thus, this option allows conservative use of the tool and incremental changes in the design of the indexes. Unless the user is experienced, it is recommended that this mode of operation be used.
For large workload files and large databases, index tuning may require a significant amount of time and resources. However, to lessen the elapsed time and the workload on the server, SQL Server allows the user to request the Index Tuning Wizard to be less extensive in its search for an appropriate set of indexes. Users can choose this mode by deselecting the exhaustive enumeration operation. This option is also presented on the server and database choice screen. Although this mode of operation searches fewer possibilities, in many cases it is able to provide a respectable set of recommendations. The Index Tuning Wizard deselects exhaustive enumeration by default.
Additional customization options are presented in the Advanced Options screen:
Maximum queries to tune
If this number is set to k, the Index Tuning Wizard ignores the workload that follows the first k queries (for example, Transact-SQL statements). By default, this number is set to 32,767. By reducing the value of this parameter, the size of the effective workload file can be controlled and the execution of the Index Tuning Wizard may be accelerated. However, it should be remembered that the Index Tuning Wizard considers the first k queries, not events. Specifically, events that are not considered by the Index Tuning Wizard to be queries are not counted towards this limit.
Maximum space for the recommended indexes
This parameter sets the limit on the sum total of all storage for all indexes. By default, this parameter is set to twice the size of the current data set. This limit includes the storage devoted to indexes that must be included due to integrity constraints (for example, uniqueness constraint). In case the Keep all indexes option is selected, this limit also includes the storage required for existing indexes. Because databases grow over time, the administrator should adjust the parameter so that the assigned storage is appropriate for the current data size.
Maximum columns per index
This parameter can be tuned to set the maximum width of indexes. An index with few columns potentially can be used in many queries in a workload. An index with many columns may enable index-only access and eliminate data scans for some of the queries even though it requires more storage space than an index with fewer columns. Given the complexity of the trade-off, it is recommended that only experienced administrators tune this parameter.
Select Tables to tune
Another significant way in which the Index Tuning Wizard can be customized is by restricting index tuning to only a subset of all tables by selecting the Select Tables to Tune option. This allows the user to focus the design on selected tables in the database without altering the indexes for the remaining tables.
Analyzing Index Tuning Wizard Output
The most important output from the Index Tuning Wizard is a set of recommended indexes. The Index Recommendations dialog box displays the list of these indexes, indicating the assigned index name, the order of columns in the index, whether the index is clustered, and whether the index exists. The wizard also produces an estimate of expected improvement in the execution of the workload compared to the existing configuration. The Index Tuning Wizard uses the optimizer component of the query processor to project the above estimate. Because the optimizer's projection is based on statistical information, the actual change in performance may be different from the projected estimate.
The Index Tuning Wizard recommendations are supplemented by a variety of reports that provide further analysis of the recommendations and their quantitative impact. These reports affect the decision about whether to accept or reject the recommendations. All the reports can be saved as text files for further analysis. In the Index Recommendations dialog box, click Analyze to view these report options:
The Index Usage Report option (recommended configuration) presents information about the expected relative usage of the recommended indexes and their estimated sizes.
The Query Cost Report option indicates to the user the estimated reduction or increase in the cost for execution of the most expensive, 100 Transact-SQL statements in the workload file if the recommended configuration is accepted.
The Table Analysis Report option provides information about the relative hits of the queries in the workload by tables in the database.
The Workload Analysis Report option provides information about the relative frequencies of SELECT, INSERT, UPDATE, and DELETE queries and their relative impact on total cost of the workload.
The Tuning Summary Report option provides important summary information about the execution of the Index Tuning Wizard. In particular, this report indicates if the recommendations suggest dropping indexes. The report also indicates the total number of queries in the workload that were considered by the Index Tuning Wizard.
Finally, the Index Tuning Wizard allows the scheduling of a task to update the existing index configuration. The index creation/alteration step can be initiated immediately or can be scheduled to occur at a specific date and time. In addition, a script to perform the index update can be created. This is particularly useful because the index recommendations can be ported from the test computer to production computers by using the script. Furthermore, the script makes it easy to identify the indexes that will be dropped if the recommendations of the Index Tuning Wizard are accepted. Examining the script identifies two essential components of index tuning: a set of indexes and a set of statistics. Executing the recommendations to create a set of statistics is vital to harnessing the full benefits of indexing. This is because the query processor exploits statistical information during query optimization to determine whether to use an index.
Starting the Wizard
The Index Tuning Wizard can be started from SQL Server Enterprise Manager by selecting a database, and then clicking Index Tuning Wizard from the list of available wizards in the Management subgroup. Alternatively, it can be started from SQL Server Profiler on the Tools tab of the Profiler menu. After the wizard obtains all the necessary user input on required parameters, it carefully begins searching the space of possible configurations. If the search for the index configurations is terminated during this time, the wizard returns the best available configuration that has been considered thus far.
SQL Server Query Analyzer provides another mode in which the Index Tuning Wizard can be started for a workload consisting of a single Transact-SQL statement only. This is accomplished by selecting one Transact-SQL statement in the query buffer and then selecting Perform index analysis from the options available on the Query tab. In this mode, for all the customization options, the default settings are used and the workload file is assumed to contain only the single query. (The only exception is the Perform Thorough Analysis option, which is selected while starting Index Tuning Wizard from SQL Server Query Analyzer.) Because the Keep all indexes option is the default, all indexes suggested in this mode assume that existing indexes must be retained. Because single query workloads typically are not representative, the Index Tuning Wizard imposes an additional constraint that no changes in the clustering of data will be recommended, whether or not there is a clustered index. Such a restriction discourages clustering data based on index recommendations for a single query. The index recommendations are presented as a Transact-SQL script. You have the option of either accepting or rejecting the recommendations. The Index Tuning Wizard should be used on a representative workload and not on single queries. However, the ability to start the index tuning capability from SQL Server Query Analyzer is useful for tuning an under-performing query in an otherwise well-tuned system.
Understanding the Architecture of the Index Tuning Wizard
The Index Tuning Wizard takes as input a workload on a specified database. The tool iterates through several alternative sets of indexes called configurations, and chooses the configuration that results in the lowest cost for the given workload. Evaluating a configuration by materializing it physically is not practical because this approach requires adding and dropping indexes, which can be resource-intensive and affect operational queries on the system. Therefore, the Index Tuning Wizard must simulate a configuration without materializing it. SQL Server 7.0 has been extended to support the ability to simulate a configuration and estimate the cost of evaluating a query for a simulated configuration. The illustration shows the architectural overview of the Index Tuning Wizard and its interaction with SQL Server.
During the course of its execution, the Index Tuning Wizard may have to evaluate the cost of many alternative configurations.
The Cost Evaluation module exploits the commonality among configurations to reduce the server work needed to estimate expected cost. The basic search module that picks the set of indexes has an iterative structure. The iterative structure arises from the consideration of multicolumn indexes of increasing width until wider indexes exceed the width threshold or fail to provide any additional reduction in the cost of the workload.
The Candidate Index Selection module examines each query in the workload and helps eliminate from further consideration a large number of indexes from the current set that provide no tangible benefit for any query in the workload. The resulting candidate indexes provide significant improvement potentially to one or more queries in the workload.
The Configuration Enumeration module uses a search algorithm to intelligently search the space of candidate indexes and pick a configuration with low total cost.
The Multicolumn Index Generation module determines the initial set of multicolumn indexes to consider in the next iteration, based on indexes chosen by the configuration enumeration module in the current iteration.
In summary, the Index Tuning Wizard works with the query processor to determine the viability of a configuration. The wizard uses workload information and is therefore able to tune the selection of indexes to the expected usage of the system. By taking into account the space of multicolumn indexes, the wizard finds indexes appropriate for index-only access. Finally, the wizard has been designed to be scalable and can handle large schemas as well as large workloads by staging its execution steps appropriately.
Working with the Index Tuning Wizard
Because the index recommendations are made with respect to a workload file, the single most important prerequisite is to pick a workload that is representative of the database system's usage. Although tools such as SQL Server Profiler can help the user record a workload by logging activity on the server over a specified period of time, it is important to ensure that the logged events are representative. Furthermore, the choice of the indexes must be reevaluated periodically. In particular, if the data volume, the data distribution, or the queries against the system change, the Index Tuning Wizard must be executed to ensure that the choice of indexes remains sound.
Another important aspect to consider is that the projected reduction in the cost of the workload estimated by the index selection tool is based on statistical summary of data. Therefore, the actual decrease or increase in cost can diverge from the estimation. It is advisable to reexecute the workload with the new index configuration to verify the projected improvement before the index configuration update is applied to production servers.
The following questions and answers provide additional tips for working with the Index Tuning Wizard.
Q: Why does the Index Tuning Wizard take so long to complete index recommendations?
A: You can reduce the execution time of the Index Tuning Wizard in several ways. First, make sure that Perform Thorough Analysis in the Select Server and Database dialog box is not selected. Next, consider tuning only a subset of the tables in the database. Finally, reduce the size of the workload file that you are using to significantly speed up the execution of the Index Tuning Wizard.
Q: I ran SQL Server Query Analyzer and want to accept the recommendations. How can I schedule the actual creation of indexes to occur at a later time?
A: You can save the script generated by SQL Server Query Analyzer and schedule that task at a convenient time.
Q: How can I determine the indexes that will be dropped if I accept the recommendations?
A: Select Save script file in the Schedule Index Update Job dialog box instead of applying the proposed changes immediately. By examining the script file, you can determine the indexes that will be dropped if you decide to accept the recommendations. You can edit the script to customize the recommendations if you want.
Q: An error occurred when I invoked Perform Index Analysis from SQL Server Query Analyzer on a query buffer consisting of multiple valid Transact-SQL statements. How do I avoid this?
A: Make sure that the query buffer consists purely of a consecutive set of Transact-SQL statements separated by blank lines. In particular, there should be no GO commands separating the Transact-SQL statements.
Q: Why do SQL Server Query Analyzer and the Index Tuning Wizard give different recommendations for the same query?
A: SQL Server Query Analyzer does not consider the option of building a clustered index. The Index Tuning Wizard may consider building a clustered index if none already exists. Therefore, the recommendations from the two tools may differ even for the same query.