Index Statistics

SQL Server 2005 allows for statistical information to be created regarding the distribution of values in a column. The query optimizer uses this statistical information to determine the optimal query plan by estimating the cost of using an index to evaluate the query.

When statistics are created, the Database Engine sorts the values of the columns on which the statistics are being built and creates a histogram based on up to 200 of these values, separated by intervals. The histogram specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.

SQL Server 2005 introduces additional information collected by statistics created on char, varchar, varchar(max), nchar, nvarchar, nvarchar(max), text, and ntext columns. This information, called a string summary, helps the query optimizer estimate the selectivity of query predicates on string patterns. String summaries lead to more accurate estimates of result set sizes and frequently better query plans when LIKE conditions are present in a query. This includes conditions such as WHERE ProductName LIKE '%Bike' and WHERE Name LIKE '[CS]heryl'.

Note

String summary information is not maintained if the summary for a column sample is larger than the Database Engine can maintain. For example, a string summary will not be maintained on statistics created by using WITH FULLSCAN on a unique varchar(80) column with 80 characters in each string, almost no similarity between strings, on a table with 85,000 rows. To determine whether a string summary is stored for a particular statistics object, use DBCC SHOW_STATISTICS (Transact-SQL) .

How Automatic Statistics Work

When you create an index, the query optimizer automatically stores statistical information about the indexed columns. Also, when the AUTO_CREATE_STATISTICS database option is set to ON (default), the Database Engine automatically creates statistics for columns without indexes that are used in a predicate.

As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. For example, if you create a table with an indexed column and 1,000 rows of data, all with unique values in the indexed column, the query optimizer considers the indexed column a good way to collect the data for a query. If you update the data in the column so there are many duplicate values, the column is no longer an ideal candidate for use in a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.

Note

Out-of-date or missing statistics are indicated as warnings (table name in red text) when the execution plan of a query is graphically displayed using SQL Server Management Studio. For more information, see Displaying Graphical Execution Plans (SQL Server Management Studio). Additionally, monitoring the Missing Column Statistics event class by using SQL Server Profiler indicates when statistics are missing. For more information, see Errors and Warnings Event Category (Database Engine).

When the AUTO_UPDATE_STATISTICS database option is set to ON (the default), the query optimizer automatically updates this statistical information periodically as the data in the tables changes. A statistics update is initiated whenever the statistics used in a query execution plan fail a test for current statistics. The sampling is random across data pages, and is taken from the table or the smallest nonclustered index on the columns needed by the statistics. After a data page has been read from disk, all the rows on the data page are used to update the statistical information. Almost always, statistical information is updated when approximately 20 percent of the data rows has changed. However, the query optimizer always makes sure that a minimum number of rows are sampled. Tables that are smaller than 8 megabytes (MB) are always fully scanned to gather statistics.

Sampling the data, instead of analyzing all the data, minimizes the cost of automatic statistical update. Under some circumstances, statistical sampling will not be able to accurately characterize the data in a table. You can control the amount of data that is sampled during manual statistics updates on a table-by-table basis by using the SAMPLE and FULLSCAN clauses of the UPDATE STATISTICS statement. The FULLSCAN clause specifies that all data in the table is scanned to gather statistics, whereas the SAMPLE clause can be used to specify either the percentage of rows to sample or the number of rows to sample.

Asynchronous Statistics Updates

A query that initiates an update of out-of-date statistics must wait for those statistics to be updated before compiling and returning a result set. This can cause unpredictable query response times and may cause applications that have aggressive time-outs to fail.

In SQL Server 2005, the database option AUTO_UPDATE_STATISTICS_ASYNC provides asynchronous statistics updating. When this option is set to ON, queries do not wait for the statistics to be updated before compiling. Instead, the out-of-date statistics are put on a queue for updating by a worker thread in a background process. The query and any other concurrent queries compile immediately by using the existing out-of-date statistics. Because there is no delay for updated statistics, query response times are predictable; however, the out-of-date statistics may cause the query optimizer to choose a less-efficient query plan. Queries that start after the updated statistics are ready will use those statistics. This may cause the recompilation of cached plans that depend on the older statistics version. Updating asynchronous statistics cannot take place if any data definition language (DDL) statements, such as CREATE, ALTER, and DROP statements, occur in the same explicit user transaction.

The AUTO_UPDATE_STATISTICS_ASYNC option is set at the database level and determines the update method for all statistics in the database. It is applicable only to statistics updating and cannot be used to asynchronously create statistics. Setting this option to ON has no effect unless AUTO_UPDATE_STATISTICS is also set to ON. By default, the AUTO_UPDATE_STATISTICS_ASYNC option is OFF. For more information about setting this option, see ALTER DATABASE (Transact-SQL).

Before you set a database to SINGLE_USER, verify the AUTO_UPDATE_STATISTICS_ASYNC option is set to OFF. When set to ON, the background thread used to update statistics takes a connection against the database, and you will be unable to access the database in single-user mode. If the option is set to ON, perform the following tasks:

  1. Set AUTO_UPDATE_STATISTICS_ASYNC to OFF.
  2. Check for active asynchronous statistics jobs by querying the sys.dm_exec_background_job_queue dynamic management view.
  3. If there are active jobs, either allow the jobs to complete or manually terminate them by using KILL STATS JOB.

Best Practice Considerations

You should consider setting the AUTO_UPDATE_STATISTICS_ASYNC option ON when the following characteristics apply to your application:

  • Has experienced client request time-outs caused by one or more queries waiting for updated statistics.
  • Requires predictable query response times, even at the expense of occasionally running queries with less efficient query plans because of outdated statistics.

Viewing Asynchronous Update Statistics Properties

To view the ON or OFF status of the AUTO_UPDATE_STATISTICS_ASYNC option, select the is_auto_update_stats_async_on column from the sys.databases catalog view. For more information, see sys.databases (Transact-SQL).

To view whether statistics are queued for updating or in the process of updating, use the sys.dm_exec_background_job_queue dynamic management view. For statistics, the column object_id1 displays the table or view ID, and the column object_id2 displays the statistics ID. Use the sys.dm_exec_background_job_queue_stats dynamic management view to view aggregate statistics of all background job queues, such as the number of job requests waiting for execution, number of failed requests, and average execution times of previously submitted requests.

Disabling Automatic Statistics

You can disable automatic statistics generation for a particular column or index in the following ways:

  • Use the sp_autostats system stored procedure.
  • Use the STATISTICS_NORECOMPUTE clause of the CREATE INDEX statement.
  • Use the NORECOMPUTE clause of the UPDATE STATISTICS statement.
  • Use the NORECOMPUTE clause of the CREATE STATISTICS statement.
  • Set the AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS database options to OFF using the ALTER DATABASE statement. For more information, see Setting Database Options.

If you instruct the Database Engine not to automatically maintain statistics, you must manually update the statistical information.

Note

The UPDATE STATISTICS statement re-enables automatic statistical updating unless the NORECOMPUTE clause is specified.

Creating and Updating Statistics Manually

Statistics can also be created on all eligible columns in all user tables in the current database in a single statement by using the sp_createstats system stored procedure. Statistics can be created on specific table or view columns by using the CREATE STATISTICS statement, and updated by using the UPDATE STATISTICS statement. The maximum number of statistics that can be created on a table or view independently of an index is 2000. Any column or combination of columns that is eligible as an index key is eligible for statistics, with the following exceptions:

  • Large object type columns, except xml, can be specified. varchar(max), nvarchar(max), varbinary(max), image, text, and ntext, types can be specified.
  • The maximum allowed size of the combined column values can exceed the 900-byte limit that is imposed on the index key value.

The statistics generated for a column can be deleted if you no longer want to retain and maintain them.

Creating statistics manually lets you create statistics that contain multiple column densities. These are average number of duplicates for the combination of columns. For example, a query contains the clause WHERE a = 7 and b = 9.

Creating manual statistics on both columns together (a, b) could allow the Database Engine to make a better estimate for the query, because the statistics also contain the average number of distinct values for the combination of columns a and b.

To create statistics on a column

CREATE STATISTICS (Transact-SQL)

To create statistics on all eligible columns on all user tables

sp_createstats (Transact-SQL)

To manually update statistics

UPDATE STATISTICS (Transact-SQL)

To view the statistics for a table

DBCC SHOW_STATISTICS (Transact-SQL)

To drop the statistics for a column

DROP STATISTICS (Transact-SQL)

Working with Statistics After Upgrading a Database to SQL Server 2005

When you upgrade a database to SQL Server 2005 from an earlier version of SQL Server, all statistics from the earlier version are considered to be out of date. Therefore, on first use, statistics that are eligible for update under the AUTO_UPDATE_STATISTICS database option are updated by using the default sampling rate. This feature has important benefits, and you typically do not have to take any action. However, in rare cases this can result in less accurate statistics if statistics were manually computed in an earlier version of SQL Server by using FULLSCAN or another high sampling rate; or if the table that is being sampled is larger than 8 MB, and the data distribution is not random. In fact, a reduction in sampling rate can always occur for FULLSCAN statistics for tables larger than 8 MB when AUTO_UPDATE_STATISTICS occurs. The initial statistics update may just occur at an earlier time after you upgrade to the new version of SQL Server.

The benefits of upgrading statistics to SQL Server 2005 format are that SQL Server 2005 statistics for a specific sampling rate are greater quality generally than those for SQL Server 2000 and earlier. Also, SQL Server 2005 creates special string summary statistics for character columns, as described previously. For more information about statistics in SQL Server 2005, see this Microsoft Web site.

Best Practice Considerations

In most cases, you do not have to take any special action regarding statistics after you upgrade a database. However, if you have a large database with demanding performance requirements, we recommend that after upgrading, you run sp_updatestats (Transact-SQL) with the RESAMPLE option. This preserves the earlier sampling rates and updates all statistics to the latest format. Be aware that statistics created during index creation are created by using the FULLSCAN sampling rate. These, and other FULLSCAN statistics, use the default sampling rate when updated because of AUTO_UPDATE_STATISTICS. If you prefer not to update all statistics by running sp_updatestats, consider using UPDATE STATISTICS to selectively update statistics on indexes and other FULLSCAN statistics with the FULLSCAN sampling rate after database upgrade.

See Also

Concepts

Optimizing Indexes

Other Resources

CREATE INDEX (Transact-SQL)
Query Tuning
sp_autostats (Transact-SQL)

Help and Information

Getting SQL Server 2005 Assistance

Change History

Release History

12 December 2006

New content:
  • Under the section "Asynchronous Statistics Updates," added that asynchronous statistics should be turned off before setting a database to single-user mode.

5 December 2005

New content:
  • Under the section "Asynchronous Statistics Updates," added that asynchronous statistics updating cannot take place if any DDL statements occur in the same explicit user transaction.