Optimizing Queries That Access Correlated datetime Columns
SQL Server 2005 introduces the DATE_CORRELATION_OPTIMIZATION database SET option. This option improves the performance of queries that perform an equi-join between two tables whose datetime columns are correlated, and which specify a date restriction in the query predicate.
Tables whose datetime column values are correlated, and which can benefit from enabling DATE_CORRELATION_OPTIMIZATION, are typically part of a one-to-many relationship and are used primarily for decision support, reporting, or data warehousing purposes.
For example, in the AdventureWorks sample database, the OrderDate column of the Purchasing.PurchaseOrderHeader table and the DueDate column of the Purchasing.PurchaseOrderDetail table are correlated. The date values of PurchaseOrderDetail.DueDate tend to follow shortly after those of PurchaseOrderHeader.OrderDate.
When the DATE_CORRELATION_OPTIMIZATION database option is set to ON, SQL Server maintains correlation statistics between any two tables in the database that have datetime columns and are linked by a one-column foreign key constraint. By default, this option is set to OFF.
SQL Server uses these correlation statistics together with the date restriction specified in the query predicate to infer that additional restrictions can be added to the query without changing the result set. The query optimizer uses these inferred conditions when it chooses a query plan. A faster query plan may result, because the added restrictions let SQL Server read less data when it is processing the query. Performance is also improved when both tables have clustered indexes defined on them, and their datetime columns for which correlation statistics are maintained are the first or only key of the clustered index.
For example, suppose you prepare the AdventureWorks database to maintain correlation information for Purchasing.PurchaseOrderDetail and Purchasing.PurchaseOrderHeader by running the following Transact-SQL script:
USE AdventureWorks GO -- Create a unique index to take the place of the existing -- primary key constraint CREATE UNIQUE NONCLUSTERED INDEX IX_PurchaseOrderDetail_PurchaseOrderID_PurchaseOrderDetailID ON Purchasing.PurchaseOrderDetail(PurchaseOrderID,PurchaseOrderDetailID) -- Drop existing clustered index by dropping constraint ALTER TABLE Purchasing.PurchaseOrderDetail DROP CONSTRAINT PK_PurchaseOrderDetail_PurchaseOrderID_PurchaseOrderDetailID -- Create new clustered index on DueDate CREATE CLUSTERED INDEX IX_PurchaseOrderDetail_DueDate ON Purchasing.PurchaseOrderDetail(DueDate) --Enable DATE_CORRELATION_OPTIMIZATION database option ALTER DATABASE AdventureWorks SET DATE_CORRELATION_OPTIMIZATION ON
Now, suppose you run the following query:
SELECT * FROM Purchasing.PurchaseOrderHeader h, Purchasing.PurchaseOrderDetail d WHERE h.PurchaseOrderID = d.PurchaseOrderID AND h.OrderDate BETWEEN '20020101' AND '20020201'
The values of PurchaseOrderDetail.DueDate returned by this query may generally fall within a certain period of days, such as 14 days, of the values of PurchaseOrderHeader.OrderDate. Because of this, SQL Server may be able to infer that the previous query can be better expressed by using a query comparable to this one:
SELECT * FROM Purchasing.PurchaseOrderHeader h, Purchasing.PurchaseOrderDetail d WHERE h.PurchaseOrderID = d.PurchaseOrderID AND h.OrderDate BETWEEN '1/1/02' AND '2/1/02' AND d.DueDate BETWEEN CAST ('20020101' AS datetime) + 14 AND CAST ('20020201' AS datetime) + 14
The exact form of the added condition, specified in the second AND clause, depends on the original query and the values of the data in your database. After adding an implied condition, the optimizer uses it to construct an execution plan. In this example, there is a clustered index on PurchaseOrderDetail.DueDate so that index can be used to retrieve the rows that satisfy
d.DueDate BETWEEN CAST ('20020101' AS datetime) + 14 AND CAST ('20020201' AS datetime) + 14. If there are several years worth of data in Purchasing.PurchaseOrderDetail, this query may cause a significant (several-fold) decrease in the execution time compared to the original query.
Before executing a query plan with a condition that is inferred because of enabling DATE_CORRELATION_OPTIMIZATION, SQL Server verifies that the query will produce the correct answer, based on the current contents of the database.
All the following conditions must be met for two tables to benefit from enabling the DATE_CORRELATION_OPTIMIZATION database option:
The database SET options must be set in the following way. ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, and QUOTED IDENTIFIER must be SET to ON. NUMERIC_ROUNDABORT must be SET to OFF.
There must be a single-column foreign key relationship between the tables.
The tables must both have datetime columns that are defined NOT NULL.
At least one of the datetime columns must be the key column of a clustered index (if the index key is composite, it must be the first key), or it must be the partitioning column, if it is a partitioned table.
Both tables must be owned by the same user.
Consider the following when you set the DATE_CORRELATION_OPTIMIZATION database option to ON:
SQL Server maintains correlation information in the form of statistics. These statistics are updated by SQL Server during INSERT, UPDATE and DELETE operations on the applicable tables, which can affect the performance of these operations. You should not enable DATE_CORRELATION_OPTIMIZATION in update-intensive database environments.
If any one of the datetime columns for which correlation statistics are maintained is not the first or only key of a clustered index, consider creating a clustered index on it. Doing this generally leads to better performance on the types of queries covered by correlation statistics. If a clustered index already exists on the primary key columns, you can modify a table so that the clustered index and primary key use different column sets.
Enabling DATE_CORRELATION_OPTIMIZATION does not provide any benefit in the following situations:
There are no pairs of tables that meet the previously stated criteria for maintaining correlation statistics.
There are pairs of tables that meet the criteria for maintaining correlation statistics, but queries that join these tables do not specify a date restriction in their predicates.
- There are no pairs of tables that meet the previously stated criteria for maintaining correlation statistics.
To set the DATE_CORRELATION_OPTIMIZATION database option
For all eligible pairs of matching tables, correlation statistics are automatically created in the form of indexed views when you set the DATE_CORRELATION_OPTIMIZATION database option to ON. When the SQL Server query optimizer is able to take advantage of correlation between pairs of datetime columns, it uses these correlation statistics in its query plan. Correlation statistics are also included in the logic of INSERT, UPDATE, and DELETE statements where they are affected. The names of correlation statistics take the following form:
<FK_constraint_name> is the name of the foreign key constraint in the sys.objects catalog view on which the datetime match is based. <constraint_object_id> is an 8-digit hexadecimal representation of the objectid of the foreign key constraint.
|SQL Server shortens the FK_constraint_portion of the correlation statistics name if the name would exceed the limit for identifier length.|
When executing a query using SET SHOWPLAN XML, any filter node that is derived from correlation statistics includes the following attribute:
For example, a <Predicate> node influenced by correlation statistics looks like this:
This attribute is included with any filter node generated entirely from correlation statistics, or by combining a predicate influenced by correlation statistics with some other predicate.
Generally, when the DATE_CORRELATION_OPTIMIZATION database option is set to ON, SQL Server creates correlation statistics for all eligible pairs of datetime columns. SQL Server creates additional correlation statistics when you perform the following:
You create foreign key constraints through CREATE TABLE or ALTER TABLE that satisfy the requirements for datetime correlation optimization.
You create a clustered index on a datetime column, and that column is eligible for correlation matching with the datetime column of another table.
Note: No correlation statistics are created when clustered indexes are created by using the ONLINE = ON option. However, after the index build is committed, correlation statistics that depend on the index may be built as the result of an event in another transaction, such as creation of a foreign key constraint.
You change the nullability or data type of a column to make it eligible for correlation matching with the datetime column of another table.
You should not refer to correlation statistics directly in applications, because SQL Server may decide to drop them at any time. You may decide to drop individual correlation statistics if you determine that the cost to maintain them affects performance. The default for DROP permissions on correlation statistics is set to the members of the sysadmin fixed server role, the db_owner and db_ddladmin fixed database roles, and the owner of the pair of tables on which the correlation statistics are defined. These permissions are not transferable.
Correlation statistics are dropped in the following situations:
When you set the DATE_CORRELATION_OPTIMIZATION database option to OFF, any correlation statistics created by SQL Server are dropped.
Correlation statistics that require excessive storage to maintain or that are notexpected to be beneficial are dropped.
When you drop a foreign key constraint by using DROP TABLE or ALTER TABLE, any correlation statistics associated with that constraint are dropped.
When an operation causes the tables that are participating in correlation matching to no longer be owned by the same user, the corresponding correlation statistics are dropped.
When you execute an ALTER TABLE…SWITCH statement and either the source table or target table has correlation statistics defined on it, those correlation statistics are dropped.
When you create a clustered index on a datetime column and correlation statistics are built on a different datetime column of the same table, the correlation statistics are dropped. SQL Server may create new correlation statistics based on the newly created clustered index, if eligible.
When you drop a clustered index whose leading index key is a datetime column, any associated correlation statistics are dropped if another datetime column exists on the same table on which new correlation statistics can be created.
When you execute ALTER TABLE to change the data type or nullability of a column participating in correlation statistics, those statistics are dropped.
Correlation statistics are created or dropped as part of the same transaction that caused them to be created or dropped. This transaction is neither online nor asynchronous.
When you use Database Engine Tuning Advisor in a simple, one-server-based tuning scenario to tune the production server directly, it considers the costs and benefits of correlation statistics. However, when you use Database Engine Tuning Advisor in a test-production server scenario, it does not consider correlation statistics to be internal system objects. Therefore, correlation statistics are not used in query optimization by Database Engine Tuning Advisor during its index tuning analysis. In a test-production scenario, you may want to ignore any recommendations that Database Engine Tuning Advisor makes about the indexed views that hold correlation statistics, because it knows their costs but not their benefits. In both scenarios, Database Engine Tuning Advisor may not recommend selection of certain indexes such as clustered indexes on datetime columns, which could be beneficial when DATE_CORRELATION_OPTIMIZATION is enabled.