Clustered Index Design Guidelines

Clustered indexes sort and store the data rows in the table based on their key values. There can only be one clustered index per table, because the data rows themselves can only be sorted in one order. For more information about clustered index architecture, see Clustered Index Structures

With few exceptions, every table should have a clustered index defined on the column, or columns, that offer the following:

  • Can be used for frequently used queries.

  • Provide a high degree of uniqueness.

    Note

    When you create a PRIMARY KEY constraint, a unique index on the column, or columns, is automatically created. By default, this index is clustered; however, you can specify a nonclustered index when you create the constraint.

  • Can be used in range queries.

If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users.

Query Considerations

Before you create clustered indexes, understand how your data will be accessed. Consider using a clustered index for queries that do the following:

  • Return a range of values by using operators such as BETWEEN, >, >=, <, and <=.

    After the row with the first value is found by using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent. For example, if a query retrieves records between a range of sales order numbers, a clustered index on the column SalesOrderNumber can quickly locate the row that contains the starting sales order number, and then retrieve all successive rows in the table until the last sales order number is reached.

  • Return large result sets.

  • Use JOIN clauses; typically these are foreign key columns.

  • Use ORDER BY, or GROUP BY clauses.

    An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.

Column Considerations

Generally, you should define the clustered index key with as few columns as possible. Consider columns that have one or more of the following attributes:

  • Are unique or contain many distinct values

    For example, an employee ID uniquely identifies employees. A clustered index or PRIMARY KEY constraint on the EmployeeID column would improve the performance of queries that search for employee information based on the employee ID number. Alternatively, a clustered index could be created on LastName, FirstName, MiddleName because employee records are frequently grouped and queried in this way, and the combination of these columns would still provide a high degree of difference.

  • Are accessed sequentially

    For example, a product ID uniquely identifies products in the Production.Product table in the AdventureWorks database. Queries in which a sequential search is specified, such as WHERE ProductID BETWEEN 980 and 999, would benefit from a clustered index on ProductID. This is because the rows would be stored in sorted order on that key column.

  • Defined as IDENTITY because the column is guaranteed to be unique within the table.

  • Used frequently to sort the data retrieved from a table.

    It can be a good idea to cluster, that is physically sort, the table on that column to save the cost of a sort operation every time the column is queried.

Clustered indexes are not a good choice for the following attributes:

  • Columns that undergo frequent changes

    This causes in the whole row to move, because the Database Engine must keep the data values of a row in physical order. This is an important consideration in high-volume transaction processing systems in which data is typically volatile.

  • Wide keys

    Wide keys are a composite of several columns or several large-size columns. The key values from the clustered index are used by all nonclustered indexes as lookup keys. Any nonclustered indexes defined on the same table will be significantly larger because the nonclustered index entries contain the clustering key and also the key columns defined for that nonclustered index.

Index Options

There are several index options that can be specified when you create a clustered index. Because clustered indexes are typically quite large, you should give special consideration to the following options:

  • SORT_IN_TEMPDB

  • DROP_EXISTING

  • FILLFACTOR

  • ONLINE

For more information, see Setting Index Options.