Planning and Creating Indexes

Updated : July 19, 2001

On This Page

About This Chapter
Lesson 1: Introduction to Indexes
Lesson 2: Index Architecture
Lesson 3: Creating Indexes
Review

About This Chapter

The two most important performance issues for SQL Server are memory and table indexing. In this chapter and the next, you will learn how SQL Server 7.0 indexes work and how to create and use appropriate indexes on your tables.

Before You Begin

To complete the lessons in this chapter you must:

  • Have installed SQL Server 7.0.

  • Be able to log on SQL Server as an Administrator.

  • Complete the exercises in Chapter 5 or run the c:\sqlimpl\exercise\bldlib\bldlib.cmd batch file with a chapter argument of 6.

  • Have installed the exercise files from the Supplemental Course Materials CD-ROM to your hard drive using the Setup.exe discussed in the "Getting Started" section of the book.

Lesson 1: Introduction to Indexes

Understanding how data is stored is the basis for understanding how Microsoft SQL Server version 7.0 accesses data.

After this lesson, you will be able to:

  • Describe how data and indexes are stored in SQL Server databases. You will also be able to determine when indexes are useful.

Estimated lesson time: 15 minutes

How SQL Server Stores and Accesses Data

To understand what indexes are and how they are used you need to understand a little of the underlying architecture of SQL Server databases. The following topics explain how SQL Server physically stores and accesses data in a database when you issue Transact-SQL statements.

How Data Is Stored

SQL Server 7.0 stores the rows of a table in data pages. In addition to the data pages, many other page types are used to store supporting information for tables.

Note: All the pages in a database are stored in the data file(s). Pages are allocated as needed and the page type is determined by the object to which the page is allocated. All pages are standard 8 KB pages within the database but a page allocated to an index is called an index page, a page allocated to a table is called a data page, and so on.

One of the following two methods is used to organize the data pages of a table:

  • For tables that have a clustered index:

The data rows on each data page are stored in order, based on the clustered index key. Each data page stores a reference to the previous and next pages to create an ordered sequence of pages called a doubly-linked list. The links not only keep the pages in sequence but they make it easy and fast to insert a new page anywhere in the sequence by updating the links on the pages between which the new page is inserted.

  • For heaps, which are tables that have no clustered index:

The data rows are not stored in any particular order, and the data pages are not sequenced in a linked list. New pages are simply added to the heap.

In addition to the data pages just described, index pages are created for tables with a clustered index. Index pages are also created when nonclustered indexes are added to tables. Index pages make it possible to directly access any row in a table.

SQL Server uses index pages in a way similar to how you use the index in a book. To find a topic in a book, you locate the topic in the index at the back of the book; the index gives you a page reference for the topic. You then turn to the referenced page where you read the details about the topic. The index duplicates the topics from the body of the book and stores a reference to the page on which each topic occurs. SQL Server index pages are similar to an index at the back of a book; they duplicate a selected part of each row and store a reference to the data page on which the complete row occurs.

Tables with a clustered index always have index pages based on the clustered index values. You can optionally add other indexes to clustered tables or heaps.

How Data Is Accessed

SQL Server accesses data using a table scan or indexed access. A table scan does the following:

  • Starts at the first data page in the table.

  • Scans from page to page through all the rows in the table. Page links are used to move from one page to the next in a clustered table, and Index Allocation Maps (IAMs) are used to move from one page to the next in a heap.

  • Reads every row in the table and extracts the rows that meet the criteria of the query.

  • Is best for accessing small tables or for accessing a large percentage of the rows in a large table.

An index is used to directly access one row or a range of rows. Indexed access does the following:

  • Traverses the index tree structure to find rows that the query requests.

  • Reads and extracts only the rows that meet the criteria of the query.

  • Is best for accessing single rows and ranges of rows that return a small percentage of the rows from a large table.

SQL Server first determines whether an index exists. Then, the query optimizer, the component responsible for generating the optimum execution plan for a query, determines whether scanning a table or using the index is more efficient for accessing data.

How Linked Pages are Updated

When new rows of data are inserted into a table or the values in the indexed columns are changed, SQL Server may have to reorganize the storage of the data in the table or the index:

  • When adding a new row to a full page, SQL Server will move approximately half the rows to a new page to make room for the new row; this is known as a page split. (See Figure 6.1)

  • When a page split occurs, the new page is linked into the page chain in the correct sequence by updating the page links.

  • When a row is deleted, space on the data page is reclaimed and is available for a new row.

Figure 6.1 demonstrates how a page split occurs. The figure shows three pages that are part of a chain of linked pages. The links between the pages order the pages—page 25 links to page 29, which links to page 32. A new row, which has an index value that requires it to be placed on a full page (page 29), is added to the table. The page is split, with approximately half of the rows being moved to a new page (page 35), leaving room on page 29 for the new row to be added. The page links on three pages (29, 32, and 35) are updated to reflect the new page order—page 25 links to page 29, which links to page 35, which links to page 32.

Figure 6.1: How a page split occurs.

Figure 6.1: How a page split occurs.

Figure 6.2 shows how rows are ordered on linked pages. Rows are accessed in order by following the links from one page to another. The figure shows how the rows are accessed before and after a page split.

Cc917624.f06xx02(en-us,TechNet.10).gif

Figure 6.2: How rows are ordered on linked pages.

When inserting rows into or updating index columns in very large tables, only a few pages need to be updated. This is because pages are accessed according to their linked sequence, rather than their physical sequence. The rows are always available in the correct order, even if extra pages have been added to the table.

Whether to Create Indexes

When you are considering whether to create an index, evaluate two factors to ensure that the index will be more efficient than a table scan: the nature of the data and the nature of the queries based on the table.

Why to Create an Index

Indexes accelerate data retrieval. For example, without an index, you would have to go through an entire textbook one page at a time to find information about a topic.

SQL Server uses indexes to point to the location of specific information on a data page instead of having to look through all the data pages of a table. Consider the following facts and guidelines about indexes:

  • Indexes generally accelerate queries that join tables and perform sorting or grouping operations.

  • Indexes enforce the uniqueness of rows if uniqueness is defined when you create the index.

  • Indexes are created and maintained in ascending sorted order.

  • Indexes are best created on columns with a high degree of selectivity—that is, columns or combinations of columns in which the majority of the data is unique.

Why Not to Create an Index

Indexes consume disk space and incur overhead and maintenance costs. Consider the following facts and guidelines about indexes:

  • When you modify data in an indexed table, SQL Server has to update the associated indexes.

  • Maintaining indexes requires time and resources. Therefore, do not create an index that you will use infrequently.

  • Indexes on small tables may have few benefits because it can be more costly to traverse the index pages than to scan the whole table.

Note: Indexes are not required. You can query and manipulate data without an index. However, data access is considerably faster when appropriate indexes are available.

Indexing Guidelines

Your business environment, data characteristics, and use of the data determine the columns that you specify to build indexes. The usefulness of an index is directly related to the percentage of rows in a table returned by a query. Low percentages or highly selective queries derive the most benefit from indexes.

When you create an index on a column, the column is referred to as the index column. A value within an index column is called a key value.

Note: Every column in a table can be indexed and a column can be part of more than one index but doing so is generally inefficient.

Columns to Index

Create indexes on frequently searched columns, such as:

  • Primary keys

  • Foreign keys or other columns that are used frequently in joining tables

  • Columns that are searched for ranges of key values

  • Columns that are accessed in sorted order

Columns Not to Index

Do not index columns that:

  • You seldom reference in a query.

  • Contain few unique values. For example, an index on a column with only two values, such as male and female, returns a high percentage of rows and is not beneficial.

  • Are defined with bit, text, and image data types. Columns with these data types cannot be indexed.

Lesson Summary

Tables are stored in doubly-linked page chains if a clustered index has been created on the table. Tables are stored in a collection of unlinked pages called a heap if there is no clustered index on the table. SQL Server indexes, which are similar to an index in a book, make it possible to quickly retrieve data in a table even when accessing data in a different order to which it is stored. Indexes are required to enforce unique data in a column in a table.

Lesson 2: Index Architecture

The index architecture for clustered and nonclustered indexes is different. Understanding the differences in architecture will help you create the most effective type of index.

A clustered index physically orders the rows in a table. Nonclustered indexes use storage location information in the index pages to navigate to the data pages—this is referred to as logical ordering.

For a video demonstraton that covers SQL Server Indexing, run the index.htm file from the \Media folder on the Supplemental Course Material CD-ROM.

After this lesson you will be able to:

  • Describe the structure of clustered and nonclustered indexes.

  • Describe how SQL Server accesses data using indexes.

  • Decide on the types of indexes to create.

Estimated lesson time: 60 minutes

Clustered Indexes

A clustered index is structured like an inverted tree, called a B-tree. The root of the tree at the top of the structure is the entry point to the index. The branches of the tree leading down from the root are used to navigate the index to the leaves at the bottom of the tree, which are the data pages. SQL Server splits pages in the tree to keep the tree balanced even when many rows with similar key values are added to the table.

Figure 6.3 illustrates the following major components of a clustered index balanced tree:

  • The non-leaf level of the tree stores only the subset of the key values that is necessary for index navigation. There are many levels in the non-leaf level. The non-leaf level is stored in index pages.

  • The leaf level of the tree stores all the rows of the table in key order. There is only one level in the leaf level. The leaf level for clustered indexes is stored in data pages.

  • The first page in the non-leaf level is called the root page. The other pages in the non-leaf level are called intermediate levels. The number of intermediate levels increases as the size of the table increases but always stays relatively small even for extremely large tables. The small number of intermediate levels means that relatively few pages are read for any search.

  • SQL Server adds an internal identifier to rows with duplicate key values. These identifiers are shown in Figure 6.3 but are not accessible to SQL Server users or administrators.

Cc917624.f06xx03(en-us,TechNet.10).gif

Figure 6.3: Clustered index architecture.

Clustered Index Architecture

Together, the leaf and non-leaf levels of a clustered index make up a special type of balanced tree. The lowest level of the index is the leaf level. In a clustered index, the data pages of a table are the leaf level and the index pages above the leaf level are the non-leaf level. The order of the values in a clustered index is always ascending, as shown in the balanced tree, or B-tree diagram, in Figure 6.3.

The purpose of an index is to quickly locate a specific key value. Non-leaf levels of an index facilitate this as follows:

The leaf level pages of an index store every key value in order. The non-leaf levels store the first key values from each of a set of leaf or lower non-leaf level pages and the page addresses of each those pages. This makes it possible to quickly search for a key value even when the table contains millions of rows.

Just as you use a telephone directory by turning many pages at a time and checking only the value of the first entry on each page, non-leaf level pages of an index allow SQL Server to quickly locate the leaf level page that contains a desired key value.

Creating Clustered Indexes

Create a clustered index on columns in tables that are searched frequently for ranges of key values or are accessed in sorted order. When you create a clustered index, consider the following facts and guidelines:

  • Each table can have only one clustered index because rows can only be stored in one physical order.

  • The physical row order of the table and the order of rows in the index are the same. You should create clustered indexes before you create any nonclustered indexes because a clustered index changes the physical row order of the table. Rows are sorted into a sequenced order and maintained in that order.

  • Key values in a clustered index must be unique. Uniqueness is maintained explicitly if the clustered index is created with the UNIQUE keyword because duplicates are not allowed. If the clustered index is not created with the UNIQUE keyword, SQL Server implicitly maintains uniqueness by adding an internal unique identifier to rows that have duplicate key values. These 8-byte unique identifiers are internal to SQL Server and are not accessible to the user.

  • The average size of a clustered index is about five percent of the table size. The extra space is used by the index pages and by the 8-byte identifier added to each duplicate row. The actual percentage increase in table size due to adding a clustered index varies depending on the size of the indexed column and on how many rows with duplicate key values exist in the table.

  • During index creation, SQL Server temporarily uses space in the database to copy and sort the table. A clustered index requires about 1.2 times the table size for working space when the index is created. The space that is used during index creation is available after the index is created.

Be sure you have sufficient disk space in the database when you create clustered indexes. For example, if you are creating a clustered index on a table that is currently 20 MB in size, then an extra 24 MB of space (44 MB in all) will be used while the clustered index is being created. The table will be about 21 MB in size after the clustered index has been added.

Accessing Data with a Clustered Index

When SQL Server needs to locate a single row or a range of rows, it first determines whether an index exists on an appropriate column and whether that index is useful for retrieving the desired data. If no useful index exists, SQL Server performs a table scan. If an index exists and is useful, SQL Server navigates through the index to the data. On successive index pages, starting from the root page, the search value is checked against each key value on the page. When the key value that is equal to or immediately preceding the search value is found, SQL Server moves to the page addressed by that value. If the page is another non-leaf level page, the process continues. If the page is a leaf level page, the desired row is accessed on the page.

Example

Figure 6.4 shows how the index tree is navigated, using a clustered index on the lastname column, for the following query:

SELECT lastname, firstname
FROM member
WHERE lastname = 'Rudd'

Cc917624.f06xx04(en-us,TechNet.10).gif

Figure 6.4: Accessing data with a clustered index.

SQL Server goes through the following steps to retrieve the information requested in the preceding query:

  1. SQL Server determines that an index exists on the lastname column and that it is useful for retrieving rows that contain the last name Rudd.

  2. In the non-leaf level, the search begins on the root page (page 140). On this page the search value Rudd is greater than Martin, the last key value on the page. The search proceeds to the page to which this key value points (page 145).

  3. The search continues in the non-leaf level on the index page (page 145). On this page the search value Rudd lies between the key values Martin and Smith. The search proceeds to the page to which the first of these key values points (page 120).

  4. The search has reached the leaf level so the data page (page 120) is scanned for the row that has a key value matching the search value Rudd. The data row is then retrieved.

Accessing a Range of Data using a Clustered Index

When a range of rows is accessed using a clustered index, the first row is located using an index search. Now, because the rows are stored in order, the rest of the rows in the range can be scanned in succession. When a row is encountered that has a key value larger than the end of the range, processing stops.

Example

Using the previous example in Figure 6.4, the query is changed to:

SELECT lastname, firstname
FROM member
WHERE lastname BETWEEN 'Rudd' AND 'Smith'

The first row is located using an index search, as in the previous query . The rest of the rows are then accessed in order until a row with a key value greater than Smith is encountered, at which point all rows have been accessed and processing stops. Page links are used to move from page 120 to page 130. Only one index search is necessary and each data page is read only once because the rows in the range are adjacent to each other.

Nonclustered Indexes

Nonclustered indexes have the same B-tree structure as clustered indexes, with two significant differences. The data rows are not stored in order and the leaf level of a nonclustered index contains key values and references to the rows of data, instead of the data rows themselves. Figure 6.5 illustrates how the nonclustered index is built on the rows of data in the table, which are stored either in a heap or in the clustered index, if there is one.

Cc917624.f06xx05(en-us,TechNet.10).gif

Figure 6.5: Nonclustered index architecture.

Architecture of a Nonclustered Index on a Heap

When a nonclustered index is built on top of a heap (a table without a clustered index), SQL Server uses row identifiers in the index pages that point to rows in the data pages. The row identifiers store data location information.

Heaps are maintained using IAM pages. IAM pages contain the addresses of the extents that have been allocated to a heap. The sysindexes system table stores a pointer to the first IAM page associated with a heap. SQL Server uses the IAM pages to navigate through the heap and to find available space for new rows being inserted. The data pages and the rows within them are not in any specific order and are not linked together. The only logical connection between data pages is that which is recorded in the IAM pages.

Page splits never need to occur in heaps because pages are not linked. This improves performance because if a page split did occur, the row identifiers of all the rows that were moved to a new page would have to be updated in all nonclustered indexes.

Architecture of a Nonclustered Index on a Table with a Clustered Index

When a nonclustered index is built on a table with a clustered index, SQL Server uses the clustered index keys in the index pages to point to the rows in the data pages.

Nonclustered Indexes Are Automatically Rebuilt

SQL Server automatically rebuilds existing nonclustered indexes when any of the following occurs:

  • An existing clustered index is dropped using the DROP INDEX statement.

  • A clustered index is created.

  • The DROP_EXISTING option is used to change a clustered index column definition.

Creating Nonclustered Indexes

Nonclustered indexes are useful when users require multiple ways to search data. For example, if you have created a clustered index on the customer code column of a customer table, you could create nonclustered indexes on customer names and customer categories.

When you create a nonclustered index, consider the following facts:

  • Nonclustered indexes are the SQL Server default.

  • You can have up to 249 nonclustered indexes per table.

  • Nonclustered indexes are best created on columns that contain highly selective or unique data.

  • SQL Server maintains the data pages in a heap unless a clustered index is defined on the table.

Nonclustered Index on a Heap

When you create a nonclustered index on a heap, for each row in the data pages, the leaf level contains the key value of the row and a row identifier that points to the row. In Figure 6.6, notice how:

  • The row identifiers specify the physical location of rows using the file ID, page number, and the number of the row on the page.

  • The order of the rows in the leaf level pages of a nonclustered index differs from the physical order of the rows of the table. The key values are maintained in ascending sorted order.

  • Uniqueness can be validated at the leaf level because the key value for each row is stored in sequence in the leaf level.

Cc917624.f06xx06(en-us,TechNet.10).gif

Figure 6.6: Nonclustered index on a heap architecture.

Accessing Data using a Nonclustered Index on a Heap

When SQL Server needs to locate a single row or a range of rows, it first determines whether an index exists on an appropriate column and whether that index is useful for retrieving the desired data. If no useful index exists, SQL Server performs a table scan. If an index exists and is useful, SQL Server navigates through the index to the data. On successive index pages, starting from the root page, the search value is checked against each key value on the page. When the key value that is equal to or immediately preceding the search value is found, SQL Server moves to the page addressed by that value. If the page is another non-leaf level page, the process continues. If the page is a leaf level page, the row identifier from the leaf level row is used to access the desired row on the data page.

Example

Figure 6.7 shows how the index tree is navigated, using a nonclustered index on the lastname column, for the following query:

SELECT lastname, firstname
FROM member
WHERE lastname = 'Rudd'

Cc917624.f06xx07(en-us,TechNet.10).gif

Figure 6.7: Accessing data using a nonclustered index on a heap.

SQL Server goes through the following steps to retrieve the information:

  1. SQL Server determines that an index exists on the lastname column and that it is useful for retrieving rows that contain the last name Rudd.

  2. In the non-leaf level, the search begins on the root page (page 12). On this page the search value Rudd is greater than Martin, the last key value on the page. The search proceeds to the page to which this key value points (page 28).

  3. The search continues in the non-leaf level on the index page (page 28). On this page the search value Rudd lies between the key values Martin and Smith. The search proceeds to the page to which the first of these key values points (page 61).

  4. The search has reached the leaf level so the index page (page 61) is scanned for the index row that has a key value matching the search value Rudd. The row identifier 470501, is used to retrieve row 1 on the data page (page 5).

Nonclustered Index on a Table with a Clustered Index

If a table has a clustered index, the leaf level of the nonclustered index contains clustered index key values of the data rows rather than to the physical row identifiers used for heaps. Figure 6.8 shows how rows in the leaf level of a nonclustered index, built on member numbers in the member table, reference the clustering keys of the clustered index, which is built on the lastname column.

Cc917624.f06xx08(en-us,TechNet.10).gif

Figure 6.8: Nonclustered index on a table with a clustered index.

When you create a nonclustered index on a table with a clustered index, consider the following facts and guidelines:

  • To locate a row using the nonclustered index, SQL Server evaluates both indexes—the nonclustered index, followed by the clustered index.

  • Create clustered indexes before nonclustered indexes because the nonclustered indexes are automatically dropped and recreated when a clustered index is created.

  • Using the clustered index key values as the row locators reduces the cost of maintaining nonclustered indexes because nonclustered indexes do not have to be updated if a page split occurs in a data page. However, some maintenance cost is incurred when data in the clustered index columns is modified.

  • Using the clustered index key values as the row locators means that nonclustered indexes will be considerably larger if the clustered index is created on one or more large columns.

  • Using the clustered index key values as the row locators for nonclustered indexes is the reason why duplicate clustered index key values have an internal identifier added to them to make them unique.

Note: Using the clustered index key values as the row locators is new in SQL Server 7.0. Previous versions of SQL Server used row identifiers for tables with or without a clustered index.

Accessing Data Using a Nonclustered Index on a Table with a Clustered Index

When SQL Server accesses data using a nonclustered index that is built on a table with a clustered index, it first searches the nonclustered index and then finds the data row by searching the clustered index. Notice that the index in the following example is different from that in the previous examples. This index is created on member numbers rather than last names.

Example

Figure 6.9 shows how the index tree is navigated using the nonclustered index on the member_no column. The clustered index key values are used as the row locators at the leaf level (for example, the clustered index key value for member number 6078 is Rudd). The clustered index key value Rudd is used to navigate the clustered index.

SELECT lastname, firstname
FROM member
WHERE member_no = 6078

Cc917624.f06xx09(en-us,TechNet.10).gif

Figure 6.9: Accessing data using a nonclustered index on a table with a clustered index.

SQL Server goes through the following steps to retrieve the information:

  1. SQL Server determines that an index exists on the member_no column and that it is useful for retrieving rows that contain the member number 6078.

  2. In the non-leaf level, the search begins on the root page (page 12). On this page the search value 6078, is greater than 5678, the last key value on the page. The search proceeds to the page to which this key value points (page 28).

  3. The search continues in the non-leaf level on the index page (page 28). On this page the search value 6078, lies between the key values 5678 and 7678. The search proceeds to the page to which the first of these key values points (page 61).

  4. The search has reached the leaf level so the index page (page 61) is scanned for the index row that has a key value matching the search value 6078. The clustered index key value (lastname = Rudd) is then used to perform a clustered index search and locate the row on page 120 (a data page).

Accessing a Range of Data using a Nonclustered Index

When you access a range of data through a nonclustered index, data retrieval may be slower than if you had accessed the data through a clustered index.

In a nonclustered index, the key values in the leaf level pages are sorted, but the data rows and the sequence of the data pages are:

  • Stored in no particular order if the nonclustered index is built on a heap

  • Stored in a different order if the nonclustered index is built on a table with a clustered index

Therefore, to retrieve a range of values, SQL Server processes the leaf level key values in order after performing a single nonclustered index search, but must read a different data page to access each row in the range. For a nonclustered index built on a table with a clustered index, SQL Server will have to perform a clustered index search for each row in the range in order to read the data page and access the row.

Example

Figure 6.10 shows how the index tree is navigated using a nonclustered index on the lastname column to retrieve all rows in a heap table for which the last names are in the range between Martin and Rudd.

SELECT lastname, firstname
FROM member
WHERE lastname BETWEEN 'Martin' AND 'Rudd'

Cc917624.f06xx10(en-us,TechNet.10).gif

Figure 6.10: Accessing a range of data using a nonclustered index on a heap.

SQL Server goes through the following steps to retrieve the information:

  1. SQL Server determines that an index exists on the lastname column and that it is useful for retrieving rows that contain the last names Martin through Rudd. The search value for this search is Martin, the value of the beginning of the range.

  2. In the non-leaf level, the search begins on the root page (page 12). On this page the search value Martin, is equal to Martin, the last key value on the page. The search proceeds to the page to which this key value points (page 28).

  3. The search continues in the non-leaf level on the index page (page 28). On this page the search value Martin, is equal to the key value Martin. The search proceeds to the page to which this key values points (page 61).

  4. The search has reached the leaf level so the index page (page 61) is scanned for the index rows starting at the row for key value Martin and ending with the row for key value Rudd. The row identifiers from these index rows are used to retrieve the data rows. Five data rows, located on pages 5, 6, 7, and 8 are retrieved.

Example

Figure 6.11 shows how the index tree is navigated using a nonclustered index on the lastname column to retrieve all rows in a table with a clustered index for which the last names are in the range between Martin and Rudd. The member_no column maps to a clustering key value at the leaf level. For example, the supplied starting key value of 5678 is mapped to a clustering key value of Barr. For each key value in the range, SQL Server traverses the clustered index using the mapped clustering key.

SELECT lastname, firstname
FROM member
WHERE member_no BETWEEN 5678 AND 6078

Cc917624.f06xx11(en-us,TechNet.10).gif

Figure 6.11: Accessing a range of data using a nonclustered index on a table with a clustered index.

SQL Server goes through the following steps to retrieve the information:

  1. SQL Server determines that an index exists on the member_no column and that it is useful for retrieving rows that are between member numbers 5678 and 6078. The search value for this search is 5678, the value of the beginning of the range.

  2. In the non-leaf level, the search begins on the root page (page 12). On this page the search value 5678, is equal to 5678, the last key value on the page. The search proceeds to the page to which this key value points (page 28).

  3. The search continues in the non-leaf level on the index page (page 28). On this page the search value 5678, is equal to the key value 5678. The search proceeds to the page to which this key values points (page 61).

  4. The search has reached the leaf level so the index page (page 61) is scanned for the index rows starting at the row for key value 5678 and ending with the row for key value 6078. The clustered index key values from these index rows are used to perform clustered index searches for the data rows. Five data rows, located on pages 100, 110, 120, and 130 are retrieved.

Lesson Summary

Clustered indexes store data in a table in index order, changing the physical layout of the data. Nonclustered indexes are stored separately from the data in a table and have no effect on the physical layout of the table. Nonclustered indexes use a physical row identifier to retrieve rows from a table with no clustered index but they use the clustering key to retrieve rows from a table with a clustered index.

Lesson 3: Creating Indexes

Now that you are familiar with the different index architectures, this lesson will discuss creating indexes and obtaining information on existing indexes.

After this lesson you will be able to:

  • Create clustered and nonclustered indexes with unique or composite characteristics.

  • Use the CREATE INDEX options to expedite index creation and improve index performance.

  • Apply the appropriate fillfactor value to accommodate the future growth of tables.

Estimated lesson time: 75 minutes

Creating and Dropping Indexes

You create indexes using the CREATE INDEX statement and remove them using the DROP INDEX statement.

Using the CREATE INDEX Statement

Use the CREATE INDEX statement to create indexes. You also can use the Create Index wizard in SQL Server Enterprise Manager. When you create an index on one or more columns in a table, consider the following facts and guidelines:

  • SQL Server automatically creates indexes when a PRIMARY KEY or UNIQUE constraint is created on a table. Defining a PRIMARY KEY or UNIQUE constraint is preferred over creating standard indexes.

  • You must be the table owner to execute the CREATE INDEX statement.

  • Indexes cannot be created on views.

  • SQL Server stores index information in the sysindexes system table.

  • Before you create an index on a column, determine whether indexes already exist on that column.

  • Keep your indexes small by limiting the key values to one or two columns. Typically, smaller indexes are more efficient than indexes with larger key values.

  • Select columns on the basis of uniqueness so that each key value identifies a small number of rows.

Syntax

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED]
INDEX index_name ON table (column [, ...n])
[WITH
[PAD_INDEX]
[[,] FILLFACTOR = fillfactor]
[[,] IGNORE_DUP_KEY]

If you do not specify UNIQUE in the CREATE INDEX statement, the index will allow duplicates. If you do not specify CLUSTERED in the CREATE INDEX statement, a nonclustered index will be created.

Example

This example creates a clustered index on the lastname column in the member table.

CREATE CLUSTERED INDEX cl_lastname
     ON library..member (lastname)

Using the DROP INDEX Statement

Use the DROP INDEX statement to remove an index from a table. When you drop an index, consider the following facts:

  • SQL Server reclaims disk space that is occupied by the index when you execute the DROP INDEX statement.

  • You cannot use the DROP INDEX statement on indexes that are created by PRIMARY KEY or UNIQUE constraints. You must drop the constraint in order to drop these indexes.

  • When you drop a table, all indexes for that table are also dropped.

  • When you drop a clustered index, all nonclustered indexes on the table are rebuilt automatically.

  • You must make the database in which an index resides the current database in order to drop that index.

  • The DROP INDEX statement cannot be used on system tables.

Syntax

DROP INDEX table . index [, ...n]

Example

This example drops the cl_lastname index from the member table.

USE library
DROP INDEX member.cl_lastname
  • To create an index on the loan table

In this exercise, you will open a script file that creates an index, review the contents of the script, execute the script, and then verify that the index was created.

  1. Open SQL Server Query Analyzer and verify that you are using the library database.

  2. Open the c:\sqlimpl\exercise\ch06\creaind1.sql script file.

  3. Review the CREATE INDEX statement. This script creates a nonclustered index named loan_member_link on the member_no column in the loan table with a fillfactor value of 75.

  4. Execute the script file.

  5. Verify that the loan_member_link index was created by executing the following statement:

EXEC sp_helpindex loan

  1. The results of the sp_helpindex system stored procedure show that the loan_member_link nonclustered index exists on the member_no column of the loan table.

Creating Unique Indexes

A unique index ensures that all data in an indexed column does not contain duplicate values. If the table has a PRIMARY KEY or UNIQUE constraint, SQL Server automatically creates a unique index when you execute the CREATE TABLE or ALTER TABLE statements.

Create a unique index if you want an index that is independent of constraints. However, in most cases, create PRIMARY KEY or UNIQUE constraints rather than creating unique indexes. When you create a unique index, consider the following facts and guidelines:

  • If a table contains data, SQL Server checks for duplicate values when you create the index. If the data contains duplicates, the CREATE INDEX statement fails.

  • SQL Server checks for duplicate values each time you use the INSERT or UPDATE statement. If duplicate key values exist, SQL Server rolls back the entire update and returns an error message with the first duplicate.

  • Create unique indexes only on columns in which entity integrity can be enforced. For example, you would not create a unique index on the lastname column of the member table because some members may have the same last name.

Example

Figure 6.12 shows the creation of a unique, nonclustered index named title_ident on the title table. The index is built on the title_no column. The value in the title_no column must be a unique value for each row of the table.

USE library
CREATE UNIQUE INDEX title_ident
 ON title (title_no)

Figure 6.12: Creating unique indexes.

Figure 6.12: Creating unique indexes.

Finding All Duplicate Values in a Column

If duplicate key values exist when you create a unique index, the CREATE INDEX statement fails. SQL Server returns an error message with the first duplicate, but other duplicate values may exist as well. Use the following sample script on any table to find all duplicate values in a column. Replace the italicized text with information specific to your query.

SELECT index_col, COUNT(index_col) AS '# of Duplicates'
FROM tablename
GROUP BY index_col HAVING COUNT(index_col) > 1 
ORDER BY index_col

Example

This example determines whether duplicate member numbers exist in the member_no column in the member table. If so, SQL Server returns the member number and number of duplicate entries for each duplicate member number in the table.

SELECT member_no, COUNT(member_no) AS '# of Duplicates'
FROM library..member
GROUP BY member_no HAVING COUNT(member_no) > 1 
ORDER BY member_no

Creating Composite Indexes

An index that is created on more than one column in a table is called a composite index. You can create composite indexes:

  • When two or more columns are best searched as a key.

  • If queries reference only the columns in the index. This is called a covered query. When a query is covered, the data rows do not have to be retrieved as the column values are read directly from the index keys.

For example, a telephone directory is a good example of where a composite index would be useful. The directory is organized by last names. Within the last names, it is organized by first names because entries with the same last name often exist. Although the entries are indexed by last name and first name, you cannot find an entry in the telephone directory by first name only.

When you create a composite index, consider the following facts and guidelines:

  • You can combine as many as 16 columns into a single composite index. The sum of the lengths of the columns that make up the composite index cannot exceed 900 bytes.

  • All columns in a composite index must be from the same table.

  • Define the most unique column first. The first column defined in the CREATE INDEX statement is referred to as the highest order. In the telephone directory example, the last name is the high order column and the first name is the low order column.

  • An index on (column1, column2) is not the same as an index on (column2, column1)—each has a distinct column order. Generally, the column that contains more selective data or that would return the lowest percentage of rows should be the higher order column.

  • If the WHERE clause of a query references a lower order column of the composite index, it must also reference all higher order columns in the index for the query optimizer to use the composite index. Like using the first name in a telephone directory, it is only useful if you have first found the last name.

  • Composite indexes are useful for tables with multiple column keys. For example, in a sales table the key may be made up of a column that specifies a sales region and another column that specifies an invoice number.

  • Used correctly, composite indexes can increase query performance and reduce the number of indexes that you create on a table.

  • Multiple indexes on the same columns are typically not useful.

Example

Figure 6.13 shows the creation of a nonclustered, composite index on the loan table. The isbn and the copy_no columns are the composite key values. Notice that the isbn column is listed first because it is more selective than the copy_no column.

USE library
CREATE INDEX loan_ident
ON loan (isbn, copy_no)

Figure 6.13: Creating composite indexes.

Figure 6.13: Creating composite indexes.

CREATE INDEX Options

SQL Server offers several CREATE INDEX options that can speed up index creation and enhance index performance over time.

Using the FILLFACTOR Option

Use the FILLFACTOR option to optimize the performance of INSERT and UPDATE statements on tables that contain clustered or nonclustered indexes.

When an index page becomes full, SQL Server must take time to split the page in order to make room for new rows. The FILLFACTOR option specifies, as a percentage, how much to fill the leaf level pages. Leaving free space reduces page splitting, because pages do not start out full, but increases the size of the index. Figure 6.14 shows leaf pages before and after specifying a fillfactor of 50 percent.

Cc917624.f06xx14(en-us,TechNet.10).gif

Figure 6.14: Using the FILLFACTOR option.

Remember that the leaf level pages are the data pages for a clustered index. For clustered indexes, fillfactor affects the level to which data pages are filled. The level, in turn, affects how many rows are stored on a page, which determines the size of the table.

The FILLFACTOR option is applied when the index is created or rebuilt. SQL Server does not maintain the specified percentage of allocated space on the index pages dynamically; this would be counter-productive and increase page splitting.

The value that you specify for fillfactor has the following effects:

  • A lower fillfactor value increases UPDATE and INSERT performance because of reduced page splitting. A lower fillfactor value is suited to online transaction processing (OLTP) environments.

  • A higher fillfactor value increases query or read performance because the rows can be read from fewer pages. A higher fillfactor value is suited to decision support services (DSS) environments.

When you use the FILLFACTOR option, consider the following facts and guidelines:

  • Fillfactor values range from 0 to 100 percent.

  • The default fillfactor value is 0. A fillfactor value of 0 does not mean that pages are 0 percent full; it is similar to a fillfactor value of 100 percent in that SQL Server creates clustered indexes with full data pages and nonclustered indexes with full leaf pages. A fillfactor value of 0 is different from 100 in that SQL Server leaves some space within the upper level of the index tree and leaves room for the maximum size of one index entry in the non-leaf level index pages.

  • The default fillfactor value can be changed at the server level by using the sp_configure system stored procedure.

  • The sysindexes system table has a row for each index that stores the fillfactor value that was applied when the index was last rebuilt, along with other index information.

  • The number of rows that are placed on each page is rounded up. For example, a fillfactor of 65 specifies that 65 percent of each leaf level page should be filled. If SQL Server calculates that 23.2 rows can fit into that space, it will place 24 rows on each page when the index is built.

Using the PAD_INDEX Option

The PAD_INDEX option specifies that free space be left on non-leaf level index pages. The PAD_INDEX option can only be used when FILLFACTOR is specified because the PAD_INDEX percentage value is determined by the percentage value specified for FILLFACTOR.

When you use the PAD_INDEX option, consider the following facts:

  • SQL Server applies the percentage that the FILLFACTOR option specifies to the non-leaf level pages as well as the leaf level pages.

  • SQL Server always leaves enough room to accommodate at least one row of the maximum index row size on each non-leaf level page, regardless of how high the fillfactor value is.

  • The number of rows on the non-leaf level index page is never less than two, regardless of how low the fillfactor value is.

The following table summarizes the effect of FILLFACTOR and PAD_INDEX option settings.

FILLFACTOR percentage

Leaf level pages

PAD_INDEX specified

Non-leaf level pages

Not specified (default to 0)

Fill completely

No

Leave room for one index entry

 

 

Yes

Leave room for one index entry

1–99

Fill to specified percentage

No

Leave room for one index entry

 

 

Yes

Fill to specified percentage (But place at least 2 rows on each page and leave room for at least 1 row on each page.)

100

Fill completely

No

Leave room for one index entry

 

 

Yes

Leave room for one index entry

Example

This example creates the author_id index on the author column in the title table. By specifying the PAD_INDEX option with the FILLFACTOR option, SQL Server creates leaf level and non-leaf level pages that are 10 percent full. However, if you do not use the PAD_INDEX option, the leaf level pages are 10 percent full and the non-leaf level pages are filled almost completely.

USE library
CREATE INDEX author_ind
 ON title (author)
 WITH PAD_INDEX, FILLFACTOR=10 

Obtaining Information on Existing Indexes

You may require information about existing indexes before you create, modify, or remove an index.

Using the sp_helpindex Stored Procedure

You can execute the sp_helpindex system stored procedure to obtain information such as index name, description, and keys for the indexes on a specific table. You can also use SQL Server Enterprise Manager to obtain this information.

Example

This example lists the indexes on the member table.

USE library
EXEC sp_helpindex member

Using the sp_help tablename Stored Procedure

You can also execute the sp_help tablename system stored procedure to obtain information on indexes, as well as other table information.

Example

This example lists information about the member table.

USE library
EXEC sp_help member
  • To use Transact-SQL to create indexes on foreign keys that reference the title table

In this exercise, you will create clustered indexes for foreign key references to the title_no column of the title table. The c:\sqlimpl\exercise\ch06\creaind2.sql script file contains a completed script for this exercise.

  1. Open SQL Server Query Analyzer and verify that you are using the library database.

  2. Write and execute a script that creates the indexes described in the following table:

CREATE CLUSTERED INDEX item_title_link ON item (title_no) CREATE CLUSTERED INDEX copy_title_link ON copy (title_no)

<table>
<colgroup>
<col style="width: 25%" />
<col style="width: 25%" />
<col style="width: 25%" />
<col style="width: 25%" />
</colgroup>
<thead>
<tr class="header">
<th><p>Index type</p></th>
<th><p>Name</p></th>
<th><p>Table</p></th>
<th><p>Column</p></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><p>Clustered</p></td>
<td><p>item_title_link</p></td>
<td><p>item</p></td>
<td><p>title_no</p></td>
</tr>
<tr class="even">
<td><p>Clustered</p></td>
<td><p>copy_title_link</p></td>
<td><p>copy</p></td>
<td><p>title_no</p></td>
</tr>
</tbody>
</table>
  1. Use the sp_helpindex system stored procedure to verify that the indexes were created.
  • To use Create Index Wizard to create indexes on foreign keys that reference the title table

In this exercise, you will use the Create Index Wizard to create a clustered and a nonclustered index for foreign key references to the title_no column of the title table.

  1. Open SQL Server Enterprise Manager, expand your server, and click on it.

  2. On the Tools menu, click Wizards.

  3. Expand Database, select Create Index Wizard, and then click OK.

  4. Use the wizard to create the following indexes in the library database:

    Table

    Column

    Index type

    Fillfactor

    Name

    Loan

    title_no

    Clustered

    Optimal

    loan_title_link

    Loanhist

    title_no

    Nonclustered

    Optimal

    loanhist_title_link

  • To create all other indexes for the library database

In this exercise, you will execute a script that creates clustered and nonclustered indexes. You will verify the successful creation and space usage of the indexes.

  1. Open SQL Server Query Analyzer and verify that you are using the library database.

  2. Open and execute c:\sqlimpl\exercise\ch06\creatind3.sql to create the remaining indexes.

You may notice that indexes are not created for all foreign keys. This is because some of the foreign keys are also primary keys, which had indexes created for them automatically when the primary key constraints were created.

  • To verify that the indexes that you created exist

In this exercise, you will query the sysindexes system table to verify that the indexes you created exist and are correct.

  1. Open and execute c:\sqlimpl\exercise\ch06\verify.sql to list all the indexes created in the library database.

    Are all the foreign key indexes listed for the library database?

  2. Execute the sp_helpindex system stored procedure on the loanhist table.

    What are the results?

  3. Execute the sp_helpindex system stored procedure on the juvenile table.

    Why should you have the juvenile_adult_link index on the adult_member_no column in the juvenile table?

    Why are some of the indexes not unique?

Lesson Summary

Use the CREATE INDEX statement to create any type of index and use the DROP INDEX statement to drop indexes. Unique indexes are required to prevent duplicate values in table columns. Composite indexes are useful when more than one column in a table is used to identify rows and when queries are covered by the composite index. Specifying a fillfactor value when creating indexes on tables that are frequently modified reduces fragmentation, which improves write performance but may decrease read performance.

Review

The following questions are intended to reinforce key information presented in this chapter. If you are unable to answer a question, review the appropriate lesson and then try the question again. Answers to the questions can be found in Appendix A, "Questions and Answers," located at the back of the book.

  1. You are considering creating a unique, composite, clustered index on the company name, last name, and first name columns of a table. What are some important points to consider when you create an index? Is there a better solution?

The above article is courtesy of Microsoft Press https://www.microsoft.com/mspress/. Copyright 1999, Microsoft Corporation.

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages. All prices for products mentioned in this document are subject to change without notice.

International rights = English only.

Link