All About SQL Server Indexes

Article
08/07/2008

Wednesday, March 27, 2002, 10:00 AM PST

Host Guide_KenM

Welcome to today's TechNet Chat. Our topic is All About SQL Server Indexes, sponsored by PASS. Questions, comments, and suggestions are welcome.

Host Guide_KenM

We have a full line-up of SMEs (Subject Matter Experts) for you today:

Host Guest_Lubor_MS

Hello, I'm Program Manager in the Query Processor group.

Host Guest_Kurt_PASS

Hi, I'm Kurt Windisch, PASS Director of program development. Thanks for attending!

Host Guest_Mirek_MS

Hi, I am Mirek Program Manager in Storage Engine

Host Guest_Sameet_MS

Hi, I am Sameet Agarwal, Developer, SQL Server Storage Engine

Host Guest_stephd_MS

Hi, I'm Steve Dybing, a program manager in the Communities group within PSS.

Host Guest_Sri_MS

Hi, I am Sri Developer in SQL Server Storage Engine

Host Guest_Lubor_MS

Q: Guru : Indexes on view will be beneficial in OLTP environment or not

Host Guest_Lubor_MS

A: Usually not. Indexes on views are beneficial more in decision support type of applications.

Host Guest_Sameet_MS

Q: Kevin: (paraphrased by me): Is it a bad idea to have clustered index on fields such that it creates hot pages?

Host Guest_Sameet_MS

A: In SQL Server 6.5 and before, it was a bad idea to have hot pages in clustered indexes because of page level locking. With SQL Server 7.0 and beyond, the penalty for having hot pages is not that severe. In fact in several cases, it might be better for performance to have hot clustered index pages since they are always in the buffer pool

Host Guest_Lubor_MS

Q: OSUcrew : Help? When I tune index with maintenance plan, I got the following error : [Microsoft][ODBC SQL Server Driver][SQL Server]UPDATE STATISTICS failed because the following SET options have incorrect settings:'QUOTED_IDENTIFIRER, ARITHABORT'

Host Guest_Lubor_MS

A: Not sure what you mean by "tune with maintenance plan". Do you have indexed views?

Host Guest_Mirek_MS

Q: SpeedyDB: What is a good guideline for telling when and index is fragmented enough that it should be rebuilt or defragged?

Host Guest_Mirek_MS

A: Run DBCC showcontig It will indicate the density if its around 20-20% you need to defrag

Host Guest_Sri_MS

Q: SUE: What method does SQL maint use to rebuild indexes (via database maintenance plan)

Host Guest_Sri_MS

A: They use the regular DBCC Rebuild index statement to build indexes. Internally DBCC reindex does a drop of the index followed by the create.

Host Guest_Lubor_MS

Q: jude : What is the best way to identify which indexes are being used via Profiler?

Host Guest_Lubor_MS

A: Use the Scans even category.

Host Guest_Mirek_MS

Q: Edster: If we are experiecing alot of page level locks, is it wise to use the sp_indexoptions to turn off page locks?

Host Guest_Mirek_MS

A: yes you can turn page lock off and stay with row level lock

Host Guest_Sameet_MS

Q: blueNote (followup to Kevin's question): What are hot clustered index pages?

Host Guest_Sameet_MS

A: Hot pages are pages in clustered indexes which most users will use at the same time. If the clustering keys are chosen in a particular way (like monotonically increasing identity columns
it is possible to make some pages much hotter than others

Host Guest_Sri_MS

Q: Busmaster Jones: For tables with a single index, I have always used a clustered index in such cases. Co-workers are concerned about hotspots. With a good drive system, should the rule be clustered index or unclustered to avoid hotspots for active tab

Host Guest_Sri_MS

A: If you have a single index, clustered index should be the way to go if you are just worried about hotspots. Having a non-clustered index just will move the hot spot to the non-clustered if you are contending on the key values. If you have queries that can be satisfied by the key columns, non-clustered indexes may be better.

Host Guest_Lubor_MS

Q: Tom_Moreau : What criteria do you use for a clustered index when you have anumber of columns that are each used in range queries? Is there some objective criteria that can be used to choose the best one?

Host Guest_Lubor_MS

A: Higher cardinality (distinct values) should go first. I assume you are talking about a single query with more predicates.

Host Guest_Mirek_MS

Q: Bgris: Can someone explain index padding and how it improves index performance (or not)?

Host Guest_Mirek_MS

A: Padding allows to set the fillfactor for higher b-tree level (above leaf level). Usually you don't have to use it, but sometimes it may help

Host Guest_Sameet_MS

Q: didi : Using Data Mart tables for reporting we need to group the data by multiple columns. SQL Server recommends index on all columns that are used in the group by. But such an index will be very large and we will need multiple indexes. What do you reccomend?

Host Guest_Sameet_MS

A: Creating a single index on all the columns in the group by will give the best performance for the group by query. However, if it is not possible to create an index on all the columns, you can still improve the performance of the query by creating indexes on a prefix of the group by columns.

Host Guest_Lubor_MS

Q: don : Is there a usage counter stored somewhere is SQL which tell how many times a particular index has been used?

Host Guest_Lubor_MS

A: Not directly. Use Profiler (answered similar question before in this Chat).

Host Guest_Sameet_MS

Q: kittu: which is better, recreateing the index or dbcc reindex?

Host Guest_Sameet_MS

A: dbcc reindex and creating an index with a drop existing clause are exactly identical internally. So they will give the same performance

Host Guest_Sri_MS

Q: GamCall : I am unclear of the difference between clustered & unclustered indexes & which is best to use in the case of single & multi-field primary keys.

Host Guest_Sri_MS

A: Clustered indexes, have the data of the table in the as its leaf level. Non-clustered indexes have only the key column values and a pointer to the data record. Generally having large clustering keys is not very good, as the clustering keys are also added to all the non-clustered indexes. The number of key columns should not be a consideration, rather the size of the key column values.

Host Guest_cesar_MS

Q: pardy : Let's say I have an index on LastName. My queries says "Select * from this_table where LastName = 'Pardy'"

Host Guest_cesar_MS

A: The index structure is search to directly find rows where LastName is 'Pardy'. Only those pages will be read.

Host Guest_Sri_MS

Q: Techno : do nonclustered indexes point to the "leaf" level of a clustered index

Host Guest_Sri_MS

A: No, actually they point to individual data records of the clustered index. They contain the clustering key column values, which identifies the clustered index row.

Host Guest_Sameet_MS

Q: BusmasterJones : Sameet_MS said "With SQL Server 7.0 and beyond, the penalty for having hot pages is not that severe." So, it is worth having the clustered index rather than making the hop from NC index to the data page?

Host Guest_Sameet_MS

A: Very likely. In general, an non-clustered index to clustered index fetch would be quite expensive. You are better off creating a clustered index for it

Host Guest_Mirek_MS

Q: SpeedyDB: Is there a guide or article that explains maintenance routines one would use for large tables (50M+ rows) and their indexes? Using DBCC SHOWCONTIG and DBCC INDEXDEFRAG are good but has anyone come up with some way to automate it all?

Host Guest_Mirek_MS

Up front I am not aware of any automated process, but you can definitely use EM to create your own maintenance plan using DBCC Showcontig, anf based

Host Guest_Kurt_PASS

Q: Gero : I am pretty experienced as Access coder; but newbie to SQL7. Am halfway through Microsoft Press "SQL Server 7.0 Implementation Training. Am I likely to be prepared to moving my apps to SQL or will I need other books, which? Advice please.!

Host Guest_Kurt_PASS

A: Experience will be the best teacher :). I'd also recommend "Inside SQL Server 2000" by Kalen Delany, "Programming SQL 2000" by Rob Vieria and subscriptions to SQL Mag and Visual Studio Mag.

Host Guest_Mirek_MS

A: Up front I am not aware of any automated process, but you can definitely use EM to create your own maintenance plan using DBCC Showcontig, and based on the Density (below 20%) you can run DBCC INDEXDEFRAG

Host Guest_Sameet_MS

Q: iforbes : Is there a way to programmatically do a DBCC to tell if defrag is necessary?

Host Guest_Sameet_MS

A: Yes. Please look at example 3 in the books online for SQL Server 2000 under DBCC SHOWCONTIG

Host Guest_Lubor_MS

Q: boanerges : are theere any problems that could occur from rebuilding indexes too frequently?

Host Guest_Lubor_MS

A: No if you don't mind the maintenance cost and locking the table while the create runs.

Host Guest_cesar_MS

Q: PKelley : <pt. 1-3, 2-3, 3-3> How are multi-column indices used in query execution.

Host Guest_cesar_MS

A: An index will be used in three ways: To provide columns, to enable seek, and to provide sorting. A multicolumn index, say on (A, B) will be used to seek conditions such as A = 5 and B > 7. In general, the seek (direct access to qualifying pages) will be done on a multi-column condition for which the initial columns are equalities, and the last is non-equality. If you have a condition like A > 5 AND B > 7, then we will seek only on A > 5, and then will evaluate the condition B > 7 on the rows found. If you have a condition on the second column, say just B > 7, the index will be scanned. So, you are better off with one index if you know your queries will always have an equality condition on the earlier columns, say A = x in this case.

Host Guest_Sameet_MS

Q: Pardy : I have a table with approx. 10 million records with no keys or indexes. How can I speed up queries retrieving data from these miserable tables?

Host Guest_Sameet_MS

A: In general, you will need to analyze your query pattern and figure out the right indexes to build. Without indexes, there isn't a general way to speed up access to data

Host Guest_Sri_MS

Q: Pat: When you drop & recreate a clustered index will the unclustered indexes also be defregged?

Host Guest_Sri_MS

A: If you drop a clustered index, the non-clustered indexes are automatically rebuild. On the other hand if you rebuild a clustered index using create index with drop existing, or dbcc rebuild, the non-clustered indexes are rebuild only if the clustered index is non-unique.

Host Guest_Mirek_MS

**Q:**DaveLS : Can the hosts recommend their favorite reference books or materials regarding SQL indexing?

Host Guest_Mirek_MS

A: Inside SQL Server 2000 by Kalen Dalaney

Host Guest_Lubor_MS

Q: Edster : Does SQL Server support bitmap indexes?

Host Guest_Lubor_MS

A: You cannot create them. Engine does it during query execution (as temporary structures). There is a white paper on it I believe.

Host Guest_Sameet_MS

Q: Sem : What field is more preferable for using in an index, for example Char or Varchar

Host Guest_Sameet_MS

A: You should use a char datatype only if the data in that column will all be of the same size. For example, employee-ids, SSNs, etc. For varying size data, it is, in general, preferable to use varchar instead of char

Host Guest_Sri_MS

Q: Denny : We have a 6 million row table and are experimenting with non-clustered indexes. Unfortunately, the indexes we have added have actually slowed response query time. Why would the addition of indexes degrade query performance?

Host Guest_Sri_MS

A: Adding additional indexes involved more maintenance. If your application is update intensive, you will see degradation if all the indexes have to be updates. Is you application update intensive

Host Guest_Lubor_MS

Q: jude : I have been having an obscure problem with Index Tuning Wizard lately...Error Message in gray box "Error when trying to optimize query", that's it....

Host Guest_Lubor_MS

A: Try to isolate the query and tune it separately from Query Analyzer.

Host Guest_cesar_MS

Pardy : cesar_MS: how does SQL know to only look for 'Pardy'?

Host Guest_cesar_MS

A: Because we analyze the predicates in the query, which specify in your example LastName = 'Pardy'

Host Guest_Sameet_MS

Q: Fernando_MVP : Q. From CSacristan: is there any guideline to use composite indexes or multiple single-column indexes when queries incolve conditions on several columns?

Host Guest_Sameet_MS

A: A multiple column index is preferable if a lot of queries lookup using the exact set of columns or a prefix of those columns.

Host Guest_Lubor_MS

Q: Tom_Moreau : My queries would use one predicate or the other; sometimes both.

Host Guest_Sameet_MS

A: If there is no common set of columns among the queries, you are better off creating multiple single column indexes and let the optimizer choose the right set. I think you should use the index tuning wizard as a guide for picking the best set of indexes for your workload

Host Guest_Lubor_MS

A: If you don't mind to pay the price, use all combinations. This is a classic cost/benefit problem.

Host Guest_Sri_MS

Q: Edster : Could you list top 3 index enhancements in the next generation of SQL Server (Yukon)?

Host Guest_Sri_MS

A: Edster, we cannot currently discuss Yukon Features.

Host Guest_cesar_MS

Q: BillAtHCI : Does the order in which you specify your "Where..." clauses matter?

Host Guest_cesar_MS

A: No.

Host Guest_Sri_MS

Q: Fernando_MVP From ecordov

A: Do DBCC DBREINDEX update statistics as well? or do we need to run UPDATE STATISTICS after it?

Host Guest_Sri_MS

A: Yes statistics are automatically updated on rebuild.

Host Guest_Sameet_MS

Q: sqlpal : What would be the limit on no. of indexes on table we can have beyond which we there would be considerable decrease in performance on INSERT/UPDATE/DELETE. Assume table is having 75 columns ?

Host Guest_Sameet_MS

A: Every non-clustered index you build slows down the performance of I/U/D while improving the performance of the queries. The number of indexes you build is basically a tradeoff between how much you want to improve the query performance at the expense of I/U/D

Host Guest_Lubor_MS

Q: Vera : Is there a way to find out when a index or statistic was last updated?

Host Guest_Lubor_MS

A: dbcc show_statistics for stats and index statistics. Index itself - no.

Host Guest_Sri_MS

Q: Techno : Two processes updating two tables. Two update statements accessing the tables in the same order but using different indexes, can deadlock occur?

Host Guest_Sri_MS

A: Yes, if the updates going through different indexes update columns in the other index you can get into deadlocks.

Host Guest_Sri_MS

Q: Pardy : Let's say I have an index on LastName. My queries says "Select * from this_table where LastName = 'Pardy'" - does SQL have to read through each previous record in this table or will it automatically position to PARDY (where the key is on LastName).

Host Guest_Sri_MS

A: It will position to PARDY using the index.

Host Guest_Sameet_MS

Q: boanerges : In a situation where a table has a bit column that seperates 5% or less of the table (ie. open orders vs. a history of closed orders in a table), is that a good time to break the general rule and use an index on a bit?

Host Guest_Sameet_MS

A: It basically depends on how important that index is for your queries. There is a considerable space penalty for building an index on a bit column (there is a key or RID stored for every row). But if the index is important, go ahead and build it.

Host Guest_Lubor_MS

Q: kittu : table have columns like C1 int, C2 char ,C3 int..., having cluster index on column C1 where caluse , using both columns C1 and C2 . which index is better ,index on both c1 and c2 or index only on c2 ( becasue c1 is having alrady cluster index

Host Guest_Lubor_MS

A: Index on c1, c2

Host Guest_Mirek_MS

Q: only_SQL : Can someone please explain the algorithm of Indexes ????

Host Guest_Mirek_MS

A: Index assess allows a quicker access to the underlying data w/o scanning sequentially the whole table. The most common technique to store an index is called B-tree method. Fore more information on SQL Server indexes please use Inside SQL Server 2000 b

Host Guide_KenM

We are going to have to wrap up this chat in about 12 min. We are working on answers to some last questions though. Sorry that we haven't been able to get to everyone-it's been a very busy chat.

Host Guest_Lubor_MS

Q: Fernando_MVP : Q. What is the maximum size of a statblob value you ever came accross? I presume it would be near to (900 + step overhead) * 200 + stat overhead, isn't it?

Host Guest_Lubor_MS

A: That's correct for SQL Server 2000 (approximately)

Host Guest_Sri_MS

Q: Pardy : Sri_MS: please explain - how is SQL getting to PARDY - algorithm maybe?

Host Guest_Sri_MS

A: It is the standard BTree algorithm. You can find this in any database book.

Host Guest_Sameet_MS

Q: didi : What means creating index on a prefix of the columns?

Host Guest_Sameet_MS

A: Let us say you want to group by columns (A, B, C). You could build an index on (A, B, C). But if that index is too large, you can build an index on (A, B) or just A and the optimizer will use that index to do a better group by

Host Guest_Lubor_MS

Q: SQL2Kcoder : If 40 - 50% of the data in a column is "NULL" or "exact same" would you index it ????

Host Guest_Lubor_MS

A: Yes. But it also depends on queries. In general, a good idea if in predicates.

Host Guest_Sameet_MS

Q: Vera : Will the next release relief the 900 bytes key size max?

Host Guest_Sameet_MS

A: We are currently looking at various improvements to indexes in the next release of SQL Server. However, we are not yet in a position yet to discuss what specific features are available in the next release.

Host Guest_cesar_MS

Q: Pardy, let_me_in, etc = How is a predicate analyzed What is the index seek algorithm?

Host Guest_cesar_MS

A: This requires more space than this chat. We will look into providing pointers to documents.

Host Guest_Sri_MS

Q: randy : Any downside using (nolock) on non-critical queries? Do Latches increase with (nolock).

Host Guest_Sri_MS

A: Other than in-correct results and non-deterministic behavior no. We take latches to protect data movement when reading the page in memory. So this will be taken always and should not increase with nolock hint.

Host Guest_Mirek_MS

Q: deneshac : clustered index??

Host Guest_Mirek_MS

A: Allows physically to order a table in a form of an index (b-tree). When in place allows very quick access (using clustered key) to the underlying data

Host Guest_cesar_MS

Q: Lewie : Is it possible to index a union view, if so what is the best practice for doing this.

Host Guest_cesar_MS

A: No, union views are not indexable.

Host Guest_Lubor_MS

Q: deneshac : Can Lubor_MS follow up on the answer to kittu? You say if there is a clustered index on C1 and both C1 and C2 is in the Where clause, to also have an unclustered index C1, C2. But earlier in the chat I saw that the unclustered index would

Host Guest_Lubor_MS

A: Having the C1 explicitely in the index definition makes it part of the key so we may use it to navigate.

Host Guest_Sameet_MS

Q: Vera : When does placing an index on a separate file group make sense? Will it help in parallel the IO? (Assuming the filegroups are on different disks with a decant controller inftrastructure)

Host Guest_Sameet_MS

A: A filegroup gives you the ability to control the placement of data and the amount of I/O for it.
If there is a particular table or index whose I/Os are more performance critical than the others (more important or a lot of I/Os, etc), then you should place them on a separate filegroup with their own disks, etc

Host Guest_Sameet_MS

Q: Vera : Are there some best practices with regards to index maintenance when bulk loading? When do I drop and rebuild, when do I load into the index?

Host Guest_Sameet_MS

A: In BOL under "Optimizing Bulk Copy Performance" and "Bulk Copy Performance Considerations" there is a lot of detail for these.

Host Guide_KenM

We're finishing up our last questions now.

Host Guest_Mirek_MS

Q: JZ : when you create a index on a view, does it mean the view will become a table?

Host Guest_Mirek_MS

A: It looks like a table since we have to materialize it (but it's not a table)

Host Guest_Sri_MS

Q: dbguru : I have a colleague that insists that Primary Key Auto Increment Integer fields should always be the No.1 candidate for clustered indexes. I disagree and tell his that processes dependent on the table should dictate but what's your opinions?

Host Guest_Sri_MS

A: Having a clustered index on an auto increment has the down side that the tail end of the index becomes a hot spot as all inserts will go to it. It has the upside of having the key size small. This has benefits as all non-clustered indexes contain the clustered key columns If they are indexed on some other key columns the inserts now are dispersed. I agree, choosing the key is application dependent. If you just choose it on auto increment integer, you can only use the clustered index for lookups.

Host Guest_Lubor_MS & Fernando_MVP

Q: From CSacristan: is there any guideline to use composite indexes or multiple single-column indexes when queries incolve conditions on several columns?

A: If a query is using multiple predicates then using a multi-column index is better. If there are many queries with different sets of columns used in predicates it may be hard to come up with complete coverage of various permutations. Then use single column indexes, and add multi-column for important or long running queries. Index Tuning Wizard is making these kind of recommendations automatically.

Host Guest_Lubor_MS

Q: rtalmage : How does the query processor make use of _WA statistics?

A: Same as user created statistics. The _WA name is generated and it is intentionally complex to lower probability of a conflict with user statistics name.

Host Guest_Lubor_MS & Fernando_MVP

Q: from lalA: When it is convinient to change the JOIN strategy to get benefits from existing indexes?

A: This is a complex issue automatically handled by the query optimizer based on the estimated cost of the alternatives.

Host Guest_Lubor_MS

Q: gamcall : Would like to know the best vartype for a field in a primary key: varchar or int? & how significant would index/lookup performance difference be?

A: Int is better than varchar. Then shorter varchar is better than longer. The difference depends on the operations performed but most often is not significant if the varchars are not very long.

Host Guest_Lubor_MS

Q: Tom_Moreau : Follow-up: If I have multiple columns, each of which can be used in a range query either individually or in tandem, which is considered the best for clustering - the narrowest one or the most selective?

A: The selectiveness is more important for query performance. However, the situation gets more complex if you are also considering secondary indexes. It may turn to be better to use the narrower column for clustering and use a secondary index with the most selective column as the leading one. Index Tuning Wizard may help here.

Host Guest_Lubor_MS & Fernando_MVP

Q: BillAtHCI : Is it ever helpful to index a column that has only 2, 3 or 4 discrete values? Is there a rule-of thumb?

A: It is not very useful. A good rule-of thumb is the number of rows per page. The number of different values should be several times greater than this number before the index may be efficient.

Host Guest_Lubor_MS & Fernando_MVP

Q: dick : I have an index on firstname, fresh statistics, SS 2000, 1 million

rows: name1 with 4000 matches uses the index, name2 with 7000 matches does tablescan taking 22 sec though with hint to use the index it takes just 1 sec. Is there a way to have the optimizer lean more toward choosing the index?

A: Most probably the index seek is followed by fetches of the base table rows. You will make the optimizer lean toward choosing the index if you add to it the column(s) that are missing there for the query.

Host Guide_KenM

Thanks for joining us today and thanks for the questions. It's time for us to go now. We're sorry we haven't been able to get to everyone. Please come back for our next SQL/PASS chat next month.

Host Guest_Kurt_PASS

Thanks for all the great questions!! Sorry we didn't make it through all of them. We will have some additional answers to questions we didn't get to in the transcripts on the PASS site.

All About SQL Server Indexes

Additional resources