SQL Server Query Optimization and Tuning
Published: Published September 9, 2003
Please note:Portions of this transcript have been edited for clarity
Hosts:
- Lubor Kollar, Lead Program Manager
- Conor Cunningham, Technical Lead
- Milind Joshi, Technical Lead
- Campbell Fraser, Software Design Engineer
- Stephen Wynkoop, http://www.sswug.org
- Stephen Dybing, MVP Lead
Moderator: Stephen_D (Microsoft)
Welcome to today's Query Optimization and Tuning chat. I will ask the hosts to introduce themselves.
Host: Swynk (MVP)
Hi everyone, I'm Stephen Wynkoop, editor of the SSWUG.ORG newsletter and MS SQL Server MVP.
Host: Lubor (Microsoft)
I'm Lead Program Manager in the SQL Server Engine in charge of Query Processor.
Host: Conor (Microsoft)
Conor Cunningham, SQL Server Query Processor - Query Optimization Technical Lead, I'm a developer on the Query Optimizer for SQL Server. I work on a number of different areas, including but not limited to distributed queries. I like slow walks on the beach and football games. :) Mostly, I make queries go fast...
Host: Milind (Microsoft)
I work on the query optimizer.
Host: Campbell (Microsoft)
Hi, I'm Campbell Fraser, a developer in SQL Server's Query Optimizer. My main focus is on Costing/Cardinality Estimation and Statistics.
Moderator: Stephen_D (Microsoft)
And... I am Stephen Dybing, SQL Server MVP Lead. Glad you all could make it today!
Host: Lubor (Microsoft)
Q: When we run statistics why it only create 200 buckets..can we control the number and size of buckets?
A: This is optimal number considering the size of the columns, and size of the histogram. There is always a tradeoff between system complexity and num
Host: Campbell (Microsoft)
Q: Can I print out the execution plan and if yes, how ?..
A: Before executing the query run "set showplan_text on". After running the query, run "set showplan_text off".
Host: Swynk (MVP)
Q: Frank : ...structure, is it a bad idea to have a self - audit table.. that is the table accepts inserts but no deletes, so to find the current value of anything you'd have to run a max of the time on that id?
A: Frank: I would say it's a design issue - it will depend on the number of rows, how often you'll be doing this, etc. That said, I'd suggest you look at triggers and pull pre-change data and insert it into a secondary table. This would keep the primary table with only current data, and your historical values could then be pulled from the secondary table.
Host: Campbell (Microsoft)
Q: Can I print out the graphical execution plan of the query?..
A: Yes, in the query analyzer, click the pane with the graphical view and then select print.
Host: Lubor (Microsoft)
Q: what do update statistics do that may hinder performance on an OLTP system?
A: If you are accessing a large table and auto-update statistics kicks in it will cause the query that caused the stats to be updated wait.
Host: Milind (Microsoft)
Q: HARV: In our production system due to histogram information optimizer don't use index when should have
A: I am not sure I completely understand your question. Why do you believe the histogram is the culprit? Is your histogram up to date? Thanks.
Host: Campbell (Microsoft)
Q: Why the numbers (cost, CPU) in the execution plan differ from sql 7 to sql 2000?
A: There are many reasons that costs have changed. SQL 2000 includes many bug fixes in the areas of costing / cardinality estimation / statistics. All of these can affect query cost. There are also extensions to the cosint model (such as parallel query costing) that were not present in 7.0
Moderator: Stephen_D (Microsoft)
Q: My company is considering upgrading from SQL Server 7.0 to SQL Server 2000. Specifically, has anyone run into issues regarding security and user permissions when they are migrated over during the upgrade?
A: You might get lucky and get responses from someone else attending this chat, but the folks we have on this end are not experts in that area. This would probably be a good question to ask in the microsoft.public.sqlserver.security newsgroup, available via http://www.microsoft.com/technet/community/newsgroups/server/sql.mspx
Host: Lubor (Microsoft)
Q: @-vars in WHERE Clause: In SQL2K if I write ...WHERE (@Flag=1 OR col=1), run time is fine. If I reverse clauses (col=1 OR @Flag=1), query appears to hang. Why is this?
A: You should compare the two plans first if they are the same. Let us know.
Host: Conor (Microsoft)
Q: HARV: When we run queries using hyperthreading servers...why we don't see any performance improvement but only see degradation sometimes....is it due to hyperthreading or parallelism?
A: Hyperthreading is an interesting new technology that has the potential to yield good performance gains on a number of applications. However, it is important to understand that hyperthreading does NOT guarantee the same performance as an existing machine with two processors. The purpose of hyperthreading is to allow the CPU to remain idle less - however, it doesn't guarantee the throughput of the system (disk IO, memory bandwidth, etc). However, that may or may not be your performance issue on your query.
Host: Campbell (Microsoft)
Q: Campbell_MS.. But I may not be able to print the tooltips too right ?..
A: Correct. The tooltip information can be printed in a text view. Before running the query, run "set showplan_all on". Use "set showplan_all off" to disable.
Host: Lubor (Microsoft)
Q: Does a clustered index on an identity column create contiguous data pages?
A: It will populate pages one after another. But this is true for any clustered index.
Host: Lubor (Microsoft)
Q: What about auto create statistics? Is it advisable to turn them on or off on an OLTP system?
A: You should leave them on.
Host: Conor (Microsoft)
Q: BusmasterJones: Running a report (select only) on very large (70 mill+ row) table using read committed is causing locking issues that make users doing inserts and updates timeout. Is there a work around using isolation levels?
A: This is not really a query optimization question, but I'll do my best... Locking issues on long-running queries can be difficult. You can run with the nolock hint or you can use replication to have a reporting server that is different from your insert update server. This separates the load.
Host: Campbell (Microsoft)
Q: SQL: What’s better..showplan_text or showplan_all ?
A: showplan_text just shows the plan. showplan_all shows much more detail like estimated cost (IO / CPU) etc.
Host: Lubor (Microsoft)
Q: So GertD, why did Lubor_MS say you can print it ?
A: It was not Lubor... I print it with the Windows "print screen" (or print window). But Campbell claims you can do it even from window.
Host: Conor (Microsoft)
Q: OK, but if in sql 7 I have something like 0,045 in subtree cost, in 2000, this number shows 0 or 1, the percentajes jump from 0 to 600%, there no values like 15% , 60%..... it's like the statistics where wrong.....i updated all the statistics...
A: Please clarify your question. Are you asking a question about the difference in the costing framework between SQL 7.0 and SQL2000?
Host: Lubor (Microsoft)
Q: Big picture question on tuning. After carefully structuring your tables, selecting indexes and putting non-clustered indexes on separate disk drives from data tables, what are the other procedures for improving performance? With cost based optimizers t
A: Use Index Tuning Wizard. Also it is not always good to separate the indexes from tables. The more drives you use for each table/index, the better. If you want to see how to build huge systems, go to www.tpc.org and you will find there SQL Server databases with terabytes of disks and data (in full-disclosures for TPCH for example).
Host: Milind (Microsoft)
Q: HARV: We are finding lot of deadlock issues on production system...is there specific readon that row lock is always upgraded to table lock and not page level first
A: This is a locking question and we don't have anyone with expertise on this topic hosting right now. Thanks
Host: Conor (Microsoft)
Q: WaldenL: UDF in WHERE clause, shortcut evaluation and performance: I have a UDF that tells me if a user is authorized to a row. The answer to that question involves IO so I don't want to test, unless I know I've selected the row. If I have a where clause like "select ID, fld 3 where fld1=8 and fld2=3 and dbo.udf_IsAuthorized(ID, 1)= 'Y'" will SQL eval my UDF for each row, or only for those rows that pass the other tests?
A: There is no guarantee of the order of execution of scalar predicates in SQL 2000
Host: Lubor (Microsoft)
Q: Our developers make an identity column the primary key and then add indexes to use in accessing the data. Is this a good technique as far as performance?
A: It depends on the queries. If those go after singleton rows, this is good design.
Host: Conor (Microsoft)
Q: Vinay: We provide a shrinkwrap solution to our customers.. would you suggest, i keep autoupdatestats on or off ??..
A: Autostats should be on in all cases unless you have a very, very good reason to turn it off.
Host: Campbell (Microsoft)
Q: SQL: Could it be possible that certain auto create stats could lead to an inefficient query plan?
A: Unfortunately yes. Anything that could change plans could lead to a worse plan for a query. There are complex data situations that are not effectively modeled and any plan affecting change could expose us in such a way as to cause a bad plan. Sometimes we pick good plans despite bad situation, by luck.
Host: Lubor (Microsoft)
Q: What determine how many steps SQL2000 concluded in statistics? Have seen 20million+ rows table, but about 60 steps in statistics
A: If there is limited number of different values or the ranges are "very similar" SQL Server shrinks the number of steps.
Host: Lubor (Microsoft)
Q: What are some best practice methods when creating indexes on views?
A: Best is to use Index Tuning Wizard.
Host: Conor (Microsoft)
Q: HARV: we have to use FORCE ORDER hint lot of times since optimizer is not using correct plan in most cases. How optimizer decide whether to use index or not ...how selective should value of index column
A: If you are seeing queries that have incorrect join orders, please make sure that the statistics are all up-to-date on all the columns participating in the query. You can try creating statistics manually with fullscan to see if that helps or not. If you have further issues with the query plan, please send a repro query/database to Product Support.
Host: Lubor (Microsoft)
Q: Could somebody explain when lock escalation really happens? I experiences it at random times and can not explain, only that it changed between service packs in SQL 2000, we use Enterprise Edition if that matters
A: There are several factors. Available memory is one of them. So that may change even from run to run of the same query.
Host: Milind (Microsoft)
Q: Is there a better way (fast) to do paging with SQL Queries??
A: Currently there isn't. We are considering alternatives for the next version.
Host: Conor (Microsoft)
Q: SydWayne: Showplan GUI: are there specific session settings required (e.g ANSI_NULLS ON, ANSI_PADDING ON) to produce valid output. I do OK with ANSI_NULLS OFF but not with ANSI_NULLS ON. Is there a documentation source for this?
A: We don't know of any particular reason why this wouldn't work for any value on each of these settings. If you have a problem where one of these settings doesn't work, please contact PSS
Host: Conor (Microsoft)
Q: Vinay: Conor_MS: Would you advise the same for the other auto maintenance functions.. like auto shrink db..
A: Yes, we recommend they all be left on by default unless you have a very specific reason to turn them off, even in an OLTP environment.
Moderator: Stephen_D (Microsoft)
Q: How can you find the no. of levels from the root of the index to the leaf?
A: Take a look at DBCC SHOWCONTIG in Books Online. I believe it will provide what you're looking for.
Host: Milind (Microsoft)
Q: SQL: If we have identical h/w and I restored a backup from one server and restored to another, are there possibilities where execution plans could be different
A: I can't think of any as long as you have the same version of the server. It will of course change once you start doing updates to the database.
Host: Conor (Microsoft)
Q: pmackay: my english...sorry...... when I see the tooltip, not the execution plan, the tooltips shows other statistics....the cost, percentage etc, are very different in 7.0 or 2000.
A: There isn't a guarantee that the costs are identical between 7.0 and SQL2000. It's possible that we've found that disks have gotten faster compared to memory, etc. We tune the costing for a broad range of queries to get reasonable performance for the largest set of queries.
Host: Lubor (Microsoft)
Q: How do I do FAST bcp in SQL 2k -non-logged?
A: Use TABLOCK and minimally logged.
Host: Lubor (Microsoft)
Q: how to capture showplan text in profiler ? Placing the event in there does not seem to show me the showplan in textdata...what else do i need?
A: That should do it. Try to run dbcc freeproccache before capturing the event.
Host: Conor (Microsoft)
Q: Martin: What are the limitations of the Tuning Wizard? I have had many cases where the wizard makes no suggestions for large complicated queries.
A: The tuning wizard tries to create a series of indexes and see if the queries can be optimized to have a lower cost. However, complex queries can make it difficult to find the right indexes to create. It might also be that what you are asking is just hard to compute quickly.
Host: Campbell (Microsoft)
Q: About execution plan: When I run a query with "Show execution plan" turned on, would the execution plan show the actual process in which the data engine got me the resultset?..
A: No, the query analyzer makes 2 calls to the server; one to run and one to get the plan. However, the plan will be the same as it will be taken from the plan cache after the first call.
Host: Milind (Microsoft)
Q: Vinay: DBCC DROPCLEANBUFFERS - Does that clear SQL Server's cache completely ?.. or for the session in which it is run ?..
A: yes it cleans it completely.
Host: Lubor (Microsoft)
Q: autostats use lot of cpu cycle..if this statement is true..how it is beneficial to leave autostats open when production systems are already short of cpu time...isn't it better to do at odd times manually
A: If you are low on CPU at certain periods of time you are correct it would be better to update them manually when the system is less busy.
Host: Conor (Microsoft)
Q: Sami: Does the order of tables in the from clause or the order of columns in the where clause have any impact on the performance?
A: In general, it should not matter - the optimizer will reorder the tables based on cardinality and estimated selectivity of predicates. Then it will consider a number of reorderings during optimization (but not all).
Host: Conor (Microsoft)
Q: Vinay: We have a lot of UDF's that we're using.. which I think is amounting to row based processing on a lot of our tables.. i'm looking for some best practices on usage of UDF's.. help..
A: UDFs are opaque and hard for the query optimizer to do anything with. In general, if these are expensive, this can cause some queries to pick bad plans (as the optimizer doesn't reason about them). If you are using table-valued UDFs, please try to make sure that they are in-lined. This typically helps improve performance significantly.
Host: Campbell (Microsoft)
Q: Under what condition SQL2k update the statistics? How does it know whether it is outdated or not? Had seen the statistics not being updated for over 10 days while having lots of read/write activities.
A: Auto stats must be enabled for the particular database. See sp_dboption '<db>'. Auto-update stats is based on the percentage of the table that is updated and whether or not the relevant column is updated.
Host: Lubor (Microsoft)
Q: Lubor_MS: You said, " it is not always good to separate the indexes from tables. The more drives you use for each table/index, the better." These two statements appear contradictory to me; please clarify when it is not good to separate indexes from tables
A: If I have 10 disks it is usually better to use all 10 for the whole table and for each index. That way I will minimize probability to hit the same disk twice.
Host: Milind (Microsoft)
Q: Vinay: Milind_MS: So, can I say that running a DBCC DROPCLEANBUFFERS is as good as restarting SQL Server?
A: That depends on why you wanted to drop the buffers.
Host: Lubor (Microsoft)
Q: Lubor we have 16 GB of memory with AWE enabled and we see no memory contention, it looks like when I hit X number of row locks the query escalates to a table lock. I heard different numbers, Kalen Daleny said something like 4500 in SP3 is that correct
A: The additional AWE memory cannot be used for locks, only for pages.
Host: Conor (Microsoft)
Q: franky: There was a part in the execution plan called "bookmark lookup" which SQL thought it took a lot of time. Table scan was a better choice while in actual run, forcing it to index seek works much better. 10 mins vs 10 seconds
A: If you have specific repro queries/databases, please send them to PSS - In general, we would pick a bookmark lookup plan if we thought the cardinality would be reasonably low. Another common problem here is that the collation of the index does not match what is needed in the query (and therefore we don't consider the index).
Host: Campbell (Microsoft)
Q: Vinay: Campbell_MS: Following up on the Show execution plan - does that mean that the Execution plan I see is just an estimated?
A: No, that will be the actual execution plan used.
Host: Lubor (Microsoft)
Q: is there any chance that we can see better partitioning methods in future versions of sql server
A: Yes.
Host: Conor (Microsoft)
Q: HARV: in our case 90% time indexed views are very helpful but in 10% cases optimizer is not using good plan when indexed views are involved....is there any good practice suggestion when using indexed views except running update statistics
A: Use them 90% of the time ;).. Seriously, you should use indexed views in cases where you know there is an expensive subtree or where you find that the optimizer didn't quite pick the right plan in a complex scenario. You can use the NOEXPAND hint in the 10% of the cases where it isn't being picked and you want it to be.
Host: Conor (Microsoft)
Q: Vinay: Conor_MS: Following up on the UDF's question - I've added primary key's to the table returned and this has helped the dataengine to do clustered index scans which have helped significantly..
A: It can, but generally, because it would pick a different plan that isn't seeing the expensive UDF run on as many rows.
Host: Lubor (Microsoft)
Q: Lubor, we are not even filling the 3GB during normal operation and I am still escalating, why is the question?
A: SQL Server does not wait for hitting the limit. It has to think about subsequent possible queries; therefore the use of memory of single query is limited.
Host: Campbell (Microsoft)
Q: HARV: we see lot of times that cost shown by execution plan for particular step is not correct then we have to make optimal decision on our knowledge of application. is there any known issues with cost parameter in execution plan
A: There are known deficiencies in the model. If your scenario is about parameters to an SP, then there is the following issue - The SP might be optimized with one parameter and produce a plan not suitable for another parameter. For variables, we optimize without knowing the value so can easily get a bad plan as a consequence.
Host: Conor (Microsoft)
Q: WaldenL: How can I get an execution plan for a UDF? I have a UDF we use for authority checks. If I run the UDF by itself I see "constant scan" as the type. If I run it in a bigger query I see "filter". Neither of these tell me about the IO _inside_ the UDF
A: You can use "set statistics profile on" to get a rough idea how much time is spent in each operator - there is no mechanism to describe the runtime cost of each individual scalar UDF.
Host: Milind (Microsoft)
Q: Cowboy : Exactly what are the thresh-holds that optimizer uses to move a query from phase 2 to phase 3 optimization and is it possible to determine at what phase the optimizer and is it possible to look at an already stored execution plan.
A: The cost of the plan at the end of phase 2 determines if we will go into phase 3. There is no easy way of determining what phase the optimization went to. Try dbcc proccache.
Host: Milind (Microsoft)
Q: Vinay: Milind_MS: Following up on the DBCC question- I'm trying to make sure I can test different scenario's of queries without being bothered about cache..
A: I think you will also have to use dbcc freeproccache along with dbcc dropcleanbuffers.
Host: Conor (Microsoft)
Q: HARV: when we do select count from big_table having about 40M rows..we see 265 scans when running set statistics io on how is it possible to have 256 scan on 2 cpu machine
A: We believe the reason for this is that the storage engine is prefetching for you. Note that the scans can be running in different threads or fibers - while some are waiting on I/O, the others can run.
Host: Lubor (Microsoft)
Q: Do logical reads in profiler or set stats io on reflect even pages being read from disk to buffer
A: No. Logically I can read from the same page many times.
Host: Campbell (Microsoft)
Q: SQL: how can i find out that all the data pages say needed for a particular query are always in buffer and do not get flushed out
A: Not in advance. If you run your query with "set statistics io on" then you will learn if that was the case. But you must run to find out
Host: Lubor (Microsoft)
Q: Are there any generally known SQL query examples that could benefit from 64-bit processing?
A: Any memory intensive queries will do (hash join, hash agg, sort)
Host: Lubor (Microsoft)
Q: We have a database where 90% tables are updated once a day. Can I put these tables in a read only file group, make it updateable while updating and then make it read only? Is this a good practice?
A: Not really.
Host: Conor (Microsoft)
Q: WaldenL: Conor_MS: re: UDF Plans: So tuning a UDF becomes difficult. It's hard to tell what the UDF is doing, except I guess, by looking at the plan for each statement in the UDF. Would the plan for a UDF in a where clause be the same as the plan for the individual statements that make up that UDF (assuming the same dataset)?
A: Yes, it is difficult. You should limit your logic to cases where you are sure it will be beneficial to do this work inside the query plan. If you can write your UDF into the original query (say, using sub-selects or sub-queries), this typically has a much better chance of being optimized to an efficient execution plan. Remember that the relational algebra has a particular semantic that allows lots of reordering to satisfy the same SQL - UDFs are, in general, procedural, and can not be reordered as freely. So please use them with care.
Host: Conor (Microsoft)
Q: curwiler: Conor_MS: I was wondering about the possible caching of UDFs myself. So you're saying there really isn't any caching for UDFs?
A: No, we don't cache the results of UDFs - Not all UDFs are deterministic and without side effects. If I have a UDF that writes to a text file, then caching the result and re-using it for multiple rows will not have the same side effects.
Host: Conor (Microsoft)
Q: HARV: I read in one of docs that it is better to specify owner name with object name to avoid recompilation..i try to reproduce this sceanrio but fails..sql server is not compling again even if i don't specify owner name...is the statement true and how to repro
A: Sorry, we (collectively) don't know - it's not really a question in our area of expertise. Sorry.
Host: Campbell (Microsoft)
Q: HARV: when sql server update statistics with sampling ratio..does it do it at page leve(take all rows of that page) or row level
A: Yes - the page level.
Host: Swynk (MVP)
Q: HARV: If we call proc2 from proc1, how can we see in profiler what parameters are passed to 2nd procedure
A: I'd have to check, but I thought if you just expanded the tree in profiler you would see the passed-in values.
Host: Milind (Microsoft)
Q: HARV: when we do full scan of big tables...does it populate all the memory data buffers with this table pages...or it uses some mechanism to not remove pages of small table
A: It doesn't do anything special. It uses the standard algorithm for buffering pages.
Host: Campbell (Microsoft)
Q: Cowboy: Is it possible to find out why the optimizer made the decisions that it made with regards to any given execution plan (SQL 2000)
A: No.
Host: Conor (Microsoft)
Q: Jeroen: Is there any use in changing the sp_configure "cost threshold for parallelism" setting to achieve better use of parallelism?
A: Feel free to experiment to see if it helps your specific scenario, but we don't see any particular need to change this for general workloads.
Host: Lubor (Microsoft)
Q: Again on lock escalation: http://support.microsoft.com/default.aspx?scid=kb;EN-US;323630, besides the trace flag can I prevent escalation via TSQL HINTS?
A: No. Only you can disallow e.g. pagelocks .that will delay escalation to table lock. Look at sp_indexoption.
Host: Lubor (Microsoft)
Q: Lubor_MS: You said with 10 disks to use all 10 for the whole table and for each index. Do you mean to stripe across the 10 disks and put data and indexes there?
A: Exactly. You can have them in different filegroups both on all disks.
Host: Conor (Microsoft)
Q: Sami: What performance impact will using a view (with joins on 4 tables) with additional where clause have as opposed to using the tables directly?
A: Assuming you mean a non-indexed view, it doesn't matter at all. We actually expand the view and look at all the tables as if you had written it using the original tables
Host: Campbell (Microsoft)
Q: SQL: Is there a way through TSQL to find out if hyperthreading is enabled on the server ?
A: No. The OS will be able to tell you.
Host: Lubor (Microsoft)
Q: for clustered index do we expect rows in sequential order on same page or they can be on different pages (when fragmentation)
A: No in general. Just after the load yes, but then they can be shuffled.
Host: Conor (Microsoft)
Q: ding: We have a simple UDF(RETURN '1753-01-31 00:00:00.000') which is used by many other queries and stored procedures. It turns out that this simple UDF is the most expensive in SQL CPU and IO. Should SQLServer cache it in memory?
A: No - you should try to write the query without using the UDF - especially in a scenario like this.
Host: Lubor (Microsoft)
Q: Lubor_MS: (Sorry about my striping question.)
A: No problem. Thanks for chatting!
Moderator: Stephen_D (Microsoft)
Thanks for joining us today! You've asked some great questions, but unfortunately, it's time to go.
Host: Campbell (Microsoft)
Thanks for coming along, hope it was helpful. Bye.
Host: Swynk (MVP)
Great questions everyone!
Host: Swynk (MVP)
Thanks everyone for stopping by today - I hope you've had a good time and had a chance to post questions, or even lurk and read others. We'll post transcripts to www.SSWUG.org shortly.
Host: Conor (Microsoft)
Thanks for letting me get questions from you guys - always a pleasure, and we get to learn a bunch every time we do this
Host: Milind (Microsoft)
Thanks, great questions.
For further information on this topic or about SQL Server, please visit the following:
Newsgroups: SQL Server Newsgroups
SQL Transcripts: Read the archive of past SQL chats.
Website: Visit the Microsoft SQL Website