Rebuilding Indexes, Disk Queue Length, and More
Edited by Nancy Michell
Q How are SQL Server indexes rebuilt? I need to know why DBCC DBREINDEX consumes all my disk space and why space is not freed when the command fails. My database size is 90GB and the largest table is 70GB.
When I run DBCC DBREINDEX, leaving 10 percent free space, it consumes all the available disk space and the command fails. While initially the database file is 90GB, when the DBCC command fails, the database file grows to 160GB and that extra 70GB is never freed; I have to manually shrink the database several times to reclaim it.
When the command does work, the size of the database file is also 160GB, but the space is automatically freed a few hours later by autoshrink. Would it be acceptable to do a defrag instead of a DBREINDEX to consume less space? I have no clustered indexes, and I'm using simple recovery mode.
A At the simplest level, indexes are rebuilt by constructing a new copy of the index and then deleting the old one. This means that essentially there are two copies of the index for a short period of time. Constructing the new index could require as much database file space as the original index, and if the rebuild requires a sort operation, then an additional 20 percent of the index size is required for the sort.
So the worst case is that rebuilding an index requires 1.2 times the space of the old index. If the database file does not have enough free space, then the file will have to grow as the operation proceeds. It's possible that if autogrow is not enabled or there is not enough space on the disk volume, then there may not be enough free space available and the rebuild operation will fail.
Whether the operation fails or not, the extra space allocated to the database file is not freed up after the rebuild operation completes. The assumption is that the space will be used for regular database operations.
Running shrink (either manually or automatically) is almost guaranteed to create index fragmentation due to the way the algorithm works. For more details, see SQL Server Storage Engine
. Autoshrink can be particularly detrimental to performance if the database needs free space for regular operations because you can get into an autogrow-autoshrink-autogrow-autoshrink cycle that can wreak havoc on fragmentation and performance.
Using DBCC INDEXDEFRAG (or ALTER INDEX ... REORGANIZE in SQL Server™ 2005) has the advantage that it uses almost no additional database file space, but it can take longer and generate a lot more transaction logging than an index rebuild. DBCC INDEXDEFRAG is always fully logged, regardless of the recovery mode in use, whereas in simple recovery mode an index rebuild will be bulk-logged. There are a variety of pros and cons to each method and they are explained more fully in the SQL Server Index Defragmentation Best Practices whitepaper
Before making the decision on how to fix fragmentation, first decide whether to fix fragmentation at all. Depending on the type of operations the index is used for, fragmentation may have no effect on performance and so fixing it is a waste of resources. The whitepaper has great detail.
Bottom line: make sure you choose the most appropriate method of fragmentation removal for your environment and that removing fragmentation is helping query performance.
Q I have configured database mirroring successfully between two SQL Server 2005 instances. My application is connecting to SQL Server using a SQL Server login and is built using ADO and the SQL native client. My connection string and connection settings specify the correct information, including the appropriate failover partner. I have also created all the same logins on the mirror server as exist on the principal server. Upon testing a database failure, the mirror successfully assumes the principal role and everything looks correct on the SQL Server instance. (I can even connect to the mirror using my Windows® login.) However, the application reconnection fails with the following error:
Cannot open database "<db name>" requested by the login. The login failed.
It appears that the login is not associated with a user in the new principal (originally the mirror) database. I run sp_change_users_login to synchronize the users and logins for the database, and I get a message saying it fixed multiple orphaned users. My application then reconnects successfully to the new principal server. I have tried multiple failovers, and each time I see the same behavior-the association between the login and user gets lost.
Is there a way to configure the mirroring setup so this is not a problem?
A Yes. The problem is due to the fact that the security identifiers (SIDs) for the SQL Server logins on each server do not match, even though the names for the logins are the same. This is not a problem with Windows/domain user/group logins because the SIDs for these logins are created based on the domain SID for the user/group, and hence will be the same for the same given user/group no matter which SQL Server the user/group is added to.
In order to make the sp_change_users_login synchronization step unnecessary, you'll need to create the SQL Server logins on the mirror server not only with the same name, but also with the same SID as on the principal server. You do this by using the SID specification in the CREATE LOGIN statement when creating the logins on the mirror server, as follows:
CREATE LOGIN <loginname> WITH PASSWORD = <password>,
SID = <sid for
same login on principal server>,...
You can retrieve the SID for each login from the principal server by querying the sys.sql_logins catalog view. An example of a query that will generate an actual CREATE LOGIN statement for each SQL Server/Windows login on a given server is shown in Figure 1.
Figure 1 Generate CREATE LOGIN statement
select 'create login [' + p.name + '] ' +
case when p.type in('U','G') then 'from windows ' else '' end +
'with ' +
case when p.type = 'S' then 'password = ' + master.sys.fn_varbintohexstr(l.password_hash) +
' hashed, ' + 'sid = ' + master.sys.fn_varbintohexstr(l.sid) + ', check_expiration = ' +
case when l.is_policy_checked > 0 then 'ON, ' else 'OFF, ' end + 'check_policy = ' +
case when l.is_expiration_checked > 0 then 'ON, ' else 'OFF, ' end +
case when l.credential_id > 0 then 'credential = ' + c.name + ', ' else '' end
else '' end +
'default_database = ' + p.default_database_name +
case when len(p.default_language_name) > 0 then ', default_language = ' + p.default_language_name else '' end
from sys.server_principals p
left join sys.sql_logins l
on p.principal_id = l.principal_id
left join sys.credentials c
on l.credential_id = c.credential_id
where p.type in('S','U','G')
and p.name <> 'sa'
Q What should my average disk queue length be? For example, if I have 20 separate physical spindles off of Storage Area Network (SAN) with RAID 01 configuration, how do I calculate average disk queue length? Is it Avg. Disk Queue Length/20 or Avg. Disk Queue Length/2?
A First of all, you should look at the disk latency before wasting any time on average disk queue length in a SAN environment. But it really does depend on what you are trying to figure out. You'll see why in a moment.
The explanation of this counter (from Perfmon) is "Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval." This is either a physical disk or logical disk counter, so the number that you get depends on how the underlying storage is presented to the operating system.
Let's look at your case. You have 20 spindles in a RAID 01 configuration, which means they are striped and mirrored (or mirrored and striped depending on how you read 01 or 10). The key point about your storage array is that there are 10 spindles in the stripe set.
But I'm missing essential information, such as what the stripe size is, how big your writes are, and what kind of I/O you're issuing (read, write, sequential, or random).
Ignoring the missing information for now, if the Avg. Disk Queue Length says 10, then the OS queued 10 I/Os to the disk array. Theoretically, that could be one I/O to each of the 10 mirrored sets in the stripe, or it could have been 10 I/Os all to the one disk. You have no way of knowing which it was.
This is where the missing information comes in. Let's assume your stripe size is 64KB, your write size is 8KB, and you're going to do a whole chunk of sequential writes. This is a typical scenario for SQL Server storage activity. In this case, there is a good chance that eight of the I/Os all went to the first disk, and the next two I/Os went to the next disk. So, if you're trying to work out the disk queue length per disk for this scenario, it is 8 for the first disk, 2 for the second disk, and 0 for the remaining eight disks in the array.
Now let's change the scenario to a theoretical stripe size of 8KB and a write block size of 64KB-and keep the disk queue length at 10. In this case, each 64KB block gets spread over 8 disks, so one I/O is written to 8 disks and the 10 queued I/Os are spread across 80 disk writes over all 10 disks in the array. If you're trying to calculate the disk queue length per disk in the array, it's going to be 8 for each disk in the array.
Let's be realistic and add another level of uncertainty into the picture. In most cases, your SAN storage is going to be connected to the server using one or more HBAs in the server, a number of fibers to connect the HBA to the SAN, a number of ports on the front end of the SAN, and perhaps some kind of fiber switch in the fabric between the server and the SAN. Then we get into the internal architecture of the buses inside the SAN, and how the disks are connected to the ports on the front of the SAN.
Any queuing you are seeing reported in Perfmon can be a symptom of high latency or queuing at any of these points between where the OS measures its disk queue length and the surface of the disks. This is why you should look at the latency and base any decisions on that counter rather than the average disk queue length.
Q I'm using transactional replication, and I know that many rows have been manually changed in a table at the subscriber, so I am getting errors when the publisher is trying to update a row that no longer exists in the subscriber.
I need to know if there is any way through replication to just re-initialize this one table from the publisher instead of reapplying a whole snapshot again. I looked into the tablediff function, which looks like it will do what I want, but I wonder how it interacts with replication.
For example, does tablediff take a snapshot-in-time of the publisher table and compare it with the subscription equivalent? Do I need to stop replication to use the tablediff utility to ensure the consistency of the data? What else do I need to know?
A First of all, tablediff does not take a literal snapshot of either the publishing or subscribing table. Regarding your particular scenario, there are a couple of options you could look at.
The first one would be to temporarily stop replication and run the utility. If you're worried about users attempting to make modifications to the data, you can use the -sourcelocked and -destinationlocked parameters, which will take out an exclusive lock on both tables while the utility is running. If this is unacceptable, another option would be to look at the -rc and -ri parameters while leaving replication running. This will run tablediff once and then run it consecutively on the errors it detects, eliminating errors which could be due to replication propagation delays. Be warned, however, that with this option, depending on your replication delay you may not necessarily catch all of the rows that have changed on the subscriber.
Thanks to the following Microsoft IT pros for their technical expertise: Sunil Agarwal, Chad Boyd, David Browne, Gilles Comeau, Emmanuel Dreux, Amanda Foote, Matt Hollingsworth, Paul Mestemaker, Uttam Parui, Paul Randal, Dennis Tighe, and Steven Wort.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited