SQL Q&A: The Lore of Logs
Transaction logs are an essential component of any instance of SQL Server. Properly and efficiently managing them is equally essential.
Q. I’ve been successfully using the online index rebuild since it was introduced in SQL Server 2005. We upgraded to SQL Server 2008 recently and now the transaction log is growing more than ever before. Can you explain what’s happening?
A. Index rebuild operations prior to SQL Server 2005 always acquired blocking locks for the duration of the rebuild operation. Rebuilding a clustered index meant an exclusive table lock (no concurrent readers or writers). Rebuilding non-clustered indexes meant a shared table lock (no concurrent writers to the table).
With the advent of online index operations in the Enterprise Edition of SQL Server 2005, locking was changed for online operations. Blocking locks were only held for short periods at the start and end of the operation (see this blog post for an explanation).
In SQL Server 2005, all index rebuild operations use minimal logging. This means only page allocations are logged, instead of all row inserts into the new index. This drastically reduces the amount of transaction log records generated by the index rebuild operation. It also means the transaction log file itself doesn’t have to be as large (because it doesn’t have to accommodate a fully logged index-rebuild operation). This behavior only occurs when using the bulk-logged or simple recovery models. All operations are fully logged in the full recovery model—hence the name. The exception was online index operations, which used minimal logging even in the full recovery model.
From SQL Server 2008 onward, online index operations were changed to be fully logged in the full recovery model. This ensures database restore operation that encompasses an online index operation in its transaction log won’t encounter any problems. (For more on this, see the Microsoft KB article 2407439.)
If limiting transaction log growth is more important than permitting concurrent table access during the index rebuild, consider using the bulk-logged recovery model, as the KB article explains. Another thing to consider is using ALTER INDEX … REORGANIZE to remove fragmentation. This always operates without causing blocking and can reduce the amount of transaction log. (There’s a comparison of rebuilding and reorganizing in one of the following answers.)
Q. We’ve been trying to design a backup strategy that coincides with our nightly data import process so our full backup contains all the data from the import process. We haven’t been able to work out a reliable way to do this—sometimes the data is all there and sometimes none of it’s there. Can you help?
A. You state that, “sometimes the data is all there, and sometimes none of it’s there.” That makes me think you’re doing your import process as one large transaction.
If this is what you’re doing, you might consider splitting it into smaller transactions. This will give SQL more potential for clearing the transaction log between import transactions and less long-term blocking. A long-running, single-transaction process can cause lock escalation to an exclusive table lock. You may not be able to change the import process, but you should do so if you can.
A full backup has two phases: reading the data and reading the transaction log. When it has fully read the data, it will then read the transaction log from that point back as far as is necessary (see my July 2009 article, “Understanding SQL Server Backups,” for more details).
When you restore a full backup, the point to which the database is restored is the time at which the data-reading portion of the backup is completed. The transaction log included in the backup essentially runs crash recovery on the restored database. This makes it transactionally consistent.
You’ll have to roll back any transactions in-flight at that time, just like a regular crash recovery. If the import process was still running when the data-reading portion completed, it would be rolled back during the restore operation.
The only way to guarantee a particular transaction is contained within a full backup is to serialize the operations so the transaction in question completes before the backup begins. Otherwise, there’s no way to know whether the transaction completes before the data-reading is complete.
There’s an alternative to serializing operations, of course. Perform a transaction log backup after the import process completes. Then restore the full backup plus the transaction log backup. This will contain the entire import process transaction.
Q. I’m designing an index-maintenance strategy. I don’t understand why there are two methods to remove index fragmentation. How can I determine whether to rebuild our indexes or reorganize them? I can’t find a list of the trade-offs between the two methods.
A. There have been two methods for removing fragmentation since I wrote DBCC INDEXDEFRAG for SQL Server 2000. The two methods are necessary because they have quite different characteristics. Unfortunately, there’s no white paper that adequately describes the differences between the three methods in SQL Server 2005 onward: ALTER INDEX … REORGANIZE (the new DBCC INDEXDEFRAG), ALTER INDEX … REBUILD (the new DBCC DBREINDEX) and the online version of ALTER INDEX … REBUILD.
Here’s a quick comparison of the ALTER INDEX options REBUILD and REORGANIZE:
- REBUILD requires building the new index before dropping the old one. This means there has to be enough free space in the database to accommodate the new index; otherwise the database will grow to provide the required free space. This can be problematic for large indexes. REORGANIZE only requires 8KB of additional space in the database.
- REBUILD can use multiple CPUs so the operation runs faster. REORGANIZE is always single-threaded.
- REBUILD may require long-term locks on the table that can limit concurrent operations. REORGANIZE doesn’t hold blocking locks (it’s always an online operation).
- REBUILD can use minimal-logging to reduce transaction log growth. REORGANIZE is always fully logged, but doesn’t prevent transaction log clearing.
- REBUILD will automatically rebuild all index-column statistics, whereas REORGANIZE won’t update statistics at all.
So, there are several trade-offs in terms of disk space requirements, locking, logging and parallelism. The most important thing to consider is the algorithmic differences between the two operations.
An index rebuild will always rebuild the entire index, regardless of the extent of fragmentation. This means an index rebuild for a lightly fragmented index is really overkill. An index reorganize will only take care of the existing fragmentation. This makes it a better choice for removing fragmentation from a lightly fragmented index, but a poor choice for removing fragmentation from a heavily fragmented index. In that case it would be better to just rebuild the index.
This leads us to the various thresholds for choosing between rebuild and reorganize. There are widely used fragmentation thresholds based on the avg_fragmentation_in_percent column in the output from sys.dm_db_index_physical_stats:
- 0 percent to 10 percent: do nothing
- 10 percent to 30 percent: use ALTER INDEX … REORGANIZE
- 30 percent and higher: use ALTER INDEX … REBUILD
These are general guidelines and you may find that different values work better in your environment. You also need to consider whether you’re using database mirroring, which requires the full recovery model. This means index rebuild operations will be fully logged. Many people find this produces too much transaction log to efficiently send between the mirroring principal and mirror. In such cases, it can be better to reorganize indexes to minimize the transaction log.
You might also consider using someone else’s index-maintenance scripts to save time. Ola Hallengren has some comprehensive and widely used scripts.
Q. I’ve just become a DBA. I’m having trouble setting up some new databases for our developers. The issue I have is figuring out how large the transaction log should be. Whatever size I choose, it grows larger and stays that way. If I shrink it, it still grows to the same size again. Is there a way to set the size properly when creating the database?
A. First of all, don’t regularly shrink the transaction log. When the transaction log needs to grow, the new space allocated to the transaction log has to be zero-initialized. This means transactions waiting to generate log records may have to wait while this initialization takes place.
If you’re in a shrink-grow, shrink-grow cycle with the transaction log, you’re causing an unnecessary performance hit on SQL Server. It’s better to leave the transaction log at its needed size. Shrink should be a rare operation on data or transaction logs.
You can estimate transaction log size based on database operations. Read more about that in my January 2011 column. If the transaction log grows to a steady size after each shrink, that’s likely the size it should be. Leave it at that size, unless it’s truly too large. If it’s too large, determine what’s causing the growth and whether you can split that operation into multiple parts so the transaction log has a chance to clear.
Paul S. Randalis the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.