SQL Q&A: Defrags and Disasters

Error logs and temp database files can grow out of hand in short order. Following these configuration tactics can help.

Paul S. Randal

Default for Defrag

Q. I’ve been addressing some database maintenance issues we have on our server, and I’m considering index fragmentation. Instead of spending time figuring out a fillfactor for each index, wouldn’t it be easier to just set the default fillfactor for that instance? Are there any downsides to doing that?

A. I generally wouldn’t recommend setting the instance-wide fillfactor to anything but the default—100 percent. Changing that setting can lead to wasted space in the database.

When you create indexes on a table, usually only one of those will match the table insertion pattern. All other indexes will inevitably become fragmented. As long as the insertion pattern is append-only (no random inserts from having a random key like a globally unique identifier [GUID]), it makes sense to avoid having the largest index fragmented.

The largest index is always the clustered index, as the clustered index is the table. This contains all the data records with all the table columns. It makes sense for the clustered index to be the one that doesn’t get fragmented. Removing fragmentation from the clustered index will be more expensive (in terms of disk space, time and transaction log) than doing so for any other index.

As the clustered index usually doesn’t get fragmented, it therefore makes sense that the fillfactor is set to 100. It will require no extra space. There may be other reasons that the clustered index gets fragmented (such as from updates to variable length columns that make table rows longer), but that’s generally true.

If you set the instance-wide fillfactor to anything except 100 percent, you have to specifically set the fillfactors of all clustered indexes that don’t get fragmented. If you don’t, they’ll use the new instance-wide fillfactor and leave space when they’re rebuilt. This essentially wastes space in the database. You may choose to do this, but it’s generally not considered a best practice.

As there are a variety of indexes in any database, it would be unusual to find a single fillfactor value that’s optimal for all indexes. It’s usually better from a manageability perspective to leave the instance-wide fillfactor alone. Then you can specifically set lower fillfactors on only those indexes that need them.

Reboot Responses

Q. Some servers in our environment are infrequently rebooted. While this is a good thing in many ways, it does mean the SQL Server error log can grow impractically large. It seems to fill up with tens of thousands of backup completion messages for which I have no use. Is there anything I can do to make the error logs smaller and more manageable?

A. There are two things you can do: Cut down on the backup messages and configure error log management. Every time a backup completes, it will write an entry into the error log. This is of little use. Successful backup completion is the norm, not an error.

There’s a documented trace flag—trace flag 3226—that will prevent backup success messages. You should add that to the list of startup trace flags (using the SQL Server Configuration Manager). To enable this without first having to stop and restart SQL Server, you can also turn on that trace flag using the command “DBCC TRACEON (3226, -1).” The -1 means to apply the trace flag globally. The SQL Server team blogged about the trace flag back in 2007.

You could also configure your error log management within the SQL Server Management Studio (SSMS). Open Object Explorer in SSMS and connect to SQL Server. Expand the Management box, right-click on SQL Server Logs and select Configure. When the Configure SQL Server Error Logs dialog appears, check the “Limit the number of error log files before they are recycled” option. Select 99 error logs. This overrides the default number of six kept error logs.

The final configuration step is what will limit the size of each error log. Instruct SQL Server to create a new error log every day (called “cycling” the error log). Do this by creating a daily SQL Server Agent job that simply does “EXEC sp_cycle_errorlog.” After this, your error logs should be manageable.

Tame the Tempdb

Q. Our data volume has grown tremendously over the last two years. Our tempdb always seems to fill whichever drive it’s on. We run some complex queries, so we make a lot of use of temp tables. Can you give any advice on how to reduce our tempdb usage?

A: Tempdb usage is a perennial problem for SQL Server users. There’s only a single tempdb for each SQL Server instance, so you have to be careful how it’s used.

One of the most common uses of tempdb is from temp tables. These tables allow intermediate results from complex aggregations or joins to persist and become part of a larger query. This can often be an efficient way to break up a long and complex query, but it’s not always the case. Sometimes developers get used to using temp tables by default, instead of doing any performance testing to see whether or not using a temp table is more efficient.

The problem with creating an intermediate result set in a temp table is that it can interrupt the efficient data pipeline through a complex query. It forces the query optimizer to process the temp table creation or population separately from subsequent operations that use the temp table. Sometimes the query optimizer produces a more efficient query plan if it doesn’t use a temp table. It can also run more efficiently with a different query construct like a derived table or a Common Table Expression.

If a temp table is the most efficient way to break up your query, there are two things you can do to reduce tempdb usage:

  • Create appropriate indexes: Make sure the only indexes created on the temp table are actually useful for further processing (check this by analyzing and checking which indexes the query plan is using). Nonclustered indexes created on each temp table column are unlikely to be useful. Also make sure indexes are created after populating the temp table, so they have statistics to help the query optimizer use the index.
  • Minimize the temp table size: Make sure the only columns persisted in the temp table are those used for further processing, otherwise they’re a complete waste of space. Temp tables are often created with a SELECT * construct with no thought for which columns are really required. If you’re processing a large result set, this wasted space can really add up.

Bad Backups

Q. Last week our SAN crashed. We suffered some data loss in our production database. The most recent backups were stored on the SAN along with the database files, so those were damaged. We also found our slightly older backups were corrupted as well—sometimes the same corruption. How can we avoid this situation in the future?

A. This is a common situation—no good backups and a single point of failure, which ultimately leads to data loss.

The first problem is that your backups are corrupted. You need to implement a backup testing strategy so you can determine whether the backup is corrupt or contains a corrupt database. There are multiple parts to this:

  • Implement a regular consistency check for the production database. This means running the DBCC CHECKDB command on the production database itself or a copy of the database.
  • Enable page checksums on the production database (if not already enabled). Use the WITH CHECKSUM option on all backups. This will test page checksums as data file pages are read for inclusion in the backup, which helps avoid creating a backup with a corrupt database.
  • Implement a validity check of backups after they’re taken. This involves taking the backup to another SQL Server instance and restoring it (using the WITH CHECKSUM option again), or at the very least performing a RESTORE VERIFYONLY of the backup using WITH CHECKSUM. The goal is to restore the database and run DBCC CHECKDB. This is also a good way to offload the consistency-checking workload from the production server.
  • Implement a regular testing schedule where you practice restoring the production database from available backups.

Another way to easily test the validity of transaction log backups is to create a log shipping secondary. This will constantly restore the transaction log backups from production. It also provides you a redundant copy of the database.

The second problem is that your backups are stored on the same I/O subsystem as the database itself. This means you have no protection against I/O subsystem failure.

You need to keep copies of all backups on a separate I/O subsystem from the production database—ideally, this would be in an entirely separate location. Local copies of backups help with faster disaster recovery, and remote copies ensure that disaster recovery is always possible if local storage is damaged or destroyed.

Another thing to consider is encouraging regular disaster recovery drills. Pretend a disaster has struck and work through your disaster recovery plan to determine its effectiveness. My April 2011 article, “SQL Server: Protect Data at All Costs,” discusses this from a management perspective.

Disaster recovery preparedness is like security. One of the often-repeated guidelines is “defense in-depth.” The more options you have for recovery and the more potential problems you anticipate and proactively avoid, the more likely you’ll be able to recover from a disaster within your downtime and data-loss service-level agreements.

Paul S. Randal

Paul S. Randal is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.