Skip to main content

SQL Q&A: Undo the damage

Take a look inside query plan selection, excessively slow backups and the process of repairing a replication database.

Paul S. Randal

Plan your query

Q. When SQL Server is choosing a query plan to execute my queries, does it take into account the data that’s currently in memory?

A. The simple answer is that the query optimizer never considers the buffer pool contents when choosing a query plan. The query optimizer evaluates various plans as it narrows down the possible choices. It looks for the best plan it can determine within a reasonable time frame. The query optimizer doesn’t always identify the absolute best plan. The query optimizer can’t spend an inordinate amount of time on plan compilation, but it always selects a “good enough” plan.

The query optimizer needs to know about the various tables involved in the query. It finds this information in the relational metadata for the tables. The relational metadata describes the table columns, indexes and constraints, as well as statistics about the value distributions in the various columns (if these statistics have been created). The storage metadata describes how the table and indexes are actually stored in the data files. The query optimizer doesn’t use this information in determining a plan.

SQL Server doesn’t keep track of which portions of a table and its indexes are in memory at any time. The buffer pool tracks the data file pages from a database in memory. However, nothing in SQL Server does any kind of automated aggregation. It won’t, for example, be able to determine that 50 percent of index two of table X is in memory, whereas only 5 percent of index three of table X is in memory.

The query optimizer assumes nothing is in memory, so it’s likely to choose a query plan that involves the lowest amount of physical I/O. Plans with a lot of physical I/O are time-consuming and expensive. Consider a table with two non-clustered indexes that could satisfy a SELECT query. The first index has all the required columns. The second index has all the required columns, plus several additional columns.

The first index will have smaller index records, so it will have more index rows per data file page. The query optimizer will choose this index. Accessing the required index records to satisfy the query will require reading fewer data file pages into memory using physical I/O, compared with using the second index. The second has larger records and fewer records per data file page. This rationale is called cost-based optimization, and it’s how the query processor in SQL Server is designed.

However, what if a large proportion of the second, wider index is already in memory and none of the first index is in memory? The query will require physical I/O to read the chosen index into memory. This will be a lot slower than using the wider index already in memory. In that case, the query plan is actually suboptimal. However, the query optimizer has no way of knowing what’s in memory, and this is only an example.

Even with that in mind, though, what if the query optimizer recognized what was in memory and generated a plan that used the less-efficient index because it was already in memory? That plan would be optimal only as long as that situation persisted. If the more-efficient index was read into memory for another query, the query would then have a suboptimal plan. How would the plan be invalidated so that it could be recompiled again?

As a former software engineer on the SQL Server team, I know that the engineering complexity of keeping an aggregate view of which tables and indexes are in memory to aid in query plan selection and invalidation is extremely challenging. This would likely add an undesirable performance overhead for only an occasional benefit, and it will likely never happen.

If you’re interested in seeing the buffer pool, check out the Dynamic Management View (DMV) sys.dm_os_buffer_descriptors and the various queries I’ve put together on the  Buffer Pool section of my SQLskills blog.

Long backups

Q. We use log shipping to provide a secondary database for reporting purposes. Every so often, we run into an issue where applying a log backup to the secondary database takes far longer than usual. Do you have any idea what might cause this?

A. Yes, I’ve seen this situation a few times. If you’re using a log shipping secondary database for reporting, this means you’re using the WITH STANDBY option when restoring the log backups on the secondary database. This works in three steps:

  1. Write all log records from the backup into the database log file.
  2. Perform the REDO part of recovery (ensuring that all operations from committed transactions are present in the database).
  3. Perform the UNDO part of recovery (ensuring that all operations from uncommitted transactions aren’t present in the database).

Step 3 writes all the log records generated by the UNDO operations into a special file called the undo file. This means the database is in read-only mode. It’s also transactionally consistent so users can have access. The reason log records are written into the undo file is so the transaction log of the database isn’t altered in any way. This lets you restore subsequent log backups.

When the restore process begins on the secondary database, if an undo file exists, there’s another step performed before the first three steps. This initial step takes all the log records in the undo file and undoes the effects on them. This essentially puts the database back into the state it was in at the end of step 2. This database state is the same as if the previous log backup had been restored using WITH NORECOVERY, instead of WITH STANDBY.

The problem you’re experiencing is when a log backup being restored contains a long-running transaction that doesn’t commit before the end of the log backup. This means it has to be completely undone as part of restoring the log backup. This results in a large undo file, which in itself can make restoring a log backup take longer. If the log backup being restored also has an uncommitted, long-running transaction, then it’s the perfect storm. Step 3 will also take a long time.

This can happen when the principal database is undergoing index maintenance and a log backup finishes near the end of a long-running index rebuild operation of a large clustered index. The initial restore of that log backup on the secondary database takes much longer than usual to complete because of step 3 in the restore process.

The next log backup on the principal database also completes just before an index rebuild. When it’s restored on the secondary, the entire undo file has to be undone again. Then the log restore happens, and another large undo file is generated to undo the second uncommitted index rebuild.

If the secondary database needs 24x7 access for reporting, you must be aware of this possibility. In that case, carefully augment the index maintenance operations on the primary database with log backups. This will ensure that only complete, committed index rebuilds are present in the log backups being restored on the secondary database.

An alternative is to move from log shipping to database mirroring. Here, log records are continually being sent from the principal to the mirror database. There aren’t any extra steps involving undoing log operations multiple times. There’s a complexity trade-off involved, as the drawback of this approach is that reporting would have to use database snapshots.

Replicate to repair

Q. Every so often we end up with corrupted files. Our backups also end up damaged, so we have to run a repair operation. Last week, one of the databases I had to repair was a replication publication database. In Microsoft Books Online, it says all subscribers must be reinitialized after repairing a publication database. Can you explain why?

A. If you’re considering using the REPAIR_ALLOW_DATA_LOSS option of database consistency checking (DBCC) CHECKDB (I’ll just say “repair” from here), you have to think twice if the database you’re going to repair is a replication publication database. If at all possible, use your backups instead of running repair.

If you’re using merge replication, Data Manipulation Language (DML) triggers capture changes to the publication database and converts them into logical operations. If you’re using transactional replication, database transaction log analysis captures the publication database changes. Then the logged, physical operations are converted into logical operations. In both cases, the logical operations are then applied to the replication subscription databases.

Neither mechanism allows the capture of repair operations. Repair operations are always direct physical changes to the database structures. These are necessary to fix inconsistencies in the structures, such as a database page, a table record or a linkage between two pages. They’re different from physical database changes because of queries performing inserts, updates or deletes on tables.

These repair operations can’t translate into logical operations that you can have applied to replication subscribers. There are no logical operations than can be expressed using Transact-SQL for the equivalent of the direct structural changes the repair is performing. Imagine a repair operation is forced to remove a data page from a unique clustered index (essentially deleting some table records). The subscription is not reinitialized. Those records would still exist on the replicated copy of the table.

If a subsequent insert operation inserted records with cluster key values corresponding to the records deleted by the repair operation, the Distribution Agent would fail when trying to apply the inserts to the replicated copy. The repair operation would not have been applied to the subscription database, so a duplicate key violation error will occur when attempting to apply the insert to the replicated table. This means if a repair operation changes any table that’s part of a replication publication, the replication subscription is no longer valid. It will have to be reinitialized.

If the repair operation affects any of the replication metadata tables, the entire replication publication is in an inconsistent state. Replication must be completely removed and reconfigured. Obviously, this is a much more invasive process than just reinitializing a subscription. The bottom line is that, if at all possible, you want to avoid having to repair a replication publication database.

Paul S. Randal

Paul S. Randal is the managing director of, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at, and you can find him on Twitter at

Related Content