SQL Q & A: Moving, Performance Tuning, Backing Up and Mirroring Databases

Paul S. Randal

New Array Moving Day

Q: Our current RAID is filling up quickly, so we need to move some SQL Server 2005 databases elsewhere. The new array is ready and I’ve been preparing to move the databases. I’ve just discovered that one of the databases is a transactional replication publisher and I know that means I can’t move the database. What should I do?

A: There is good news for you—only SQL Server 2000 (and earlier) had the limitation that restricted moving a publication database without reinitializing transactional replication or directly altering the various system tables.

With SQL Server 2005 and SQL Server 2008, there is a documented process that lets you move a database without having to do anything with transactional replication, as long as the database remains attached to the same instance of SQL Server. You have to accept some downtime, as there’s no way to move a database file while it’s still online. The procedure is as follows:

First, take the database offline using the code below. If there are users connected to the database, you’ll need to drop them first for this process to succeed:           

ALTER DATABASE MyDatabaseName SET OFFLINE;

Next, copy the data files to the new location. Use copy instead of move to allow for a fast rollback in case anything goes wrong (otherwise, you will have to perform a restore). Then let SQL Server know the new location of each file with the following code:

ALTER DATABASE MyDatabaseName 
MODIFY FILE
   (NAME = N'LogicalFileName',
   FILENAME = N'pathname\filename');

Once you’ve physically copied all the files and updated their locations in SQL Server, bring the database back online with the code:

ALTER DATABASE MyDatabaseName SET ONLINE;

Closing the Page Latch

Q: I’m having problems understanding some of the concepts around performance tuning. I’ve read several times that I need to prevent “page latch” issues. I don’t know what it means by a “page” or a “latch,” or why a page latch could even be an issue. Can you explain all this?

A: All data in a SQL Server database is stored in data files. Internally, these files are organized into sequences of 8KB chunks called pages. A page is the basic unit of storage and I/O that SQL Server can manage. Pages are usually in the data files on disk, and need SQL Server’s cache (known as the buffer pool) to read them before processing any queries.

SQL Server uses different kinds of pages to store different kinds of relational data (such as rows from a table, rows from a non-clustered index or text/LOB data). There are also pages that store parts of the internal data structures necessary for SQL Server to organize and access the pages storing the relational data.

A latch is a lightweight internal mechanism SQL Server uses to synchronize access to a page within the cache. There are two types of page latches you need to watch—regularpage latches and page I/O latches. If a SQL Server thread has to wait to acquire one of these latches, it indicates a performance issue.

When SQL Server is waiting for a portion of a data file to be read from disk, it can cause a page I/O latch wait. If a page I/O latch takes an excessive amount of time, this usually indicates a performance problem with the underlying disk subsystem (that is, it’s overloaded).

When multiple threads within SQL Server are trying to access the same 8KB data file page in memory, and there is contention for access to the page, this can cause a page latch wait. The most common occurrence of this involves heavy use of small temporary objects in the tempdb database.

A deeper explanation of how to monitor and mitigate page latch waits is beyond the scope of this column, but you can find more information in:

Looking Through Database Snapshots

Q: I’ve just discovered database snapshots. Now I’m considering using them as an alternative to the full recovery model and log backups. I’ll create a snapshot every hour or so, and that way if something goes wrong, I can pull back the damaged data. They seem like a lot less hassle and a much faster way to restore. Do you see any problems with making this change?

A: Yes—database snapshots are not a practical or viable substitute for a comprehensive disaster recovery strategy. A database snapshot does not provide the same capability as a transaction log backup in terms of fully recovering from a disaster. The database snapshot does not contain a copy of all pages from the database, only those that have changed since it was first created. This means that if the database is damaged in any way, the database snapshot is useless without the underlying database. It’s just a collection of disparate pages from the database and can’t be used for recovery.

A database snapshot will let you pull back data that has been accidentally deleted from the database, as long as the database itself is still available. If a dropped table in the database still exists in the snapshot, for example, you can use it to recreate that dropped table.

That being said, it’s not a good idea to create too many snapshots of a database (as a substitute for a one-half-hourly transaction log backup) because of the potential performance problems. Before you can exchange a database page (see the explanation in the answer in the “Closing the Page Latch section”), you must first synchronously copy the page into all existing database snapshots that do not already contain a version of that page. As you create more database snapshots, the more page copies you have to make, and performance degrades.

One other reason to not create too many database snapshots is that each will contain pre-change copies of database pages. Each of them will grow as more of the database changes. This can lead to disk space problems, as well as performance problems.

Database snapshots are not designed to be a substitute for frequent log backups. You can read a more in-depth study of the performance implications of database snapshots in the white paper “Database Snapshot Performance Considerations Under I/O-Intensive Workloads.”

Also, if you’re using the full recovery model and transaction log backups, then you’re obviously interested in being able to recover up to the point of a disaster and/or make use of point-in-time restores. (For an explanation of these see my July 2009 and November 2009 articles, “Understanding SQL Server Backups” and “SQL Server: Recovering from Disasters Using Backups,” respectively.)

Mirror, Mirror

Q: I’ve been asked to set up a database mirror for our database, but I’m concerned that database mirroring isn’t going to help with our problem. We’ve had some corruption issues with our SAN, so the plan is to have database mirroring protect us from corruption. Won’t any corruption automatically be sent over to the mirror? How is database mirroring going to help with this?

A: This is an issue that causes a lot of confusion. It would seem that any technology that provides a redundant copy of a database would be susceptible to corruption propagating from the principal to the mirror database (to use the database mirroring terminology)—but in reality this does not occur.

The crux of the matter lies in understanding how the mirror database is maintained. Corruption would certainly be propagated to the mirror if the underlying synchronization mechanism copied complete database pages from the principal to the mirror database. A corrupt page from the principal would then be placed in the mirror.

However, database mirroring specifically avoids this because it doesn’t copy database pages from one database to the other. Database mirroring works by copying the transaction log records from the principal database to the mirror. Transaction log records describe physical changes made to database pages, and do not contain the actual pages themselves. (For a complete explanation of transaction log records, logging and recovery see my article from February 2009: “Understanding Logging and Recovery in SQL Server.”)

Even if a database page is corrupted by the underlying I/O subsystem of the principal database, there’s no way for that corruption to directly propagate to the mirror database. The worst that could possibly happen is if SQL Server doesn’t detect a page corruption (because page checksums are not enabled), and a corrupt column value is used to calculate a value stored in the database. The resulting incorrect result would be propagated to the mirror database—a second-order corruption effect.  As I mentioned, if page checksums are enabled, such corruption would be detected when the page is read from disk, and the second-order corruption would not occur.

This behavior also explains why running a consistency check on the principal database doesn’t yield any information about the consistency state of the mirror database and vice-versa. They are two separate databases kept synchronized by shipping descriptions of physical changes to the database, not the actual database pages.

Editor’s Note: Thanks to Kimberly L. Tripp of SQLskills.com for providing a technical review of this month’s column.

Paul S. Randal* is the managing director of SQLskills.com, a Microsoft regional director and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. He wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Randal is an expert on disaster recovery, high availability and database maintenance, and is a regular presenter at conferences worldwide. He blogs at SQLskills.com/blogs/paul, and you can find him on Twitter at Twitter.com/PaulRandal.*