Backup Compression, Client Redirection with Mirroring, and More
Paul S. Randal
Q We're going to be upgrading most of our servers to SQL Server 2008, and one of the features that I'm looking forward to putting into production is backup compression. I know that I can turn it on by default for all databases on each server, but I've also heard that I might not want to do that. I'm not sure why I wouldn't want to have the feature enabled by default, as it seems like I've got nothing to lose. Can you help explain the reasoning behind what I've heard?
A The answer is my perennial favorite: it depends! Let me give some background to explain.
The key point to consider is the compression ratio that each database backup will have when backup compression is enabled. The compression ratio of anything being compressed by any algorithm is determined by the actual data being compressed.
Random data (small integer values, for instance) will not compress very well, so the contents of the tables and indexes in the database will determine, for the most part, the compression ratio that can be achieved.
Here are some examples of when backup compression may not produce a high compression ratio:
- If the database has transparent data encryption enabled, then the compression ratio will be very low because the data being compressed is random small values.
- If most of the data in the database is encrypted at the column level, then the compression ratio will be low, again because the column encryption essentially randomizes the data.
- If most tables in the database have data compression enabled, then the compression ratio will be low; compressing data that is already mostly compressed usually has little effect.
In the case when the compression ratio is low, the problem is not the low ratio but the fact that CPU resources are used to run the compression algorithm for no gain. No matter how well a chunk of data can be compressed, CPU resources are always used to run the compression and decompression algorithms.
This means that you need to check how well each database compresses in a backup before deciding to use backup compression for that database all the time. Otherwise you may possibly be wasting CPU resources. This is the basis for what you've heard.
To summarize, if the majority of the databases will benefit from backup compression, it makes sense to enable backup compression at the server level and manually change a few backup jobs to specifically use the WITH NO_COMPRESSION option. Alternatively, if the majority of the databases will not benefit from backup compression, it makes sense to leave backup compression turned off at the server level and manually change a few backup jobs to specifically use the WITH COMPRESSION option.
Q Last year we upgraded our databases to have database mirroring so that if a failure occurs, we can failover to the mirror and the application continues. While we were designing the system, we practiced doing failovers of the database and everything worked fine. Last week we had a real failure and the database failover occurred, but all of the application transactions stopped and the application didn't connect to the failover server. In the future, how can I set up SQL Server so that it doesn't drop the application connections during the failover so the transactions can continue?
A Let me break this down into two parts—how applications can cope with failovers and how to manage client redirection with database mirroring.
When a failover occurs using any of the high-availability technologies available with SQL Server, the client connection to the failed server is dropped and any in-flight transactions are lost. It is not possible to migrate an in-flight transaction between servers (in a failover situation or otherwise). Depending on the high availability technology, the in-flight transaction either will not exist at all on the failover server or it will exist as an in-flight transaction but will be rolled back as part of the process to bring the database online on the failover server.
With regard to database mirroring, which continually ships transaction log records from the principal server to the mirror server, it is usually the latter case—any in-flight transactions are rolled back as part of bringing the mirror database online as the new principal.
Therefore, there are two things that an application must be able to do gracefully when running on a server with the possibility of having to failover to another server:
- It must be able to gracefully handle the server connection being dropped and then try reconnecting after a small time interval.
- It must be able to gracefully handle a transaction being aborted and then retrying the transaction after a connection is established with the failover server (possibly using a mid-tier transaction manager).
The only high-availability technology here that does not require client changes specifically to allow redirection of the client connection after a failover is failover clustering. Clients connect to a virtual server name and are transparently redirected to whichever physical cluster node is active.
With high-availability technologies, such as log shipping and replication, the server name of the failover server is different, which means that manual redirection of client connections is required after a failover. This manual redirection can be done in several ways:
- You can hardcode the failover server name into the client so the reconnection attempts are made to the failover server.
- You can use Network Load Balancing with a 100/0—0/100 configuration, which will then allow the connection to be switched to the failover server.
- You can use something like a server name alias or switching entries in a DNS table.
With database mirroring, any of these options will work. But database mirroring also has built-in client direction capabilities. The client connection string is able to explicitly specify the name of the mirror server, and if the principal server cannot be contacted, the mirror will then automatically be tried. This process is known as explicit redirection.
If the client connection string cannot be changed, then implicit redirection may be possible if the failed server is now running as the mirror server. Any connections to it will be automatically redirected to the new principal—but this only works if the mirror server is running.
The SQL Server 2005 white paper "Implementing Application Failover with Database Mirroring
" explains these options in more detail.
When we upgraded to SQL Server 2005, we redesigned our large tables to be partitioned so that we could take advantage of partitioned maintenance and the sliding-window mechanism. You described this in the August 2008 installment ("Partitioning, Consistency Checks, and More
"). But we've encountered a problem. Occasionally, concurrent application queries are experiencing blocking across the whole table when the queries aren't even accessing the same partitions. I've heard that SQL Server 2008 fixes this problem—can you please explain how I can stop this blocking?
Figure 1 Examining locks on a partitioned table
A The problem you're seeing is caused by a mechanism called lock escalation. SQL Server acquires locks on data to protect them while a query is reading or writing the data. It can acquire locks on entire tables, data file pages, or individual table/index rows, and every lock takes up a little memory.
If a query causes too many locks to be acquired, SQL Server can decide to replace all the locks on rows or pages in a table with a single lock on the entire table (the threshold, when this takes place, is approximately 5000 locks, but the exact algorithm is complicated and configurable). This process is called lock escalation.
In SQL Server 2005, if query A is operating on a single partition of a table and causes enough locks to be taken to trigger lock escalation, then the entire table becomes locked. This can prevent query B from being able to operate on a different partition of the same table. Hence, query B is blocked until query A completes and its locks are dropped.
In SQL Server 2008, the lock escalation mechanism has been improved to allow a table to have partition-level lock escalation. Using the example above, this means that the lock escalation caused by query A would only lock the single partition query A is using, rather than the entire table.
Query B will then be able to operate on another partition without being blocked. Query B could even trigger lock escalation itself, which would then lock just the partition that query B is operating on, rather than the entire table.
This model of lock escalation can be set using the following syntax:
ALTER TABLE MyTable SET (LOCK_ESCALATION = AUTO);
This syntax instructs the SQL Server Lock Manager to use partition-level lock escalation if the table is partitioned and regular table-level lock escalation if the table is not partitioned. The default behavior is to use table-level lock escalation. Care should be taken when setting this option, as it could lead to deadlocks depending on the behavior of your queries.
For example, if queries A and B both cause lock escalation on different partitions of a table, but then they each try to access the partition the other query has locked, one of the queries will be aborted by the Deadlock Monitor process.
The figure shows an example of querying the sys.partitions system catalog view (the first set of results) and the sys.dm_os_locks DMV (the second set of results) to examine the locks being held for queries against a partitioned table where partition-level lock escalation has taken place. In this case, there are two partition-level exclusive locks (the HOBT locks in the output), but the table locks (the OBJECT locks in the output) are not exclusive, so multiple queries can access partitions even though lock escalation has taken place. Notice that the resource IDs for these two partition locks match the partition IDs for the first two partitions of the table in the output from sys.partitions.
Earlier this year I blogged about an example script showing how partition-level lock escalation works
and the potential for deadlocks. The SQL Server 2008 Books Online topic called "Locking in the Database Engine
" has an in-depth explanation of all aspects of locking in SQL Server 2008.
Q One of our servers had some issues with the disk holding the transaction log for a database, and the database became suspect. The most recent full backup was from five weeks ago, and it was going to take too long to restore all the log backups, too. It was out of hours when the problem occurred, so we rebuilt the broken transaction log to avoid the downtime. Under some circumstances, this could cause problems. But if nothing was accessing the data, then I think we're safe. Did we do the right thing?
The simple answer is that the only time I would consider rebuilding a transaction log is when it's not possible to recover from backups. Although you are aware of the dangers of rebuilding a transaction log (for those readers who are not, see my blog post, "Last resorts that people try first...
," the fact that the database went suspect means that it failed during recovery—either when running crash recovery or while rolling back a transaction. This means that there is the real possibility of data corruption in the database.
Although the problem occurred during your quiet period, did you consider scheduled jobs and background tasks? A maintenance job could have been running that was rebuilding or reorganizing a clustered index when the log became corrupted. A background task might have been running ghost cleanup on pages in a heap or clustered index. Either of these, for example, could have been making changes to clustered index structures that, if not properly rolled back, would result in corruption in the database and possible data loss.
The bottom line is that rebuilding a transaction log should always be the absolute last resort in any disaster recovery scenario due to the enormous potential for causing more corruption and data loss. At the very least, you should run a full DBCC CHECKDB on that database to check whether any corruption exists.
Moving forward, you should change your backup strategy so that you are able to perform timely restores and don't have to resort to drastic measures, such as transaction log rebuilding. The steps to design a backup strategy extend beyond the scope of this column, but I plan on covering this topic in a full-length feature article at some point in the coming year. So please stay tuned!
Paul S. Randal
is the Managing Director of SQLskills.com
and a SQL Server MVP. HeHe worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. Paul wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Paul is an expert on disaster recovery, high availability, and database maintenance and is a regular presenter at conferences around the world. He blogs at SQLskills.com/blogs/paul