SQL Q&A: Size Does Indeed Matter
Database size, index fragmentation and post-failover availability are among the issues dogging SQL admins this month.
Paul S. Randal
Fear the Fragmentation
Q. I’ve read a few blog posts that seem to imply we don’t need to be concerned about index fragmentation if our databases are hosted on solid-state storage—the theory being that solid-state drives (SSDs) are so much faster than spinning disks. I understand the performance degradation would be reduced, but can we really just completely ignore index fragmentation?
A. Whether you’re using spinning disks or .SSDs., you need to pay attention to index fragmentation. Index fragmentation encompasses two phenomena—out-of-order index pages and page-density issues. The former prevents efficient read ahead during index-range scans and the latter reduces data density.
It’s true that read/write latencies with SSDs are very small. Consequently, the need to perform more frequent, smaller read-ahead I/Os when range scanning a fragmented index will not have anywhere near as much of a performance effect as in the same situation when using spinning disks.
However, the reduction in data density from index fragmentation can still be a big problem. Most index fragmentation occurs from an operation called a “page split.” This happens when free space is created on a page in an index by moving half the index rows to a new page. This leaves the old and new pages with roughly 50 percent empty space. With a heavily fragmented index, it’s not uncommon to see indexes with an average page density of 70 percent or less (with 30 percent free space).
With this in mind, if a large number of indexes in the databases stored on your SSDs have low page density, this means your expensive SSDs could be storing a large amount of empty space. This is clearly not an optimal situation. Also, even though the extra I/Os necessary to read in the lower-density pages will have low latencies on the SSDs, they’ll take up more space in the SQL Server buffer pool (the in-memory data-file page cache). This also means your precious server memory isn’t being used optimally.
The other thing to consider besides the index fragmentation itself is the cause of the fragmentation: page splits. These are expensive operations that generate a lot of transaction log records (take a look at my blog post to see how bad it can be). These extra log records mean extra processing by anything that reads the transaction log (such as transactional replication, backups, database mirroring, log shipping). This can cause performance degradation for those same processes. So don’t ignore index fragmentation just because you’re using SSDs.
Don’t Look to the Mirror
Q. We’re redesigning our availability strategy, but have become stuck on how to make a couple of transactional replication subscription databases more highly available. We couldn’t use database mirroring on SQL Server 2005 because we’d have to reinitialize the subscription after a failover. Is there a better solution now that we’re on SQL Server 2008 R2?
A. You’re correct in stating that mirroring a subscription database on SQL Server 2005 only provides a redundant copy of the data that has already been mirrored to the mirror copy of the subscription database. There’s no way to recreate the replication subscription without a complete re-initialization. This obviously makes database mirroring a poor choice on SQL Server 2005.
With SQL Server 2008, there was a new mechanism introduced into transactional replication that allows for partial re-initialization of a subscription. The option is called “initialize from lsn.” It’s specified as the @sync_type parameter when calling sp_addsubscription.
Peer-to-peer transactional replication in SQL Server 2008 was enhanced to allow you to add and remove nodes in a peer-to-peer topology without having to make all activity in the topology completely quiescent first. This was a major boost to the data availability peer-to-peer topology provides. The “initialize from lsn” option was added as these enhancements were out in place.
For database mirroring, there’s no additional support for mirroring subscription databases (as there is in the Log Reader Agent for mirroring publication databases). However, you can use the “initialize from lsn” method to provide a fast re-initialization of a subscription after a mirroring failover.
This methodology relies on determining the Log Sequence Number (LSN—a unique number identifying a transaction log record) of the most recent replicated operation applied to the subscription database before the mirroring failover occurred. We’ll call this LSN2.
Some of these operations will have also been mirrored to the mirror copy of the database before the failover occurred. This could go up to LSN3, for example, a bit further back in time than LSN2. There will also be some operations that haven’t been applied to the subscription database at all. These are more recent in time than either LSN2 or LSN3. We’ll call those LSN1.
All operations up to LSN2 were applied to the principal subscription database. All operations up to LSN3 were applied to the principal subscription database and mirrored to the mirror subscription database. To perform an “initialize from lsn” initialization of a new subscription after a mirroring failover, the call to sp_addsubscription must use LSN3 as the starting point.
The distribution retention time period must also be set so operations are retained in the distribution database for some time after they’ve been applied to the subscription database. In short, you can now use database mirroring to provide higher availability of a subscription database with only a minimal re-initialization required after a mirroring failover. For a more more in-depth explanation on this, download the white paper, “SQL Server Replication: Providing High-Availability Using Database Mirroring.”
Too Big to Handle
Q. Our main database has reached almost 9TB. We’re finding we simply don’t have the capability to perform regular maintenance tasks without seriously affecting our regular workloads. We’re most concerned about being able to back up the database to allow disaster recovery. Do you have any advice?
A. This is a case where splitting the database up into more manageable chunks would be beneficial. You can do this in a number of ways, the most common of which use the SQL Server table/index partitioning feature (in Enterprise Edition) or manually split things into separate tables.
In either case, the crucial point is to create multiple filegroups in the database. With partitioning, each partition of the largest tables/indexes resides in a separate filegroup. With manual splitting, each large table resides in a separate filegroup (possibly with all of its indexes as well).
By using separate filegroups, you have more granular units of the database that you can backup and restore. You won’t have to operate on the entire 9TB each time. If you had a sales database, for example, with data from 2012 back to 2008, you could partition the various tables by data range into calendar year partitions. Each year partition would be in a separate filegroup.
With only the 2012 filegroup undergoing changes, you could back it up frequently. You could back up other unchanging filegroups much less frequently. This saves on backup storage space and the amount of time that the extra I/O overhead of performing the backup is incurred on the production system.
With such an arrangement, disaster recovery also becomes faster (using Enterprise Edition). You only need to quickly restore the filegroups required to bring the Online Transaction Processing (OLTP) portion of the workload online. You can do this with a partial restore, then use partial database availability to bring the database online. You can restore the filegroups containing the older data later using online piecemeal restore, while the OLTP activity is occurring in the already-online filegroups.
You can read more about this approach in these white papers:
- “Partitioned Tables and Indexes in SQL Server 2005”
- “Partial Database Availability”
- “Partitioned Table and Index Strategies Using SQL Server 2008”
Q. One thing that confuses our DBA team is how to tell whether the buffer pool is being pressured. There’s a lot of conflicting information about which PerfMon counters and what thresholds to use. Most of what I’ve read says to use Page Life Expectancy (PLE) and 300 as a threshold. Can you shed some light on this?
A. You’re not alone in your confusion. The number 300 was first referenced in a Microsoft white paper published five years ago. That is now badly outdated.
PLE is the right counter to use, but you have to understand what it means and when to be concerned. That number provides an instant measure of how aggressively the buffer pool is making space for required data file pages to be read from disk into memory. It’s not a rolling average. It’s the expected time in seconds that a page read from disk will stay in memory before you need to flush it out so another page can take its place.
As such, looking at a single PLE value doesn’t tell you much. You need to look at value trends. It’s entirely possible for valid SQL Server operations to cause the PLE to drop drastically. It will often then recover to its former value. If PLE drops and stays low, that’s a cause for concern.
The threshold for when to be concerned isn’t a fixed value, as many people describe. The 300 value means the entire buffer pool is being replaced every 300 seconds. If you have a 100GB buffer pool, for example, this means 100GB of new data is being read into memory every five minutes. That’s clearly a performance issue. However, it becomes a massive performance issue way before PLE hits 300. You can calculate a more reasonable value using (buffer pool memory in GB / 4) * 300, as explained in this blog post.
You also need to be aware of the non-uniform memory access (NUMA) configuration of your server. The PLE counter in the Buffer Manager performance object is actually the average of the PLEs for each NUMA node, if you have NUMA configured. This means monitoring the Buffer Manage PLE isn’t a true indicator of memory pressure on the server. In that case, you should measure the PLE counter in each of the Buffer Partition performance objects. You can read more about PLE and NUMA in this blog post.
PLE is the right counter to monitor, but you should only be concerned if the value drops significantly below normal and stays there for a long time. That’s a general guideline, but unfortunately there are no specifics that apply to all situations.