SQL Q&A
Partitioning, Consistency Checks, and More
Paul S. Randal
Q I accidentally attached a production database to a SQL Server® 2005 server and now I'm trying to attach it to the correct server, which is running SQL Server 2000. I've tried simply detaching the database and attaching to the SQL Server 2000 server, and I've tried doing the same thing with a backup and restore, but neither of these work. Why won't it work on my SQL Server 2000 server? This is the only copy of the database I have.
A The first thing to note here is the importance of having backups. DBAs usually think of needing backups to recover from corruption or some other destructive disaster. But you've hit upon a less obvious disaster—something going wrong during an upgrade, though in this case it was an accidental upgrade. The point remains, however, that you should always have a recent full database backup in case something happens to your database.
An upgrade, whether intentional or accidental, is a one-way operation and it is extremely difficult to reverse its effects. When you upgrade between versions of SQL Server, a series of upgrade steps are performed on the database. Each step usually involves some physical changes to the database, and each step increases the version number of the database.
For example, one of the major changes performed when upgrading a database from SQL Server 2000 to SQL Server 2005 is to change the structure of a database's system catalogs (often called the system tablesor database metadata) that hold various metadata about tables, indexes, column, allocations, and other details regarding the relational and physical structure of the database.
As each of these upgrade steps is performed, the database version number is increased. For example, SQL Server 7.0 databases have version number 515, SQL Server 2000 databases have version number 539, and SQL Server 2005 databases have version number 611 (or 612 if the vardecimal feature is enabled). This allows SQL Server to know the last upgrade step performed on the database.
Releases of SQL Server cannot read databases upgraded to more recent releases of SQL Server (for instance, SQL Server 2000 cannot read a database that's been upgraded to SQL Server 2005). This is because older releases do not have the code needed to interpret the upgraded structures and database layout. And here lies the problem with your database, which has been upgraded to SQL Server 2005: not being able to attach back to SQL Server 2000.
In the absence of a full database backup, your only option is to export all the data from the upgraded database and manually transfer it back to a new SQL Server 2000 database. As long as you haven't used any new features in SQL Server 2005, you can script out the database structure, create the database on SQL Server 2000, and then export/import the data.
To script out the database in SQL Server 2005, you can use the Object Explorer in SQL Server Management Studio (right-click on the database and select Tasks and then the Generate Scripts option). The wizard is self-explanatory and will generate a script to create all the objects, indexes, constraints, triggers, and so on.
Q We've recently redesigned our schema to use table partitioning on the main table—we've been told this helps improve performance. The database is stored on a single 120GB drive and the table is contained in a single filegroup. There's no sliding-window being done, just new partitions being added every week, and all data needs to be available online. Most of our queries process data from within a single week and there are a few that work across the last year. This seems to be easier to do in a single filegroup. Am I correct or is there more to it?
A While it may seem easier to use a single filegroup, you're unlikely to see the benefits of using partitioning that way. The major uses of partitioning are for more efficient database maintenance and for increasing data availability in the event of a disaster, plus you can create a schema that gives performance gains.
The archetypal example of partitioning is for a sales table, with 12 partitions, each representing one month of the last year's sales data. At the end of the month, the oldest partition is switched out (and archived or deleted) and a new partition is switched in. This is the sliding-window scenario to which you referred. The current month's partition is set to read/write and the older months are read-only. Each partition is stored in a separate filegroup. This schema allows all the benefits of partitioning, but is not the most optimal in all circumstances.
My wife, Kimberly, came up with a twist on the schema described above that allows more efficient indexing. Consider splitting the sales table into two tables—one to hold the single read/write partition, and one to hold the 11 read-only partitions, with a partitioned view over both tables.
This allows the read/write table to have a small number of indexes, and the read-only table to have more indexes to support reporting queries. This, in turn, makes Data Manipulation Language (DML) operations on the read/write data significantly more efficient because they do not have to maintain so many non-clustered indexes.
Also, queries on the read/write data don't have to also process the read-only data. Partition elimination in query plans is not perfect in SQL Server 2005, especially in cases with complex predicates, but it has been much improved in SQL Server 2008. For more information on this, see Kimberly's blog post at
sqlskills.com/blogs/kimberly/default,month,2007-10.aspx#a71c70243-3d57-4940-9af7-a802b73f2f93.
To show you what I mean, I'll explain some of the features that are enabled by partitioning over multiple filegroups.
Partial Database Availability This is the ability for a database to be online and accessible during a disaster recovery situation as long as the primary filegroup is online. If you have only a single filegroup, then the entire database is down while it is being restored. With the data spread over multiple filegroups, only the damaged filegroups are offline during the restore and your application may be able to continue operating.
Piecemeal Restore This scheme is similar to partial database availability. With a single filegroup, the unit of restoration is either a single page or the whole database. With multiple filegroups, you can restore just a single filegroup—thus allowing partial database availability.
Partitioned Database Maintenance With either of the partitioning schemes discussed above, you can perform per-partition index fragmentation removal, even when all the partitions are on a single filegroup. But with a single filegroup, you lose the ability to do per-filegroup consistency checks, which can drastically cut back on the amount of data that database consistency checking (DBCC) needs to process (and, in turn, the amount of CPU and I/O resources being used).
Simply put, while you can have multiple partitions within the same filegroup, there are numerous benefits to having a 1-1 mapping between partitions and filegroups.
Q We recently had a scare on one of our high-end database servers—a bad memory board was causing corruptions. We discovered these when random data started to appear in our application. We ran DBCC CHECKDB and found all sorts of corruption. Unfortunately, this was also present in our backups so we had to manually prune out the bad data.
Long story short—we replaced the bad hardware and turned on page checksums. We'd like to run regular consistency checks but we don't have a large maintenance window and our 2.4TB database takes a long time to check. What can we do?
A The question of how to perform consistency checks and other maintenance on a VLDB (very large database) is becoming more and more common. Many DBAs give up after finding that DBCC CHECKDB takes longer to run than their maintenance window allows. (In some situations, the database is used 24x7 and there is no valid time to give up the CPU and I/O overhead required by DBCC CHECKDB for an extended period of time.)
Apart from giving up and not running any consistency checks (which I definitely do not recommend) there are four methods you can use. I've personally helped customers use all four of these approaches.
Use the WITH PHYSICAL_ONLY Option of DBCC CHECKDB A regular DBCC CHECKDB runs a large number of logical consistency checks, taking up a lot of CPU (and basically being a CPU-bound operation). Using the WITH PHYSICAL_ONLY option limits the checks to running the very fast DBCC CHECKALLOC allocation bitmap consistency checks, and then reading and auditing every allocated page in the database, forcing any page checksums present on the pages to be tested. This turns DBCC CHECKDB into an I/O-bound operation, with a much shorter run time (in fact, it's sometimes even a magnitude or more faster than a full DBCC CHECKDB, which can translate to pretty significant time savings).
Split Up the Consistency Checking Workload This entails breaking up the tables in the database into equal sized groups (the easiest approach is to do this by number of pages) and then using DBCC CHECKTABLE commands to consistency check all the tables in a single group each night. So, for example, with, say, seven groups being checked one-group-per-day, with a DBCC CHECKALLOC and DBCC CHECKCATALOG once a week, the equivalent of a DBCC CHECKDB can be achieved, albeit spread out over the course of a week.
Use Table Partitioning with Multiple Filegroups The largest tables in the VLDB can be split over several filegroups. A sample consistency- checking schema may be to run a daily DBCC CHECKFILEGROUP on the filegroup holding the read/write partition and also a weekly DBCC CHECKFILEGROUPs on the filegroups holding the read-only partitions. The logic is that the read-only data is comprehensively backed up and not used in day-to-day processing. As a result, it does not need to be consistency checked as often because corruption is not so critical in this data.
Offload the Consistency Checks to Another Server This option involves restoring the regular full database backups onto a different server and running the DBCC CHECKDB on that server. Obviously, this completely offloads the consistency checking workload from the production server. However, the downside is that if a corruption is found, a consistency check must then be run on the production server—but that should be a very infrequent occurrence.
As you can see, there are plenty of options available for DBAs to perform consistency checks on a VLDB, thereby reducing the extra resource load required when running a full DBCC CHECKDB. As I mentioned previously, I've personally helped customers use all four of these approaches. I think you'll find that these options will work for you.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.