Click to Rate and Give Feedback
Tips
You can run Check Disk from the command line or within other utilities. At a command prompt, you can test the integrity of the E drive by typing these commands. ...

Read more!

Windows 7 beta 1 introduces the built-in ability to burn discs directly from ISO and IMG files. Find out how you can easily burn ISO and IMG files to disc. ...

Read more!

Mailbox and public folder databases have several associated states. You can determine the status of a database by following these four easy steps. ...

Read more!

Simple steps for creating and restoring Shadow Copies on Windows Server 2008! ...

Read more!

Windows Vista includes an assortment of utilities for diagnosing, monitoring, and repairing network connections. Here’s an overview of the key tools and what they’re used for. ...

Read more!

Related Articles

In this installment, Paul Randal answers questions about backing up and restoring, looks at the differences between log shipping and database mirroring, and explains why the Repair function should only be used as a last resort.

Paul S. Randal

TechNet Magazine October 2008

...

Read more!

Security continues to be an area of deliberate improvement for SQL Server. Explore some of the most important security-related changes you’ll find in SQL Server 2008, from encryption and authentication enhancements to Policy-Based Management and the new auditing system that will help with regulatory requirements.

Rick Byham

TechNet Magazine April 2008

...

Read more!

Saleem Hakani and Dan Carollo

TechNet Magazine February 2008

...

Read more!

SQL Server 2008 offers seven new built-in data types that let you work with more complex data and simplify the management of structured, semi-structured, and even unstructured data. Get an in-depth look at these new data types and find out how you can use them to improve data management in your infrastructure.

Kelly Wilson

TechNet Magazine April 2008

...

Read more!

SQL Server 2008 offers more powerful relational data warehousing capabilities. Take a close look at key performancerelated data warehousing features and find out how you can use all this new technology to optimize query performance.

S. Agarwal, T. Grabs, and J. Hammer

TechNet Magazine April 2008

...

Read more!

Also by this Author

Paul Randal answers your SQL questions: How does removing index fragmentation affect statistics? Does data compression interfere with backups? Can you explain database mirroring states? And more!

Paul S. Randal

TechNet Magazine June 2009

...

Read more!

In most situations, database backups are critical for recovering from a disaster, but there are lots of misconceptions about how backups work and what a good backup strategy is. In this article, the fi rst in a three-part series on disaster recovery, Paul S. Randal explains how the three most common types of backups work and how you can combine them into an effective backup strategy.

Paul S. Randal

TechNet Magazine July 2009

...

Read more!

Should you enable Backup Compression? How can you avoid dropped application connections during a failover? Paul Randal answers these questions and more in the latest installment of SQL Q&A.

Paul S. Randal

TechNet Magazine February 2009

...

Read more!

In this installment, Paul Randal answers questions about backing up and restoring, looks at the differences between log shipping and database mirroring, and explains why the Repair function should only be used as a last resort.

Paul S. Randal

TechNet Magazine October 2008

...

Read more!

Logging and recovery are critical parts of SQL Server, but they are often poorly understood. Paul Randal gives an overview of how the logging and recovery features work in SQL Server and explains the transaction log and recovery models to help you maintain a better database.

Paul S. Randal

TechNet Magazine February 2009

...

Read more!

Popular Articles

Aaron Margosis

TechNet Magazine August 2006

...

Read more!

See how the free Microsoft USB Flash Drive Manager can help you easily copy, delete, and back up files from a flash drive, as well as perform other useful tasks.

Lance Whitney

TechNet Magazine January 2009

...

Read more!

Project Server 2007 delivers significant enhancements, not only to the features and functionality for users but also for administrators. Alan Maddison explores some of the most significant new features and walks you through the installation and configuration of Microsoft Office Project Server 2007.

Alan Maddison

TechNet Magazine January 2009

...

Read more!

Why do attachment sizes increase when sending and receiving e-mail messages? How can you ensure databases on a passive node in a CCR cluster are defragmented during online maintenance? Can you use an external trust between forests? We answer these questions and more.

Henrik Walther

TechNet Magazine January 2009

...

Read more!

Virtualization brings significant changes to disaster recovery. Here’s an introduction to how the Microsoft virtualization platform factors into your disaster recovery plan, as well as a deeper look into backup and restore options and considerations for Windows Server 2008 Hyper-V.

Adam Fazio

TechNet Magazine October 2008

...

Read more!

Our Blog

NAP monitors the health of specified computers when they attempt to connect to a network and includes a number of mechanisms to enforce health requirements. In this article, Geek of All Trades Greg Shields gives readers an overview of these enforcement mechanisms and, as an example, takes a closer look at setting ...

Read more!

Use Windows PowerShell to Manage Virtual Machines Here are a few examples of how you can use Windows PowerShell scripts to manage virtual machines running on a Server Core installation. Note that these scripts are presented as samples and may need to be customized to work in your environment.

Create a New ...

Read more!

Disabling an Unused Part of Group Policy Objects One way to disable a policy is to disable an unused part of the GPO. By disabling part of a policy that isn’t used, the application of GPOs and security will be faster.

Administer Windows Server 2008 Server Core from the Command Prompt ...

Read more!

In the August 2008 issue of TechNet Magazine, Paul Randal wrote an article Top Tips for Effective Database Maintenance.  It was geared toward "involuntary  DBAs" (IT pros who inadvertently wind up responsible for a SQL Server instance).  The article had a great response from our readers so Paul has written another ...

Read more!

Microsoft Forefront is designed to deliver an integrated security solution that makes it much easier to deploy and manage security across an organization’s IT infrastructure. In this, our annual security issue, we feature two articles that describe how Forefront Security protects instant messaging and e-mail.

Protect ...

Read more!

SQL Server 2008
Tracking Changes in Your Enterprise Database
Paul S. Randal
 
At a Glance:
  • The need for tracking changes
  • Tracking changes in SQL Server 2005
  • Change tracking in SQL Server 2008
  • Change data capture in SQL Server 2008

For developers, one difficult problem in SQL Server is tracking what data has changed in a database. An even greater challenge is architecting a simple solution that doesn’t heavily impact workload performance and isn't difficult to create, implement, and manage. So why go to all the trouble to track changes? Is tracking changes really worth all this effort? Two commonly cited examples are to support updates to a data warehouse and to support the synchronization of heterogeneous, occasionally connected systems.
A data warehouse usually has some representation of the tables in the Online Transaction Processing (OLTP) database, but the table schemas may actually be quite different. This means that there needs to be an ETL (extract, transform, load) process that moves data from the OLTP database to the data warehouse.
Video
Watch Paul Randal demonstrate how you can track changes in your database using the new Change Data Capture feature in SQL Server 2008.


I can think of three possibilities for doing this. The first is to refresh the entire data warehouse periodically. This is clearly impractical for large data volumes and also means that updates to the data warehouse are not continuous. The second method is to use a partitioning scheme in the OLTP database to allow the ETL process to work only on data that is new since the previous ETL process. This method only works for data inserts, not updates or deletes, and requires a complex mechanism to manage partition boundary definition and switching partitions. The third method is to track changes to the OLTP data and only perform the ETL process using the changed data. This is the most efficient method in terms of data volume.
Mobile devices are ubiquitous in today's business environment, which means that dealing with occasionally connected systems is a requirement. In terms of database systems, the problem is how to efficiently update a data store on a device that does not connect frequently, especially when the data store itself may be small and radically different in schema from the main database.
Consider a mobile sales rep who is responsible for a portion of a very large product catalog. Every night she connects her handheld device to the main database to download the latest data—all the changes to that portion of the product catalog, simplified for storage on a handheld device. The data transfer should be as efficient as possible.
Now, you could have the database system prepare the entire relevant portion of the product catalog for download to the device and have the device download it. In other words, all the data is downloaded every time the device connects, even if the data hasn't changed. This is obviously a rather inefficient approach.
Another approach is to have the database system track changes to the relevant portion of the product catalog. Then, when the handheld device connects, it asks for the data that has changed since the last time it connected. In this solution, the database system only has to prepare a subset of the data and the download is as efficient as possible.
Another reason for tracking changes is to support auditing, which is essential these days. Auditing tracks the changes being made as well as when the change occurred and who made the change. This really takes things to another level, with rigid constraints around durability, security, and correctness of a complete audit trail.
The technologies that were designed for tracking data changes in SQL Server 2008 were not intended to support auditing; however, SQL Server 2008 offers a new feature, which is called SQL Server Audit, that was designed specifically for auditing. Rick Byham discussed the SQL Server Audit feature in his "SQL Server 2008: Security" article from the April 2008 issue of TechNet Magazine (available at technet.microsoft.com/magazine/cc434691).
As you can see, there are a number of compelling reasons to track changes to your data. So the important question is, how best can tracking be done?

How You Track Changes in SQL Server 2005
With SQL Server 2005 (and earlier versions of SQL Server) there is not a simple, canned solution. So for these platforms, developers have had to create custom solutions for their applications, usually involving timestamp columns, DML (Data Manipulation Language) triggers, and extra tables. These solutions, however, present a variety of potential problems. For instance:
  • The addition of timestamp columns causes the table schema to change (with possible knock-on effects in stored procedures and other code).
  • A DML trigger is implicitly part of the transaction containing the DML by which it is triggered, so its execution time will increase the length of the transaction. The more complex a trigger, the longer it will take to execute and so the higher the detrimental effect on workload performance. DML triggers used for tracking changes need to process the inserted and deleted tables to harvest all the changes and then insert them into another tracking table.
  • The tracking table needs to be managed in some way to avoid its growing out of control, which may require you to create something like an Agent job to periodically trim old data.

Easier Ways to Track Changes in SQL Server 2008
SQL Server 2008 introduces two new technologies that make it much easier to track changes to data: change tracking and change data capture. Both features track data that has changed (as well as use the insert, update, or delete operations to track exactly how the data was changed), and they eliminate the need for custom solutions. Those similarities aside, their mechanisms and what exactly they track are actually quite different.
Change data capture uses an asynchronous mechanism that tracks all the changes that occur to a table (or a defined set of columns of the table), including the column values themselves. This is designed for scenarios like the data warehouse ETL process I described previously.
Figure 1 illustrates change data being consumed in time slices. The change data capture mechanism extracts the changed data into a set of tables, with the most recent changes being at the top of the table. The ETL process can then query the tables holding the change data for all changes that occurred within a set time period. This mechanism allows the ETL process to limit the amount of data that must be consumed in each batch.
Figure 1 Historical change data being consumed in time slices (Click the image for a larger view)
Change tracking, on the other hand, uses a synchronous mechanism that tracks only that a particular row changed in a table (and optionally the list of columns that changed). This is designed to address such problems as the occasionally connected system scenario I described previously. This approach is illustrated in Figure 2.
Figure 2 An occasionally connected system using change tracking data (Click the image for a larger view)
Both features introduce an increase in I/O and logging, but this is true with custom solutions as well—the change data has to be stored somewhere. What makes these two features potentially different from a custom solution is that the tables used to store the change data must be in the same database as the tables being tracked. This means all the change data will be included in backups and potentially transmitted over the network by log shipping or database mirroring.
In terms of development, these two features should remove a lot of the complexity from tracking changes. There are no table schema changes or triggers required for either technology. Both technologies have configurable, automatic cleanup processes, order changes by transaction commit times, and provide built-in functions to retrieve change information.
From the management perspective, there are pros and cons to each approach. As with any technology, there is a lot of information you must understand before you develop and deploy solutions that use these features. In the rest of this article I'll give an overview of each of these features, touching on how they work and the practical points to consider before using them in production.

How Change Data Capture Works
Change data capture doesn't do anything as part of the transactions that change the table being tracked. Instead, the insert, update, and delete operations are written to the transaction log, as normal, and periodically harvested from the log. Harvesting is performed by a SQL Agent log reader job, and the harvested operations are stored in a separate table called a change table. At some later point, the change table can be queried to obtain the change data using one of two functions. The combination of the change table and the two functions is called a capture instance. Figure 3 shows the flow of data using change data capture to drive a data warehouse ETL process.
\\msdnmagtst\MTPS\TechNet\issues\en\2008\11\Randal - SQL\layout\FIGURES\fig03.gif
Enabling change data capture is a two-stage process. First a member of the sysadmin fixed server role must enable change data capture for the database using sys.sp_cdc_enable_db. Then a member of the db_owner fixed server role must enable change data capture on a specific table using sys.sp_cdc_enable_table. These security requirements are due to the potential for high disk usage if change data capture is misconfigured. It makes perfect sense that a table owner cannot enable the feature and surprise a database administrator with extra disk usage.
When change data capture is enabled for a database, a few things are added to the database, including a new schema (called cdc), some metadata tables, and a trigger to capture Data Definition Language (DDL) events. (One capability that I think is cool is that you can get a list of the DDL changes to a table.)
Enabling change data capture also creates the capture instance for the table—the change table and up to two functions to return change data. The change table name is the same as the capture instance, with _CT appended. The first function is always created and is the one used to return change data from the change table. The second function is created if the option to allow net changes is specified. This means that only the final result of all the captured changes is returned, rather than all the intermediate changes that the first function returns. The two function names are, respectively, fn_cdc_get_all_changes_ and fn_cdc_get_net_changes_, with the capture instance name appended. Note that (like the change tracking feature) this functionality requires the table to have a primary key or other unique index.
When you're dealing with the first table in the database to have change data capture enabled, two SQL Agent jobs may be created: the capture job and the cleanup job. I say "may be created" because the capture job is the same as the one used for harvesting transactions in transactional replication. If transactional replication is already configured, then only the cleanup job will be created and the existing log reader job will also be used as the capture job. This is good because having two log reader jobs would very quickly lead to contention problems with the log and hence performance problems. Either way, SQL Agent must be running if you want to use change data capture.
The logic inside the log reader automatically copes with tables being enabled and disabled for change data capture and alters what is harvested from the transaction log accordingly. One major point to note here is that once change data capture is enabled, the transaction log behaves just as it does with transactional replication—the log cannot be truncated until the log reader has processed it. This means a checkpoint operation, even in SIMPLE recovery mode, will not truncate the log unless it has already been processed by the log reader.
Also, if the BULK_LOGGED recovery model is used to reduce logging, change data capture will force everything to become fully logged, except for index create/drop/rebuild operations. If you have never experienced such behavior, beware that this may cause transaction log size problems, especially if the capture job defaults are changed so the log isn't processed as frequently.
By default, the capture job runs continuously, scanning the log every five seconds and processing a maximum of 500 transactions from the log. Also by default, the cleanup job runs every day at 2 A.M. and removes all change data entries older than three days from the change tables. You can change these settings using the sys.sp_cdc_change_job procedure, and then changes will not take effect until you restart the jobs using sys.sp_cdc_stop_job and sys.sp_cdc_start_job.
Although the log reader process will usually have a low impact on system performance, it is possible on heavily loaded OLTP systems with a lot of changing data that the addition of just one log reader process can cause contention on the transaction log. The actual contention would be caused by the disk heads having to move back and forth between the point at which the log is being written to by transactions and the point at which it is being read by the log reader process. In this case, it may be necessary to change the frequency at which the capture job runs to ensure OLTP performance will not suffer. This, however, creates a classic disk space versus performance trade-off—the log will continue to grow until the capture job processes it.
The same problem occurs if the cleanup job frequency or change data retention periods are changed—the change tables will continue to grow until the change data is cleaned. This leads to a wholesale design consideration of what gets tracked and how long it is retained. Important things to consider here include:
  • The required column list for the capture instance. When more columns are captured, more change data is inserted into the change tables.
  • The amount of disk space used by the change tables.
  • The frequency at which the process runs that consumes the change data. Keep in mind that the data can't be deleted if it hasn't been used yet.
  • The frequency that the cleanup process runs—there may be so much change data generated that the cleanup process that deletes it can only run on the weekend, for example, because otherwise it generates too much transaction log.
Change data capture can be set up to simply track all changes to a table or to track a subset of the columns in a table. Using a subset can be useful if some of the unimportant columns are very wide varchar columns or Binary Large Object (BLOB) columns (such as text, image, or XML); otherwise, the space used by the change table could grow unwieldy very quickly.
Given the increased potential for disk space usage, the filegroup location of the change table can be set when change data capture is enabled. This allows easier management of the underlying disk space and means that all the change data can potentially be stored on a volume with a less expensive RAID level than the main database. Also, while the cleanup job settings apply to all capture instances, an individual capture instance can be separately cleaned at any time if disk space becomes an issue. You can easily monitor disk space usage using sp_spaceused on the capture tables.
The actual row written to the change table contains metadata about the transaction (the commit log sequence number, or LSN), as well as the order within the transaction that the change occurred, what the operation was, a bitmask of which columns were changed, and the actual column values.
DDL changes are unrestricted while change data capture is enabled. However, they may have some effect on the change data collected if columns are added or dropped. If a tracked column is dropped, all further entries in the capture instance will have NULL for that column. If a column is added, it will be ignored by the capture instance. In other words, the shape of the capture instance is set when it is created.
If column changes are required, it is possible to create another capture instance for a table (to a maximum of two capture instances per table) and allow consumers of the change data to migrate to the new table schema. But care should be taken when doing this because two capture instances for a tracked table mean twice the amount of disk space, I/Os, and logging.
Without going into too much depth, changes are retrieved from the change tables using the functions I've described. The functions take a start LSN and end LSN, and there are other functions supplied that allow you to convert a regular time into an LSN. When retrieving updates, you can even specify whether to see the before and after values, or just the before. There is a screencast of me using change data capture available at www.technetmagazine.com/video.

How Change Tracking Works
As I've mentioned, change tracking is a synchronous process and is a lot less complex than change data capture. It is part of the transaction that makes a change to a row in a table being tracked, and the fact that the row has changed is tracked in a separate table. The table is what's called an internal table, and there is no control over its name or where it is stored. I don't consider this an issue, as there should be a lot less data in this table than in a change table used for change data capture. But there can still be disk space problems, as I'll explain in a moment.
The fact that change tracking is done synchronously means that there is extra processing done inside each transaction that changes the table being tracked. The effect on performance is similar to that when a non-clustered index exists on the table and must be updated with each change to the table. Transactions themselves are also tracked when they commit by a row in the internal sys.syscommittab table.
Change tracking is enabled and disabled using regular ALTER DATABASE and ALTER TABLE syntax, and it follows the same model as change data capture, where it must be enabled at the database level before the table level. The sequence of operations would look something like this:
ALTER DATABASE AdventureWorks2000 SET CHANGE_TRACKING = ON
  (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON);
GO
USE AdventureWorks2000;
GO
ALTER TABLE Person.Person ENABLE CHANGE_TRACKING
  WITH (TRACK_COLUMNS_UPDATED = ON);
GO
The required permissions for enabling change tracking at the database and table levels are also different from those for enabling change data capture: db_owner and the table owner, respectively. When change tracking is enabled at the database level, the retention period can be set as well as whether change data is automatically cleaned up. The default retention period is 2 days, with a maximum of 90 days and a minimum of one minute.
Automatic cleanup is also turned on by default. When making changes to these settings, you need to evaluate the same trade-offs as I discussed with change data capture—basically disk space and performance against application needs.
By default, what is captured for each row is just that it changed. This is done by making a note of the primary key of the row that changed (which means that change tracking on a table requires that it has a primary key), along with a version number (once a database is enabled for change tracking, a version number is instituted, which allows ordering of operations), and the type of operation that made the change. You can also optionally track which columns changed; this requires 4 bytes per changed column.
Disk space monitoring is slightly different with change tracking, since the change data is stored in internal tables. To find the names of the internal tables being used, simply use the sys.internal_tables system catalog view:
SELECT [name] FROM sys.internal_tables
  WHERE [internal_type_desc] = 'CHANGE_TRACKING';
GO
Then pass the name into sp_spaceused to see how much disk space is being used.
Unlike with change data capture, when change tracking is enabled, there are restrictions on the DDL that can be performed on a table being tracked. The most notable restriction is that the primary key cannot be altered in any way. The other restriction worth calling out here is that an ALTER TABLE SWITCH will fail if either table involved has change tracking enabled. This is most likely due to the fact that it doesn't make sense to automatically start or remove change tracking for a partition being switched out of a change tracked, partitioned table, or a change tracked table being switched into a partitioned table, respectively.
Changes are retrieved from the internal change tables using a new CHANGE­TABLES (CHANGES …) function. This takes the name of the change tracked table plus the version number from the previous time it was used and returns information about all rows that have changed since that previous time. There are various functions to find the current and oldest valid version. The application can then use the information that is returned to query the table being change tracked to get the actual column values. This, of course, is a multi-step process—you get the current version, use that version to query change tracking, and then query the actual tables for the column data corresponding to that version.
On a constantly changing system, it is possible to get inconsistent or incorrect results unless some kind of unchanging view of the version, change data, and actual column data is maintained. To do this, you can use snapshot isolation and wrap the multi-step process in an explicit transaction. This works well but has potential drawbacks. Snapshot isolation can affect the workload performance, and it affects the performance and space usage of tempdb. You can find more details about this at technet.microsoft.com/library/cc280358.

Wrapping Up
Figure 4 provides a side-by-side comparison of change tracking and change data so you can get a better idea of the major differences that DBAs will care about. You can see from the table that change data capture is much heavier weight than change tracking. It requires more care when deciding what to track because of the potential for fast growth in the size of the tracking table if, say, the table that is tracked contains BLOB columns or very wide rows. There is also the potential for transaction log management issues, as the log will not be truncated until the log reader has harvested records from the log.
Feature Change Tracking Change Data Capture
Synchronous Yes No
Requires SQL Agent No Yes
Forces full logging of some bulk operations No Yes
Prevents log truncation No Yes, until log records harvested
Requires snapshot isolation Recommended No
Requires separate tables to store tracking data Yes Yes
Requires primary key Yes Not by default
Allows placement of tracking tables No Yes
Potential for space consumption issues Some Lots
Automatic cleanup process Yes Yes
Restrictions on DDL Yes No
Permission required to enable Sysadmin Database owner
Change tracking is not without its own requirements, however. For instance, a primary key is required, and it is strongly recommended that you use snapshot isolation when change tracking is enabled. Snapshot isolation itself can add significant workload overhead and requires much more careful management of tempdb.
There is one further issue that developers and DBAs need to cope with: disaster recovery. Although discussing this in depth is beyond the scope of this article, the topic of disaster recovery is too important to not at least mention here.
Both features play well with BACKUP and RESTORE. The problem comes when a database gets restored and is essentially taken back in time. How should the overall application/system behave? Custom solutions designed for tracking changes also face this problem, and it still needs to be considered when using SQL Server 2008.
As always, make sure you read through all the available documentation (technet.microsoft.com/library/bb418491) and any existing white papers before embarking on a design and deployment project that involves the new features to track changes. You need to first find out if any potential issues I haven't covered here may apply to you. You should also get details on the new monitoring SPs and Dynamic Management Views (DMVs).
Overall these new features are a huge advance over previous methods for tracking data changes. Now that they exist, you can be sure developers will want to use them in solutions that you manage.
There are important configuration and management issues to consider, and I hope this article has given you a solid overview of the technologies so you can anticipate and prepare for some of the issues I have discussed. If you have any feedback on this article or questions, go ahead and drop me a line at Paul@SQLskills.com.

Paul S. Randal is the Managing Director of SQLskills.com and a SQL Server MVP. He worked on the SQL Server Storage Engine team at Microsoft from 1999 to 2007. Paul wrote DBCC CHECKDB/repair for SQL Server 2005 and was responsible for the Core Storage Engine during SQL Server 2008 development. Paul is an expert on disaster recovery, high availability, and database maintenance and is a regular presenter at conferences around the world. He blogs at SQLskills.com/blogs/paul.

Page view tracker