Source tables change over time. A data mart or data warehouse that is based on those tables needs to reflect these changes. However, a process that periodically copies a snapshot of the entire source consumes too much time and resources. Alternate approaches that include timestamp columns, triggers, or complex queries often hurt performance and increase complexity. What is needed is a reliable stream of change data that is structured so that it can easily be applied by consumers to target representations of the data. Change data capture in SQL Server provides this solution.
The change data capture feature of the Database Engine captures insert, update, and delete activity applied to SQL Server tables, and makes the details of the changes available in an easily-consumed, relational format. The change tables used by change data capture contain columns that mirror the column structure of the tracked source tables, along with the metadata needed to understand the changes that have occurred on a row by row basis.
Change data capture is available only in SQL Server 2008 Enterprise, Developer, and Evaluation editions.
An Integration Services package can easily harvest the change data in the SQL Server databases to perform efficient incremental loads to a data warehouse. However, before you can use Integration Services to load change data, an administrator must enable change data capture on the database and the tables from which you want to capture changes. For more information on how to configure change data capture on a database, see Configuring Change Data Capture.
Once an administrator has enabled change data capture on the database, you can create a package that performs an incremental load of the change data. The following diagram shows the steps for creating such a package that performs an incremental load from a single table:
As shown in the previous diagram, creating a package that performs an incremental load of changed data involves the following steps:
Step 1: Designing the Control Flow
In the control flow in the package, the following tasks need to be defined:
Calculate the starting and ending datetime values for the interval of changes to the source data that you want to retrieve.
To calculate these values, use an Execute SQL task or Integration Services expressions with datetime functions. You then store these endpoints in package variables for use later in the package.
Determine whether the change data for the selected interval is ready. This step is necessary because the asynchronous capture process might not yet have reached the selected endpoint.
To determine whether the data is ready, start with a For Loop container to delay execution, if necessary, until the change data for the selected interval is ready. Inside the loop container, use an Execute SQL task to query the time mapping tables maintained by change data capture. Then, use a Script task that calls the Thread.Sleep method, or another Execute SQL task with a WAITFOR statement, to delay the execution of the package temporarily, if necessary. Optionally, use another Script task to log an error condition or a timeout.
In the data flow of the package, the following tasks need to be defined:
Retrieve the change data from the change tables.
To retrieve the data, use a source component to query the change tables for the changes that fall within the selected interval. The source calls a Transact-SQL table-valued function that you must have previously created.
Apply the inserts, deletes, and updates to the destination.
To apply the changes to the destination, use a destination component to apply the inserts to the destination. Also, use OLE DB Command transformations with parameterized UPDATE and DELETE statements to apply updates and deletes to the destination. You can also apply updates and deletes by using destination components to save the rows to temporary tables. Then, use Execute SQL tasks to perform bulk update and bulk delete operations against the destination from the temporary tables.
The process outlined in the previous diagram and steps involves an incremental load from a single table. When having to perform an incremental load from multiple tables, the overall process is the same. However, the design of the package needs to be changed to accommodate the processing of multiple tables. For more information on how to create a package that performs an incremental load from multiples tables, see Performing an Incremental Load of Multiple Tables.