Change Data Capture - SQL Server 2008

Data within our systems changes over time, and it has been notoriously difficult to capture change information for further processing. Certainly, we are able to create backups and snapshots of data, but these techniques don’t really allow us to easily determine changes to data values within tables.

We’ve tried adding metadata columns such as UpdateDate and CreateDate to all of our tables or, in the worst-case scenario, we’ve just reprocessed all data every time we ran our ETL process. The insert and update dates allowed us to make comparisons against a processing date in order to determine data that needed to be processed. However, this technique did not give us the before and after picture of data that so many applications require. Therefore, triggers were typically used to track more detailed change information. These triggers provided a way to capture the before and after data values, and write them to another table or to create versioned records within the same table. For the vast majority of applications, this functionality worked very well. But wouldn’t it be nice to have this type of functionality natively provided by SQL Server?

With SQL Server 2008, the ability to determine changes to data values within tables has been provided, and it is called Change Data Capture (CDC). CDC provides the necessary foundation for easily tracking changes to data within specific tables, and removes the requirement for application developers to create triggers and tables to capture changes.

CDC works by capturing insert, update, and delete information and automatically storing it in change data tables for future retrieval. Metadata regarding the type of operation performed (insert, delete, and update) and log sequence (which can be translated to time) is stored along with the column data from the table. Once enabled, CDC captures changes to the source tables by monitoring the database transaction log. The changed data is then written to corresponding change tables. External applications can use the change tables to perform various types of processing.

In this section, we will work through an example of implementing CDC. This example will demonstrate
one of the primary use cases for CDC, which is within ETL processes to extract changed data from a line-of-business system. This changed data will then typically be inserted into a data warehouse.

Enabling Change Data Capture

Before individual tables can be configured for CDC, an administrator must enable the database for CDC. Then the database owners can enable individual tables for CDC. Here are the steps for enabling CDC for this example:

  1. Open SQL Server Management Studio and run the following query in the context of the database for which you want to enable CDC (sysadmin permissions are required to make this change):
    USE AdventureWorks
    EXECUTE sys.sp_cdc_enable_db_change_data_capture
  2. Enable CDC on the Production.Product table by running the following DDL:
    EXECUTE sys.sp_cdc_enable_table_change_data_capture
    @source_schema = N'Production'
    ,@source_name = N'Product'
    ,@role_name = N'cdc_Admin'
    ,@supports_net_changes = '1'

    This will create the cdc.Production_Product_CT change table in the AdventureWorks database. Along with the table, two table-valued functions are created: cdc.fn_cdc_get_all_ changes_Production_Product and cdc.fn_cdc_get_net_changes_Production_Product.

  3. The sys.databases and sys.tables catalog views can be used to ensure that CDC is properly configured for the database and table. Issue the following query:
    SELECT name, is_cdc_enabled FROM sys.databases

    You should see that the is_cdc_enabled column for the AdventureWorks database is set to 1.

  4. Issue the following query:
    SELECT name, is_tracked_by_cdc FROM sys.tables WHERE name = 'Product'

    You should see that the is_tracked_by_cdc column is set to 1.

Extracting Change Data with SSIS

Once CDC is enabled for the AdventureWorks database and for the Production.Product table, SQL Server will automatically start to populate the cdc.Production_Product_CT with all data changes. We could now start building an SSIS package that would use the fn_cdc_get_net_changes_Production_Product or fn_cdc_get_all_changes_Production_Product to extract data from the change table and perform the appropriate action within the data warehouse table.

SQL Server Books Online contains a very detailed example of configuring CDC, under the title “Improving Incremental Loads with Change Data Capture.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

SQL Server 2008 Topics