Validating Replicated Data

Transactional and merge replication allow you to validate that data at the Subscriber matches data at the Publisher. Validation can be performed for specific subscriptions or for all subscriptions to a publication. Specify one of the following validation types and the Distribution Agent or Merge Agent will validate data the next time it runs:

  • Row count only. This validates whether the table at the Subscriber has the same number of rows as the table at the Publisher, but does not validate that the content of the rows matches. Row count validation provides a lightweight approach to validation that can make you aware of issues with your data.
  • Row count and binary checksum. In addition to taking a count of rows at the Publisher and Subscriber, a checksum of all the data is calculated using the checksum algorithm introduced in Microsoft SQL Server 2000. If the row count fails, the checksum is not performed.
  • Row count and checksum. This option is deprecated and should be used only for Subscribers running Microsoft SQL Server version 7.0. It uses the same approach as row count and binary checksum, but uses the checksum algorithm available in SQL Server version 7.0. For more information, see the section "How Data Validation Works" in this topic.

In addition to validating that data at the Subscriber and Publisher match, merge replication provides the ability to validate that data is partitioned correctly for each Subscriber. For more information, see Validating Partition Information for a Merge Subscriber.

To validate data

To validate all articles in a subscription, use SQL Server Management Studio, stored procedures or Replication Management Objects (RMO). To validate individual articles in snapshot and transactional publications, you must use stored procedures.

Data Validation Results

When validation is complete, the Distribution Agent or Merge Agent logs messages regarding success or failure (replication does not report which rows failed). These messages can be viewed in SQL Server Management Studio, Replication Monitor, and replication system tables. The how to topics listed above demonstrate how to run validation and view the results.

To handle validation failures, consider the following:

Considerations for Data Validation

Take the following issues into consideration when validating data:

  • You must stop all update activity at Subscribers before validating data (it is not necessary to stop activity at the Publisher when validation is occurring).
  • Because checksums and binary checksums can require large amounts of processor resources when validating a large data set, you should schedule validation to occur when there is the least activity on the servers used in replication.
  • Replication validates tables only; it does not validate whether schema only articles (such as stored procedures) are the same at the Publisher and Subscriber.
  • Binary checksum can be used with any published table. Checksum cannot validate tables with column filters, or logical table structures where column offsets differ (due to ALTER TABLE statements that drop or add columns).
  • Data in text, ntext or image columns is not included in checksum calculations.
  • Validation using binary checksum or checksum can erroneously report a failure if data types are different at the Subscriber than they are at the Publisher. This can occur if you initialize a subscription manually and are using different data types at the Subscriber, or if you use the following schema options for an article:
    • Mapping user defined types to base data types (schema option 0x20)
    • Mapping xml to ntext (schema option 0x10000000)
    • Mapping nvarchar(max) and varbinary(max) to data types supported on earlier versions of SQL Server (schema option 0x20000000).
      For more information about schema options, see Publishing Data and Database Objects.
  • Binary checksum and checksum validations are not supported for transformable subscriptions for transactional replication.
  • Validation is not supported for data replicated to non-SQL Server Subscribers.

How Data Validation Works

SQL Server validates data by calculating a row count or a checksum at the Publisher and then comparing those values to the row count or checksum calculated at the Subscriber. One value is calculated for the entire publication table and one value is calculated for the entire subscription table, but data in text, ntext or image columns is not included in the calculations.

While the calculations are performed, shared locks are placed temporarily on tables for which row counts or checksums are being run, but the calculations are completed quickly and the shared locks removed, usually in a matter of seconds.

When binary checksums are used, 32-bit redundancy check (CRC) occurs on a column-by-column basis rather than a CRC on the physical row on the data page. This allows the columns with the table to be in any order physically on the data page, but still compute to the same CRC for the row. Binary checksum validation can be used when there are row or column filters on the publication. Subscribers running SQL Server version 7.0 must use the checksum algorithms released in SQL Server version 7.0, which generate CRC values that are different from those generated with SQL Server 2000 and later. The checksum routines released in SQL Server version 7.0 cannot validate tables with column filters, or logical table structures where column offsets differ (due to ALTER TABLE statements that drop or add columns).

See Also

Concepts

Best Practices for Replication Administration

Other Resources

Administering Replication

Help and Information

Getting SQL Server 2005 Assistance