Chapter 11 - Replication Architecture

Replication is an important technology for distributing data and stored procedures across an enterprise. The replication technology in Microsoft SQL Server allows you to make duplicate copies of your data, move those copies to different locations, and synchronize the data automatically so that all copies have the same data values. Replication can be implemented between databases on the same server or different servers connected by LANs, WANs, or the Internet.

The replication model used in SQL Server continues to build on the publish and subscribe metaphor introduced by earlier versions of SQL Server. SQL Server also supports:

  • Replication to heterogeneous data sources.

  • Programmatic interfaces for replicating data from heterogeneous data sources. 

  • Programmatic interfaces for invoking replication agents. 

  • SQL-DMO interfaces for installing and monitoring replication. 

See Also 

In Other Volumes 

"Replicating to Heterogeneous Data Sources" in Microsoft SQL Server Distributed Data Operations and Replication 

"Implementing Transactional and Snapshot Replication from Heterogeneous Data Sources" in Microsoft SQL Server Distributed Data Operations and Replication 

"Monitoring Replication Agents" in Microsoft SQL Server Distributed Data Operations and Replication 

Replication Dimensions

Data replication is a complex technology, and Microsoft SQL Server replication recognizes that there cannot be a single solution that works for all applications. SQL Server provides a variety of replication technologies that can be tailored to your application's specific requirements. Each technology produces different benefits and restrictions, depending on your requirements, across three important dimensions:

  • Transactional consistency 

  • Site autonomy 

  • Partitioning data to avoid conflicts 

Transactional consistency in regard to distributed operations like replication adds additional wrinkles to the typical database issues of ACID properties (atomicity, consistency, isolation, and durability) and transaction isolation levels. In the context of replication, transactional consistency means that, after replication, data at any participating site will be the same data that would have resulted had all transactions been performed at only a single site.

Site autonomy refers to the effect of one site's operations on another. There is complete site autonomy if one site's ability to do its usual work is independent of its connectivity to another site, and independent of the state of operations at that site. For example, the two-phase commit protocol (2PC) makes every data change dependent on every other participating site being able to accept the transaction successfully and immediately. If one site is unavailable, no work proceeds. At the other end of the spectrum, merge replication sites work independently and can be completely disconnected from all other sites.

You can segregate data at multiple sites to provide your own guarantee of transactional consistency. The absence of guaranteed transactional consistency does not necessarily imply transactional inconsistency. If you can design your application so each participating site works with data that is strictly segregated, or partitioned from other sites, you can maintain transactional consistency. For example, design your order entry system so a given sales representative has a unique territory, so that orders will never conflict.

See Also 

In Other Volumes 

"Transactional Consistency" in Microsoft SQL Server Distributed Data Operations and Replication 

"Site Autonomy" in Microsoft SQL Server Distributed Data Operations and Replication 

"Partitioning Data to Avoid Conflicts" in Microsoft SQL Server Distributed Data Operations and Replication 

Replication Types

Microsoft SQL Server provides three major types of replication, and two options, to use as you design your distributed applications:

  • Snapshot 

    Transactional

    • Immediate-updating Subscribers option 

    • Replication of stored procedure execution option 

  • Merge 

Each of these types provides different capabilities and different attributes for transactional consistency and can be arrayed along a continuum from eventual data convergence to immediate guaranteed consistency.

Snapshot replication takes a snapshot of current data in a publication at a Publisher and replaces the entire replica at a Subscriber on a periodic basis, in contrast to publishing changes when they occur.

Transactional replication marks selected transactions in the Publisher's database transaction log for replication and then distributes them asynchronously to Subscribers as incremental changes, while maintaining transactional consistency.

Merge replication allows sites to make autonomous changes to replicated data, and at a later time, merge changes made at all sites. Merge replication does not guarantee transactional consistency.

See Also 

In Other Volumes 

"Snapshot Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Transactional Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Merge Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Option: Immediate-updating Subscribers" in Microsoft SQL Server Distributed Data Operations and Replication 

"Option: Replicating the Execution of Stored Procedures" in Microsoft SQL Server Distributed Data Operations and Replication 

Replication Components

The Microsoft SQL Server version 7.0 replication model is based on the publish and subscribe metaphor popularized by SQL Server 6.0. The replication model consists of:

  • Publishers 

  • Distributors 

  • Subscribers 

  • Publications 

  • Articles 

  • Push subscriptions 

  • Pull subscriptions 

The Publisher is a server that makes data available for replication to other servers. In addition to identifying which data is to be replicated, the Publisher detects which data has changed and maintains information about all publications at that site. Any given data element that is replicated has a single Publisher, even if it may be updated by any number of Subscribers or published again by a Subscriber.

The Distributor is the server that contains the distribution database. The exact role of the Distributor is different in each type of SQL Server 7.0 replication.

Subscribers are servers that store replicas and receive updates. In earlier versions of SQL Server, updates could typically be performed only at the Publisher. However, SQL Server 7.0 allows Subscribers to make updates to data (but a Subscriber making updates is not the same as a Publisher). A Subscriber can, in turn, become a Publisher to other Subscribers.

A publication is a collection of articles, and an article is a grouping of data to be replicated. An article can be an entire table, only certain columns (using a vertical filter), only certain rows (using a horizontal filter), or even a stored procedure (in some types of replication). A publication often has multiple articles. This grouping of multiple articles makes it simpler to subscribe to a unit (the publication), which has all the relevant and required data. Subscribers subscribe only to a publication, not to individual articles within a publication.

With a push subscription, the Publisher propagates the changes to a Subscriber without a request from the Subscriber to do so. Typically, push subscriptions are used in applications that are required to send changes to Subscribers whenever and as soon as they occur. Push subscriptions are best for publications that require near-real-time movement of data without polling and where the higher processor overhead at the Publisher does not affect performance. Changes can also be pushed to Subscribers on a scheduled basis.

With a pull subscription,** **the Subscriber asks for periodic updates of all changes at the Publisher. Pull subscriptions are best for publications having a large number of Subscribers (for example, Subscribers using the Internet). Pull subscriptions are also best for autonomous mobile users because they allow the user to determine when the data changes are synchronized. A single publication can support a mixture of push and pull subscriptions.

The above replication components are implemented using a modular design. You can install these components on separate computers to balance workloads and minimize SQL Server replication's effect on server performance.

See Also 

In Other Volumes 

"Configuring Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Publishing Data and Stored Procedures" in Microsoft SQL Server Distributed Data Operations and Replication 

"Subscribing to Publications" in Microsoft SQL Server Distributed Data Operations and Replication 

Replication Agents

In addition to the basic components, your replication design may have two or more replication agents:

  • Snapshot Agent

  • Log Reader Agent

  • Distribution Agent

  • Merge Agent 

The Snapshot Agent prepares schema and initial data files of published tables and stored procedures, stores the snapshot on the Distributor, and records information about the synchronization status in the distribution database. Each publication has its own Snapshot Agent that runs on the Distributor and connects to the Publisher. The Snapshot Agent is typically run under SQL Server Agent and can be administered directly using SQL Server Enterprise Manager.

The Log Reader Agent moves transactions marked for replication from the transaction log on the Publisher to the distribution database. Each database published using transactional replication has its own Log Reader Agent that runs on the Distributor and connects to the Publisher.

The Distribution Agent moves the transactions and snapshot jobs held in distribution database tables to Subscribers. Transactional and snapshot publications that are set up for immediate synchronization when a new push subscription is created each have their own Distribution Agent that runs on the Distributor and connects to the Subscriber. Transactional and snapshot publications not set up for immediate synchronization share a Distribution Agent across the Publisher/Subscriber pair that runs on the Distributor and connects to the Subscriber. Pull subscriptions to either snapshot or transactional publications have Distribution Agents that run on the Subscriber instead of the Distributor. Merge publications do not have a Distribution Agent. The Distribution Agent typically runs under SQL Server Agent and can be administered directly using SQL Server Enterprise Manager.

The Merge Agent moves and reconciles incremental data changes that occurred after the initial snapshot was created. Each merge publication has its own Merge Agent that connects to both the Publisher and the Subscriber and updates both. In a full merge, the agent first uploads all changes from the Subscriber where the generation is 0 or is greater than the last generation sent to the Publisher. The agent gathers the rows, and those rows without conflicts are applied to the publishing database. Those rows with conflicts are handled by the conflict resolver associated with the article in the publication definition. All changes are applied using stored procedures derived from the Publisher tables at the time snapshot is generated or first applied. Finally, the agent reverses the process by downloading any changes from the Publisher to the Subscriber and applying the changes to the subscribing database. Push subscriptions to merge publications have Merge Agents that run on the Publisher, while pull subscriptions to merge publications have Merge Agents that run on the Subscriber. Snapshot and transactional publications do not have Merge Agents.

See Also 

In Other Volumes 

"Snapshot Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Merge Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Transactional Replication" in Microsoft SQL Server Distributed Data Operations and Replication 

"Monitoring Replication Agents" in Microsoft SQL Server Distributed Data Operations and Replication 

Replication Tools

SQL Server Enterprise Manager includes several wizards and dialog boxes you can use to simplify installation and maintenance of replication. SQL Server Enterprise Manager enables you to view and modify the properties of replication, and provides a graphical navigational tool, Replication Monitor, to monitor and troubleshoot replication activity.

Microsoft SQL Server also allows you to install and maintain replication using Transact-SQL stored procedures and programming components such as SQL-DMO.

See Also 

In Other Volumes 

"Replication Wizards" in Microsoft SQL Server Distributed Data Operations and Replication 

"Replication and SQL Server Enterprise Manager" in Microsoft SQL Server Distributed Data Operations and Replication 

"Replication and System Stored Procedures" in Microsoft SQL Server Distributed Data Operations and Replication 

"Replication Programming Interfaces" in Microsoft SQL Server Distributed Data Operations and Replication