Replication (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Article
03/05/2012

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

We have observed that customers often use transactional replication (or peer-to-peer replication) when the need for scaling out is driven primarily by the desire for additional server capacity for reporting purposes. Typical examples of transactional replication include the creation of a replica for reporting, the population of an operational data store, and when using a data warehouse. The Microsoft SQL Server replication technology is mature and widely understood by the user community; see SQL Server Replication1 for an overview.

Peer-to-peer replication offers the additional advantage that both copies of data can be updated, if you adopt a scheme to avoid or minimize the conflicts that can potentially arise when the same record is updated by both (or multiple) servers. In some scenarios, peer-to-peer replication can also provide a high availability and disaster recovery (HA/DR) solution. Generally, these scenarios address cases where the user can tolerate some data loss in case of a localized failure (high availability) or a major failure (disaster recovery). Merge replication is often used in scale-out deployments as well, but primarily not for additional processing capacity. (This topic is also discussed in the "Scale-Out" Technical Reference Guide.)

When compared with data-dependent routing (DDR) and Service Broker scale-out technologies, the design and deployment flexibility of replication is relatively limited. User-written application code can be used with DDR and Service Broker to provide more flexibility than is available with replication. However, replication technology is widely understood by the user community, requiring less time and education to gain the appropriate skill level.

Best Practices

The following articles provide further information, as well as best practices and pitfalls to avoid.

The article Selecting the Appropriate Type of Replication2 provides guidance for selecting the appropriate type of replication.
The article Using Replication for High Availability and Disaster Recovery3 describes a scenario in which the goal is to provide an architecture that will enable all servers to be utilized in a multi-master fashion while providing HA/DR abilities. Note that HA/DR scenarios for replication are typically designed around Peer to Peer Transactional replication.
How Oracle Publishing Works 4 shows that scale-out with replication can be done even when the data source is not a SQL Server. With appropriate drivers, this scenario is also possible with other database management systems (DBMSs).
The article Best Practices for Replication Administration5 discusses best practices for replication administration. To simplify administration, it is advisable to script out the replication topology so that it can be recreated quickly and accurately.
The article Performance (Replication)6 discusses enhancing replication performance. An often missed tuning tip is the "NOT FOR REPLICATION" option for constraints, identities, and triggers. For example, if a row is inserted by a user at the Publisher, and that insert satisfies a check constraint on the table, it might not be required to enforce the same constraint when the row is inserted by a replication agent at the Subscriber. In the case of certain constraints such as identity constraints, the need to include the "NOT FOR REPLICATION" option is important for logical consistency as well.
The white paper SQL Server Replication: Providing High Availability using Database Mirroring7 discusses how to use database mirroring for high availability of a replication solution, in addition to how one can use the two technologies together.
The article Initializing a Transactional Replication Subscriber from an Array-Based Snapshot8 explains how to initialize the subscriber using a storage area network (SAN)-based restore solution. Initializing the Subscriber using a SAN-based restore solution is particularly beneficial for very large databases (VLDBs) because the standard transactional replication initialization process, which is typically restricted by either the network or storage input/output (I/O) bandwidth, could take longer than the business service-level objectives (SLO) permit because of the time needed to initialize or recover the Subscriber. (In this context, we use the term VLDB to mean a database that is typically multi-terabyte, and requires specialized administration and management.)
For transactional replication, the transaction log of a published database might require more space than the log of an identical unpublished database because the log records are not truncated until they have been moved to the distribution database, as discussed in Considerations for Transactional Replication.9 Ensure that you have enough disk space to store replicated transactions in the distribution database. The same consideration needs to be made for the size of the distribution database when using named subscribers. The article also explains that all published tables in transactional replication must contain a declared primary key.
The article Troubleshooting (Replication)10 discusses concepts and tools for troubleshooting replication.
The blog post Troubleshooting SQL Server Transactional Replication11 is based on practical experiences gained at a multi-terabyte online transaction processing (OLTP) installation where replication is used to create a copy for reporting.

Case Studies and References

This section describes some customers' uses for replication.

The article United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Security12 discusses how a federal institution of the U.S. government deployed load balancing using peer-to-peer replication for scalability.

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

Evaluate the scale-out replication use scenario, including the acceptable latency for Subscriber access. For example, distributing a few tables that are updated either nightly or infrequently across a number of servers might integrate appropriately with replication. On the other hand, if you need to query the data in real time, remember that replication is an asynchronous data-movement process that is likely to have some latency. Whether or not replication is an acceptable option depends on the real-time needs of the scenario.
It is important to consider the scope of the objects that need to be replicated across servers. Replication lets you choose particular tables, or even subsets of tables, to be replicated to other databases. However, if you want to replicate entire databases or objects outside of the databases, it is useful to evaluate other technologies, including SAN-based replication. SAN-based replication often works well for scenarios in which an application requires consistency across several databases or DBMS for the integrity of reports from an application perspective. SQL Server databases, along with Oracle or IBM DB2, might need to be replicated in sync for the application to be able to use them.
The replication agent jobs and the creation of a distribution database add overhead to the system in terms of the number of databases to manage, along with I/O and CPU resources. All of these should be considered when you consider adding replication (or scale-out in general) to the environment.
For republishing scenarios, it can be difficult to move components in and out of the topology without downtime or without quiescing part or all of the system. This is somewhat easier with peer-to-peer deployment.
For topologies that allow updates at the Subscribers, all controls used to validate data changes at the Publisher (such as constraints) should be present at the Subscriber, or should be handled in the application. See the article Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATION13 for implementation considerations.
By default, replication agents run as jobs scheduled under SQL Server Agent. SQL Server Agent must be running for these jobs to execute.
The replication agent security model, discussed in the article Replication Agent Security Model14 allows for fine-grained control over the accounts under which replication agents run and make connections―a different account can be specified for each agent. For more information about how to specify accounts, see Managing Logins and Passwords in Replication.15
Understand the volume of data that is transferred during the initial snapshot and the subsequent replication. Network bandwidth is especially critical when applying the initial snapshot. The volume of incremental data changes might be low, but the volume of data initially distributed might be high. Possible solutions include the following:
- Transfer the snapshot using a CD-ROM or tape device, as discussed in Managing Logins and Passwords in Replication.15
- Initialize transactional subscriptions without a snapshot, as discussed in Alternate Snapshot Folder Locations.16
The transactional replication log reader agent reads the transaction log and applies the logical changes to each record into the distribution database, and ultimately into the Subscriber. Consider the following:
- The type of the statement (for example, an update statement, discussed in UPDATE Statements May be Replicated as DELETE/INSERT Pairs17).
- An update that modifies 100,000 rows will become one row in the msrepl_transactions table, but will become 100,000 distinct rows in the msrepl_commands table. This can have a significant impact on performance and the size of the distribution database. Considerations are discussed in the article Enhancing Transactional Replication Performance.18
See the article Considerations for All Types of Replication19 for considerations for SQL Server Agent, security, network, performance, and administration.
It is important to consider replication when designing backup and restore strategies. The article Backing up and Restoring Replicated Databases20 describes the considerations.

Appendix

Following are the full URLs for the hyperlinked text.

1 SQL Server Replicationhttps://msdn.microsoft.com/en-us/library/ms151198.aspx

2 Selecting the Appropriate Type of Replicationhttps://msdn.microsoft.com/en-us/library/ms152565.aspx

3 Using Replication for High Availability and Disaster Recoveryhttp://sqlcat.com/whitepapers/archive/2009/09/23/using-replication-for-high-availability-and-disaster-recovery.aspx

4 How Oracle Publishing Workshttps://msdn.microsoft.com/en-us/library/ms151845.aspx

5 Best Practices for Replication Administrationhttps://msdn.microsoft.com/en-us/library/ms151818.aspx

6 Performance (Replication)https://msdn.microsoft.com/en-us/library/ms151786.aspx

7 SQL Server Replication: Providing High Availability using Database Mirroringhttps://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx

8 Initializing a Transactional Replication Subscriber from an Array-Based Snapshothttp://sqlcat.com/technicalnotes/archive/2009/05/04/initializing-a-transactional-replication-subscriber-from-an-array-based-snapshot.aspx

9 Considerations for Transactional Replicationhttps://msdn.microsoft.com/en-us/library/ms151254.aspx

10 Troubleshooting (Replication)https://msdn.microsoft.com/en-us/library/ms151756.aspx

11 Troubleshooting SQL Server Transactional Replicationhttps://blogs.msdn.com/b/benjones/archive/2009/09/10/troubleshooting-sql-server-transactional-replication.aspx

12 United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Securityhttps://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004415

13 Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATIONhttps://msdn.microsoft.com/en-us/library/ms152529.aspx

14 Replication Agent Security Modelhttps://msdn.microsoft.com/en-us/library/ms151868.aspx

15 Managing Logins and Passwords in Replicationhttps://msdn.microsoft.com/en-us/library/ms151219.aspx

16 Alternate Snapshot Folder Locationshttps://msdn.microsoft.com/en-us/library/ms151327.aspx

17 UPDATE Statements May be Replicated as DELETE/INSERT Pairshttps://support.microsoft.com/kb/238254/EN-US

18 Enhancing Transactional Replication Performancehttps://msdn.microsoft.com/en-us/library/ms151762.aspx

19 Considerations for All Types of Replicationhttps://msdn.microsoft.com/en-us/library/ms152479.aspx

20 Backing up and Restoring Replicated Databaseshttps://msdn.microsoft.com/en-us/library/ms151152.aspx

Replication (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Best Practices

Case Studies and References

Questions and Considerations

Appendix

Additional resources