Replication (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.
We have observed that customers often use transactional replication (or peer-to-peer replication) when the need for scaling out is driven primarily by the desire for additional server capacity for reporting purposes. Typical examples of transactional replication include the creation of a replica for reporting, the population of an operational data store, and when using a data warehouse. The Microsoft SQL Server replication technology is mature and widely understood by the user community; see SQL Server Replication1 for an overview.
Peer-to-peer replication offers the additional advantage that both copies of data can be updated, if you adopt a scheme to avoid or minimize the conflicts that can potentially arise when the same record is updated by both (or multiple) servers. In some scenarios, peer-to-peer replication can also provide a high availability and disaster recovery (HA/DR) solution. Generally, these scenarios address cases where the user can tolerate some data loss in case of a localized failure (high availability) or a major failure (disaster recovery). Merge replication is often used in scale-out deployments as well, but primarily not for additional processing capacity. (This topic is also discussed in the "Scale-Out" Technical Reference Guide.)
When compared with data-dependent routing (DDR) and Service Broker scale-out technologies, the design and deployment flexibility of replication is relatively limited. User-written application code can be used with DDR and Service Broker to provide more flexibility than is available with replication. However, replication technology is widely understood by the user community, requiring less time and education to gain the appropriate skill level.
The following articles provide further information, as well as best practices and pitfalls to avoid.
The article Selecting the Appropriate Type of Replication2 provides guidance for selecting the appropriate type of replication.
The article Using Replication for High Availability and Disaster Recovery3 describes a scenario in which the goal is to provide an architecture that will enable all servers to be utilized in a multi-master fashion while providing HA/DR abilities. Note that HA/DR scenarios for replication are typically designed around Peer to Peer Transactional replication.
How Oracle Publishing Works
4 shows that scale-out with replication can be done even when the data source is not a SQL Server. With appropriate drivers, this scenario is also possible with other database management systems (DBMSs).
The article Best Practices for Replication Administration5 discusses best practices for replication administration. To simplify administration, it is advisable to script out the replication topology so that it can be recreated quickly and accurately.
The article Performance (Replication)6 discusses enhancing replication performance. An often missed tuning tip is the "NOT FOR REPLICATION" option for constraints, identities, and triggers. For example, if a row is inserted by a user at the Publisher, and that insert satisfies a check constraint on the table, it might not be required to enforce the same constraint when the row is inserted by a replication agent at the Subscriber. In the case of certain constraints such as identity constraints, the need to include the "NOT FOR REPLICATION" option is important for logical consistency as well.
The white paper SQL Server Replication: Providing High Availability using Database Mirroring7 discusses how to use database mirroring for high availability of a replication solution, in addition to how one can use the two technologies together.
The article Initializing a Transactional Replication Subscriber from an Array-Based Snapshot8 explains how to initialize the subscriber using a storage area network (SAN)-based restore solution. Initializing the Subscriber using a SAN-based restore solution is particularly beneficial for very large databases (VLDBs) because the standard transactional replication initialization process, which is typically restricted by either the network or storage input/output (I/O) bandwidth, could take longer than the business service-level objectives (SLO) permit because of the time needed to initialize or recover the Subscriber. (In this context, we use the term VLDB to mean a database that is typically multi-terabyte, and requires specialized administration and management.)
For transactional replication, the transaction log of a published database might require more space than the log of an identical unpublished database because the log records are not truncated until they have been moved to the distribution database, as discussed in Considerations for Transactional Replication.9 Ensure that you have enough disk space to store replicated transactions in the distribution database. The same consideration needs to be made for the size of the distribution database when using named subscribers. The article also explains that all published tables in transactional replication must contain a declared primary key.
The article Troubleshooting (Replication)10 discusses concepts and tools for troubleshooting replication.
The blog post Troubleshooting SQL Server Transactional Replication11 is based on practical experiences gained at a multi-terabyte online transaction processing (OLTP) installation where replication is used to create a copy for reporting.
Case Studies and References
This section describes some customers' uses for replication.
The article United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Security12 discusses how a federal institution of the U.S. government deployed load balancing using peer-to-peer replication for scalability.
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
Evaluate the scale-out replication use scenario, including the acceptable latency for Subscriber access. For example, distributing a few tables that are updated either nightly or infrequently across a number of servers might integrate appropriately with replication. On the other hand, if you need to query the data in real time, remember that replication is an asynchronous data-movement process that is likely to have some latency. Whether or not replication is an acceptable option depends on the real-time needs of the scenario.
It is important to consider the scope of the objects that need to be replicated across servers. Replication lets you choose particular tables, or even subsets of tables, to be replicated to other databases. However, if you want to replicate entire databases or objects outside of the databases, it is useful to evaluate other technologies, including SAN-based replication. SAN-based replication often works well for scenarios in which an application requires consistency across several databases or DBMS for the integrity of reports from an application perspective. SQL Server databases, along with Oracle or IBM DB2, might need to be replicated in sync for the application to be able to use them.
The replication agent jobs and the creation of a distribution database add overhead to the system in terms of the number of databases to manage, along with I/O and CPU resources. All of these should be considered when you consider adding replication (or scale-out in general) to the environment.
For republishing scenarios, it can be difficult to move components in and out of the topology without downtime or without quiescing part or all of the system. This is somewhat easier with peer-to-peer deployment.
For topologies that allow updates at the Subscribers, all controls used to validate data changes at the Publisher (such as constraints) should be present at the Subscriber, or should be handled in the application. See the article Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATION13 for implementation considerations.
By default, replication agents run as jobs scheduled under SQL Server Agent. SQL Server Agent must be running for these jobs to execute.
The replication agent security model, discussed in the article Replication Agent Security Model14 allows for fine-grained control over the accounts under which replication agents run and make connections―a different account can be specified for each agent. For more information about how to specify accounts, see Managing Logins and Passwords in Replication.15
Understand the volume of data that is transferred during the initial snapshot and the subsequent replication. Network bandwidth is especially critical when applying the initial snapshot. The volume of incremental data changes might be low, but the volume of data initially distributed might be high. Possible solutions include the following:
The transactional replication log reader agent reads the transaction log and applies the logical changes to each record into the distribution database, and ultimately into the Subscriber. Consider the following:
The type of the statement (for example, an update statement, discussed in UPDATE Statements May be Replicated as DELETE/INSERT Pairs17).
An update that modifies 100,000 rows will become one row in the msrepl_transactions table, but will become 100,000 distinct rows in the msrepl_commands table. This can have a significant impact on performance and the size of the distribution database. Considerations are discussed in the article Enhancing Transactional Replication Performance.18
- The type of the statement (for example, an update statement, discussed in UPDATE Statements May be Replicated as DELETE/INSERT Pairs17).
See the article Considerations for All Types of Replication19 for considerations for SQL Server Agent, security, network, performance, and administration.
It is important to consider replication when designing backup and restore strategies. The article Backing up and Restoring Replicated Databases20 describes the considerations.
Following are the full URLs for the hyperlinked text.
1 SQL Server Replicationhttp://msdn.microsoft.com/en-us/library/ms151198.aspx
2 Selecting the Appropriate Type of Replicationhttp://msdn.microsoft.com/en-us/library/ms152565.aspx
3 Using Replication for High Availability and Disaster Recoveryhttp://sqlcat.com/whitepapers/archive/2009/09/23/using-replication-for-high-availability-and-disaster-recovery.aspx
4 How Oracle Publishing Workshttp://msdn.microsoft.com/en-us/library/ms151845.aspx
5 Best Practices for Replication Administrationhttp://msdn.microsoft.com/en-us/library/ms151818.aspx
6 Performance (Replication)http://msdn.microsoft.com/en-us/library/ms151786.aspx
7 SQL Server Replication: Providing High Availability using Database Mirroringhttp://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx
8 Initializing a Transactional Replication Subscriber from an Array-Based Snapshothttp://sqlcat.com/technicalnotes/archive/2009/05/04/initializing-a-transactional-replication-subscriber-from-an-array-based-snapshot.aspx
9 Considerations for Transactional Replicationhttp://msdn.microsoft.com/en-us/library/ms151254.aspx
10 Troubleshooting (Replication)http://msdn.microsoft.com/en-us/library/ms151756.aspx
11 Troubleshooting SQL Server Transactional Replicationhttp://blogs.msdn.com/b/benjones/archive/2009/09/10/troubleshooting-sql-server-transactional-replication.aspx
12 United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Securityhttp://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004415
13 Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATIONhttp://msdn.microsoft.com/en-us/library/ms152529.aspx
14 Replication Agent Security Modelhttp://msdn.microsoft.com/en-us/library/ms151868.aspx
15 Managing Logins and Passwords in Replicationhttp://msdn.microsoft.com/en-us/library/ms151219.aspx
16 Alternate Snapshot Folder Locationshttp://msdn.microsoft.com/en-us/library/ms151327.aspx
17 UPDATE Statements May be Replicated as DELETE/INSERT Pairshttp://support.microsoft.com/kb/238254/EN-US
18 Enhancing Transactional Replication Performancehttp://msdn.microsoft.com/en-us/library/ms151762.aspx
19 Considerations for All Types of Replicationhttp://msdn.microsoft.com/en-us/library/ms152479.aspx
20 Backing up and Restoring Replicated Databaseshttp://msdn.microsoft.com/en-us/library/ms151152.aspx