Merge Replication Performance Improvements in SQL Server 2005

Article
01/28/2010

Writer: Michael Blythe

Technical Reviewers: Philip Vaughn and Greg Yvkoff

Designer: Paul Carew

Applies To: SQL Server 2005®

Summary: This paper describes performance optimizations and new features that provide improved performance for merge replication in SQL Server 2005. Each improvement is described and test results are provided to demonstrate the benefits.

Introduction
Test Scenario and Workload
Obtaining an Initial Copy of the Database
Sending and Receiving Only the Data Required by the Application
Sending Lookup Tables and Reference Data
Replicating Data in a High Volume Server to Server Environment
Conclusion
Appendix: Publication, Article, and Subscription Properties

Introduction

Merge replication is a synchronization technology that lets database applications work while disconnected from the network. Originally released in SQL Server 7.0, merge replication is primarily designed for mobile application or distributed server applications that have possible data conflicts. The most common uses of merge replication include:

Exchanging data with mobile users: Many applications require data to be available to remote users, including sales people, delivery drivers, and so on. These applications include customer relationship management (CRM), sales force automation (SFA), and field force automation (FFA) applications.
Consumer point of sale (POS) applications: POS applications, such as checkout terminals, typically require data to be replicated from remote sites to a central site.
Integrating data from multiple sites: Applications often integrate data from multiple sites. For example an application that supports regional offices might require data to flow between regional offices and a central office.

Merge replication performance improved considerably from SQL Server 7.0 to SQL Server 2000, and has dramatically increased in SQL Server 2005. This paper describes performance optimizations and new features that provide improved performance for merge replication in SQL Server 2005. Each improvement is described and test results are provided to demonstrate the benefits.

The paper first provides an overview of merge replication, and then presents the test scenario and workloads used for testing. The paper then goes on to present tests that address the following requirements, which are common to most offline applications:

Obtain an initial copy of the database
Send and receive only the data required by the application
Ensure that some data, such as reference data, is only updated centrally

Finally, the paper discusses the use of merge replication in a high-volume server-to-server environment. The results of the tests in this paper clearly demonstrate that merge replication in SQL Server 2005 has the performance necessary to support critical enterprise-class distributed applications.

Replication Model and Components

Replication uses a publishing industry metaphor to represent the components in a replication system, which include the Publisher, Distributor, Subscribers, publications, articles, and subscriptions. It is helpful to think of SQL Server replication in terms of a magazine:

A magazine publisher produces one or more publications.
A publication contains articles.
The publisher either distributes the magazine directly or uses a distributor.
Subscribers receive publications to which they have subscribed.

Although the magazine metaphor is useful for understanding replication, merge replication includes functionality that is not represented in this metaphor, especially the ability for a Subscriber to make updates and for a Publisher to send out incremental changes to the articles in a publication.

A replication topology defines the relationship between servers and copies of data and clarifies the logic that determines how data flows between servers. There are several replication processes (referred to as agents) that are responsible for copying and moving data between the Publisher and Subscribers. Figure 1 provides an overview of the components and processes involved in merge replication.

Figure 1: Merge Replication Components

Figure 1: Merge Replication Components

The Merge Agent applies the initial set of data (the snapshot) to the Subscriber and moves and reconciles incremental data changes that occur after the snapshot is applied. Changes are tracked by using triggers and change tracking tables. Each subscription has its own Merge Agent that connects to the Publisher and the Subscriber and updates both of them. The Merge Agent runs at either the Distributor (for push subscriptions) or the Subscriber (for pull subscriptions).

The Snapshot Agent, which is not shown in this diagram, prepares the schema and initial data files for the published tables and other objects, stores the snapshot files, and records information about synchronization in the distribution database. The Snapshot Agent runs at the Distributor. For merge replication, the Publisher and Distributor are typically on the same computer.

Test Scenario and Workload

The tests in this paper were conducted using tables, workloads, and hardware that are typical for applications that use merge replication. Merge replication is used in many sales force automation (SFA) and field force automation (FFA) applications that use sales and customer data. The test schema used for this paper reflects this fact and includes the following tables:

salespeople
customers
orders
order_details
products

Figure 2 shows the columns in each table and the relationships between the tables.

Figure 2: Test Schema

Figure 2: Test Schema

Each table was populated with data that simulates order processing for a medium-sized to large-sized company. As the following Tables 1 and Table 2 show, the Publisher contained more data than each Subscriber except for the products table, which contained the same data at the Publisher and Subscribers. Each Subscriber received data only for their customers and their customers' orders, but received all product data.

For more information about the filtering used for this schema, see the section "Sending and Receiving Only the Data Required by the Application" in this paper. Table 3 shows the average size of the snapshots sent to each Subscriber. For more information about generating and applying the snapshot, see the section "Obtaining an Initial Copy of the Database" in this paper.

Table 1 Row and Table Sizes at the Publisher

Table	Number of Rows	Approximate Data Size (KB)	Approximate Index Size (KB)
customers	50,100	28,632	1,592
order_details	15,030,000	1,431,432	615,752
order	5,010,000	2,672,000	137,856
products	100,000	14,552	5,464
salespeople	501	48	80

Table 2 Row and Table Sizes at Each Subscriber

Table	Number of Rows	Approximate Data Size (KB)	Approximate Index Size (KB)
customers	100	64	64
order_details	30,000	2,864	1,992
order	10,000	5,336	576
products	100,000	14,552	7,744
salespeople	1	8	56

Table 3 Size of Snapshot

Version	Approximate Size of Files for Schema Snapshot	Approximate Size of Files for Partitioned Snapshot(1)
SQL Server 2000	25 files: 115 KB	33 files: 20,949 KB
SQL Server 2005	26 files: 2,874 KB2	35 files: 23,714 KB2

(1) Referred to as "dynamic snapshot" in SQL Server 2000 Books Online. For more information, see the section "Sending and Receiving Only the Data Required by the Application" in this paper.

2 The difference in size is due to a set of batching procedures that SQL Server 2005 creates. These procedures let changes be applied more efficiently.

Standard Workload

Table 4, Table 5, and Table 6 show the number of insert, delete, and update operations performed for the standard workload tests. These tests are used in the following sections of this paper:

Sending and Receiving Only the Data Required by the Application

The tests in this section measured performance for data that is filtered, with Subscribers receiving different subsets of data.
Sending Lookup Tables and Reference Data

This tests in this section measured performance for data that is not updated at the Subscriber.

The period over which changes were made varies depending on the test.

Table 4 Total Changes Made at the Publisher

Table	Number of DML Operations
orders	50,100 inserts (one additional order for each customer)
order_details	200,400 inserts (four order detail records for every new order created)
products	Inserts: 640 Deletes: 640 Updates: 1,920

orders

50,100 inserts (one additional order for each customer)

order_details

200,400 inserts (four order detail records for every new order created)

products

Inserts: 640

Deletes: 640

Updates: 1,920

Table 5 Changes Downloaded to Each Subscriber

Table	Number of DML Operations
orders	Inserts: 100
order_details	Inserts: 400
products	Inserts: 640 Deletes: 640 Updates: 1,920

orders

Inserts: 100

order_details

Inserts: 400

products

Inserts: 640

Deletes: 640

Updates: 1,920

Table 6 Changes Uploaded from Each Subscriber

Table	Number of DML Operations
orders	Deletes: 50 Updates: 200
order_details	Inserts: 400

orders

Deletes: 50

Updates: 200

order_details

Inserts: 400

The total changes each Subscriber uploaded and downloaded are as follows:

Upload inserts: 400
Upload updates: 200
Upload deletes: 50
Download inserts: 1140
Download deletes: 640
Download updates: 1,920

-----------------------------

Total changes: 4,350

High Volume Workload

Table 7 and Table 8 show the number of insert, delete, and update operations performed for the high-volume workload tests, all of which use a single Subscriber. These tests are used in the section "Replicating Data in a High Volume Server to Server Environment" in this paper.

Table 7 Changes Downloaded to a Single Subscriber

Table	Number of DML Operations
products	Inserts: 40,000 Deletes: 10,000 Updates: 50,000

products

Inserts: 40,000

Deletes: 10,000

Updates: 50,000

Table 8 Changes Uploaded from a Single Subscriber

Table	Number of DML Operations
products	Inserts: 101,000

Publication, Article, and Subscription Properties

Merge replication lets you customize behavior by setting publication, article, and subscription properties. For information about properties that are relevant to the tests in this paper, see the section "Appendix: Publication, Article, and Subscription Properties" in this paper.

Hardware Used for Testing

We used the following hardware for all tests in this paper.

Table 9 Hardware Used for Testing

	Processors	Processor Speed	RAM
Publisher	Eight	3 gigahertz (GHZ)	12 gigabytes (GB)
Subscribers 1	One or two	1.6 GHZ to 2.4 GHZ	1 GB

1 A single Subscriber computer was used for a varying number of subscriptions, depending on the test.

Obtaining an Initial Copy of the Database

Before replicating data changes to Subscribers, you must initialize each Subscriber so that it has the system objects required by replication and the schema and data required by your application. For example, if you have a sales staff that takes orders at customer sites, you might provide a new sales person with a laptop that runs an SFA application. The first time the sales person starts the application when connected to the corporate network, the application initializes the subscription, downloading the appropriate customer data to the laptop.

There are several ways to initialize a Subscriber, but the most common is to use a replication snapshot. (For more information, see the section "Sending and Receiving Only the Data Required by the Application" in this paper.) The snapshot is generated by the Snapshot Agent and applied by the Merge Agent during the first synchronization of a subscription. The process that replication uses depends on whether parameterized filters are used; parameterized filters let each Subscriber receive a different subset of data:

If the publication does not use parameterized filters, the Snapshot Agent creates a single snapshot that is used by all subscriptions. The agent copies object schema and replication scripts to the snapshot folder, and then uses the bulk copy program (bcp) to copy user data to the same folder. The Merge Agent then copies the snapshot to each Subscriber.
If the publication uses parameterized filters, the Snapshot Agent creates a snapshot for each data partition, such as the customers and order data for a given sales person. To do this, first the agent creates a schema snapshot that contains object schema and replication scripts, but not user data. Next, the agent uses bcp to copy the user data to a different snapshot folder for each partition. The Merge Agent then copies to each Subscriber a snapshot that includes the schema snapshot and the data for the Subscriber's partition.

Snapshot generation performance has been improved dramatically in SQL Server 2005. Because subscriptions are typically initialized by using snapshots, these improvements affect the performance of most merge replication deployments. Improvements are especially noticeable when concurrently generating partitioned snapshots for a large number of Subscribers.

Tests and Results

In our test scenario, we generated snapshots for each Subscriber by using the process described earlier for publications with parameterized filters. The total size of each snapshot was approximately 20-23 megabytes (MB), depending on the version of SQL Server that was used to create the snapshot. We conducted three tests to measure snapshot performance:

Generating the schema snapshot.
Concurrently generating 50 partitioned snapshots.
Applying one schema snapshot and one partitioned snapshot to a new Subscriber.

Generating the Schema Snapshot

A schema snapshot is created when replication is first configured. A schema snapshot contains: the system objects and user schema that replication requires to track changes on the Publisher and Subscribers. The schema snapshot also contains the database schema (tables, views, stored procedures, and so on) required to initialize a Subscriber. Only one schema snapshot is required per Publisher, regardless of the number of partitioned snapshots required for a specific set of Subscribers.

For this test, we generated a schema snapshot of all the tables in our publication. Figure 3 shows the time required to create the schema snapshot. Figure 4 shows the time required to create the partitioned snapshots for each Subscriber. In Figure 3 and Figure 4, a smaller bar is better, because it indicates less total time to generate the snapshot. These results demonstrate how much quicker it is to generate a schema snapshot in SQL Server 2005.

Figure 3: Time Required to Generate the Schema Snapshot

Figure 3: Time Required to Generate the Schema Snapshot

Generating Partitioned Snapshots

In this test, we generated snapshots for fifty different data partitions, each partition representing data for a single sales person. This test is representative of an application with mobile users who each have a separate partition of data. The results, shown in Figure 4, were even more dramatic than the results from the schema snapshot test. The partitioned snapshots were generated very quickly in SQL Server 2005.

It is a common practice to generate snapshots on a schedule, so that a snapshot is always available if a subscription requires reinitialization. With the improved performance in SQL Server 2005, it is now possible to run many Snapshot Agents concurrently without overwhelming the Publisher. These improvements in snapshot processing make it possible to create initial datasets in dramatically less time than in SQL Server 2000.

Figure 4: Time Required to Generate 50 Partitioned Snapshots Concurrently

Figure 4: Time Required to Generate 50 Partitioned Snapshots Concurrently

Applying the Schema Snapshot and Partitioned Snapshot to a New Subscriber

In this test, we applied the snapshot to a Subscriber under the following conditions:

With no pending changes at the Publisher. This means that no changes were made to any published tables between the time the snapshot was generated and the time it was applied at the Subscriber.
With 3700 pending changes.
With 7400 pending changes.

Snapshots are typically generated at off-peak hours and then applied during business hours, such as when a sales person arrives for work in the morning. Data changes often occur between the time that the snapshot is generated and the time it is applied, so there may be pending changes to apply after a snapshot is applied to the Subscriber.

Pending changes are also typically present when an existing Subscriber is reinitialized or when a new Subscriber is added after a system is in production. For example, a new sales person might initialize a subscription through an SFA application.

Figure 5 shows the time required to apply a snapshot when there are pending changes to download. Initial synchronization performance with no pending changes is slightly slower in SQL Server 2005 than in SQL Server 2000, because SQL Server 2005 generates additional stored procedures for each article during initialization. These procedures are used to batch data changes, and provide significant benefit when synchronizing changes: for example, at 7400 changes, SQL Server 2005 is over 220% faster than SQL Server 2000.

Figure 5: Time Required to Apply Snapshots When There Are Pending Changes to Download

Figure 5: Time Required to Apply Snapshots When There Are Pending Changes to Download

Sending and Receiving Only the Data Required by the Application

After a Subscriber is initialized, it is ready to accept incremental data changes from the Publisher and other Subscribers. (All changes from other Subscribers are routed through the Publisher.) The rest of this paper deals with performance improvements in applying incremental changes.

Merge replication is used in many applications that support mobile users. These applications usually have many subscriptions that receive a different subset or partition of the database. Depending on application needs, a specified partition can be published to one or more Subscribers. For example, with an SFA application, data for a particular customer could be replicated to more than one sales person if sales people provide backup for each other during vacations or busy times. In other scenarios, such as FFA, partitions are almost always delivered to a single Subscriber. For example, a delivery driver downloads only the partition relevant to his or her deliveries for a particular day.

To support data partitioning, merge replication offers four types of filters:

Parameterized row filters, which use a data value supplied by a Subscriber to send Subscribers different data sets (referred to as "dynamic filters" in SQL Server 2000 Books Online). For more information, see "Parameterized Row Filters" in SQL Server 2005 Books Online.
Join filters, which extend a row filter from one published table to another. For more information, see "Join Filters" in SQL Server 2005 Books Online.
Static row filters, which specify a data set that all Subscribers to a publication receive. For more information, see "Filtering Published Data" in SQL Server 2005 Books Online.
Column filters, which specify a subset of columns to be published. For more information, see "Filtering Published Data" in SQL Server 2005 Books Online.

In merge replication, parameterized filters and join filters are typically used together and provide a powerful way to define partitions of data across multiple tables. A parent table is filtered using a parameterized filter, and then one or more join filters are defined, much as you define a join between tables. The join filters extend the parameterized filter so that the data in the related tables is replicated only if it matches the join filter. In the test schema, the filters were defined as follows:

A parameterized row filter on the salespeople table: dbo.username = SUSER_SNAME()
A join filter between the salespeople and customers tables: salespeople.userid = customers.userid
A join filter between the customers and orders tables: customers.custid = orders.custid
A join filter between the orders and order_details tables: orders.orderid = order_details.orderid

Each Subscriber receives a single row from the salespeople table based on the sales person's login. The join filters then ensure that customer, order, and order detail data is downloaded to the Subscriber only if the data relates to a customer who is assigned to the particular sales person. Each join filter uses the join_unique_key optimization, based on the fact that there is a one-to-many join between each pair of tables. For more information about this property, see "sp_addmergefilter" in SQL Server 2005 Books Online.

The products table is not filtered, because information for a product might be required even if that product is not represented in one of the sales person's orders. Column filters are not used in the tests in this paper.

How Performance Was Improved For Replicating Filtered Data

When a Subscriber synchronizes with a Publisher, the Publisher must evaluate the Subscriber's row filters to determine which rows belong to that Subscriber's partition. This process of determining partition membership of changes at the Publisher for each Subscriber that receives a filtered dataset is referred to as partition evaluation. In SQL Server 2000, partition evaluation must be performed for each change made to a filtered column at the Publisher, beginning from the last time that the Merge Agent ran for a specific Subscriber. This process must be repeated for every Subscriber that synchronizes with the Publisher.

However, if the Publisher and Subscriber are running on SQL Server 2005, by default partition membership for all changes at the Publisher is precomputed and persisted at the time that the changes are made. As a result, when a Subscriber synchronizes with the Publisher, it can immediately start to download changes relevant to its partition without having to go through the partition evaluation process. This can lead to significant performance gains when a publication has many changes, Subscribers, or articles in the publication.

For more information, see "Optimizing Parameterized Filter Performance with Precomputed Partitions" in SQL Server 2005 Books Online.

Tests and Results

For the tests in this section, we measured two important indicators of performance:

The number of rows per second that merge replication could process. We increased the number of Subscribers for each test run to determine how the number of concurrent Subscribers affected the number of rows that could be processed.
Performance of insert, delete, and update operations on published tables in both SQL Server 2000 and SQL Server 2005.

Concurrently Synchronizing a Large Number of Subscribers

In these tests, we synchronized an increasing number of Subscribers concurrently to determine how many rows per second could be processed. Figure 6 and Figure 7 show the same data from two different points of view:

Figure 7 shows the number of rows that could be processed at each Subscriber as the number of concurrent Subscribers increased.
Figure 6 shows the total number of rows that could be processed at the Publisher as the number of concurrent Subscribers increased.

For example, consider the case of 50 concurrent Subscribers:

Each SQL Server 2000 Subscriber processed 5 rows per second; each SQL Server 2005 Subscriber processed 189 rows per second. (Figure 7)
The SQL Server 2000 Publisher processed 250 rows per second (50 Subscribers * 5 rows per Subscriber per second); the SQL Server 2005 Publisher processed 9450 per second (50 Subscribers * 189 rows per Subscriber per second). (Figure 6)

As you can see in both graphs, SQL Server 2005 scales linearly through 500 concurrent synchronizations. Because of the large number of changes processed, the maximum number of concurrent synchronizations we could maintain for SQL Server 2000 was 100, but this number included several Merge Agent failures. However, even with 500 concurrent Subscribers, SQL Server 2005 processed rows over 3000% faster than SQL Server 2000.

Although the goal of this test was to highlight concurrency improvements, you can see that, even with a single Subscriber, performance increased by over 460%. Moreover, SQL Server 2005 is much more flexible than SQL Server 2000. Many applications built on merge replication have many Subscribers, and if you have a large sales force, by using SQL Server 2005 you can more confidently allow sales people to synchronize at a time that is appropriate for them, instead of having to carefully stagger synchronization schedules to avoid bottlenecks.

Figure 6: Number of Rows Processed Per Second by the Publisher as the Number of Subscribers Increases

Figure 6: Number of Rows Processed Per Second by the Publisher as the Number of Subscribers Increases

Figure 7: Number of Rows Processed Per Second at Each Subscriber as the Number of Subscribers Increases

Figure 7: Number of Rows Processed Per Second at Each Subscriber as the Number of Subscribers Increases

Insert, Delete, and Update Performance at the Publication and Subscription Databases

In this test, we measured the performance of insert, delete, and update operations on the publication database. The test did not measure the performance of replicating changes, only the performance of making changes with replication enabled on a database. Figure 8 shows the time required to complete these changes (DML operations).

In order to provide significantly better scaling, SQL Server 2005 redesigned filter processing to use precomputed partitions. As part of this redesign, we added logic to the merge tracking triggers. This test was designed to measure the cost of the trigger logic compared to SQL Server 2000, with the expectation that SQL Server 2000 would show slightly better performance. However, as seen in Figure 8, SQL Server 2005 performs better in this test even with the more complex trigger logic. This can be attributed to improvements in trigger performance and other database engine enhancements in SQL Server 2005. Be aware that performance can drop if you use complex filtering logic or if you use many join filters. For more information about filter design, see "Enhancing Merge Replication Performance" in SQL Server 2005 Books Online.

Note We did not include results for update operations because operations were completed in less than a second for both versions of SQL Server with the test workload.

Figure 8: Time Required to Complete DML Operations on Published Tables

Figure 8: Time Required to Complete DML Operations on Published Tables

Sending Lookup Tables and Reference Data

In many application scenarios, one or more tables are updated at the Publisher, but not at the Subscriber; these tables are usually referred to as lookup or reference tables. For example, a sales person has product data available on her laptop, but she never updates this data; all updates occur on the central server at the head office. Merge replication provides download-only articles for this kind of data.

Because download-only articles cannot be updated at the Subscriber, triggers are not added to published tables at the Subscriber, and the tracking metadata that merge replication uses for standard articles is not sent to Subscribers. This can lead to reduced storage on the Subscribers and a performance benefit, especially if the network connection is slow.

Merge replication offers two types of subscriptions: client and server. Client subscriptions are appropriate for most applications, whereas server subscriptions are typically used for Subscribers that republish data to other Subscribers. Download-only articles work in conjunction with client subscriptions: if an article is designated as download-only, Subscribers that use client subscriptions cannot insert, update, or delete data in that article. Publishers and Subscribers that use the server subscription type can insert, update, and delete data.

For more information, see "Optimizing Merge Replication Performance with Download-Only Articles" in SQL Server 2005 Books Online.

Tests and Results

In this test, we replicated data to a single Subscriber using the workload described in the "Standard Workload" section of this paper (Table 4, Table 5, and Table 6), making changes to the orders, order_details, and products tables. We first established a baseline by comparing SQL Server 2000 to SQL Server 2005 using only standard articles. We then specified the products table as download-only.

Figure 9 shows the rows processed per second using SQL Server 2000, SQL Server 2005 with a standard article for the products table, and SQL Server 2005 with the download-only articles option specified for the products table. Although the changes in the products table represented only part of the total changes in the test, use of the download-only setting made a significant improvement: overall performance with the download-only setting was 150% better than SQL Server 2005, and over 700% better than SQL Server 2000.

The improvement over SQL Server 2000 is even more dramatic with a large volume of changes. For more information, see the section "Replicating Data in a High Volume Server to Server Environment" in this paper.

Figure 9: Number of Rows Processed Per Second with Download-Only Articles

Figure 9: Number of Rows Processed Per Second with Download-Only Articles

Replicating Data in a High Volume Server to Server Environment

Merge replication is typically used in mobile applications that have disconnected clients. However, in certain cases, such as applications in which conflicts are expected, it is appropriate to use merge replication to synchronize two or more servers. For example, if a company has multiple sales offices, the same order data might be updated by staff at more than one office, requiring conflict detection and resolution, and possibly custom business logic during synchronization.

Transactional replication uses an efficient log reading mechanism to move changes between servers, making it a natural choice where low latency between the Publisher and Subscriber is required. Merge replication uses triggers, and although still not as efficient as transactional replication, merge replication is noticeably faster in SQL Server 2005 when replicating changes between computers that are connected by a fast reliable network. Unlike transactional replication, merge replication by default detects and resolves conflicts, making it an appropriate choice when the replication environment can have conflicts.

SQL Server 2005 offers a new parameter, called -ParallelUploadDownload, for the Merge Agent: The -ParallelUploadDownload parameter lets the agent process the changes uploaded to the Publisher and those downloaded to the Subscriber at the same time. This parallel processing can provide a significant benefit in high volume environments with high network bandwidth. This option is included in the new high volume server-to-server Merge Agent profile. (A profile contains a set of parameters that are used each time an agent runs.)

For more information about agents and agent parameters, see "Administering Replication Agents" in SQL Server 2005 Books Online.

Tests and Results

In these tests, we used the workload that was described in the "High Volume Workload" section (Table 7 and Table 8). We conducted two sets of tests, using a high volume workload of 100,000 changes to the products table. In the first test (Figure 10), we made 100,000 changes at the Publisher and at a single Subscriber, for a total of 200,000 changes. In the second test (Figure 11) we made 100,000 changes at the Publisher only.

Uploading and Downloading a Total of 200,000 Changes

For the first set of tests, we established a baseline by comparing SQL Server 2000 to SQL Server 2005. Figure 10 shows the number of rows processed per second using SQL Server 2000 and SQL Server 2005. We then specified the -ParallelUploadDownload parameter to evaluate the performance when a large number of changes are pending at both the Publisher and Subscriber. As you can see in Figure 10, SQL Server 2005 is much faster than SQL Server 2000, even without the parameter; specifying -ParallelUploadDownload provides an additional 70% performance benefit in this scenario.

Figure 10: Number of Rows Processed Per Second Using the Server to Server Agent Profile

Figure 10: Number of Rows Processed Per Second Using the Server to Server Agent Profile

Downloading 100,000 Changes

For the second set of tests, we again established a baseline by comparing SQL Server 2000 to SQL Server 2005 using a standard article. We then specified the download-only option, which was expected to yield even greater benefit, for the products table. Figure 11 shows the results for SQL Server 2000, SQL Server 2005 with a standard article for the products table, and SQL Server 2005 with the download-only articles option specified for the products table. The performance gains are impressive even without the download-only article option, which gives a good sense of the improvements for high-volume scenarios in SQL Server 2005.

Note This test used a client subscription for the download-only part of the test, because download-only articles require client subscriptions.

Figure 11: Number of Rows Processed Per Second with Download-Only Articles

Figure 11: Number of Rows Processed Per Second with Download-Only Articles

Conclusion

SQL Server 2005 merge replication is an ideal solution for many distributed applications, especially when data is updated at multiple sites, when disconnected Subscribers are used, or when there is a chance of data conflicts. The performance of merge replication in SQL Server 2005 is significantly improved over SQL Server 2000, because of important performance modifications and new merge replication functionality. The improved performance and scalability let businesses confidently use merge replication for enterprise deployments.

Even with the impressive performance of merge replication in SQL Server 2005, it is important to understand that performance and scalability are dependent on good database and application design and the appropriate setting of replication parameters. For additional information about performance tuning, see the section "Enhancing Replication Performance" in SQL Server 2005 Books Online.

Appendix: Publication, Article, and Subscription Properties

The following tables show the properties that are most relevant to the tests in this paper. For each property, we first list the phrase used in the replication property sheets in SQL Server Management Studio and then list the property name used in the stored procedure.

Table A1 Publication Properties

Property	SQL Server 2000	SQL Server 2005
Allow parameterized filters @dynamic_filters	TRUE	TRUE
Optimize synchronization @keep_partition_changes	FALSE	FALSE
Compatibility level @publication_compatibility_level	N/A1	90RTM
Precompute partitions @use_partition_groups	N/A1	TRUE

1 These properties were not available in SQL Server 2000.

Publication Property Descriptions

@dynamic_filters Enables the merge publication to use parameterized row filters. Parameterized filters are used when a Subscriber wants only a subset of the data from the Publisher.
@keep_partition_changes An optimization for filter processing that has been superseded in SQL Server 2005 by @use_partition_groups.
@publication_compatibility_level Indicates the backward compatibility of the publication, with values ranging from 70RTM to 90RTM.
@use_partition_groups Specifies that precomputed partitions should be used to optimize filter processing.

For more information about these and other publication properties, see "sp_addmergepublication" in SQL Server 2005 Books Online.

Table A2 Article Properties

Property	SQL Server 2000	SQL Server 2005	SQL Server 2005 (download-only tests)	SQL Server 2005 (unique partitions test)
Tracking level @column_tracking	TRUE	TRUE	TRUE	TRUE
Partition options @partition_options	N/A1	02	02	32
Synchronization direction @subscriber_upload_options	N/A1	03	23	03

Tracking level

@column_tracking

TRUE

Partition options

@partition_options

N/A1

Synchronization direction

@subscriber_upload_options

N/A1

1 These properties were not available in SQL Server 2000.

2 All tests use a value of 0 for this property, except for the test involving unique partitions of data for each Subscriber. For more information, see the section "Replicating Unique Partitions of Data to Users" in this paper.

3 All tests use a value of 0 for this property, except for the tests involving download-only articles. For more information, see the section "Replicating Lookup Tables and Reference Data" in this paper.

Article Property Descriptions

@column_tracking Specifies whether conflicts are detected at the column level or at the row level.
@partition_options Defines how data in the article is partitioned. Use of this option enables performance optimizations when all rows belong in only one partition or in only one subscription.
@subscriber_upload_options Defines restrictions on updates made at a Subscriber.

For more information about these and other article properties, see "sp_addmergearticle" in SQL Server 2005 Books Online.

Table A3 Subscription Properties

Property	Server 2000	Server 2005	Server 2005 (high volume test)
Merge Agent location @subscription_type	pull	pull	pull
Subscription type @subscriber_type	local1,2	local1,2	global1,3

Merge Agent location

@subscription_type

pull

Subscription type

@subscriber_type

local1,2

global1,3

1 In SQL Server 2005 Books Online, "local" and "global" are referred to, respectively, as "client" and "server". However, in SQL Server 2005 the values for @subscriber_type are still specified as local or global.

2 In previous versions of SQL Server, it was recommended to specify a value of anonymous for @subscriber_type if there was a large number of Subscribers. We now recommend using a value of local in these cases.

3 The high-volume test used a value of global, which is recommended for server-to-server environments. For more information, see the section "Replicating Data in a High Volume Server to Server Environment" in this paper.

Subscription Property Descriptions

@subscription_type Specifies whether the subscription is push or pull.
@subscriber_type Specifies whether the Subscriber is local (client), global (server), or anonymous.

For more information about these and other subscription properties, see "sp_addmergesubscription" and "sp_addmergepullsubscription" in SQL Server 2005 Books Online.

For more information:

https://www.microsoft.com/technet/prodtechnol/sql/default.mspx

Download

Merge Replication Performance Improvements in SQL Server 2005
199 KB
Microsoft Word file

Merge Replication Performance Improvements in SQL Server 2005

On This Page

Introduction

Replication Model and Components

Test Scenario and Workload

Standard Workload

High Volume Workload

Publication, Article, and Subscription Properties

Hardware Used for Testing

Obtaining an Initial Copy of the Database

Tests and Results

Generating the Schema Snapshot

Generating Partitioned Snapshots

Applying the Schema Snapshot and Partitioned Snapshot to a New Subscriber

Sending and Receiving Only the Data Required by the Application

How Performance Was Improved For Replicating Filtered Data

Tests and Results

Concurrently Synchronizing a Large Number of Subscribers

Insert, Delete, and Update Performance at the Publication and Subscription Databases

Sending Lookup Tables and Reference Data

Tests and Results

Replicating Data in a High Volume Server to Server Environment

Tests and Results

Uploading and Downloading a Total of 200,000 Changes

Downloading 100,000 Changes

Conclusion

Appendix: Publication, Article, and Subscription Properties

Download

Additional resources