Diagnosing and Troubleshooting Slow Partitioned Merge Processes

Updated : March 25, 2002

November 2001

Abstract Partitioning data is fundamental to many merge replication applications; it can be used to enforce business rules and to improve performance. However, if it is not implemented correctly, it can also become a source of performance problems. Following the processes and suggestions outlined in this paper can help you achieve scalable, high-performance merge replication applications.

On This Page

Introduction
Partitioning and Merge Replication
Diagnosing Merge Partitioning Performance Problems
Improving Performance by Improving Query Plans and Tuning Agent Properties
Additional Issues and Considerations for Optimizing Merge Replication Performance
Conclusion

Introduction

A merge replication application must not only provide the functionality required by the business rules of the application, but also complete its operations in a timely manner, even if the volume of data and the number of users increase over time.

Because merge replication is implemented through tables and queries in Microsoft® SQL Server™ 2000, performance depends on how efficiently SQL Server handles the processing needs of replication, which in turn depends on how well the database is designed and performance-tuned.

Many merge replication applications include partitioning, which allows different users of the replicated data to receive different data sets. A common source of performance problems in merge replication lies in how partitioning is implemented and how efficiently SQL Server evaluates and processes the expressions associated with partitioning. This paper focuses primarily on diagnosing and troubleshooting the performance of partition processing and addresses some general database design principles that affect performance.

Partitioning and Merge Replication

Processing partition definitions in a publication can be a major source of performance problems in merge replication, but partitioning data can also enhance application performance and flexibility and help ensure that the application's business rules are properly implemented.

A fundamental design goal for merge replication is to partition the data in such a way that each Subscriber receives the smallest data set possible. When you are publishing tables for replication, consider carefully whether every Subscriber needs to see all the data, or whether you can replicate a smaller subset of the data to each Subscriber. Partitioning the data can be extremely beneficial because it can:

  • Eliminate or reduce the possibility of conflicting changes to the data.

    When the same data set is shared and changed at more than one database, there is always the possibility that a change made at one database can conflict with a change made at another database. Conflicting changes to data add to the complexity of an application and to the processing needs of replication; therefore, they should be eliminated by careful design wherever possible.

  • Minimize the volume of data maintained at each Subscriber.

    Larger data sets incur more processing overhead. Also, Subscriber hardware is generally less powerful than Publisher hardware and might not be able to accommodate the processing and storage requirements associated with copies of entire tables. In addition, initializing a Subscriber with a large data set can be a significant issue when the Subscriber does not have a high-speed connection.

  • Prevent Subscribers from receiving sensitive data.

    You might want to prevent Subscribers from seeing data that is inappropriate. The complete data set at the Publisher can be restricted to a data set that is appropriate for each Subscriber.

Filters

In merge replication, partitions are implemented through two different kinds of filters: subset filter clauses and join filters (also referred to as merge filters or cross table merge filters). The two kinds of filters are often combined in publications, with subset filter clauses filtering individual tables and join filters filtering according to relationships between tables.

Both kinds of filters can be used with static partitioning or dynamic partitioning. When you define a publication with static partitioning, all Subscribers receive the same data set when the merge process runs. When you define a publication with dynamic partitioning, Subscribers can receive different data sets on the basis of their connection properties, such as the login.

Subset Filter Clauses

A subset filter clause is an article property that allows you to define a logical expression. This logical expression is used in the WHERE clause of queries that replicate the tables. For example, a CUSTOMER table at the Publisher contains customer data for several states, but the business rules of the application require that the publication limit the published data to a single state. An example of a subset filter clause for this table is:

State = 'WA'

Join Filters

Foreign key relationships are central to most database designs. For example, in a database that tracks product orders, you often see a CUSTOMER table with a CustID primary key and an ORDER table with a CustID foreign key that references the CUSTOMER table.

In merge replication, if you filter the CUSTOMER table with a subset filter clause, you must also filter the ORDER table, so that only the rows that reference the filtered subset of rows from the CUSTOMER table are replicated. Filtering requirements that are based on a foreign key relationship must be explicitly represented in the merge replication configuration through a join filter (which lists the two related table articles and a logical expression that identifies the relationship), as shown in the following example:

CUSTOMER.CustID = ORDER.CustID

The effect of this join filter between the CUSTOMER table and the ORDER table, together with the subset filter clause on the CUSTOMER table, is that Subscribers to the publication receive only those rows from the CUSTOMER table where State = 'WA' and only those rows from the ORDER table that correspond to the CustID values in the filtered CUSTOMER table.

Filtering and partitioning that is based on foreign key relationships is the most common approach to limiting data, but sometimes this approach is not appropriate for your situation. For example, Subscribers might need to receive all of the CUSTOMER rows, but only the ORDER rows for the customers in their state. The CUSTOMER table is not filtered; therefore, the ORDER table cannot be filtered with a join filter that is based on the CUSTOMER table. All filtering must be expressed through the use of a subquery in a subset filter clause on the ORDER article, such as:

ORDER.CustID IN 
(SELECT CustID FROM CUSTOMER WHERE State = 'WA')

Although some applications require the use of a subquery, this filtering technique might not be as efficient as the join filter technique in the previous example. It is recommended that you use join filters whenever possible.

Dynamic Partitioning

The filtering expressions discussed in the preceding sections are static and produce only one partitioned data set. Publishing a partition based on a different state requires another publication, so if the application requires a partition for each state, 50 publications are required. Because the overhead for managing and maintaining this many publications would be high, merge replication supports dynamic partitions that publish different data sets in a single publication.

If you use dynamic partitions, the subset filter clause contains a function that returns a different value for each partition. (Each Subscriber might have a different partition, but different Subscribers can have the same partition.) For example, the CUSTOMER table can include a SALESREP column that contains the network user account for each customer's sales representative. The subset filter clause would be:

SALESREP = SUSER_SNAME()

SUSER_SNAME() is a Transact-SQL function that returns the login for the user who is currently connected. When a Subscriber connects to the Publisher to run the merge process, the SUSER_SNAME() function evaluates to a value that is compared to the column specified in the dynamic filter; rows that match the filter are replicated. For example, if the sales representative who is running the merge process connects with the login domain\bobjones, the partition includes only those CUSTOMER rows where SALESREP = 'domain\bobjones'.

The Transact-SQL function HOST_NAME() can also be used in dynamic filters, so that filtering is based on the identity of the Subscriber server rather than on the Subscriber login. SUSER_SNAME() corresponds to the Merge Agent parameter –PublisherLogin, which in turn corresponds to the SQL Profiler column name LoginName. Similarly, HOST_NAME() corresponds to the Merge Agent parameter -Hostname and the SQL Profiler column name HostName. SQL Profiler column names are discussed later in the section "Data Columns." For more information, see the topics "Replication Merge Agent Utility" and "SQL Profiler Data Columns" in SQL Server Books Online.

Dynamic filtering has implications for initializing new Subscribers; see the sections "sp_Msinitdynamicsubscriber" and "Using Dynamic Snapshots" later in this paper.

Internal Implementation of Filters

When you define partitions in a merge replication publication, merge replication creates views in the published database that are based on your filtering expressions and then publishes the subset of data represented by these views.

For example, the view created for the CUSTOMER article with a subset filter clause of State = 'WA' would be:

CREATE VIEW [publication_CUSTOMER_VIEW] AS  
SELECT * FROM [dbo].[CUSTOMER] [CUSTOMER] 
WHERE  (State = 'WA')

The view created for the ORDER article that has a join filter with the CUSTOMER table would be:

CREATE VIEW [publication_ORDER_VIEW] AS 
SELECT [ORDER].*  
FROM [dbo].[ ORDER] [ORDER], [publication_CUSTOMER_VIEW] [CUSTOMER] 
WHERE CUSTOMER.CustID = ORDER.CustID

These partitioning views and the filtering expressions they contain are central to the performance of replicating partitioned data. They define the operations that must be performed by SQL Server. Indexes determine how SQL Server is to perform the operations. If the views are overly complex or cannot make use of indexes on the base tables, performance for the entire set of related published tables could be degraded.

**Join Filters and @join\_unique\_key**

The @join_unique_key property of sp_addmergefilter (the stored procedure used to add a join filter) affects how SQL Server builds join filter views. Set it to 1 when the join logic is based exclusively on unique logic in the filter criteria table. Otherwise, set it to 0.

For example, the join filter described earlier in this paper expresses a join between the ORDER table and the CUSTOMER table based on the condition customer.CustID = ORDER.CustID. If the CustID field in the CUSTOMER table (the filter criteria table) has a unique index, the comparison is based on a unique column (or a set of columns with a unique index) and the @join_unique_key value should be 1. The result is a partition view that performs a join between the ORDER table and the CUSTOMER view:

CREATE VIEW [sfa_publ_ORDER_art_VIEW] AS 
SELECT [ORDER].* 
FROM [dbo].[ORDER] [ORDER] , [sfa_publ_CUSTOMER_art_VIEW] [CUSTOMER] 
WHERE (CUSTOMER.CustID = ORDER.CustID)

However, if the CUSTOMER.CustID values in this example are not unique, this join technique is not valid for the partitioning process. The ORDER rows would be erroneously duplicated to match the duplicated CustID values in the CUSTOMER table. In this situation, the @join_unique_key setting for the filter should be set to 0, which results in a subselect rather than a join in the ORDER view:

CREATE VIEW [sfa_publ_ORDER_art_VIEW] as 
SELECT *  FROM [dbo].[ORDER] ORDER_alias_1
WHERE rowguidcol IN 
(
SELECT [ORDER].rowguidcol 
FROM [dbo].[ORDER] [ORDER_alias_2] , 
     [sfa_publ_CUSTOMER_art_VIEW] [CUSTOMER] 
WHERE ( CUSTOMER.CustID = ORDER_alias_2.CustID)
)

This example is relatively straightforward because it reflects the relationship of primary and foreign keys between the tables, which requires the CustID column to be unique. In more complex scenarios, it can be difficult to determine whether the logic is unique. If your join filter expressions contain multiple expressions combined by the OR operator or if they include wildcards or the NOT operator, be very careful about setting @join_unique_key property to 1.

The @join_unique_key property is available because using a join in the view allows the SQL Server Query Analyzer to use a more efficient technique to process the query and return the result set. If you have set the property to 0 and the join logic is actually unique, you are not taking advantage of the optimal processing technique available to you. On the other hand, if you set the property to 1 and the join logic is not unique, the join technique returns duplicated data and the result is primary key violations when the data is applied to the Subscriber.

Partitioning and Its Effect on Performance

It is important to note that the views in the previous section are nested when they represent join filters. The ORDER view is based on a join with the CUSTOMER view. If an ORDER_ITEM article is added to this publication and the ORDER_ITEM article is filtered on the basis of the ORDER article, the partition view for ORDER_ITEM includes a join with the ORDER partition view:

CREATE VIEW [publication ORDER_ITEM_VIEW] AS 
SELECT [ORDER_ITEM].*  
FROM [dbo].[ ORDER_ITEM] [ORDER_ITEM], [publication_ORDER_VIEW] [ORDER] 
WHERE ORDER_ITEM.OrderID = ORDER.OrderID

View definitions become increasingly complex as they involve more levels of relationships. This can be a major source of performance issues because problems at the higher levels (for example, in the CUSTOMER table) are magnified when the complexity of the queries at the lower levels increases.

If partitions are defined in a publication, the merge process must ensure that each Subscriber receives the appropriate data set for its partition. This is relatively straightforward when a new Subscriber is initialized, because the merge process selects the appropriate data from the views at the Publisher and uses the resulting data set to populate the Subscriber. But as changes are made to data at either the Publisher or the Subscriber and as rows are added to the partition or removed from it, the merge process must update the data set at the Subscriber by adding or removing rows as appropriate.

For example, if the CUSTOMER article is filtered using the logic State = 'WA' and one of the customers relocates to Oregon, the State column is updated accordingly. The row that contains this customer no longer belongs in the State = 'WA' partition, so the merge process must remove it from all Subscribers who should have only data in which the value for the State column is WA. The ORDER data is based on the CUSTOMER table, and the ORDER_ITEM data is based on the ORDER table. Therefore, removing the customer from the State='WA' partition implies that the merge process must also remove the associated Subscriber rows in the ORDER and ORDER_ITEM tables that belong to this customer. Conversely, the customer row and its related rows in the ORDER and ORDER_ITEM tables must be added to the data sets of Subscribers that receive a State = 'OR' partition (for Oregon).

The merge replication process can require significant SQL Server processing resources to populate and maintain the partitioned data sets, especially when more data sets are published. The nested logic becomes more complex, and the volume of data in the published tables increases. For this reason, problems related to partition performance are usually associated with larger tables at lower levels in the hierarchy.

The efficiency of merge replication partition processing is determined by the ability of SQL Server to use efficient query optimization techniques when processing partition views. The type of filtering expressions you use and the presence of indexes that support the filtering expressions in turn determine what query optimization techniques SQL Server can use.

Indexes and Merge Partition Performance

The presence of indexes can dramatically improve the processing time and reduce the processing resource requirements (memory, locks, and so on) of queries that allow SQL Server to take advantage of indexes for the columns in the WHERE clause.

When merge replication processes a partitioned data set, it executes queries based on the partition's views as described in the previous section. A quick return of the partition's data set is primarily based on the ability to use indexes for the queries. If the database is designed so that indexes cannot be used efficiently or if indexes are not even present, partition processing performance suffers and the replication process might fail.

Define partitions by using filtering conditions that result in efficient queries, and use indexes whenever possible. For more information about effective index design, see the topic "Indexes, Designing" in SQL Server Books Online.

Note: Although the presence of indexes can significantly improve query performance, using indexes increases overhead costs because SQL Server has to maintain the indexes. Whenever a user adds or deletes a row or updates an indexed column, SQL Server must update the index. Only use indexes when their benefit (for the merge process and other queries as well) outweighs their cost. Use the Index Tuning Wizard to help determine which indexes are appropriate.

Index Maintenance

Indexes can become fragmented over time as rows are inserted, updated, and deleted. It might be necessary to defragment or rebuild them occasionally so that they are stored in an efficient and compact manner. Defragmenting or rebuilding indexes is important for user tables and merge replication system tables. For example, the MSmerge_contents system table is changed whenever a row is inserted, updated, or deleted in any published table. When the table receives a lot of activity, its indexes can become fragmented; the result is slower merge replication processing.

There are a number of ways to defragment or rebuild an index. To defragment an index use DBCC INDEXDEFRAG. To rebuild an index, drop and then re-create the index or use DBCC DBREINDEX(). If the index is associated with a PRIMARY KEY or UNIQUE constraint, you might not be able to drop and re-create the index. In this case, you should use DBCC DBREINDEX(). See the topics "Indexes, Defragmenting" and "Indexes, Rebuilding" in SQL Server Books Online for more information.

Problems Resulting from Poorly Indexed Partitions

Poorly designed databases and sub-optimal partition implementation can produce several different types of problems:

  • Long running merge processes

  • Time-outs

  • Locking, blocking, and deadlocks

The problems might occur during the first attempt at replicating data, or they might only manifest themselves after the amount of data and number of users grows over time.

Long Running Merge Processes

One of the most common problems encountered with merge processes is that the process takes a long time to execute. If partitions are present, the problem might be the result of poorly optimized partition logic.

Time-outs

The Merge Agent (Replmerg.exe or the merge ActiveX® control) allows you to specify a

-QueryTimeout value for the merge process. If a query does not complete within the number of seconds specified by the -QueryTimeout value, the process fails with the following error message: "A timeout occurred."

This is not necessarily a problem or a bug; the query for a perfectly optimized partition in the best possible circumstances can still require more than the specified number of seconds to complete. Increasing the -QueryTimeout value for the Merge Agent might be the only solution. However, a time-out is an indication that you should re-evaluate the partitioning logic and optimize it if possible.

Locking, Blocking, and Deadlocks

When SQL Server accesses the data in a table, it locks the data for the duration of the process to ensure the transactional consistency of the data during the process. During that time, other processes are blocked and must wait until locks are released before they can complete. If queries are not optimized, more data is locked and the locks are held for a longer period of time. This means that other processes are blocked for a longer period of time.

Blocking is an issue for merge replication primarily when other processes are accessing data at the same time as the merge process. Poorly optimized partition queries might hold locks and block other processes, or other processes might hold locks that block the merge process. Depending on the database settings (that is, SET LOCKTIMEOUT) the blocked process might result in an error.

Deadlocks are a slightly different manifestation of a locking problem. Essentially, one process or connection is holding one resource (a table, index, or row) and waiting for another resource. A second process or connection is holding the second resource and is waiting for the first one. Neither process can complete until the other completes and releases its resource. SQL Server recognizes this situation and causes one of the processes to fail with the error message: "A deadlock occurred and this process has been chosen as the victim."

When the merge agent fails with this error, it is usually the result of a temporary resource problem. The merge process will probably succeed on the next execution when the other processes have completed their operations and released the resources. For this reason, the merge process fails with a retryable error; if it is automated with a SQL Server Agent, the job automatically starts again.

If partitions are not optimal, merge replication requires excessive locks and holds them longer and the merge replication process is more prone to deadlock failures. Again, this may be a natural side effect of the implementation of merge replication, but using efficient partitions reduces the potential for deadlocks.

Diagnosing Merge Partitioning Performance Problems

If you are encountering long running merge processes or if the merge processes are failing with time-outs or locking and blocking problems, evaluate your partitioning logic to ensure that you are using efficient techniques wherever possible. To diagnose performance problems, you can:

  • Evaluate partitioning definitions.

  • Analyze agent time-out failures.

  • Analyze the steps in a slow merge process to look for sub-optimal query processing.

Evaluating Partition Definitions

The easiest way to investigate performance problems is to analyze partition definitions to ensure that:

  • The filtering expressions in the subset filter clauses and join filters use "index friendly" expressions.

  • The columns used in the expressions are indexed.

  • The @join_unique_key setting is used correctly.

You should perform this analysis when the database and publication are originally designed and created. Follow-up reviews are also beneficial when the application is in production because several factors (volume of data, number of users, patterns of data changes, and so on) can change over the life of an application in ways that are not foreseen during the design and implementation phase of a project.

The best way to evaluate a partition definition is to review the partition views created by merge replication based on the subset filter clause and join filter expressions. The partition views are named in the form publicationname_articlename_VIEW. To review a view definition, use one of the following methods:

  • In SQL Server Enterprise Manager, right-click the view name and then select Properties.

  • In SQL Query Analyzer, run:

EXEC sp_helptext publicationname_articlename_VIEW.

Another approach is to review the definitions of the articles and filters defined in the publication. You can do this in either of the following ways:

  • In SQL Server Enterprise Manager, right-click the publication name and then select Properties. Then select the Filter Rows tab.

  • In SQL Server Enterprise Manager, right-click the publication name and select Generate SQL Script… The Transact-SQL script that is generated includes the sp_addmergearticle and sp_addmergefilter calls that contain the partition expressions.

When you are looking at partition views, be sure all WHERE clause expressions make use of indexes.

To review the index definitions, do one of the following

  • Use SQL Server Enterprise Manager.

  • Review Transact-SQL scripts (created either manually or from the scripting feature in SQL Server Enterprise Manager).

  • Query the system table information in the database.

Although reviewing view definitions and indexes might not clearly identify the portions of the merge process that are causing performance problems, it does provide a quick way to identify obvious problems, which can lead to significant performance improvements.

Analyzing an Agent Time-out Failure

After the database and application have been in production for some time, the amount of data and number of users can grow, which causes increasingly slower merge processing. Eventually, one of the steps in the merge process can fail with a time-out error. If the merge process fails with an error, check the error details in SQL Server Enterprise Manager:

  1. Select Replication Monitor; and then select Agents, and then Merge Agents.

  2. Right-click the Merge Agent with the error, and then select Error Details….

If the time-out is related to partitioned merge performance, the command being executed is probably either sp_MSinitdynamicsubscriber or sp_MSsetupbelongs. From this point, the diagnostic steps are the same as those outlined in the next section.

Analyzing the Steps in a Slow Merge Process

The most thorough approach to analyzing merge performance problems is to step through the merge process itself, identify the steps that perform slowly, and analyze the SQL Server execution plans to identify whether indexes are being used.

When a merge process starts, it opens a minimum of three connections: to the Publisher, to the Distributor, and to the Subscriber. The merge process can open more connections if it can benefit from opening multiple threads to the Publisher or to the Subscriber for faster parallel processing.

After the connections are opened in the initialization process, changes are uploaded from the Subscriber to the Publisher. Then changes are downloaded from the Publisher to the Subscriber. When the merge process is run for the first time with a new Subscriber, there are no Subscriber changes to upload. So the process consists of applying the initial data set (the snapshot) to the Subscriber and then downloading any changes from the Publisher that have occurred since the snapshot was created.

The analysis performed in this section begins with defining a SQL Profiler trace. It involves only the connection to the Publisher in the download step because this is where partitioning queries are executed.

Defining a SQL Profiler Trace

You can use the SQL Profiler utility to display the queries being sent by clients to an instance of SQL Server and provide information about each query that is submitted. SQL Profiler can be used to gain information on many processes, but this section only provides information specific to diagnosing a merge process. For more information about SQL Profiler, see SQL Server Books Online.

SQL Profiler uses a trace definition, which is a collection of settings that describe what operations are included in the profile and what information is recorded for each operation. In addition to the connection information for the server being analyzed, a trace definition contains three types of settings: events, data columns, and filters.

A trace can return hundreds of thousands of rows of extraneous information; therefore, you must limit the rows of information that are returned to those generated by the merge process. Include only the columns that are useful for your analysis. It is easiest to limit the rows returned by SQL Profiler to only those relevant for the merge process if the merge process is the only process that is currently running on the server. This is often not possible in a production environment, so you might need to use the settings listed in the following sections to restrict the quantity and type of information returned in a trace.

SQL Profiler Events

SQL Profiler events include the activities (such as connecting or disconnecting, executing queries, and so on) that are performed by users or by SQL Server. You must explicitly identify the events that are included in the trace. For the purposes of analyzing a merge replication process, the most important event is Stored Procedures, RPC: Completed because this is how merge replication submits queries and performs operations in SQL Server. All other events should be removed from the trace unless you have a specific need to include them.

SQL Profiler Data Columns

Data columns define the types of information that are recorded for each event. For analyzing the merge replication process, the most important columns to include are:

  • TextData, which shows the query or procedure call being executed by the step in the merge process.

  • Duration, which shows how long it took SQL Server to execute the step.

Other useful columns are:

  • DatabaseId and Spid, which allow you to distinguish Publisher connections from Distributor connections when the Publisher and Distributor are on the same server (which is usually the case in merge replication).

  • LoginName and HostName, which allow you to distinguish one Subscriber's merge process from another when multiple merges are running at the same time.

SQL Profiler Filters

SQL Profiler filters allow you to explicitly exclude or include certain events with more flexibility and granularity than using a trace definition alone. This is only necessary in high-traffic servers where many other operations are being performed. See SQL Server Books Online for more information about using filters in SQL Profiler.

Tracing a Merge Process

After you have defined a SQL Profiler trace, start it and then start the merge process. You can start the merge process manually through SQL Server Enterprise Manager by using the Start option for the merge agent job. However, SQL Profiler also includes extraneous information in the trace from the operations of SQL Server Enterprise Manager. We recommend that you either use a trace filter that excludes the application MS SQL EM or start the process from the command line. To start the merge process from the command line, copy the command line parameters from the Merge Agent job in SQL Server Enterprise Manager:

  1. Select Replication Monitor, and then select Merge Agents.

  2. Select the merge agent you want to analyze.

  3. Right-click the agent, select Agent Properties, and then select Steps.

  4. Select the Run Agent step and select Edit.

  5. In the Command box, copy the text that is similar to the following:

    -Publisher ServerName -PublisherDB PublisherDBName -Publication PublicationName -Subscriber [SubscriberName] -SubscriberDB [SubscriberDBName] -Distributor [DistributorName] -DistributorSecurityMode 1

After you copy the parameters, execute Replmerg.exe from the command line with the parameters from the agent job. (If the parameters include -Continuous, exclude this parameter for the purpose of this analysis.) Use an explicit path, if you receive the following error message:

'replmerg.exe' is not recognized as an internal or external command, 
operable program or batch file

For a default SQL Server 2000 installation, this path is %systemdrive%\Program Files\Microsoft SQL Server\80\COM\replmerg.exe.

After you have started the merge process, you can see commands added to the trace window as they are executed. Allow the merge process to complete, and then stop the SQL Profiler trace. At this point, the SQL Profiler screen displays a grid with rows of information about the activities performed.

Note: If you are performing this analysis using a test system, it is important to simulate a realistic merge process. The merge process should have changes to replicate, and the changes that are replicated should be representative of the types of changes that are encountered in a production environment.

Identifying the Operations Being Performed in a Step

After the merge process and the SQL Profiler trace are complete, identify the trace steps with the longest durations. You can work directly with the information in the SQL Profiler screen, but it might be easier to save the trace as a table, especially if a large number of rows are returned in the trace. To save the trace as a table, click File, point to Save As, point to Trace Table, and then select SQL Server Table. After the information is saved, you can query the saved table to sort the data by duration and then use a WHERE clause to eliminate extraneous rows based on DatabaseId, Spid, LoginName, HostName, and so on.

After you have located any long running steps, you can retrieve the text of the executed command from the TextData column of the trace data.

Common Steps Involved with Partition Processing

The two replication stored procedure calls used in partitioned data processing are sp_MSsetupbelongs and sp_MSinitdynamicsubscriber. If either of these procedure calls is included in the longest running steps, it is likely that partition processing is the cause of your performance problems. Determine what articles are being processed by the procedure calls, and then determine the subset filter clause or join filter that is causing the step to take a long time to complete.

sp_MSsetupbelongs

The sp_MSsetupbelongs procedure determines which set of rows in each table should be replicated to the Subscriber. It does this by querying the partition views for each filtered table. If the partition views contain poorly optimized queries, sp_MSsetupbelongs takes longer than necessary to execute.

sp_MSsetupbelongs steps can be difficult to associate with a specific article (and partitioning expression) because the procedure is called with a list of generations (essentially version numbers for a batch of changes) that contain changes from several different articles. To further complicate the matter, the sp_MSsetupbelongs procedure uses several temporary tables that are not present after the merge process completes.

For this reason, you might need to experiment with different trace events, such as:

  • TSQL: Stmt Starting and TSQL: Stmt Completed

  • Scan:Started and Scan:Completed

  • Performance:ShowPlanStatistics, Performance:ShowPlanAll, and Performance:ShowPlanText

Use caution when including these events in a trace, because even a relatively simple merge process can generate millions of rows of trace information for these events. You might be able to filter the output, but even if you are able to do so, it can still be difficult to use these events successfully.

Modifying sp_MSsetupbelongs to Record Diagnostic Information

Because it is difficult to determine what articles are causing performance problems during sp_Mssetupbelongs processing, it is necessary to manually modify sp_Mssetupbelongs by adding a step that records diagnostic information about the amount of time spent processing each article.

These suggestions are intended for diagnostic purposes and should only be used for test systems that simulate production processing. Instructions for reversing these changes are included at the top of the next section. This code example is applicable to SQL Server 2000, but the use of the view_sel_proc in SQL Server 7 is generally the same.

Before modifying sp_MSsetupbelongs, create a table in the publication database that will be loaded with data during the execution of sp_MSsetupbelongs:

CREATE TABLE article_diagnostics
(
view_sel_proc sysname, 
start_stamp datetime default getdate()
)

sp_MSsetupbelongs has two main stages:

  1. It creates and populates temporary tables with information about the changes that need to be replicated.

  2. It obtains the changes on an article-by-article basis by querying the temporary tables that are joined with the published tables.

After sp_MSsetupbelongs is modified, it populates the article_diagnostics table that you created with the names of the articles being processed in the second step.

To modify sp_MSsetupbelongs, you must first locate the Replmerg.sql file in the INSTALL directory of your SQL Server installation, which is by default located at:

  • %systemdrive%\Program Files\Microsoft SQL Server\MSSQL\Install for a default instance, or

  • %systemdrive%\Program Files\Microsoft SQL Server\MSSQL$InstanceName\Install for a named instance

Copy the file with a new name, such as Newreplmerg.sql, and then open it with an editor. Find the section that loops through the articles; this section starts:

while (@artnick is not null)
   begin
      select   @artbaseobjid = objid, @procname = 
         view_sel_proc, @before_view_objid = before_view_objid,
            @before_table_objid = before_image_objid 
            from sysmergearticles 
            where pubid = @pubid 
            and nickname = @artnick

After the SELECT statement, paste the command:

INSERT INTO article_diagnostics (view_sel_proc) 
VALUES (@procname)

The view_sel_proc procedure is stored in the article_diagnostics table and can be used later to find the partition view name. The start_stamp column is loaded with the current time when each row is added to the table, which allows you to determine the amount of time from one view_sel_proc execution to the next.

After you have inserted this command, save the file (with its new name) and install the modified procedure by executing the script at the command line with the osql utility:

OSQL -Usa -P -S -n -inewreplmerge.sql

(You might need to adjust this command line as appropriate for the security settings of your SQL Server installation.)

Determining the Partitions

After you have created the table and modified the sp_MSsetupbelongs, re-execute the merge process as usual (it is not necessary to run SQL Profiler), and when it is done, review the article_diagnostics table to see the view_sel_proc names and the amount of time spent processing each partition.

When you are performing the analysis, you must simulate actual data flow by performing inserts, updates, and deletes between each run of the Merge Agent. If there are no partition changes to process, the sp_MSsetupbelongs is still called, but it does not step through the view select procedures for the articles that depend on the partitions.

Note: After your analysis is complete, you must reinstall the original version of the procedure by reinstalling the original Replmerg.sql file:

OSQL -Usa -P -S -n -ireplmerg.sql

sp_MSinitdynamicsubscriber

An initial merge is performed when the merge process is run for the first time with a new Subscriber. The function of an initial merge is to populate the Subscriber with the complete data set for that Subscriber. Subscribers to non-dynamic publications are initialized using a bulk copy program (bcp) operation that copies in data from files created during snapshot processing. This technique is very fast and is rarely associated with performance problems; it has few options for optimization.

On the other hand, for Subscribers subscribing to dynamically filtered publications, the snapshot process cannot determine the appropriate set of data for a Subscriber (unless you use a dynamic snapshot). Therefore, the merge process must populate the data set by querying the published tables using sp_MSinitdynamicsubscriber. This technique is much less efficient than initializing a Subscriber with bcp files and is often the cause of partition performance problems.

If your trace information indicates that an sp_MSinitdynamicSubscriber call is taking a relatively long time, you can see the article nickname as the second parameter of the executed procedure. This article nickname can be used to obtain the partition view name associated with the long step:

SELECT view_sel_proc
FROM sysmergearticles
WHERE nickname = 123456

Obtaining the Partition View Definition

The view_sel_proc procedure associated with the long running query is used to query the partition view and obtain the partitioned data set. Execute sp_helptext to see the view_sel_proc definition:

EXEC sp_helptext view_sel_proc

The view_sel_proc definition begins with several initialization steps, followed by two queries that refer to the partition view, as demonstrated in the following example:

SELECT @tablenick, v.[rowguid], coalesce (c.generation,1), 
coalesce (c.lineage, @lin), coalesce (c.colv1, @cv), v.* 
     FROM [dbo].[CustPub_Customer_VIEW] v 
LEFT OUTER JOIN  dbo.MSmerge_contents c ON  
     v.[rowguid] = c.rowguid  and c.tablenick = 
@tablenick WHERE v.[rowguid] > @guidlast 
 ORDER BY v.[rowguid]
INSERT INTO #belong (tablenick, rowguid, flag, 
     skipexpand, partchangegen, joinchangegen)
SELECT ct.tablenick, ct.rowguid, 0, 0, ct.partchangegen, 
     ct.joinchangegen from 
#contents_subset ct, [dbo].[CustPub_Customer_VIEW] v 
     WHERE ct.tablenick = @tablenick AND 
ct.rowguid = v.[rowguid]

The FROM clause of these two queries contains the partition view name; in this example the view name is CustPub_Customer_view.

Gathering Partition View Query Plans

When you know which partition view definition is causing performance problems, your goal is twofold:

  • To understand the view and how it represents the internal implementation of your filtering logic.

  • To determine how SQL Server is processing this view and what aspects of the view are causing it to run slowly.

To extract the query from the view definition, use sp_helptext, as follows:

EXEC sp_helptext CustPub_Customer_view

After you have the text of the partition view, review the Transact-SQL definition of the view. You might be able to identify problems with the view that require changing the filtering expressions in the publication.

To gain information about how SQL Server processes this view, you must execute the query from the view definition with a special setting so that SQL Server shows the query plan. The query plan shows you how SQL Server breaks the query into individual steps, the percent of time spent on each step, and what technique it uses to execute each step.

To see the query plan, copy the SELECT statement from the view and paste it into the query window of SQL Query Analyzer. In SQL Query Analyzer, select Query, then select Show Execution Plan, and then execute the query. The query results window now includes an Execution Plan tab that shows a graphical representation of the query plan.

Executing the view's query should give you a reasonable simulation of the queries executed during merge processing. Executing the view's query does not, however, include the join with the MSmerge_contents or #contents tables that you can see in the previous code sample. To see a more pure simulation of the query, you can add more diagnostics steps in the procedure to save a copy of MSmerge_contents and #contents data to another table, which can then be used later when you derive the query plan.

If a partition contains dynamic expressions, you must ensure that the query being analyzed uses the same values for the dynamic expressions that the merge process uses. Otherwise, the partitioned data set you are analyzing might be significantly different from the merge process. For example, consider the following filtering expression:

SALES_REPL = SUSER_SNAME()

The merge process might run under the 'domain\bobjones' account, and you might be performing your analysis logged in as the system administrator. When you obtain query plans, either log in with the same account that is used during the merge process or manually extract the SELECT statement from the view and replace the SUSER_SNAME() value with the hard-coded literal value 'domain\bobjones'.

Analyzing Query Plans

Although SQL Server uses a number of complex techniques to process the steps of a query, it basically processes the rows in a query in one of two ways: it performs a scan by stepping through each row in a table or clustered index, or it performs an index seek by going to a specific location in an index. Scans require a lot more overhead and processing time than index seeks, and cause more performance problems.

SQL Server only performs a scan when it cannot use an index for one of the following reasons: an index is not present; the WHERE clause cannot use the indexes that are present; or the data in the indexed column is not sufficiently unique.

When the longest step of a partition view query plan shows a scan, it is not using an index. The details of the plan step usually show the WHERE clause expression SQL Server is using when scanning the rows. This expression is a prime candidate for query optimization.

Using the partition examples listed earlier, this expression could be:

WHERE State = 'WA'

After you have found the query and expression that are creating a sub-optimal plan, you have several options for improving the query plan, as described in the next section.

Improving Performance by Improving Query Plans and Tuning Agent Properties

Before we address specific approaches to improving partition performance, it is worth re-iterating a point made in the introduction to this paper: replication performance depends on SQL Server performance, which in turn depends on effective database design.

Database design involves the table definitions and indexes present in the database, and how efficiently they support the needs of the application's business rules and replication processing. One of the most common and serious causes of performance problems with merge replication is poor database design: the design is either fundamentally flawed, or it does not successfully accommodate the needs of replication processing.

Even a database that is designed and tuned well without replication being installed can encounter performance problems if you then later install and use replication. It is possible to publish a small, simple database with little consideration of performance issues and have it run successfully indefinitely. But when you publish a large, complex database that will grow in size and scope, replication performance needs can often determine the success or failure of the application. Take into account replication needs in the very early stages of designing a new application; when you are adding replication to an existing database, you might need to change the database itself to accommodate the needs of replication.

Adding Indexes to Support Filtering Expressions

The simplest solution for improving partition performance is to add an index if one is not present. In the earlier example, the WHERE State = 'WA' expression used a table scan. If this column has no index, consider adding one. If your partition expression contains several conditions in the WHERE clause, you might need to add a compound index that covers the query. Consider an example where the expression is:

WHERE State = 'WA' AND
      City = 'Redmond' AND 
Postal_Code = 98056

Adding an index on the three columns covers the query better and is a more useful index for the query. For example:

CREATE INDEX cust_loc_index ON CUSTOMER (Postal_Code, City, State)

It is usually better to have the most selective columns listed first in compound indexes. The SQL Server Index Tuning Wizard might be able to provide specific recommendations for index changes that will improve the performance of the query.

Changing Filtering Expressions

If the filtering expression contains complex expressions with functions, multiple operators, and so on, SQL Server might not be able to use an index efficiently for the query. You should evaluate the expression to determine whether it can be simplified or changed to work more effectively with an index.

Using Query Hints

Most discussions of advanced query optimization include references to query hints, which can be used to override standard query optimization and force SQL Server to use the methods you specify. SQL Server usually selects the best execution plan for a query and can adjust the execution plan to accommodate changes in data that may affect the best plan for a query. For these reasons, you should generally avoid using hints in an attempt to improve a query. However, if other optimization techniques do not address your performance problems, query hints might be beneficial. But they should be used with the utmost caution and be tested extensively.

To use a query hint to improve the partition performance, alter the partition view definition to include the hint.

Setting @keep\_partition\_changes

@keep_partition_changes is a true/false property associated with a publication; it can improve the efficiency of partitioned data set performance if it is set to TRUE. When a table is published with partitioned data sets, this setting determines whether changes to one Subscriber's data set will impose partition maintenance overhead for the other Subscribers' partitions.

For example, if there are two Subscribers, one for State = 'WA' and another for State = 'OR' and a row in the data table with State = 'WA' is updated to State = 'CA', this row must be removed from the State = 'WA' partition.

If @keep_partition_changes is FALSE (default), the merge replication process recognizes that the partitioned data sets have changed, but not which partition has changed. Therefore, the sp_MSsetupbelongs procedure, which populates the Subscribers' data sets at the Publisher, must run for each Subscriber to determine whether the Subscriber has rows that must be removed, even if the Subscriber's partition never contained the moved row in the first place.

If @keep_partition_changes is TRUE, merge replication stores extra data in its system tables to track which Subscribers will be affected by partition changes. This means that only the State = 'WA' Subscriber will be involved in the partition cleanup operation.

The tradeoff for this option is that when it is TRUE, the Publisher must store and maintain a tracking table. If there are a large number of Subscribers with discrete partitions and a small amount of shared data, this cost is probably justified. If there are a smaller number of Subscribers or if the Subscribers share more partitioned data, it is more likely that the partitioned data set of each Subscriber will need to be updated based on changes made by the other Subscribers. The cost of the extra system table data might not be justified.

Using Dynamic Snapshots

When a publication does not use dynamic partitioning, the snapshot process creates a set of data files that the merge process can load quickly into each Subscriber during the first merge run.

However, if dynamic partitions are used in the publication, the snapshot process cannot create a single set of data files that are appropriate for Subscribers with different partitioned data sets. In this case, the merge process must query the Publisher database by using the dynamic logic for each Subscriber and then load the data into the Subscriber. This process is much less efficient than copying in data files, and it can degrade performance when you have large data sets and complex partitioning expressions.

If dynamic snapshots are used, you can use new options in SQL Server 2000 during the snapshot process to create different sets of Subscriber-specific data files. Then the merge process can load these files quickly into the Subscriber, without having to query the Publisher for the data. For more information, see the topic "Dynamic Snapshots" in SQL Server Books Online.

Changing the Database and Filtering Design

If the previous optimization techniques have not yielded the required performance improvement, it might be necessary to redesign the database to better fit the partitioning needs of replication.

Redesigning the database is the most extreme solution for improving merge replication performance and is usually much easier if done early in the project when there are fewer dependencies on the current design. If serious performance problems are discovered only after the application is in production, there could be significant limitations with respect to how the database can be structurally changed. Therefore, investigate design changes that have minimal impact on or are even completely transparent to the other components of the application. Two common approaches are the use of mapping tables and the denormalization of the database.

Mapping Tables

Sometimes the structure of your base tables does not lend itself to efficient partitioning. For example, if your CUSTOMER table needs to be partitioned for each sales representative, you might be inclined to add the SalesRep column to the table and use this column as a dynamic filter. But you might need additional logic that cannot be implemented and indexed efficiently, such as partitioning that requires OR operator logic, complex expressions, or complex cross-table logic. Then you might not be able to use filtering expressions that are based on the existing tables and take advantage of indexes. However, you can add a mapping table that associates the customers with their sales representatives. You can use the dynamic subset filter clause expression on the mapping table and use a join filter to associate the CUSTOMER table with the mapping table, as shown below:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY.........)
CREATE TABLE CUSTOMER_REP_MAPPINGS
(MappingID PRIMARY KEY,
SalesRep VARCHAR(40),
CustID INT)

With these table definitions, the partitioning expression for the CUSTOMER_REP_MAPPINGS article is:

SalesRep= SUSER_SNAME()

A join filter is defined between the two articles by using the expression:

CUSTOMER.CustID = CUSTOMER_REP_MAPPINGS.CustID

For an example of a mapping table used in a more complex replication topology, see the section "Reducing the Number of Subscribers by Implementing a Hierarchy of Publishers" later in this paper.

Denormalization

When you use a hierarchical partitioning technique similar to the CUSTOMER, ORDER, and ORDER_ITEM tables used as examples earlier in this paper, the table at the top of the relationship (CUSTOMER) usually has the subset filter clause. The other tables that are related to the CUSTOMER table through constraints have join filters that identify the relationships. This approach is known as following the references.

When this hierarchy grows larger, the subset filter clause logic grows farther from the lower level tables, and the partition views can become more complex, as shown in this example:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY, 
 SalesRep VARCHAR(40)...)
CREATE TABLE ORDER
(OrderID INT PRIMARY KEY,
 CustID INT FOREIGN KEY 
      REFERENCES CUSTOMER(CustID) ...)
CREATE TABLE ORDER_ITEM
(OrderItemID INT PRIMARY KEY,
 OrderID  INT FOREIGN KEY
      REFERENCES(ORDER.OrderID)....)

The CUSTOMER article usually contains this subset filter clause:

CUSTOMER.SalesRep = SUSER_SNAME()

There is also a join filter between the CUSTOMER and ORDER table and a join filter between the ORDER and ORDER_ITEM table.

It might be possible to denormalize the table design by carrying the columns used in the subset filter clause in some of the lower level tables, as shown here:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY, 
SalesRep VARCHAR(40)...)
CREATE TABLE ORDER
(OrderID INT PRIMARY KEY,
 CustID INT,
 SalesRep VARCHAR(40),
   FOREIGN KEY (CustID, SalesRep)     
   REFERENCES   
   CUSTOMER(CustID, SalesRep)...)
CREATE TABLE ORDER_ITEM
(OrderItemID INT PRIMARY KEY,
 OrderID  INT,
 SALES_REP VARCHAR(40),
   FOREIGN KEY (OrderID, SalesRep)     
   REFERENCES   
   ORDER(OrderID, SalesRep)...)

With this type of table structure, it is not necessary to use join filters between the table articles. Each article is defined with the subset filter clause of:

SalesRep = SUSER_SNAME()

This design is not normalized, according to database design theory, because the SalesRep columns are redundant and should be included only in the top-level table. But the costs of denormalizing the tables somewhat might be outweighed by the performance benefits for replication partitioning.

This technique simplifies the partitioning views, but it is not always more efficient than following the references. The efficiency of the technique depends on several application-specific and database-specific factors and, like all optimization techniques, should be thoroughly tested.

Additional Issues and Considerations for Optimizing Merge Replication Performance

Improving the performance of any database application involves identifying processing bottlenecks. The bulk of this paper focuses on the bottlenecks associated with partition evaluation at the Publisher. But the overall performance of merge replication is affected by several factors, any of which can be a bottleneck. This section lists a number of these factors and provides suggestions about how you can address them or investigate them further.

Processing Resources Available to the Merge Processes

As the processing requirements for a server increase due to more data, an increase in the number of simultaneous users, and so on, the Publisher or Subscriber hardware might not be adequate for the job. Evaluating hardware performance usually includes running the Microsoft Windows® System Monitor tool, which has performance counters for processor, memory, disks, and so on. You may be able to identify a specific component of your server hardware that can be upgraded, or you might need to upgrade the entire server computer to a more current model or faster class of computer.

The Throughput of the Publisher/Subscriber Connection

Depending on the volume of data and the complexity of the database, merge processing can send a lot of data over the network. The throughput of the connections in your merge replication topology can be a factor in the overall performance of merge replication.

The Publisher and Subscribers should be connected through the highest speed connection possible: If you are using slow RAS connections, consider taking advantage of DSL or other high-bandwidth technologies.

Performance problems with slow connections are usually related to the initialization of a new Subscriber, especially when a large data set is sent to the Subscriber. If you are seeing performance problems in the initial merge, you can pursue alternative methods for delivering the initial data set to a new Subscriber. For more information, see the topic "Applying the Initial Snapshot" in SQL Server Books Online.

Using the @keep_partition_changes=TRUE setting for a publication can reduce unnecessary partition re-evaluation, which significantly reduces the amount of data sent over the network during merge processing. Configuring your subscriptions as global or anonymous instead of local can also be beneficial because less replication metadata is sent over the network.

The Type of Subscriptions

Merge Agents for push subscriptions are executed at the Publisher, which is appropriate for publications with a small number of Subscribers. If you have a large number of push Subscribers running simultaneously, the processing resources (memory, CPU cycles) for the agents are imposed on the Publisher hardware and then those resources are not available for SQL Server or other applications. If you use pull subscriptions or anonymous subscriptions, the processing resources for the Merge Agents are imposed on the Subscriber computers rather than on the Publisher hardware.

The Number of Simultaneous Merge Replication Processes

Performance can also degrade when multiple Subscribers are merging changes simultaneously. Faster hardware and more efficient replication configuration can address this issue to a certain degree. But even in a perfectly designed and tuned application, every Publisher server has an upper limit with respect to the number of simultaneous Subscriber merge processes it can handle effectively.

One solution to this problem is to coordinate the timing of the Subscribers' merge processes so that they are staggered. If all your Subscribers currently connect at the beginning or end of their work day to merge changes, you can encourage or require some users to perform their merge processes at off-peak times during the day.

If you have no control over when subscribers perform their merge processes, you can define an upper limit to the number of subscribers that are allowed to merge at the same time. For more information, refer to the @max_concurrent_merge publication property of sp_addmergepublication in SQL Server Books Online.

Reducing the Number of Merges at a Single Publisher by Implementing a Hierarchy of Publishers

Another way to reduce the demands on a Publisher server is to spread the processing load across more servers. A common technique is to use several servers in a republishing hierarchy. For example, if you have a single server that acts as the Publisher for all of the sales representatives in the country, you could add two more Publisher servers:

  • The Central Publisher publishes data to East and West Subscribers.

  • The East Subscriber republishes data to the sales representatives on the East coast.

  • The West Subscriber republishes data to the sales representatives on the West coast.

The mapping tables technique described earlier can be used to define how data is partitioned for each Subscriber. In that example, the mapping table created an association between customers and sales representatives. To use the same technique for partitioning in a hierarchy, you can use a column, such as ReplHostName that identifies the HOST_NAME() values for each server that should have the customer in its partition, as follows:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY.........)
CREATE TABLE CUSTOMER_PARTITION_MAPPINGS
(MappingID PRIMARY KEY,
 ReplHostName VARCHAR(40),
 CustID int)

CUSTOMER_PARTITION_MAPPINGS data

MappingID

ReplHostName

CustID

1

East

1

2

BobJones

1

3

East

2

4

JaneGray

2

5

West

3

6

JimBrown

3

In this example, each customer needs two rows: one row to identify the regional republisher partition and a second row to identify the SalesRep partition. The BobJones and JaneGray Subscribers subscribe to the East republisher, and the JimBrown Subscriber subscribes to the West republisher. The Merge Agent for each Subscriber must use the –Hostname value appropriate for their subscriptions: East, West, BobJones, JaneGray, or JimBrown.

At first glance, one might think that the dynamic filtering expression should be in the subset filter clause for the CUSTOMER_PARTITION_MAPPINGS article:

CUSTOMER_PARTITION_MAPPINGS.ReplHostName = HOST_NAME()

And a join filter would represent the join between the two tables:

CUSTOMER.CustId = CUSTOMER_PARTITION_MAPPINGS.CustId

However, if this logic is used, the East republisher would only get the CUSTOMER_PARTITION_MAPPINGS table rows (and the related customers) where MappingID is 1 or 2, that is, the rows that match the ReplHostName = 'East'. The BobJones Subscriber would not get any data because there are no CUSTOMER_PARTITION_MAPPINGS rows in the East republisher for BobJones.

The implication for this publication is that the entire, unfiltered CUSTOMER_PARTITION_MAPPINGS table must be replicated to all of the Subscribers. This means that the dynamic filtering logic must be in the join filter expression rather than in the subset filter clause. The CUSTOMER_PARTITION_MAPPINGS article would not have a subset filter clause, and the subset filter clause for the CUSTOMER article would be:

CUSTOMER.CustId = CUSTOMER_PARTITION_MAPPINGS.CustId and
CUSTOMER_PARTITION_MAPPINGS.ReplHostName = HOST_NAME()

Using this filtering technique has two benefits:

  • It does not require different filtering expressions for each publication; the same filtering expressions can be used for all Publishers.

  • The partitions are defined by data in the mapping table, which means the approach is flexible. The republishing hierarchy can be made either wider or deeper to accommodate future growth needs.

For example, if the number of Subscribers to the East republisher grows beyond the capacity of the East server, you can make the hierarchy wider by adding a SouthEast republisher. This requires:

  • Adding a SouthEast Subscriber to the Central Publisher.

  • Creating a publication at SouthEast to make it a republisher.

  • Updating the ReplHostName to SouthEast for the southeast customers in the CUSTOMER_PARTITION_MAPPINGS table.

  • Removing the southeast Subscribers from the East publication and adding them to the SouthEast publication.

To make the hierarchy deeper, another city layer can be added. This requires:

  • Adding NewYork and Boston Subscribers to the East republisher and creating the republishing publications for New York and Boston.

  • Adding rows to the CUSTOMER_PARTITION_MAPPINGS table for the NewYork and Boston customers.

  • Removing the BobJones and JaneGray subscribers from the East republisher.

  • Adding BobJones to the NewYork subscriber and adding JaneGray to the Boston republishers.

In either redesign of the merge topology, the next merge process for each Subscriber automatically updates the data sets.

The Number of Retries During Merge Processing

When the Merge Agent is processing changes and it encounters a problem, it might attempt to apply the change again at the end of the process. The rows that are retried are processed much less efficiently than rows that are processed normally.

Problems in applying changes are most often caused when there are primary key / foreign key references defined in the tables. These problems include the following:

  • The merge process tries to insert a foreign key row before the related primary key row is present.

  • The partitioning logic is different from the referential integrity logic. The partitioning logic might produce foreign key rows whose primary key row is not included in the partitioned data set; foreign key insert failures are always retried again at the end of the merge process.

To detect merge retries, define and run a SQL Profiler trace for the merge process that is filtered to include the sp_MSenumtries and sp_MSgetonerow procedure calls.

Tables Containing Text or Image Columns

If your published table includes text or image columns, these columns can impose extra processing requirements on the merge replication process, even if the text or image column is not changed. If the table is regularly updated, consider optimizing your design by moving the text and image columns to another table. For example, if you have a table such as:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY,
.
.
CustComments TEXT)

Overnormalizing the table could be implemented as:

CREATE TABLE CUSTOMER
(CustID INT PRIMARY KEY,
.
.
)
CREATE TABLE CUSTCOMMENTS
(CommentID INT PRIMARY KEY,
CustID INT FOREIGN KEY
   REFERENCES CUSTOMER(CustID),
CustComment TEXT

The Size of the Merge Replication System Tables

Merge replication tracks data changes in a single system table, MSmerge_contents. Because this table contains the data changes for all the tables in the database that are published with merge replication publications, the table can grow very large, which will have a significant impact on the performance of merge replication.

To control the size of the system tables, regularly remove outdated or unnecessary data from the MSmerge_contents table. If you are using SQL Server 2000 Service Pack 1 or later, the retention-based metadata cleanup process automatically removes the metadata for changes that are so old that they are beyond the @retention property for the publication. Reducing the @retention property to the lowest acceptable value for the needs of the application can dramatically improve the performance of merge applications when you have many users or large volumes of changes.

In addition to controlling the size of the system tables, consider whether it is necessary to replicate all of the tables with merge replication. For example, reference or lookup data is usually relatively static and often better suited to snapshot replication, which does not involve the merge replication system tables. You could also use a custom technique to update the tables independently of replication.

If the previous techniques are insufficient to control the size of the system tables, you can separate published data and put it into publications in different databases. Putting the data into different publications in the same database does not improve performance because one MSmerge_contents table is used for all publications within a database. This technique assumes that the data can be logically separated. (Remember that all data related through join filters must be in the same publication.)

The Overall Volume of Data in Published Tables

In general, tables with large numbers of rows require more processing resources than smaller tables. One of the goals of partitioning, as discussed earlier in this document, is to reduce the size of the tables in the Subscriber databases. But you might also be able to reduce the size of the published tables by moving outdated data to archive tables or databases.

The Volume of Changes Replicated in Each Merge Process

The volume of changes that are made in each merge process must be weighed against the frequency of merge processing. A Subscriber that merges less often has more changes to process than a Subscriber that merges more frequently. You can control to some degree the volume of changes replicated in each merge by having the Subscribers merge more frequently. As mentioned earlier, you can also reduce the @retention property for the publication, which prevents Subscribers from getting too far behind.

Merge Agent Properties

These properties all have performance implications with respect to the merge process and can be adjusted either through the profile for the Merge Agent or by manually changing the properties of the Merge Agent job steps in SQL Server Enterprise Manager.

-Continuous, -PollingInterval

The -Continuous property forces the merge process to loop indefinitely through the upload and download stages of processing without disconnecting and reconnecting with each loop. Although it does not affect partition processing itself, it does allow the merge process to remain more up-to-date when there are ongoing changes to replicate. Remaining current can help avoid replicating a huge number of changes in infrequent merges.

You can also adjust the frequency of merge processing by adjusting setting for the -PollingInterval of the Merge Agent job schedule. But when the application needs near real-time latency and the Publisher receives regular ongoing changes from a Subscriber, using –Continuous is usually more appropriate. It avoids the connect/initialize/disconnect overhead of having the agent job scheduled to run every few minutes.

-Validate, -ValidateInterval

The -Validate property specifies whether the data at the Publisher and the Subscriber is compared after all changes have been merged to ensure that they are in sync. This extra process can take a long time to complete and can require significant query processing resources. Use the -ValidateInterval property to control how often validation is performed during the merge process.

-HistoryVerboseLevel, -OutputVerboseLevel

These properties control the amount of information that is recorded by the merge process. Reducing the values can improve performance when a large number of Subscribers are exchanging relatively small sets of data changes.

-MaxBcpThreads

This property controls how many threads are used during bcp operations. This setting only affects the initialization of a new Subscriber.

-SrcThreads, -DestThreads

These properties control how many threads can be used to process the articles in a publication. Increasing these values on servers that have a large amount of memory and processing resources allows the merge process to perform more activities in parallel.

-MaxDownloadChanges, -MaxUploadChanges, and Batch-size Parameters

These settings all control how many changes the merge process handles at one time:

  • -MaxDownloadChanges and -MaxUploadChanges

  • -UploadGenerationsPerBatch and -DownloadGenerationsPerBatch

  • -UploadReadChangesPerBatch and -DownloadReadChangesPerBatch

  • -UploadWriteChangesPerBatch and -DownloadWriteChangesPerBatch

When there are a large number of changes to process and there is a high-speed connection between the Publisher and Subscriber, increasing these values can result in higher throughput because the merge process then has less per-batch overhead.

When there is a low-speed or unreliable connection between the Publisher and Subscriber, decreasing these values can help ensure that smaller batches are completed entirely, avoiding the costly retry operations associated with incomplete batches.

-ExchangeType

The default for this property (–ExchangeType 3) forces the merge process to upload Subscriber changes and then download Publisher changes. You can use this property to specify upload only (–ExchangeType 1) or download only (–ExchangeType 2).

Specifying 1 or 2 might be useful when there are a large number of changes from several Subscribers and the changes from each Subscriber affect the data sets of other Subscribers. In this situation, each Subscriber uploads its changes (using –ExchangeType 1) but does not download any changes. When all Subscribers have uploaded their changes, they all can then download the combined set of changes using –ExchangeType 2. This approach reduces the overhead incurred when performing several bidirectional merges to add all Subscribers changes.

Conclusion

Merge replication is an ideal solution for many distributed applications, particularly when data is updated at multiple sites and disconnected Subscribers are used. However, for merge replication to be effective:

  • It must be implemented in a manner consistent with efficient partition query processing.

  • The database around which applications are built must be carefully designed and tuned with replication in mind.

Because merge replication uses standard SQL Server queries, procedures, tables, and views, you can observe how merge replication processes, particularly partitioning processes, use these objects and then tune and adjust the applications accordingly.

Partitioning data is fundamental to many merge replication applications; it can be used to enforce business rules and to improve performance. However, if it is not implemented correctly, it can also become a major source of performance problems. Following the processes and suggestions outlined in this paper can help you achieve scalable, high-performance merge replication applications.