Offloading Batch Processing

Some applications require that processing-intensive batch operations be performed on data. In many cases, these batch operations cannot be performed on the online transaction processing (OLTP) server, because the processing overhead interferes with other operations on the server. In this case, it is necessary to perform the batch processing on a separate server. In some cases, the batch processing is simply offloaded; in other cases, the results of the batch process are propagated back to the online processing server.

The following diagram shows a typical scenario with data replicated to a batch processing server:

Replicating data for batch processing

Adventure Works Cycles Example

Adventure Works Cycles is a fictional manufacturing company used to demonstrate database concepts and scenarios. For more information, see AdventureWorks Sample Databases.

Adventure Works Cycles uses batch processing to check for credit card fraud on their website. Data collected from website transactions is replicated from the Microsoft SQL Server that services the website to a separate SQL Server that is used for a number of Adventure Works Cycles applications. On the batch processing server, the data is checked for patterns of credit card fraud. Although the fraud detection produces a small amount of data (updating data in a small number of columns if an account shows suspicious activity), the checks are computation-intensive and require substantial server resources. After the batch process runs, a small amount of data is sent back to the OLTP server for the website, indicating any accounts that show possible signs of fraud.

Common Requirements for This Scenario

Batch processing applications typically have the following requirements, which an appropriate replication solution must address:

  • The system must maintain transactional consistency.

  • The system should have low latency: updates at the online processing server must reach the batch processing server quickly.

  • The system should have high throughput: it should handle the replication of a large number of transactions.

  • Replication processing should require minimal overhead on the online processing server.

  • Data changes might flow in both directions: the results of batch processing might be propagated back to the online processing server.

  • The data required at the batch processing server might be a subset of the data available at the online processing server.

The Type of Replication to Use for This Scenario

SQL Server uses a publishing industry metaphor to describe the components of the replication system. The components include the Publisher, Subscribers, publications and articles, and subscriptions.

  • In the diagram above, the online processing server is the Publisher. Some or all of the data at the online processing server is included in the publication, with each table of data being an article (articles can also be other database objects, such as stored procedures). The batch processing server is a Subscriber to the publication, receiving schema and data as a subscription.

  • If results are propagated back to the online processing server, the batch processing server is also a Publisher (typically with a publication identical to the one at the online processing server) and the online processing server subscribes to this publication.

For more information on the components of the system, see Replication Publishing Model Overview.

SQL Server offers different types of replication for different application requirements: snapshot replication, transactional replication, and merge replication. This scenario is best implemented with transactional replication, which is well suited to handle the requirements outlined in the previous section. For more information on transactional replication, see Transactional Replication Overview and How Transactional Replication Works.

By design, transactional replication addresses the principal requirements for this scenario:

  • Transactional consistency

  • Low latency

  • High throughput

  • Minimal overhead

The options to consider for this scenario are filtering, peer-to-peer transactional replication, and bidirectional transactional replication:

Steps for Implementing This Scenario

To implement this scenario, you must first create a publication and subscriptions, and then initialize each subscription. Click the links below for more information about each step:

After the subscription is initialized and data is flowing between the Publisher and Subscribers, you might need to consult the following topics for information on common administration and monitoring tasks: