Batch Processing (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.
Batch (sometimes also called Background or Asynchronous) processing is generally characterized by the absence of user interactivity. Few systems are purely transactional or batch-oriented; even systems that are designed primarily as batch systems may have an online component. For example:
Most month-end or year-end application processing is performed in batches.
Utility (electric, gas) bills are generally produced via batch processing for mailing to clients, while current month usage computation for review by the online user is likely to be computed independently online and not via batch processing, even though most of the application logic may be very similar in batch and online processing.
The following resources provide additional information about batch (or background or asynchronous) processing. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)
Batch processing may require scheduling help. The SQL Server 2008 Books Online article Implementing Jobs1 describes how SQL Server Agent can be used for scheduling jobs.
Consider implementing Message Queuing and SQL Server Service Broker for asynchronous processing. The article Building Reliable, Asynchronous Database Applications Using Service Broker2 discusses the advantages of including an asynchronous, reliable messaging feature in a SQL Server database. For further information on Service Broker, consult the "Scale-Out: Asynchronous" Technology Reference Guide in this series.
The Microsoft Developer Network (MSDN) article Offloading BatchProcessing3 demonstrates how replication can be used to offload batch processing workloads.
The bulk loading of data with a concurrent workload is much more common in a data warehousing environment, but it can be applicable in a transactional batch environment as well. The article Bulk Loading Data into a Table with Concurrent Queries4 may be a useful resource.
Switching to the bulk-logged recovery model, as discussed in Operations That Can Be Minimally Logged5, enhances the performance of certain operations that are used in asynchronous data load processing.
Consider using partitioned tables for large extraction, transformation, and load (ETL) operations. Loading data to an empty partition, creating the indexes, and switching the partition into the table can be faster than normal insert operations by orders of magnitude. For more information, see the following articles:
Case Studies and References
Examples of successful architectures are described in the following documents:
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
A thorough understanding of the system workload helps drive the architecture and design of the system, including aspects that may be appropriate for asynchronous or batch processing.
Define potential boundaries between online and background processing by interviewing stakeholders, such as the customer’s business analysts and architects. Consider designs that make use of common code paths and components for both batch and online processing, yet are consistent with performance and scalability goals.
Consider the tradeoffs between using an entirely online system versus introducing batch processes. Avoid unnecessary complexity introduced by mixing the two approaches.
In some cases, batched background processing can be more efficient than singleton transactions. Generally, batched transactions use fewer resources per unit of work because the overhead associated with transaction initiation and termination is amortized over a larger number of transactions. When using batched background processing, incorporate batch sizes that are consistent with other concurrent operations. For example, work on a few thousand rows at a time to mitigate lock escalation (for example, to table locks).
In other cases, batch processing can be very resource intensive because of the nature of the queries. In our experience, batch processing may involve queries that access or aggregate large amounts of data, which requires parallel execution plans and introduces the risk of negatively affecting transaction response times. If you use online transactional processing in this case, be sure that the queries do not introduce significant resource consumption, which would impact the transactions, and ensure that the queries do not introduce blocking. We have observed that the Resource Governor, although useful in most cases, is unable to effectively manage CPU consumption when large queries could interfere with short duration queries.
Be careful in designing and implementing a "queue" for asynchronous processing. If inefficient application polling is used to select work for the asynchronous processing, an unnecessary burden may be placed on the server. It is better to use an efficient mechanism like SQL Server Service Broker.
ETL processes are usually batch driven. See the Data Warehouse collection of the Technical Reference Guides.
Design the processing window carefully so that daily maintenance procedures are not affected.
Following are the full URLs for the hyperlinked text.
1 Implementing Jobshttp://msdn.microsoft.com/en-us/library/ms187880.aspx
2 Building Reliable, Asynchronous Database Applications Using Service Brokerhttp://msdn.microsoft.com/en-us/library/ms345113(SQL.90).aspx
3 Offloading Batch Processinghttp://msdn.microsoft.com/en-us/library/ms151801.aspx
4 Bulk Loading Data into a Table with Concurrent Querieshttp://sqlcat.com/technicalnotes/archive/2009/04/06/bulk-loading-data-into-a-table-with-concurrent-queries.aspx
5 Operations That Can Be Minimally Loggedhttp://msdn.microsoft.com/en-us/library/ms191244.aspx
6 Partitioned Tables and Indexes in SQL Server 2005http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx
7 Partitioned Table and Index Strategies Using SQL Server 2008http://msdn.microsoft.com/en-us/library/dd578580(v=SQL.100).aspx
8 MySpace: MySpace Uses SQL Server Service Broker to Protect Integrity of 1 Petabyte of Datahttp://www.microsoft.com/casestudies/Microsoft-Windows-Server-2003-Enterprise-x64-Edition/MySpace/MySpace-Uses-SQL-Server-Service-Broker-to-Protect-Integrity-of-1-Petabyte-of-Data/4000004532
9 Washington State: Washington State Saves Millions of Dollars Creating its Voter Registration Databasehttp://www.microsoft.com/casestudies/Microsoft-Windows-Server-2003-Enterprise-Edition-32-Bit-X86/Washington-State/Washington-State-Saves-Millions-of-Dollars-Creating-its-Voter-Registration-Database/4000002511