Scale-Out (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.
You can choose to scale-out applications in an installation for various reasons. For this discussion, we broadly classify the reasons into two categories: scale-out for throughput (or scalability), and scale-out for occasionally connected devices or to meet some organizational need (for reasons other than throughput or scalability).
As technology has evolved over the last few decades to provide more processing power at a lower cost, the need to scale-out to meet throughput requirement has continually declined. Additionally, compared to scale-up systems, scale-out systems are more complex to develop, deploy, and maintain, requiring a more highly skilled staff. For example, meeting disaster recovery service-level objectives requires more work for scaled-out systems because of the additional coordination necessary. Similarly, upgrading to newer versions of applications, schemas, or system software is more complex in a scaled-out, distributed environment. (For a list of the various trade-offs, refer to the Wikipedia article, “Distributed database”1) The response time for a scaled-out transaction can be lower or higher than that of a transaction that is not scaled out, depending on many factors, including query patterns, data distribution, optimizer sophistication, and network interact-connect. These factors have kept scale-up as the predominant option for IT professionals.
However, there continues to be a desire to use lower-cost commodity hardware to meet the throughput requirement and to add or remove capacity in smaller sizes, to reduce power consumption, and to overall lower the total cost of ownership. This desire has led to renewed interest in scale-out architectures, with even more impetus since the introduction of Windows Azure and SQL Azure, which require scale-out architectures for some workloads (these requirements can currently be easily met with scale-up architecture using medium to high-end hardware).
Scale-Out for Throughput
Scale-out techniques used for throughput or scalability include:
Data-dependent routing (DDR), under application control
Semi-transparent (or translucent) distributed partitioned views (DPVs)
Replication, either transactional or peer-to-peer
Microsoft SQL Server Service Broker, for parallel or distributed processing of conversations
In the section, "Questions and Considerations," below, we provide guidance on the relative merits and usage scenarios of these options, and we cover these technologies individually in the five associated Technical Reference Guides (TRGs).
Although not covered in the associated online transaction processing (OLTP) TRG, Microsoft Windows Server AppFabric (described in the article, “AppFabric Cache: Real-World Usage and Integration”2) provides another scale-out technology that can be used by an application to obtain more throughput and scalability by caching database data at the application tier for either scale-up or scale-out database management system (DBMS) solutions. This is discussed in the Middleware TRGs.
Scale-Out for Non-Throughput
We now briefly cover the predominant scenarios for which scaling out is adopted for reasons other than throughput. With occasionally connected devices (where network connectivity is not available, not desired, or not reliable), installations might choose scale-out architectures. Scenarios in which there is a need to operate independently of connection to a remote host include workers at retail stores, travelling sales agents, field service workers, and workers at secure sites which do not permit external electronic communication. Another example of a non-capacity scale-out scenario is one driven by government regulations that might require keeping data in the country/region where it was gathered, with only summary data allowed out of the country/region or used for providing high availability or disaster recovery.
Scale-out technologies used for data synchronization between the various scaled-out devices and remote hosts in non-throughput scenarios include:
Microsoft Sync Framework
In the section, "Questions and Considerations," below, we share thoughts on the relative merits and use case scenarios for these technologies, and we cover them individually in their associated TRGs.
Several of the links that follow point to articles that discuss capabilities and features in SQL Server 2005; however, with some minor exceptions, these also apply to SQL Server 2008.
The white paper, “Scaling Out SQL Server 2005,”3is a very useful article that discusses scalable shared databases, peer-to-peer replication, linked servers, DPVs, DDR, and the associated trade-offs of using these options. However, the article does not cover Service Broker in detail. It concludes by listing the factors influencing the selection of these technologies (including Service Broker) for scale-out in a table.
This article is helpful for selecting an appropriate technology based on usage scenario. The article also classifies the data to be scaled-out into three types based on usage pattern: reference data, activity data, and resource data.
The article, “Architecting Service Broker Applications,”4 discusses where and why Service Broker should be used, and describes the decisions you must make to successfully design and build a Service Broker application. You should review this article to understand how to use Service Broker for scale-out design.
Another pertinent article is “How SQL Server 2005 Enables Service-Oriented Database Architectures,”5 which discusses how SQL Server enables Service-Oriented Architectures (SOAs). Of special interest are Figures 7, 8, and 9, which illustrate how Service Broker enables scale-out topologies by routing messages and by allowing messages to be processed on different servers.
After reviewing the articles listed above, architects can reference the article, "Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005.”6 This article covers various technologies, including Service Broker, DDR, DPVs, peer-to-peer replication, and query notification.
Currently, we are not encountering wide-spread adoption of new systems using query notifications, and we suggest that you instead consider AppFabric (described in the article, “AppFabric Cache: Real-World Usage and Integration”2). AppFabric can be used by an application to get more throughput and scalability by caching database data at the application tier for either scale-up or scale-out DBMS solutions. AppFabric is covered in the Middleware technical reference guides.
As the name implies, the technical article, “Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 2005,”7 is very useful for better understanding Service Broker, DDR, peer-to-peer replication, and DPVs. The article is probably of more value during the development and debugging phases, and of somewhat less value during the technology selection and architecting phases. The companion article, “Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005,”6 appears to be more valuable during the initial design phase.
Case Studies and References
Different scale-out techniques have been employed by different companies. This section describes some of these techniques.
Myspace developed and deployed five major architecture changes as they experienced rapid growth, and their experience is described in the article, “Inside MySpace: The Story.”8 Their latest architecture includes most scale-out techniques (DDR, SOA, DPV, Service Broker, and peer-to-peer). This illustrates that for a large system, it is reasonable to use different solutions based on the need. The drawback is that the installation requires the development and maintenance of skills in all of these technologies.
The case study, “United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Security,”9 describes how the U.S. government uses peer-to-peer replication to scale out its database structure, and to incorporate remote server locations for disaster recovery.
A retailer with 2,500 stores and 15,000 point-of-sales (POS) terminals uses merge replication to exchange product and price changes with the headquarters. The prices can be changed by either stores or by the headquarters. The retailer also uses Service Broker to exchange detailed sales data between the headquarters and the servers based in the retail stores.
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
It is not easy to decide whether to scale-out or scale-up data to meet scalability requirements. In our opinion, the probability is in the high nineties that most applications can be deployed by a scale-up design. However, after reviewing the documents referenced above and discussing the trade-offs with the customer, you might decide to adopt scale-out for the associated benefits. The next decision is which technology to adopt for scale-out deployment. Below, we summarize some of the trade-offs in choosing a technology, and point to other documentation for more detailed discussion.
It is useful to gather as much technical and business information as practical because this will influence the selection of a technology and architecture. Once the choice of architecture is made and development begins, changing the architecture is very difficult and can attract negative visibility. Traditional volumetric information for users, data sizes, transaction numbers, and network topology continue to be important. In addition, you need to understand the data characteristics and usage (as discussed in “Scaling Out SQL Server 2005” 3). You must decide how best to federate and partition data. A good understanding of the application domain is required, and the customers are the best source for that domain expertise.
We have not discussed scalable shared databases, log shipping, and snapshot databases with SQL Server 2008 R2 database mirroring as potential solutions for scaling-out for read-only purposes. In some cases, these can be acceptable solutions, but our experience indicates that these technologies are not practical for tier-one workloads, when customers generally need real-time data for reporting. If the data currency requirements are relaxed, however, these solutions might be acceptable.
DDR (application driven) and Service Broker both permit scale-out architectures. DDR offers a system designer more flexibility and control over the distribution options and schema design because the application is developing and managing the scale-out capability. Also, DDR does not require the developer to learn a new technology. However, the development, deployment, and operation can be a lot more complex because the application needs to account for all challenges associated with distribution. The level of sophistication and skills required for DDR is related to the complexity of the data schema and the processing. If the data model and processing requirements are straightforward (which is not likely for a tier-one application), the scale-out design might not be overly complex.
For simpler data models and processing, we have noticed that federating and partitioning using date/time, geography, user IDs, item numbers, or synthetic keys often suffice. Additionally, users try to avoid joining data from multiple partitions for predictable, good performance, and also avoid updating data on two servers within the same transaction to reduce the overhead associated with distributed transactions.
Scale-out architectures based on Service Broker reduce the design, development, and operational complexity because Service Broker provides an infrastructure that addresses some of the toughest scale-out problems associated with synchronization, multi-threading, messaging, and routing. However, using this type of architecture requires a learning curve for the architects, developers, and database administrators, and also imposes some design requirements that reduce the freedom enjoyed by the DDR developer. In particular, database administrators are often not familiar with message-based architectures and the tooling support that is currently available for administering and troubleshooting Service Broker applications in SQL Server Management Studio is very limited.
If the scale-out requirement is primarily driven by the need to provide capacity for reporting on a separate server (so as to reserve the OLTP data copy for the operational workload only), then a transactional server is often sufficient. Peer-to-peer replication offers an additional advantage in that both copies can be updated if a scheme is adopted to avoid or minimize the potential conflicts arising from the same record being updated by both servers. The advantage of using replication is that it is a mature and widely used technology with skilled workers typically available for deployment.
For some workloads, peer-to-peer replication can also provide a high availability and disaster recovery solution. These workloads can generally tolerate some data loss in the case of a disaster. The drawback of using replication is that the flexibility is limited when compared to DDR or Service Broker, with which user-written application code can provide relatively more flexibility.
DPV (also called transparent scale-out) can often work in simple cases, but there can be a negative impact on performance when the Transact-SQL predicates become complex, or when there are joins on columns not used for partitioning. Refer to the blog post, “Distributed Partitioned Views / Federated Databases: Lessons Learned,”10 for a few of the lessons learned.
DPV can be used in combination with other technologies such as DDR. By using DPV, you can simplify the read data access because DPV masks the fact that the table has been physically partitioned. For writing to the table, you can direct the application to the appropriate partition based on DDR for efficiency.
Merge replication is often selected when scaling out for reasons other than processing capacity because it is a well understood, mature technology and it provides a default conflict resolution mechanism. Merge replication can be used in conjunction with other technologies. An example of this is the retail store case study listed above. We have observed that, depending on data volumes and concurrency of merging, performance can suffer with 400 to 500 subscribers. Often the solution is to have a fan-out design with 8 to 10 subscribers in the middle layer who serve as publishers, enabling thousands of other subscribers. The downside is that merge replication does not permit the degree of flexibility that other approaches such as Service Broker or synchronization framework provide and the changes are not propagated in a transactional manner. Changes are propagated on a table by table basis. These other approaches are more flexible because the user has the option to develop code to meet the needs.
The article, “Enhancing Merge Replication Performance,” 11 has useful performance tuning tips.
An advantage of merge replication is that it is a mature technology, while Sync Framework and Service Broker are technologies that require the development and maintenance of specific skills. However, a significant advantage of using Sync Framework is the flexibility. You can access any data source, including non-relational data; there is no dependency on the availability of drivers, which is the case with merge replication and Service Broker. Sync Framework can also be used to sync up on-premise data with SQL Azure databases.
Following are the full URLs for the hyperlinked text.
1 "Distributed database," http://en.wikipedia.org/wiki/Distributed_data_base
2 "AppFabric Cache: Real-World Usage and Integration," http://msdn.microsoft.com/en-us/magazine/ff714581.aspx
3 "Scaling Out SQL Server 2005," http://msdn.microsoft.com/en-us/library/aa479364.aspx
4 "Architecting Service Broker Applications," http://msdn.microsoft.com/en-us/library/aa964144.aspx
5 "How SQL Server 2005 Enables Service-Oriented Database Architectures," http://technet.microsoft.com/en-us/library/cc966516.aspx
6 "Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005," http://technet.microsoft.com/en-us/library/cc917714.aspx
7 "Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 2005," http://www.microsoft.com/technet/prodtechnol/sql/2005/scaleout.mspx
8 "Inside MySpace: The Story," http://www.baselinemag.com/article2/0,1540,2082921,00.asp
9 “United States Government: Federal Institution Supports 60 Million Daily Database Hits with Reliability, Added Security,” http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000004415
10 "Distributed Partitioned Views / Federated Databases: Lessons Learned," http://blogs.msdn.com/b/sqlcat/archive/2007/06/20/distributed-partitioned-views-federated-databases-lessons-learned.aspx?wa=wsignin1.0
11 "Enhancing Merge Replication Performance," http://msdn.microsoft.com/en-us/library/ms152770.aspx