Application Driven Scale-Out (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Article
03/05/2012

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

With application-driven scale-out, the application is aware of where data resides; to be even more precise, the application has a method for discovering where the data resides. The data can be partitioned according to use (for example, login, mailbox, instrumentation, profile, shopping basket, or catalog), geography (East, Central, West, and so on), time (year, month, date, or other descriptor), hierarchy (for example, corporate, regional, branches, or locations), or simply based on data values, key ranges, key hashing, and other data information. Once the data is partitioned the application can direct queries to the appropriate data partition residing on various servers, achieving additional processing capacity.

Using this data-dependent routing (DDR), the application, or a middleware service, can route queries to the correct database. The degree of knowledge that the application possesses and uses to take advantage of the data partitioning impacts both the complexity and the benefits. In a simple design, the application determines the location by a simple key look up; the rest of the logic might remain the same. Slightly more complex logic might have additional flags indicating maintenance operations. A record that is moved to another partition for load balancing is an example of such a condition, because there is the potential for a duplicate record location. The application might use some logic to decide which record to use or where to post the new updates.

The deeper the knowledge the application has of the data partitioning, the more it can use this knowledge to achieve functionality. However, the application does become more complex and more dependent on partitioning schemes. Any change in the partitioning approach might necessitate application maintenance.

Databases in which the partitioned transactional data aligns with the usage scenarios are often good candidates for application-driven scale-out. When the alignment is poor, application-driven scale-out does not work well. Partitioning of data is also very challenging if the data cannot be partitioned along the most frequently used joining column, or if joins have to be made on the non-partitioning keys. However, there are workarounds that can be used to deal with this issue; more discussion can be found in the documents referenced in the Best Practices section of this guide.

Some of the primary challenges with DDR include developing a predictable and stable partitioning scheme, maintaining consistency in workload, and coping with unforeseen changes in the workload patterns. Failure to address these challenges can lead to a non-uniform workload on the servers with a regular need for rebalancing the data. There is more discussion in the Considerations section of this guide, and also in the documents referred to in this guide.

Best Practices

The following are some best practices and pitfalls to avoid:

A good article covering several architectural choices for scaling out is Scaling-out SQL Server 2005.1 This article covers several options, but the specific one of interest is the DDR. You can also review the other options to get an idea of other approaches.
The article Scaling-out SQL Server with Data Dependent Routing2 illustrates the ideas followed by an internal Microsoft installation to achieve their goal. This is a good example if your application fits a similar profile.
Microsoft SQL Server Service Broker can be used to help with routing with DDR implementations that consist of routing logic built into the data layer of the application, and that use a table to hold the routing data. When the data layer sends a message, it checks for the message destination and sends the message on the appropriate Service Broker dialog. The data layer opens a dialog to each destination and keeps them open to speed up the process. For maximum flexibility, you could use the same tables to hold routing data and security data. In this way, you can expand the number of destinations by adding services. You could also use the built-in Service Broker routing capability to implement DDR.

Starting on page 29, the white paper Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 20053 illustrates how to use Service Broker and DDR together.
You might like to review other massive scale-out concepts and projects outside of Microsoft, such as Hadoop4 and MapReduce.5 These Wikipedia articles include useful links covering actual projects that use these technologies. Some of your customers might talk about these options, and it is beneficial to have a background in these technologies because these also use DDR.

Case Studies and References

Following are some case studies that provide more information.

The article Inside MySpace: The Story6 describes the five major architecture changes Myspace made as they experienced rapid growth, including most scale-out techniques (DDR, service-oriented data architecture (SODA), distributed partitioned view (DPV), and Service Broker). All of these techniques are used today at MySpace.
The article Scaling-out SQL Server with Data Dependent Routing2 illustrates ideas followed by an internal Microsoft installation to achieve their goal. You might like to review it as an example if your application fits a similar profile.
Other slides cover the retrieval, insertion, and deletion of records. Based on the available key, there is either a direct read to a specific server or a DPV is used. Insertions are accomplished by connecting to the specific server.

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

Discuss and evaluate how much data-distribution awareness the customer wants to build into the application. If only high-level knowledge is desired, it might be easier to develop and maintain the application code. However, it can be more difficult to rebalance the data among the partitions without taking the databases offline for some time.

Data redistribution is needed for a variety of reasons, including the following:
- The initially projected usage pattern is not accurate.
- The data access pattern changes over time. For example, suppose a retail company uses Stock Keeping Units (SKUs) for distributing data. It is very likely that the access pattern will change over the course of the year. In the spring, sales volumes will likely be higher for summer clothes and in the fall, they might be higher for the winter clothes. This might require data distribution strategies over the four seasons. Alternately, a sales campaign might dictate a much more aggressive resource need for a particular SKU when the campaign is running.
- Data volumes grow unevenly, causing disk space or response time issues.
These events require such actions as redistribution of data or addition or removal of servers, causing down-time for application availability if the data distribution cannot be achieved online.

Based on these considerations, you might decide to build more logic into the application layer, leading to more complex application design, development, and maintenance.
The need to select a good distribution scheme is paramount. There are no ready-made simple answers; you will need to understand the data domains (or data subject areas), data volumes, transitional volumes, usage patterns, availability requirements, skills set available for development, on-going maintenance, and operation, because they influence the design patterns. These considerations are covered in the white papers and case studies referred in this guide.
Identify the usage pattern for the reference data. Generally, reference data changes slowly with time, and is used in the majority of queries. It might be worthwhile to maintain a copy of the reference data on every node, and update the copy using replication, SQL Server Integration Services (SSIS) packages, or some other technique. While there is a trade-off with the maintenance of a copy of reference data on all nodes because of the additional resources for storage and maintenance, the reference data is generally much smaller than other data and the reduction in access time can be significant.
Depending on the data usage, you might use a different strategy for scale-out; it is not necessary to adopt only a single approach. While the use of multiple approaches increases the learning and maintenance required, it is often necessary for tier-one applications as they are generally not simple. MySpace and Centipede are examples of companies that use multiple scale-out technologies.
Any process that requires transactional data to be accessed across the partition boundary needs to be identified and, where possible, appropriate workarounds need to be implemented. Joining data across servers is possible, but you must carefully consider the response-time impact. Large joins are challenging in any online transaction processing (OLTP) application, but across server nodes they can have a significant negative impact on performance.
Databases are typically backed up individually and restored as necessary. With distributed databases, it is necessary to carefully evaluate the backup and recovery procedures to determine the degree of coordination required because the logical database has been partitioned into many physical databases. The disaster recovery implications are also likely to be more complex, because the databases might have to be recovered to the same point in time for consistency, and typically technologies such as backups, restores, log shipping, and database mirroring operate at the database level. One common approach that offers a reasonable degree of consistency is the use of marked transactions in each database prior to performing backups.

Appendix

Following are the full URLs for the hyperlinked text.

1 Scaling Out SQL Server 2005https://msdn.microsoft.com/en-us/library/aa479364.aspx

2 Scaling Out SQL Server with Data Dependent Routinghttps://technet.microsoft.com/en-us/library/cc966448.aspx

3 Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 2005https://www.microsoft.com/technet/prodtechnol/sql/2005/scaleout.mspx

4 Hadoophttp://en.wikipedia.org/wiki/Hadoop

5 MapReducehttp://en.wikipedia.org/wiki/Mapreduce

6 Inside MySpace: The Storyhttp://www.baselinemag.com/article2/0,1540,2082921,00.asp

Application Driven Scale-Out (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Best Practices

Case Studies and References

Questions and Considerations

Appendix

Additional resources