Designing Distributed Partitioned Views

When you design a set of distributed partitioned views to implement a federation of database servers, consider the following:

  • Determine the pattern of SQL statements executed by the application

  • Determine how the tables are related to each other.

  • Match the frequency of SQL statements against the partitions defined from analyzing the foreign keys.

  • Define the SQL statement routing rules.

Pattern of SQL Statements Executed by the Application

Develop a list of the SQL statements that will be executed by the application during typical processing periods. Divide the list into SELECT, UPDATE, INSERT, and DELETE categories, and order the list in each category by frequency of execution. If the SQL statements reference stored procedures, use the base SELECT, INSERT, UPDATE, and DELETE statements from the stored procedure. If you are partitioning an existing SQL Server database, you can use SQL Server Profiler to obtain such a list.

The recommendation for using the frequency of SQL statements is a reasonable approximation in the typical online transaction processing (OLTP) or Web site database in which distributed partitioned views work best. These systems are characterized by having individual SQL statements that retrieve relatively small amounts of data when compared to the types of queries in a decision support, or OLAP, system. When each SQL statement references a small amount of data, just studying the frequency of each statement yields a reasonable approximation of the data traffic in the system. However, many systems have some group of SQL statements that reference lots of data. You may want to take the additional step of weighting these queries to reflect their larger data requirements.

Relationships of the Tables

The intent is to find clusters of tables that can be partitioned along the same dimension, for example, part number or department number; so that all the rows related to individual occurrences of that dimension will end up on the same member server. For example, you may determine that one way to partition your database is by region. To support this, even tables that do not have a region number in their key must be able to be partitioned in some manner related to a region. In such a database, even when the Customer table does not have a region number column, if regions are defined as collections of whole states or provinces, the Customer.StateProvince column can be used to partition the customers in a manner related to region.

Because they define the relationships between tables, explicit and implicit foreign keys are the prime elements to review in looking for ways to partition data. Study the explicit foreign key definitions to determine how queries would frequently use rows in one table to find rows in another table. Also study implicit foreign keys, or ways that SQL statements use values in the rows of one table to reference rows from another table in join operations, even when there is no specific foreign key definition. Because implicit foreign keys are not explicitly defined as part of the database schema, you must review the SQL statements generated by the application to understand whether there are statements that join tables using nonkey columns. These implicit foreign keys are typically indexed to improve join performance. Therefore, you should also review the indexes defined in the database.

Frequency of SQL Statements Against the Partitions

Match the frequency of SQL statements against the partitions defined from analyzing the foreign keys. Select the partitioning that will best support the mix of SQL statements in your application. If some sets of tables can be partitioned in more than one way, use the frequency of SQL statements to determine which of the partitions satisfies the largest number of SQL statements. The tables most frequently referenced by SQL statements are the ones that you want to partition first. Prioritize the sequence in which you partition the tables based on the frequency in which the tables are referenced.

The pattern of SQL statements also influences the decision on whether a table should be partitioned:

  • Partition a table if more than 5 percent of the statements referencing a table are INSERT, UDATE, or DELETE statements, and the table can be partitioned along the dimension you have chosen.

  • Maintain complete copies of tables on each member server if less than 5 percent of the statements referencing the table are INSERT, UPDATE, or DELETE statements. You also must define how updates are made so that all the copies of the table are updated. If high transactional integrity is required, you can code triggers that perform distributed updates of all the copies within the context of a distributed transaction. If you do not need high transactional integrity, you can use one of the SQL Server replication mechanisms to propagate updates from one copy of the table to all other copies.

  • Do not partition or copy a table if more than 5 percent of the statements referencing a table are INSERT, UDATE, or DELETE statements, and the table cannot be partitioned along the dimension you have chosen.

SQL Statement Routing Rules

The routing rules must be able to define which member server can most effectively process each SQL statement. They must establish a relationship between the context of the input of the user and the member server that contains the bulk of the data required to complete the statement. The applications must be able to take a piece of data entered by the user, and match it against the routing rules to determine which member server should process the SQL statement.