Microsoft SQL Server 2000 Analysis Services Performance Guide

Article
01/28/2010

By Carl Rabeler, Len Wyatt, Dave Wickert

Summary: This paper describes techniques you can use to optimize query responsiveness and processing performance in Microsoft® SQL Server™ 2000 Analysis Services.

Introduction
Optimizing Analysis Services to Improve Query Responsiveness
Optimizing Analysis Services to Improve Processing Performance
Optimizing the Data Model to Improve Performance
Optimizing Hardware Resources to Improve Performance
Optimizing the Architecture to Improve Performance
Optimizing Clients for Slow Network Connections
Summary
Appendix A: Resources
Appendix B: Interpreting Performance Monitor Counters
Appendix C: How to Flush the Query Results Cache
Appendix D: How to Tune the Process Buffer Size
Appendix E: A Checklist of Configuration Steps for an Analysis Server
Appendix F: When to Consider Moving to SQL Server 2000 (64-bit)

Introduction

Users can analyze data stored in Microsoft® SQL Server™ Analysis Services with greater speed and flexibility than users can query the same data stored in a relational database. Analysis Services queries data more efficiently because it organizes data into multidimensional structures, stores aggregations of fact data, and stores frequently used results in a memory cache for quick access. While many sites deploy Analysis Services with little or no performance tuning and receive good query responsiveness and processing performance, there are many techniques that you can employ to increase the speed with which Analysis Services processes new data into multidimensional cubes and to reduce the length of time Analysis Services requires to respond to well-formed user queries.

This paper addresses these techniques, which are based on the collective experience of the Microsoft Business Intelligence (BI) Practices team and the Analysis Services development team. Sites with large quantities of data, many users, or complex query patterns can benefit the most from employing the techniques discussed in this paper.

This paper is divided into the following sections:

"Optimizing Analysis Services to Improve Query Responsiveness" discusses techniques that you can employ within Analysis Services to increase the speed with which Analysis Services responds to user queries.
"Optimizing Analysis Services to Improve Processing Performance" discusses techniques that you can employ to increase the speed with which new data is processed into Analysis Services, and to minimize the time during which cubes are unavailable to users due to the processing of new data. The impact of techniques used to improve query responsiveness on processing performance is also discussed.
"Optimizing the Data Model to Improve Performance" discusses design decisions you can make to improve query responsiveness and processing performance.
"Optimizing Hardware Resources to Improve Performance" discusses how additional system resources and the appropriate configuration of existing system resources can improve query responsiveness and processing performance.
"Optimizing the Architecture to Improve Performance" discusses techniques that you can employ at the architecture level to improve query responsiveness and processing performance.
"Optimizing Clients for Slow Network Connections" discusses techniques that you can employ on the client side to increase query responsiveness over a slow network connection.

This paper does not address performance issues related to the efficiency of the MDX queries actually submitted to Analysis Services from clients. However, the MDX used in queries and calculated members can have a significant impact on the responsiveness of Analysis Services to user queries. Therefore this should not be overlooked when attempting to improve query responsiveness. For information on MDX resources, see Appendix A, "Resources," in this paper.

In addition to the techniques discussed in this paper, you should also apply SQL Server 2000 Service Pack 3 (SP3) to the computer on which Analysis Services is running (the Analysis server) and update the Pivot Table Service (PTS) on each Analysis Services client computer. Updating PTS on each client computer is particularly important because the client-server architecture of PTS places a significant portion of the PTS code on each client computer and SP3 includes significant performance enhancements to the client-side components of PTS. Specific SP3 performance enhancements are discussed in context throughout this white paper.

Optimizing Analysis Services to Improve Query Responsiveness

Understanding the steps involved in resolving a client query will help you improve query responsiveness. When a client application submits a query to Analysis Services, the query is resolved in the following order:

PTS on the client parses the query and, if the query references multiple data slices, breaks the query into separate query requests. PTS breaks the query into one query request per slice when there are multiple slices on the same dimension.
For each query request, PTS reads dimension information stored in memory to determine the levels and members needed to resolve the query request. If additional dimension information is required, PTS requests it from the server.
PTS determines whether the query request can be resolved using information contained in the client's cache. Query requests that cannot be resolved from the client's memory are forwarded to Analysis Services for resolution.
Analysis Services attempts to resolve the query request from the data stored in its query results cache.
For the portion of each query request that cannot be resolved from the query results cache, Analysis Services retrieves the data from the appropriate cube partitions (using aggregations when they are useful). With small cubes, this data might be in the file system cache; with large cubes, this data is retrieved from disk.
Analysis Services forwards the returned data for each query request to the client and also places the retrieved data in its query result cache for use in resolving future queries.
PTS on the client post-processes the returned data as needed and returns the resolved query to the client application.

There are many factors that directly affect this process. These include:

Aggregation design
Multidimensional storage mode
Number of partitions
Existing data in the query result cache
Existence of writeback tables
Use of parent-child dimensions with a ROLAP partition

In addition to these factors, which are discussed below, the data model design, the availability and configuration of hardware resources on the Analysis server, the architecture of the Analysis Services solution, and client-side settings also affect the speed with which Analysis Services responds to user queries. These effects are discussed in separate sections of this paper.

Design Effective Aggregations

The most effective technique you can use to improve overall query responsiveness (assuming the Analysis server has sufficient memory and hard disk resources) is to design effective aggregations of fact data in each cube partition. Aggregations are pre-calculated subtotals of fact data at a various levels in a cube. Because users rarely execute queries at the lowest level of a cube, designing aggregations of fact data at different dimension levels enables Analysis Services to respond quickly to most queries. For example, suppose that a client requests that Analysis Services compare the sales for two customers and return the name of the customer with the greater volume of sales for the current year. Analysis Services could resolve this query by first scanning for individual facts in the cube at its lowest level (the fact level) that meet the criteria specified in the query, and then subtotaling the sales for these customers by year at query time. Analysis Services could resolve this query faster if these subtotals had been pre-calculated and stored with the cube.

Since it is impractical to store every possible aggregation in the cube, designing effective aggregations is the art of storing the appropriate aggregations for the queries your users are submitting. The aggregation design wizards consider the cost and value of each possible aggregation.

Aggregation cost is a function of the number of rows that could be in the aggregation. Smaller aggregations have a lower cost than larger aggregations. For more information on aggregation size, see "Number and Size of Records in an Aggregation" later in this section.
Aggregation value is a function of how many queries the aggregation would be useful for. For more information on how the Analysis Services design wizards determine the value of an aggregation, see "Understanding the Aggregation Design Tools" later in this section.

The first step toward designing the appropriate set of aggregations is to understand exactly what aggregations are and how Analysis Services uses aggregations to resolve queries.

Understanding an Aggregation and How It is Used

Each aggregation represents the subtotaling of facts in a cube partition at some dimension level for each dimension in the cube. For example, imagine that you are working with a Sales cube that contains the following dimensions, levels (with counts), and measures:

A Customers dimension with five levels: All (1), Country (3), State (80), City (578), Name (3811)
A Products dimension with five levels: All (1), Category (60), Brand (911), Name (7621), SKU (8211)
A Time dimension with four levels: All (1), Year (3), Quarter (12), Month (36)
The cube contains two measures: Units Sold, Sales

Notice that the All level counts as a level. The table in Figure 1 represents a small sample of the rows at the Sales cube fact level. The fact level may contain millions of rows of data that are initially populated and then periodically updated from one or more fact tables in a relational database.

CustID	SKU	Date	Units Sold	Sales
345-23	135123	3/12/2001	2	$45.67
563-01	451236	7/19/2001	34	$67.32
...	...	...	...	...

Figure 1 Excerpt from Sales cube fact level

Now, suppose that a client requests that Analysis Services determine the total number of units of the Brand X sold in Canada during the first quarter of 2001. In the absence of any pre-calculated subtotals stored in the cube, Analysis Services scans the lowest level in the cube for individual facts that meet the specified criteria, and then summarizes these facts according to Brand, Country, and Quarter to resolve the query. The size of the cube, the storage mode of the partitions in the cube, and the hardware resources on the Analysis server determine the length of time Analysis Services requires to resolve this query. If Analysis Services had pre-calculated subtotals at various levels and stored them in this cube, Analysis Services could resolve this query using existing subtotals rather than creating subtotals at query time.

For example, suppose that the facts in this cube had been subtotaled at the Brand, Country, and Quarter levels. Figure 2 illustrates this particular aggregation: the <Country, Brand, Quarter> aggregation.

Country	Brand	Quarter	Units Sold	Sales
Can	Brand X	Q1,2001	9456	$23,914.30
Can	Brand X	Q2,2001	...	...
Can	Brand X	...	...	...
Can	Brand Y	Q1,2001	6722	$37,382.33
Can	Brand X	Q2,2001	...	...
Can	Brand X	...	...	...
US	Brand X	Q1,2001	22,687	$57,375.61
US	Brand X	Q2,2001	...	...
US	Brand X	...	...	...
US	Brand Y	Q1,2001	14623	$81,321.30
US	Brand Y	Q2,2001	...	...
US	Brand Y	...	...	...
...	Brand X	Q1,2001	...	...
...	Brand X	Q2,2001	...	...
...	Brand X	...	...	...
...	Brand Y	Q1,2001	...	...
...	Brand Y	...	...	...
...	Brand Y	...	...	...
Mexico	Brand X	Q1,2001	4286	$12,592.29

Figure 2 Excerpt from <Country, Brand, Quarter> aggregation

As you can see, this aggregation is a set of records containing the same dimensions and measures as the fact level of the cube but subtotaled at a particular dimension level in the cube above the fact level. If the aggregation displayed in Figure 2 were stored in the Sales cube, Analysis Services could resolve the total number of units of the Brand X sold in Canada during the first quarter of 2001 directly from this aggregation without having to scan for individual facts and then subtotaling these facts at query time. If subtotals of facts at the exact combination of levels required to resolve a query do not exist within an existing aggregation, Analysis Services attempts to resolve the query by using aggregations with subtotals at lower levels than those required to resolve the query. For example, if Analysis Services received a query requesting the total number of units of Brand X sold in each country during 2001, Analysis Services would resolve this query by using the aggregation in Figure 2. Analysis Services would begin with the subtotals in this aggregation and then calculate the units sold for the year 2001 for each country by using the subtotals of units sold during each quarter of 2001 for each country. Similarly, if Analysis Services received a query requesting the total number of units of Brand X sold in all countries during 2001, Analysis Services would begin with the subtotals in this same aggregation and then calculate the units sold during 2001 in all countries by subtotaling quarters to years and individual countries to all countries. With each of these queries, Analysis Services resolves them more quickly by using this aggregation stored with the cube rather than scanning for individual facts and then calculating the necessary subtotals at query time. The performance benefit of storing the appropriate aggregations with the cube becomes more significant as the amount of data in the cube increases because the length of time required to scan individual facts is directly related to the size of the cube.

As you have seen, Analysis Services does not require that an aggregation contain the exact subtotals required to resolve a query in order to use the aggregation to resolve the query. However, Analysis Services is only able to resolve queries based on existing aggregations if the existing aggregations are at levels matching or below those of the queries being submitted. For example, if the records in an aggregation contain subtotals at the Year level and the query requests information at the Quarter level, Analysis Services would have to query at the fact level or use another aggregation to resolve the query; the subtotals of data in this aggregation at the Year level would be of no use in resolving this query.

While it is fairly easy to see that these simple queries benefit from pre-calculated subtotals, and to explain why performance is slower without these subtotals, the performance benefits of aggregations with some types of queries are not as obvious. Top count and median are examples of queries that must touch a large number of cells to resolve, but only return a few values (these are called wide queries). For example, a query that returns the top three customers in sales for the current year or the median volume of sales per customer for the current year may require a long time for Analysis Services to resolve in a large cube. Without pre-calculated subtotals of sales information for each customer by year, Analysis Services would have to calculate subtotals for each customer at query time to resolve either of these queries. Calculating these subtotals at query time for all customers can be a very time-consuming task because Analysis Services must read all cells at the fact level containing sales information for each customer for the current year, calculate subtotals for each customer, and then compare these subtotals to determine the top three or the median. It is not obvious from the query results that are returned why Analysis Services took a significant amount of time to resolve these types of queries. Analysis Services could resolve these two queries more rapidly if pre-calculated subtotals of sales for each customer for the current year (or month) were stored in the cube.

While appropriate aggregations enable Analysis Services to quickly resolve many queries, you should also review the actual MDX used with poorly performing queries to ensure that the most efficient MDX statement is being submitted. Poorly written MDX queries are also a common reason that queries perform poorly. For information on MDX resources, see Appendix A, "Resources." Good aggregation design (the subject of the next section) helps well-written and efficient MDX queries in the same manner that proper indexing can greatly help well-written SQL queries.

Understanding the Aggregation Design Tools

SQL Server 2000 Analysis Services and the SQL Server 2000 Resource Kit include a number of tools that you can use to design and manage aggregations. These include the following:

Storage Design Wizard designs aggregations based on the assumption of a uniform pattern of query distribution.
Usage-Based Optimization Wizard designs aggregations based on information captured in a query log about the actual distribution of queries from your users, filtered by criteria that you specify.
Partition Manager, a SQL Server 2000 Resource Kit utility, enables you to manually design aggregations and to copy aggregation designs between partitions.
The Partition Aggregation utility, which is an updated version of Partition Manager, contains additional functionality for setting row counts on a per-partition basis and for applying the aggregation design for a single partition to other partitions in the cube. The Partition Aggregation utility is available with the SQL Server Accelerator for Business Intelligence (SSABI) and can be downloaded at no cost from https://www.microsoft.com/sql/ssabi/default.asp.
Cube Editor enables you to manage dimension and cube properties that affect aggregations.

After you use one or more of these tools to design aggregations for one or more partitions in a cube, you must fully process the partitions containing the new aggregation design to calculate and store the new aggregations in order for the aggregations to be useful to queries. Also, a partition with a new aggregation design must be fully processed before it can be incrementally updated. You must create a new aggregation design whenever you add or delete a level or a dimension to a cube because Analysis Services drops all existing aggregations whenever you make structural changes to a cube. Processing the partitions in a cube containing new or deleted levels or dimensions will not generate any aggregations until you create a new aggregation design incorporating the new dimension structure.

Storage Design Wizard

In the Storage Design Wizard, you specify the number of aggregations that will be generated based on an estimate of the amount of disk space you are willing to have the aggregations consume or the amount of performance gain that you want to achieve (both values are estimated conservatively). The Storage Design Wizard determines which aggregations to design by using statistical coverage algorithms that assume all queries are equally probable. In other words, the Storage Design Wizard assumes that a query is as likely to contain any level in a dimension as any other level in the dimension. The Storage Design Wizard uses this assumption in assessing the value of each aggregation and uses the member counts for each level of each dimension enabled for aggregation in assessing the cost of each aggregation. You may want to limit certain dimensions that you know are not queried frequently to only the All level in order to reduce the total number of possible aggregations and the potential size of each aggregation. This is done using the Aggregation Usage property of the dimension. For information on setting the Aggregation Usage property, see "Cube Editor" later in this section.

The Storage Design Wizard uses the member counts and the fact table row count for each partition to determine which aggregations are too large to be generated. It assumes that the fact data is randomly distributed across dimension members when determining the amount of data in each aggregation. For more information on aggregation size limits, see "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section. The Storage Design Wizard can design the best set of aggregations only if the dimension level member counts and partition row counts are set correctly.

When you define a dimension, Analysis Services counts the current members for each level in the dimension and stores this value in the Member Count property for each dimension level. Thereafter, Analysis Services never recalculates the member counts. If you have a growing dimension, you should update the member counts manually before you use the Storage Design Wizard to design aggregations for a partition. You can query the relational tables to determine the actual number, or you can enter the number of members you anticipate will be in the cube when it is fully populated. You can update the member count for each level in each dimension individually by using the Dimension Editor in Analysis Manager, or you can update all dimension member counts at once by using the Partition Aggregation utility. You can also set the member counts using the Decision Support Objects (DSO)API. In most cases, you should set the member counts to be equal to the actual member counts you anticipate will be in the cube in the production environment. See "Decrease Dimension Counts in a Partitioned Cube" later in this section for other recommendations when using multiple partitions.

When you define a cube, Analysis Services counts the current rows in the fact table for the initial partition and stores this value in the Fact Table Size property for the cube. Thereafter, Analysis Services never recalculates the row count for any partition. Furthermore, because aggregations are designed on a per partition basis, it is the row count for the partition that is important and that must be set properly for each partition. If you have a growing fact level, you should update the row count manually for a partition before you design the aggregations for that partition. You can query the relational tables to determine the actual number, or you can enter the number of rows you anticipate will be in the partition when it is fully populated. If your cube has only one partition, you can update the row count for this partition by using the Cube Editor in Analysis Manager. This value will automatically be copied to the only partition in the cube. If you have multiple partitions, you must use the Partition Aggregation utility to set the row count for each partition (Cube Editor will not update this value across all partitions). You can use the Partition Aggregation utility to set the row counts for all partitions at once. You can also set the row count for each partition using the DSO API. See "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section for a discussion regarding when to increase the row count above the actual number of rows at the fact level for a partition.

When users' queries follow a pattern (which is normally the case) rather than being uniformly random, the Storage Design Wizard will generate some aggregations that are superfluous, and it will overlook the design of some aggregations that would be useful. For example, suppose your company regularly generates a query of sales by state by brand by month. If the Storage Design Wizard decides to calculate an aggregation by state by brand by month, this query would benefit from this aggregation. If instead the Storage Design Wizard decides on an aggregation by country by brand by month, the query would not benefit from that aggregation. However, another query might benefit from that aggregation. The Storage Design Wizard has no information regarding which query is more likely to be submitted.

With large or complex cubes, the number of generated but unused aggregations increases, consuming more processing time and storage space. Query responsiveness is not optimal because some useful aggregations were not generated. However, until the actual query distribution pattern can be logged and aggregations created based on actual usage patterns, creating aggregations based on a uniform distribution of queries is frequently the only way to design aggregations (other than starting with no aggregations).

Usage-Based Optimization Wizard

Unlike the Storage Design Wizard, the Usage-Based Optimization Wizard enables you to design aggregations based on an analysis of the actual pattern of levels queried. The Usage-Based Optimization Wizard generates aggregations based on the usage patterns logged. In the Usage-Based Wizard, you still specify the number of aggregations that will be generated based on an estimate of the amount of disk space you are willing to have the aggregations consume or the amount of performance gain that you want to achieve (both values are estimated conservatively). However, the Usage-Based Optimization Wizard determines which aggregations to design based on the dimension levels queried by the queries that you select from the Analysis Services query log rather than an assumption that all queries are equally probable. You can select queries from the log based on queries between certain dates, queries that ran more than a specified number of times, queries that ran longer than a specific amount of time, queries by specified users, and queries to MOLAP cubes, ROLAP tables, or the server cache. The Analysis Services query log, by default, records the levels queried by every tenth query. You can change this logging pattern to log the levels queried by every query, in order to capture the actual query pattern of your users, and then turn off query logging until you are ready to create a new query log to save system resources (or leave it at every tenth query to detect changing query patterns).

Designing aggregations based on past query patterns increases the likelihood that future queries will benefit from the generated aggregations. The Usage-Based Optimization Wizard is particularly effective in increasing query responsiveness with large or complex cubes, because only a small percentage of total possible aggregations can be created with these cubes due to time and space limitations. The Usage-Based Optimization Wizard ensures that the aggregations that are created are the ones most likely to be used.

Similar to the Storage Design Wizard, the Usage-Based Optimization Wizard uses dimension level member counts to evaluate the cost of each aggregation, and uses the member counts and the partition row count to determine whether particular aggregations are too large to be built. It is crucial that these counts be accurate. See "Storage Design Wizard" earlier in this section for information on how to set these counts manually. See "Decrease Dimension Counts in a Partitioned Cube" later in this section for other recommendations when using multiple partitions. See "Aggregation Size Limited to 30 Percent of Fact-Level Size" later in this section for a discussion regarding when to increase the row count above the actual number of rows at the fact level for a partition.

The Usage-Based Optimization Wizard honors the settings for each dimension's Aggregation Usage property. If you have previously excluded certain dimensions and levels from consideration, you should use the Cube Editor to enable these dimensions and levels before running the Usage-Based Optimization Wizard, to ensure that all dimensions and levels are considered based on actual query patterns. For example, if you have marked the quarter level of the Time dimension as disabled for aggregation and users are actually querying at the quarter level, the Usage-Based Optimization Wizard will not design aggregations at the quarter level. For information on setting the Aggregation Usage property, see "Cube Editor" later in this section.

You can use the Usage-Based Optimization Wizard to add aggregations to those previously designed or to replace all existing aggregations. In both cases, the Usage-Based Optimization Wizard designs new aggregations as if there were no existing aggregations; it does not make any evaluation of the existing aggregations. When you are designing aggregations to add to existing aggregations, select the queries that are running most slowly to avoid creating excessive aggregations. When you are replacing all existing aggregations in your aggregation design, select all of the queries to ensure the best overall aggregation design. When you use the Usage-Based Optimization Wizard to design new aggregations to add to the existing aggregations, the wizard merges the newly designed aggregations with the existing aggregations (duplicate aggregations are not included twice). For example, suppose that a partition starts with 80 aggregations and the Usage-Based Optimization Wizard designs 20 aggregations. After the aggregation designs are merged, the number of aggregations in the design will be somewhere between 80 and 100. You must fully process a partition after changing its aggregation design in order to implement the new aggregation design.

Partition Manager and Partition Aggregation Utility

Partition Manager enables you to manage aggregations across multiple partitions of a cube, or to manually examine and change the aggregations in a partition. An updated version of this utility, called the Partition Aggregation utility, ships with the SQL Server Accelerator for Business Intelligence (SSABI). The SSABI version enables you to specify the estimated row count on a per-partition basis and to update member counts for all dimensions and levels at once to aid you in designing the appropriate aggregations. While the Storage Design Wizard and the Usage-Based Optimization Wizard enable you to design aggregations for only one partition at a time, Partition Manager and the Partition Aggregation utility enable you to design a set of aggregations for a single partition and then copy them to additional partitions. With these utilities, you can design aggregations for a partition with data that is representative of the target system in terms of size and data distribution, and then copy the aggregation design to other partitions.

You can also design aggregations manually, using either Partition Manager or the Partition Aggregation utility. If you plan to do so, there are a number of factors that you must consider:

Query usage varies over time. Aggregations that were appropriate at the time they were designed may not be appropriate based on current usage patterns.
Only add or change a few aggregations at a time. It is tedious and slow to make a large number of changes manually. Use the Storage Design Wizard or the Usage-Based Optimization Wizard to make major changes to existing aggregations.
Only add aggregations that will be used. Unused aggregations are a waste of processor and disk storage resources.
Monitor the impact of additional aggregations on processing performance. As you add additional aggregations, the time required for processing the aggregations increases. For more information on the impact of aggregations on processing performance, see "Avoid Designing Excessive Aggregations" later in this paper.

Designing the appropriate aggregations manually requires a thorough knowledge of user query patterns and an understanding of the benefits and costs associated with aggregations.

Cube Editor

Analysis Manager's Cube Editor enables you to control what dimensions and levels are considered for aggregation by setting a dimension's Aggregation Usage property. These settings do not directly determine aggregation designs; rather, they alter the behavior of the aggregation design wizards and tools. Values for the Aggregation Usage property are the following:

Top Level Only ensures that only one level, typically the All level (which contains only one member), is considered by the aggregation design wizards. Including only this single level reduces the exponential growth in the number and size of aggregations. For example, if you have 15 dimensions in a cube and you set 8 of the dimensions to Top Level Only, the effective complexity of the cube is reduced to 7 dimensions. If the top level is included in an aggregation, it reduces the size of the aggregation and the time required to calculate it. To avoid the use of flexible aggregations (see "Types of Aggregations" later in this section), Top Level Only is the default setting for virtual dimensions.
Bottom Level Only ensures that only the bottom level is considered by the aggregation design wizards, which also reduces cube complexity. Aggregations containing the bottom level will be larger than those that use the top level only, and can provide better or worse overall query performance depending on the query mix. Bottom Level Only can be a good idea for small dimensions with a single level under the All level. A customer gender dimension is an excellent example of this.
Top and Bottom Levels has the combined effects of choosing Top Level Only and Bottom Level Only. The effective complexity of the cube is reduced if there are levels between the top and bottom levels. To avoid the use of flexible aggregations (see "Types of Aggregations" later in this section), Top and Bottom Levels is the default setting for changing dimensions.
Standard permits the aggregation design wizards to evaluate all levels in considering potential aggregations to include in the cube.
Custom allows you to choose the levels to be considered for inclusion in aggregations. Individual levels are then included or excluded by setting the Enable Aggregations property on the level to Yes or No. Two cubes that use the same shared dimension can have different levels enabled for aggregation in each cube.

Aggregation Design Considerations and Guidelines

Once administrators understand that aggregations improve query performance, their initial tendency may be to generate excessive aggregations in an attempt to ensure that subtotals exist to cover the greatest number of potential queries. This tendency is particularly tempting because the Storage Design Wizard enables you to design aggregations based on an estimated percentage gain in performance. If one aggregation design achieves a 20 percent performance gain, it is an easy assumption to make that an aggregation design that achieves a 40 percent gain must be better, and that an aggregation design that achieves a 60 percent gain must be even better.

While having more aggregations increases overall query responsiveness, more aggregations require more time to process and more space to store. If you have a large or complex cube, the generation of excessive aggregations requires a tremendous amount of processor resources, generates substantial disk I/O, and consumes a substantial amount of disk space. Even with a small to medium sized cube, generating excessive aggregations substantially increases processing time with little gain in query responsiveness. While aggregations are being calculated, query responsiveness decreases due to competition for hardware resources (memory, disk I/O, and processor).

You should design only aggregations that are likely to be used; designing excessive aggregations frequently causes Analysis Services to generate aggregations that are rarely or never used. For more information about the impact of the design of excessive aggregations on processing, see "Avoid Designing Excessive Aggregations" later in this paper.

Before discussing how to design the appropriate set of aggregations, you need to understand the impact of your cube design on the number of possible aggregations and the number and size of records within an aggregation. Each of these factors affects your aggregation design and the time required to calculate the aggregations.

Number of Aggregations

The theoretical maximum number of possible aggregations in a cube is the product of the number of levels in each cube dimension. As you add levels and dimensions to a cube, the number of possible aggregations increases exponentially. The higher the number of dimensions and levels in a cube, the greater its complexity. In the example in Figure 2, the Time dimension has four levels, the Customers dimension has five levels, and the Products dimension has five levels. This yields a theoretical maximum number of aggregations of 100 (5 customer levels x 5 product levels x 4 time levels). However, this number increases exponentially as you add dimensions or levels. For example, if you add the Day level to the Time dimension, the theoretical maximum number of aggregations increases to 125 (5 x 5 x 5). Now, suppose you add two more dimensions to this cube, each with three levels. The theoretical maximum number of aggregations increases to 1125 (5 x 5 x 5 x 3 x 3). A cube with nine dimensions containing five levels each yields theoretical maximum number of aggregations of 1,953,125. A cube of this complexity is considered a cube of medium complexity. A cube of high complexity might have 20 dimensions with five levels each and yield a theoretical maximum number of aggregations of approximately 95 trillion. As you can see, you can directly affect the theoretical maximum number of aggregations in a cube by changing the number of dimensions or the number of levels. Having multiple dimensions with deep hierarchies improves the ability of users to perform analysis, but having too many of either can lead to resource problems during querying and processing.

Because Analysis Services can easily calculate higher-level subtotals at query time by using existing aggregations at lower levels, you do not need to design more than a subset of the total number of possible aggregations to obtain a sufficiently useful set of aggregations. Before discussing how to design the appropriate set of aggregations, you need to understand the amount of space consumed by aggregations, in order to understand the impact cube design decisions have on aggregation design. Large aggregations consume more memory, require more disk space, and generate more disk I/O. For more information on the impact of the number of levels, member properties, and dimensions on performance, see "Minimize Dimensions, Levels, Members, and Member Properties" later in this paper.

Number and Size of Records in an Aggregation

The theoretical maximum number of records in an aggregation is initially determined by the product of the number of members in each dimension level at which the aggregation is being calculated. So, while the number of dimensions and levels determines the theoretical maximum number of aggregations, the number of members at each level in the aggregation determines the theoretical maximum number of records within an aggregation. For example, if the aggregation in Figure 2 contains 3 members at the Country level, 911 members at the Brand level, and 12 members at the Quarter level, the aggregation will contain a maximum of 32,796 records. If, however, the aggregation were at the Country, Brand, and Year levels, the aggregation would only contain a maximum of 8,199 records. As you can see, the levels at which the aggregations are calculated have a major impact on the maximum number of records in an aggregation and in turn the maximum size of the aggregation.

Factors Affecting the Actual Size of an Aggregation

Note: The numbers used in this section are for discussion purposes only. You should not expect to see these sizes on real systems, for the reasons discussed in this section.

The actual size (number of bytes) of each aggregation depends on three factors: the number of records, the number of measures, and the density of the data.

Number of records The aggregation for the Sales cube in Figure 1 at the All Customers, All Products, and All Time levels consists of a single record containing the total of the Units Sold measure and the Sales measure at the top level of these dimensions. This record would consume up to 8 bytes (before compression), assuming that each measure is a 4-byte integer. Due to the way Analysis Services compresses data, less than 4 bytes might be used to store a 4-byte integer. The aggregations for the Sales cube in Figure 1 at the Country level in the Customers dimension, the All level in the Products dimension, and the Year level in the Time dimension would consist of nine records and consume up to 72 bytes (assuming the same 4-byte integer size for each measure). However, an aggregation near the bottom of each level will be much larger. For example, the aggregation at the Country, Brand, Quarter level, with a theoretical maximum of 32,768 records, could consume up to 256 megabytes (MB).
Number of measures Each additional measure increases the size of an aggregation linearly (not exponentially). If you add three more measures to the example in Figure 2, the aggregation with 32,768 records increases to approximately 640 MB (each measure consuming up to approximately 128 MB). Notice that increasing the number of measures in a cube has a much smaller impact on the size of aggregations than increasing the number of levels or dimensions (its complexity).
Density of data Analysis Services does not store a record for aggregations with empty member combinations. For example, in Figure 2 above, if Brand X were not sold in Canada, no record would be stored for this cube cell and no time would be required for these calculations. Aggregations in sparse partitions consume substantially less space than the theoretical maximum and take substantially less time to process than those in dense partitions.

The theoretical storage numbers in this section refer to the space required to store the data only; some additional space is required to store the key information used by Analysis Services to identify each record. In addition, these theoretical numbers do not take into account the compression algorithms used by Analysis Services to reduce the amount of actual space consumed.

Two situations — distinct count measures and virtual dimension aggregations — can cause each aggregation to require a significant amount of additional time to generate and additional space to store.

Distinct count measures When a cube contains a distinct count measure, the size of each aggregation is multiplied by the number of distinct members of the measure. For example, if you create a distinct count measure on the Sales cube in Food Mart sample cube and design aggregations for a 20 percent performance gain using the Storage Design Wizard, the Sales cube grows from 90 kilobytes (KB) to 13 MB. However, if the data in your cube is not dense, the exponential impact on the size of each aggregation is mitigated by the number of empty records. For strategies to minimize the negative performance impact of distinct count measures, see "Use Distinct Count Measures Appropriately" later in this paper.
Virtual dimension aggregations By default, only the All levels of virtual dimensions will be included in aggregations. However, if you change the Aggregation Usage property for a virtual dimension from Top Level Only to Standard (using the Cube Editor in Analysis Manager), aggregations can include these intermediate or leaf levels. Administrators may choose to design aggregations containing these levels in order to increase the performance of queries against these levels. The impact of enabling aggregations for intermediate and leaf levels in a virtual dimension is the same as adding a standard dimension to a cube.

You can affect the size of aggregations by limiting the dimensions and levels that are included in aggregations. For more information on how to accomplish this, see "Cube Editor" earlier in this paper. You can also affect the size of aggregations by keeping the size of member keys small. For information on choosing the appropriate data type for member keys to reduce their impact on aggregation size, see "Choose Appropriate Data Types and Sizes" later in this paper.

Aggregation Size Limited to 30 Percent of Fact-Level Size

The Analysis Services aggregation design wizards will not design an aggregation whose possible size is over 30 percent of the size of the fact level (this is sometimes called the One-Third Rule). When you have dimensions with a large number of members, this threshold can easily be exceeded at or near the leaf level. The aggregation in Figure 2 has a potential size of 32,796 records (ignoring empty records). As long as the estimate of the number of rows at the fact level is more than 109,320, this aggregation will be considered by the aggregation wizards for creation. However, an aggregation at a lower level, such as the <State, Name, Month> aggregation, would not be considered unless the estimate of the number of records at the fact level is more than approximately 73 million (80 x 7621 x 36 / 0.3). As the number of dimension members increases at deeper levels in a cube, it becomes less likely that an aggregation will contain these lower levels because of the One-Third Rule. The aggregations excluded by the One-Third Rule are those that would be almost as large as the fact level itself and almost as expensive for Analysis Services to use for query resolution as the fact level. As a result, they add little or no value.

Administrators will sometimes trick the Analysis Services aggregation design wizards by setting the fact-level size higher than the actual row count in a partition in order to force these tools to calculate large aggregations that exceed the One-Third Rule. This approach is not recommended, because it results in excessively large aggregations being calculated and built. The fact-level size should be set according to the actual or expected size of the fact level in production, but not larger. You might consider increasing the row count above the actual count when data is distributed unevenly or when you are working on a proof-of-concept system.

Uneven distribution of data

When the aggregation design wizards determine whether a particular aggregation will be built, these tools assume a random distribution of factdata over the dimension members. If, however, the actual data is distributed unevenly such that the actual facts are concentrated in a small number of records, you can increase the estimated fact table size to artificially alter the application of the One-Third Rule. In this scenario, the actual aggregation sizes will be substantially smaller than the estimates of their size by the aggregation design wizards because many of the records in the aggregation will be empty.

Using our example in Figure 1, suppose that certain brands were not sold in Canada and the cube contained a large amount of data from Canada. When designing aggregations, the aggregation design wizards assume that the fact level is randomly distributed over all dimension members in the cube and might decide not to create an aggregation containing the Product Name level because the wizard would estimate that this aggregation would violate the One-Third Rule. However, because many records at the fact level are empty, the aggregation would not be as large as estimated. As a result, this aggregation would not actually violate the One-Third Rule if it were built. If you increase the estimated fact level row count, aggregations will be created lower in the dimension hierarchy without violating the One-Third Rule.

You should use this technique only when you are designing aggregations with the Usage-Based Optimization Wizard, rather than with the Storage Design Wizard. The Usage-Based Optimization Wizard ensures that the additional aggregations that are designed are likely to be used, while the Storage Design Wizard does not. For more information on these tools, see " Understanding the Aggregation Design Tools" earlier in this section.
Proof-of-concept system

When you are in the design phase of a pilot or proof-of-concept (POC) system, you normally work with a subset of the actual data. However, you need to ensure that aggregations are designed to reflect the size of data that the cube will actually contain, not the subset of data used for the pilot. In this case, you should increase the row count to reflect the row count for the actual data that will exist in the cube once the full dataset is loaded. If you fail to increase the row count, the aggregation design wizards will determine the aggregation design from an artificially small aggregation pool because of the impact of the One-Third Rule, and will design aggregations that are artificially high in the hierarchy trees. Because the goal is to have a distribution of aggregations at various combinations of levels appropriate to the actual data in the cube, queries will respond more poorly than they would otherwise respond because the aggregations were designed for a smaller dataset. When you correct this row count, the aggregation design wizards will design and calculate larger aggregations using levels that are lower in the hierarchy. As a result, you can expect that processing will take more time than aggregations that are designed artificially high in the hierarchy, but overall query performance will improve.

The row count is set initially when each partition in a cube is created. It is not automatically updated. The Fact Table Size property for a cube, which is exposed in Cube Editor, is the initial value for the first partition created. If you have a single-partition cube, the row count for the partition will be set according to the row count for the cube. For a multi-partition cube you will need to use a utility such as the Partition Aggregation utility to set the count for each partition. For more information on setting row counts manually, see "Understanding the Aggregation Design Tools" earlier in this section.

Types of Aggregations

Analysis Services generates both rigid and flexible aggregations for MOLAP and HOLAP partitions, each with different benefits and performance impacts. ROLAP partitions contain only rigid aggregations. (For more information on aggregations, see "Choose MOLAP to Improve Query Responsiveness" later in this section.)

Rigid aggregations Rigid aggregations are aggregations that include only levels in a standard dimension, the All level or the leaf level in a changing dimension, or the All level in a virtual dimension. Rigid aggregations are updated when the partition or cube is incrementally processed, but are only completely recalculated when a partition or cube is fully processed or its data is refreshed. By default, while rigid aggregations are being recalculated, Analysis Services resolves queries using the existing aggregations until the new aggregations are completely recalculated (unless the processing of a shared dimension knocks the cube completely offline). You can choose to have the aggregations recalculated as a background process (called "lazy processing") to enable users to see the updated data sooner, but without the benefit of aggregations until they have been recalculated. During incremental processing, existing aggregations are not dropped and recalculated. Rather, aggregations are generated for the new data (using a temporary partition) and then merged with the existing aggregations. While the new aggregations are being generated, Analysis Services continues to use the existing aggregations to resolve queries.
Flexible aggregations Flexible aggregations are aggregations that include an intermediate level in a changing dimension or an intermediate or leaf level in a virtual dimension. Flexible aggregations are created only if you enable aggregations at intermediate levels of changing dimensions or enable aggregations for virtual dimension levels. All flexible aggregations in all partitions are dropped and completely recalculated whenever a partition or cube is fully processed, in the same manner as rigid aggregations. In addition, all flexible aggregations in all partitions are dropped whenever you incrementally or fully process a changing dimension or a physical dimension on which a virtual dimension is based, if flexible aggregations exist that are based on that changing or virtual dimension. In each MOLAP partition, these dropped aggregations are recalculated as a background task, unless you choose to retain the old data and aggregations until new aggregations can be calculated. However, in a HOLAP partition, these dropped aggregations are not recalculated until each partition is processed.

By default, Analysis Services recalculates flexible aggregations as a background task using lazy processing. While the flexible aggregations are being recalculated, users can continue to query the cube (without the benefit of the flexible aggregations). While the flexible aggregations are being recalculated, queries that would benefit from the flexible aggregations run slower because Analysis Services resolves these queries by scanning for individual facts at the fact level and then summarizing the data at query time. As the flexible aggregations are recalculated, they become available incrementally on a partition-by-partition basis.

SP3 provides an option to update changing dimensions in a transaction and enable existing flexible aggregations to be retained on disk while new flexible aggregations are calculated as a background task. This approach retains the stale data with the existing aggregations until the new aggregations are completely recalculated. For more information on changing dimensions and virtual dimensions, see "Use Incremental Processing and Changing Dimensions" and "Use Virtual Dimensions Effectively" later in this paper.

An Aggregation Design Strategy

Given the range of tools and choices available to the administrator, it is not easy to formulate an overall approach for managing aggregation design. This section outlines a general plan for managing the aggregation design over the life of a system.

When you initially deploy a cube, you cannot use the Usage-Based Optimization Wizard because no usage data yet exists. However, because Analysis Services generally resolves user queries faster with some aggregations than with none, you should initially design a limited number of aggregations using the Storage Design Wizard. The number of initial aggregations that you should design depends on the complexity and size of the cube (the fact-level size).

With a small cube, an effective approach is to initially design aggregations to achieve a 20 to 30 percent increase in performance (this should take less than 15 minutes). With a large and complex cube, it will take Analysis Services a long time just to design a small percentage of the possible aggregations. For example, in a complex cube with twenty dimensions containing five levels each, it will take the Storage Design Wizard approximately 3024 years to consider all 95 trillion possible aggregations, if you assume that the wizard can examine 1000 aggregations per second (which is a very generous estimate). Furthermore, a large number of aggregations will take a long time to actually calculate and will consume a large amount of disk space. An effective approach with large and complex cubes is to initially design aggregations to achieve a small performance increase (less than 10 percent, possibly down to even 1 or 2 percent with very complex cubes) and then allow the Storage Design Wizard to run for no more than 15 minutes. With a medium-complexity cube, design aggregations to achieve a 10 to 20 percent increase in performance and then allow the wizard to run for no more than 15 minutes. While it is difficult to define what constitutes a high-complexity cube versus a medium-complexity cube, consider this general guideline: a high-complexity cube is a cube with more than 15 multi-level dimensions.

Before you create these initial aggregations with the Storage Design Wizard, you should disable aggregations (by setting their Aggregation Usage property to Top Level Only using Cube Editor) for dimensions you anticipate users will rarely query. This minimizes the aggregations that are calculated for dimensions that you know are used rarely and increases the probability that the aggregations that are designed and created will be useful. While this technique is important for data stored using any storage mode, this technique is especially important for data stored in a ROLAP partition, because aggregations stored using ROLAP require significantly more disk storage than aggregations stored using MOLAP or HOLAP. For more information on aggregations and storage mode, see "Choose MOLAP to Improve Query Responsiveness" later in this section.

After users have queried the cube for a sufficient period of time to gather useful query pattern data in the query log (perhaps a week or two), use the Usage-Based Optimization Wizard to perform a usage-based analysis for designing additional aggregations that would be useful based on actual user query patterns. You can then process the partition to create the new set of aggregations. Remember that the Usage-Based Optimization Wizard only works on one partition at a time. You will have to either perform this same process on each partition, or use the Partition Aggregation utility to copy the aggregation design to additional partitions.

As usage patterns change, use the Usage-Based Optimization Wizard, Partition Manager, or the Partition Aggregation utility to add additional aggregations (but avoid adding too many aggregations). After you have added additional aggregations a considerable number of times, you need to consider replacing the current aggregation design with a new aggregation design containing a smaller number of total aggregations based on current query patterns. Over time, the incremental addition of new aggregations to your initial aggregation design may cause processing times to become too large for the processing window (nightly or weekly), while a new design based on current usage patterns may be just as effective for query responsiveness.

If you are using partitions, you can create different aggregation designs for different partitions. For more information on the use of partitions, see "Use Multiple Partitions to Improve Query Responsiveness" later in this section, and "Use Multiple Partitions to Improve Processing Performance" later in this paper.

In a cube with multiple partitions based on time, new data should flow into the single partition corresponding to the most recent time period (for example, the latest month). Generally that is also the partition most frequently queried. A common aggregation strategy in this case is to perform Usage Based Optimization to the most recent partition, leaving older, less frequently queried partitions as they are. The newest aggregation design can also be copied to a "base" partition. This base partition holds no data — it serves only to hold the current aggregation design. When it is time to add a new partition (for example, at the start of a new month) the base partition can be cloned to a new partition. When the slice is set on the new partition, it is ready to take data as the current partition. Following an initial full process, the current partition can be incrementally updated for the remainder of the period. See also "Incremental Partition Processing" later in this paper.

Choose MOLAP to Improve Query Responsiveness

The Analysis Services storage mode you choose for each partition in your cube is another factor determining the speed with which Analysis Services resolves user queries. You can save each partition in your cube using one of three storage options: multidimensional OLAP (termed MOLAP), hybrid OLAP (HOLAP), and relational OLAP (ROLAP).

Multidimensional OLAP (MOLAP)

MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. All partitions are stored on the Analysis server. Analysis Services responds to queries faster with MOLAP than with any other storage mode for the following reasons:

Compression Analysis Services compresses the source data and its aggregations to approximately 20 percent of the size of the same data stored in a relational database. The actual compression ratio varies based on a variety of factors, such as the number of duplicate keys and bit encoding algorithms. This reduction in storage size enables Analysis Services to resolve a query against fact-level data or aggregations stored in a MOLAP structure much faster than against data and aggregations stored in a relational structure because the size of the physical data being retrieved from the hard disk is smaller.
Multidimensional data structures Analysis Services uses native multidimensional data structures to quickly find the fact data, at either the fact level or at higher aggregation levels. With ROLAP and HOLAP partitions, Analysis Services relies on the relational engine to perform potentially large table joins against fact-level data stored in the relational database to resolve some or all queries. Large table joins against relational structures take longer to resolve than similar queries against the MOLAP structures. For information on reducing the number of these joins, see "Eliminate Joins Between Fact and Dimension Tables" later in this paper.
Data in a single service MOLAP partitions are generally stored on a single Analysis server, with the relational database frequently stored on a server separate from the Analysis server. When the relational database is stored on a separate server and partitions are stored using ROLAP or HOLAP, Analysis Services must query across the network whenever it needs to access the relational tables to resolve a query. The impact of querying across the network depends on the performance characteristics of the network itself. Even when the relational database is placed on the same server as Analysis Services, inter-process calls and the associated context switching are required to retrieve relational data. With a MOLAP partition, calls to the relational database, whether local or over the network, do not occur during querying. For more information on placement, see "Place the Relational Database on the Analysis Server" later in this paper.

Hybrid OLAP (HOLAP)

HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but leave fact-level data in the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact-level data stored in a HOLAP partition, Analysis Services must query the relational database directly rather than querying a multidimensional structure stored on the Analysis server. Furthermore, Analysis Services must rely on the relational engine to execute these queries. Querying the relational database is slower than querying a MOLAP partition because of the large table joins generally required.

Many administrators choose HOLAP because HOLAP appears to require less total storage space while yielding excellent query performance for many queries. However, these apparent justifications for using HOLAP storage option are negated by the likelihood of excessive aggregations and additional indexes on relational tables.

Excessive aggregations Query responsiveness with HOLAP partitions relies on the existence of appropriate aggregations so that Analysis Services does not have to resolve queries against the fact table in the relational database. To ensure that a wide range of aggregations exists, administrators sometimes resort to generating excessive aggregations by increasing the performance improvement percentage in the Storage Design Wizard, or artificially increasing the partition row counts (and sometimes both). While these techniques will increase the percentage of queries that Analysis Services can resolve using aggregations, there will always be some queries that can only be resolved against the fact level (remember the One-Third Rule). In addition, generating additional aggregations to improve query responsiveness comes at the cost of significantly longer processing times and increased storage requirements (which also negates the space savings). For more information about how the storage mode affects partition processing, see "Choose MOLAP to Improve Processing Performance" later in this paper.
Additional indexes on relational tables To ensure that the relational engine can quickly resolve queries that Analysis Services must resolve against the fact table in the relational database, administrators frequently add appropriate indexes to the fact and dimension tables. These additional indexes frequently require more space than MOLAP requires to store the entire cube. The addition of these indexes negates the apparent savings in disk space that is sometimes used to justify HOLAP. In addition, maintaining the indexes on the relational tables slows the relational engine when adding new data to the relational tables. For more information on indexes on the relational database, see "Tune the Relational Database" later in this paper.

Relational OLAP (ROLAP)

ROLAP partitions store aggregations in the same relational database that stores the fact-level data. By default, ROLAP partitions store dimensions in MOLAP on the Analysis server, although the dimensions can also be stored using ROLAP in the relational database (for very large dimensions). Analysis Services must rely on the relational engine to resolve all queries against the relational tables, storing both fact-level data and aggregations. The sheer number of the queries with large table joins in large or complex cubes frequently overwhelms the relational engine.

Before the release of SQL Server 2000 (64-bit), ROLAP dimensions were required for cubes with very large dimensions (ROLAP dimensions can only be used with ROLAP partitions). Very large dimensions could not be stored in MOLAP because they were too large to be loaded into memory (MOLAP dimensions are always loaded into memory). With the 32-bit version of SQL Server 2000, very large dimensions may still require the ROLAP dimensions with ROLAP cubes. For more information on how Analysis Services uses physical and virtual memory, see "How Analysis Services Uses Memory" later in this paper. However, with the release of SQL Server 2000 (64-bit), Analysis Services can now store very large dimensions (in excess of 10,000,000 members across all dimensions) using MOLAP.

If performance is your goal, the only situation in which ROLAP storage should be used is when implementing real-time OLAP. In this case, to minimize the performance cost with ROLAP, consider creating a small real-time ROLAP partition and create all other partitions using MOLAP. Using MOLAP for the majority of the partitions in a real-time OLAP solution allows you to optimize the query responsiveness of Analysis Services for most queries, while obtaining the benefits of real-time OLAP. For more information on using multiple partitions to increase query responsiveness, see "Use Multiple Partitions to Improve Query Responsiveness" later in this section.

With both HOLAP and ROLAP partitions, eliminating some or all of the large table joins increases query responsiveness. For more information, see "Eliminate Joins Between Fact and Dimension Tables" later in this paper.

Use Multiple Partitions to Improve Query Responsiveness

Every cube has at least one partition, and the cube reflects the combined data of all of its partitions. You can divide a cube into multiple partitions if you are using SQL Server 2000 Enterprise Edition (or SQL Server 7.0 Enterprise Edition). During development, you can do the same using SQL Server 2000 Developer Edition. Each partition can have its own fact table, aggregation design, and storage mode. The division of a cube into multiple partitions is transparent to the user. When a query requests data that spans multiple partitions, Analysis Services uses aggregations or scans data in each partition and then combines this information to resolve the query and return the results to the user. You can significantly increase query responsiveness and processing performance by horizontally segmenting the data by one or more keys, such as date or department, and dividing the cube into multiple partitions. This section focuses on improving query responsiveness by using multiple partitions. For information on using partitions to increase processing performance, see "Use Multiple Partitions to Improve Processing Performance" later in this paper.

Understanding the Effect of Storage Mode on Partitions

By default, each Analysis Services cube consists of a single partition, which is stored using MOLAP, ROLAP, or HOLAP. Using multiple partitions is most beneficial when a partition is stored using MOLAP.

When using MOLAP to store a partition, a small number of files on the Analysis server store the fact-level and aggregation data for each partition. The files holding the fact-level data for each partition are smaller and the aggregations are also smaller than if all of the cube data were stored in a single partition. Because the size of an aggregation is based on the non-empty combinations of members across the levels in the aggregation, the aggregations for each partition may be smaller. Smaller aggregations require less time to calculate and less time to store.

When using HOLAP to store a partition, a small number of files on the Analysis server store the aggregation data for each partition. The files holding the aggregations are smaller than if all of the cube aggregations were stored in a single partition as is the case with MOLAP. When using ROLAP to store a partition, the aggregations are stored in a greater number of smaller tables than the tables used for the aggregations for a single partition.

Regardless of the storage mode, Analysis Services does not perform any partitioning of fact or dimension data in the relational database. Fact table partitioning, which must be initiated by a system administrator using relational database management tools, will increase processing performance for MOLAP and overall performance for HOLAP and ROLAP. For more information on the benefits of fact table partitioning, see "Use Multiple Partitions to Improve Processing Performance" and "Eliminate Joins Between Fact and Dimension Tables" later in this paper.

Analysis Services resolves queries faster against MOLAP and HOLAP partitions because there is less data in each partition to be evaluated. For example, imagine a cube with 36 months of data that is divided into 36 monthly partitions. A query requesting information about the most recent month need only evaluate 1/36th of the data that would otherwise need to be evaluated. This query will perform dramatically faster with partitioning by month than with no partitioning. Partitioning can increase query responsiveness by 100 to 1000 percent, particularly under multi-user loads. For maximum performance, Analysis Services should touch as few partitions as possible when resolving a query. If you are not using MOLAP for all partitions, you should use MOLAP for these frequently queried partitions, to maximize query responsiveness against these partitions.

Define the Data Slice for Each Partition

If the data slice value for a partition is set properly, Analysis Services can quickly eliminate irrelevant partitions from the query processing and significantly reduce the amount of physical I/O and processor time needed for many queries issued against MOLAP and HOLAP partitions. With multiple ROLAP partitions, Analysis Services generates more queries to the relational database, but each query is resolved against a smaller table. For more information on partitioning in the relational database, see Using Partitions in a Microsoft SQL Server 2000 Data Warehouse in the MSDN library.

To enable Analysis Services to take full advantage of partitions, you must define the data slice for each partition in the Partition Wizard of Analysis Manager. The data slice identifies the actual subset of data contained in each partition. The Partition Wizard does not require you to set this data slice when you create a partition. As a result, it is possible to create a partition without setting the data slice.

Caution: Creating a partition without setting the data slice is not a good practice, and can result in considerable overhead being added to the system (artificially increasing response times). Without the data slice, Analysis Services cannot limit a query to the appropriate partitions and must scan each partition even if zero cells will be returned.

The data slice enables Analysis Services to determine which partitions contain data relevant to the query.

To maximize querying performance with partitions, construct partitions with data slices that mirror the data slices required by your users. For example, suppose you are deploying a cube that typically tracks data as a time series (such as a financial cube), and most queries will retrieve data based on time period. You should partition the cube by time period to provide the greatest performance benefit. With very large cubes, partitioning along multiple dimensions (such as time and department) can yield substantial query responsiveness benefits. Remember that each partition can have a different aggregation design.

Decrease Dimension Counts in a Partitioned Cube

When you divide a cube into multiple partitions, you should modify the member counts to enable the aggregation design wizards to correctly evaluate the cost of each aggregation and to compare the size of each aggregation to the fact level size for each partition. The member counts stored in the MemberCount property for each dimension level is the member count for the number of dimension members at that level in the cube, not the dimension members at that level in a partition. Let's look at two examples to help explain how to modify the dimension counts used by the aggregation wizards.

Suppose the Sales cube in Figure 1 is partitioned by month, with the data slice defined properly for each partition. The member counts for each Time dimension level are All (1), Year (3), Quarter (12), Month (36). The wizards will not design any aggregations in these partitions that include the All, Year, or Quarter levels of the Time dimension because each partition only contains data for a single month, and the aggregation design wizards will not create an aggregation in a partition that includes a level higher than the slice level for the partition. However, when the aggregation design wizards attempt to determine the cost of an aggregation at the month level and compare the size of an aggregation at that level to the fact level size, the member count at the month level is 36, rather than 1.

Next, suppose that a cube comprises a partition for each quarter rather than for each month. In this case, the aggregation design wizards will not design any aggregations at the All or the Year level because each partition only contains data for a single quarter. However, the member counts for the Quarter and Month level are 12 and 36 respectively, rather than 1 and 3. An aggregation at the Month level will only contain three records for Month, not 36 as the aggregation design wizards will estimate based on the member counts. As a result, the cost of this aggregation is overestimated.

In both of these examples, applying the original member counts to the partition aggregation design will result in aggregations being created artificially high in each partition, because the size estimates used by the aggregation design wizards were applicable to the cube, not to the partition. To avoid this problem and to ensure the aggregation design wizards design the appropriate set of aggregations for the data in each partition, you should manually set the members counts for levels based on your partitioning scheme. However, since these member counts are stored in the MemberCount property at each dimension level (not in the partition), if a dimension is shared with other cubes with a different partitioning scheme, you must set the member counts for the dimension to numbers appropriate for each cube immediately before designing aggregations for that cube.

Warm the Query Results Cache When Possible

Another technique that you can use to increase query responsiveness is to preload the query results cache using a generalized set of representative user queries. This is called cache warming. The query results cache is a portion of memory that Analysis Services uses to store the results of user queries for use in resolving future queries. While the query results cache stores the results of most queries, the query results cache does not cache the results of query time calculations. For more information, see "Use Query Time Calculations Sparingly in Complex Cubes" later in this paper.

When Analysis Services first starts, or after a partition or cube is processed, the query results cache is empty. Until the query results cache is loaded with data from new queries, Analysis Services must resolve user queries using data stored on disk, either by scanning the fact level or by using aggregations. You can create an application that executes a set of generalized queries to simulate typical user activity, in order to expedite the process of populating the query results cache. For example, if you determine that users are querying by month and by product, you can create a set of queries that request data by product and by month. If you run this query whenever you start Analysis Services, or process the cube or one of its partitions, this will pre-load the query results cache with data used to resolve these queries before users submit these types of query. This technique will substantially improve Analysis Services response times to user queries that were anticipated by this set of queries.

To determine a set of generalized queries, you can use the Analysis Services query log to determine the dimension levels typically queried by user queries. For example, if you review the query log and determine that users frequently query by month, product name, and country, the following query would pre-fetch that data:

select
   [Time].[Month].members on 0,
   [Product].[Product Name].members on 1,
   [Customer].[Country].members on 2
from Orders

If you only wanted to warm the cache with data from the current year, rather than for all years, you could use the following query, which assumes that a named set [Current Year] tracks the current calendar year:

select
   Descendants([Current Year].Item(0), [Time].[Month]) on 0,
   [Product].[Product Name].members on 1,
   [Customer].[Country].members on 2
from Orders

Once these queries are loaded into the query results cache, they remain there until the cleaner thread removes them or the cache is flushed during partition processing. For information about optimizing the use of memory in Analysis Services and the cleaner thread, see "Provide Sufficient and Appropriately Configured Memory" later in this paper.

You can use an application, such as a Microsoft Excel macro, an Office Web Component Web page, or a script file, to warm the cache whenever you have performed an operation that flushes the query results cache. For example, this application could be executed automatically at the end of the cube processing step. Running this application under an identifiable user name enables you to exclude that user name from the Usage-Based Optimization Wizard's processing and avoid designing aggregations for the queries submitted by the cache warming application.

While cache warming can be a useful technique, cache warming should not be used as a substitute for designing and calculating an appropriate set of aggregations. When you are testing the effectiveness of different cache-warming queries, you should empty the query results cache between each test to ensure the validity of your testing. For information on flushing the query results cache, see Appendix C, "How to Flush the Query Results Cache."

Optimize Cell Writeback

If you write-enable a cube, users can change the values of a cube's cells. Analysis Services stores the deltas in a separate writeback table in the relational database. For example, suppose a cell contains a value of 100, and the user changes it to a value of 20. In this case, Analysis Services stores the value of -80 in the writeback table. Analysis Services does not change the original fact-level data. When a user queries the changed cell, Analysis Services rolls up the changes in the writeback table, combines it with the value from the fact level, and returns the new value of 20.

Users can update cell values for any cell, not only leaf-level cells. When a higher-level value is updated, the client application uses the UPDATE CUBE statement to distribute the changes for each cell value to leaf-level entries in the writeback table. Users see the net effect of all changes in the writeback table when browsing the cube. To accomplish this, Analysis Services merges this data with fact-level data and existing aggregations at query time. Query performance with a writeback cube will be less than with a cube that is not write-enabled. Actual performance depends on the extent of the writeback changes and the actual query. Concurrent writeback operations also incur a performance penalty.

To increase the performance of write-enabled cubes, consider taking the following steps:

Store the writeback table in SQL Server. SP3 introduced significant performance increases for concurrent data writebacks that are stored in SQL Server and use the Microsoft OLEDB Provider for SQL Server driver. When storing the writeback table in SQL Server 2000 with SP3, set the database for the writeback table to simple recovery mode and specify the SSFastLoadOptions registry key with one of the following string values:
- Empty string value enables the use of IRowsetFastLoad, the SQL Server bulk insert feature.
- TABLOCK enables the use of IRowsetFastLoad and enables a table lock, which maximizes the performance of bulk inserts.
- TABLOCK, FIRE_TRIGGERS enables the use of IRowsetFastLoad, enables a table lock, and enables triggers to fire. For more information on the FIRE_TRIGGERS option, see Using BCP and BULK INSERT in the MSDN library.

The SSFastLoadOptions registry key is located at HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\OLAP Server\CurrentVersion and is disabled by default. If you change this default to one of the above values, the value is passed to the SSPROP_FASTLOADOPTIONS OLEDB property. For more information, see IRowsetFastLoadRowsets in Appendix A, "Resources."

Convert writeback tables to partitions. Because the writeback tables have no aggregations until they are converted into a partition, querying write-enabled cubes becomes slower over time as the amount of data in the writeback tables increases. Periodically convert the writeback tables to partitions to increase query performance. For more information, see "Converting Writeback Data to a Partition" in SQL Server Books Online.
Submit changes in a single transaction. Because the update of a cell value on a client flushes the query results cache on both the server and the clients, you can increase overall query responsiveness by accumulating multiple changes in a single transaction.

Avoid Using Parent-Child Dimensions With a ROLAP Partition

When you use a relational engine other than SQL Server to store a ROLAP partition, and the cube contains a parent-child dimension, an IN clause with multiple expressions is generated. This SQL statement can become so big that it generates an out-of-memory error. If the parent-child dimension is stored in a ROLAP partition on SQL Server, Analysis Services generates temporary tables to improve performance. For best performance, store all partitions using MOLAP when using parent-child dimensions.

Optimizing Analysis Services to Improve Processing Performance

Processing a cube updates the entire cube with new data from the underlying relational database, sorts the data for multidimensional access, constructs multidimensional mapping files to speed data retrieval, and creates aggregations of the fact data. Each dimension and partition in a cube can also be processed separately, which updates only the particular dimension or partition.

There are many factors that directly affect the time required to process a cube or partition. These include:

Aggregation design
Storage mode of the partitions
Use of multiple partitions
Number of joins used to retrieve data from the relational database
Use of incremental updates
Performance of the relational database

In addition to these factors, which are discussed below, the design of the data model itself (such as the use of distinct count measures), the availability and configuration of hardware resources on the Analysis server for processing, and the architecture of the Analysis Services solution also affect processing performance. These effects are discussed in separate sections of this paper.

Avoid Designing Excessive Aggregations

Designing excessive aggregations for one or more partitions in your cube is one of the common causes of poor processing performance. More aggregations take longer to process than fewer aggregations. Excessive aggregations can take an extraordinarily long time to process and may require the use of temporary files (which are slow and I/O intensive). Long processing time for full processing or cube refreshing affects cube availability, and long processing time for incremental processing competes with queries for hardware resources. You can use different aggregation designs on different partitions to increase the number of aggregations on frequently queried partitions and decrease the number of aggregations on infrequently queried partitions. For more information, see "Use Multiple Partitions to Improve Processing Performance" later in this section.

To minimize the length of time required to calculate aggregations during processing, only design aggregations that are actually used to resolve queries. For information on designing effective aggregations and avoiding the processing performance penalty associated with excessive aggregations, see "Design Effective Aggregations" earlier in this paper.

Choose MOLAP to Improve Processing Performance

Analysis Services can store data, create multidimensional mapping files, and calculate aggregations faster than a relational engine can create indexes and calculate aggregations. The primary reason the relational engine is slower is due to the large table joins that the relational engine must perform during the processing of a ROLAP partition. In addition, because the relational engine performs the actual processing tasks, competing demands for resources on the computer hosting the relational tables can negatively affect processing performance for a ROLAP partition.

There is no significant difference in processing performance between MOLAP partitions and HOLAP partitions. In both cases, all fact-level data is read from the relational database, and aggregations are calculated. With MOLAP, Analysis Services writes the fact-level data into the MOLAP structure. With HOLAP, Analysis Services does not store fact-level data. This difference has minimal impact on processing performance, but can have a significant impact on query performance. Because HOLAP and MOLAP processing speeds are approximately the same and MOLAP query performance is superior, MOLAP is the optimum storage choice.

If you choose to deploy a real-time ROLAP partition, you should make the ROLAP partition as small as possible and use MOLAP for all other partitions in the cube. This configuration will maximize query responsiveness and minimize processing time.

Use Multiple Partitions to Improve Processing Performance

You can significantly reduce the overall processing time for medium and large cubes by dividing the cube into multiple partitions. Using multiple partitions reduces total processing time for the following reasons:

Multiple partitions can be processed in parallel. You can decrease the total time required to process a cube by processing multiple partitions in parallel (provided you have sufficient processor and memory resources). By default, Analysis Services processes each partition in a cube serially. Parallel processing results in dramatic performance benefits with large cubes during the initial load of the data warehouse, during full cube processing, and during cube refreshes, provided that the Analysis server has sufficient memory to process several partitions in parallel. If the Analysis server has insufficient memory to store the aggregations for each partition being processed, Analysis Services uses temporary files, which negates the performance benefit you are trying to achieve through the use of parallel processing. For more information about configuring memory, see "Configure Available Memory Effectively" later in this paper. For more information about sufficient processor resources to support parallel processing, see "Use Available Processor Resources Effectively," later in this paper. To process partitions in parallel, you must use a tool that calls the Decision Support Objects (DSO) interface. For an example, see the Parallel Processing sample application in the SQL Server 2000 Resource Kit.
Only some partitions will need to be processed. You can process only the partitions that have been updated with new data. This enables you to decrease the overall time required to update a cube with new data. For example, if you partition your cube by time, you merely need to process the partitions containing data from the most recent time period, rather than processing the entire cube. Analysis Services can process one or several small partitions more quickly than it can process a single large partition containing all of the data for the cube.
Different partitions can have different aggregation designs. You can design additional aggregations on heavily queried partitions and design fewer aggregations on less heavily queried partitions. If you keep the number of the partitions with a large number of aggregations small, and design fewer aggregations for those partitions that are queried less frequently, you can reduce overall processing time. However, if you plan to merge partitions at a later date, you must ensure that the partitions being merged have the same aggregation design at the time of the merger.
Different partitions can have different storage modes. You can use different storage modes for particular purposes. For example, you can create a small partition using ROLAP to enable you to implement real-time OLAP, and then use MOLAP for all other partitions to maximize query responsiveness.
Partitions can be refreshed individually. You can refresh a partition more quickly than an entire cube, thereby consuming fewer resources and affecting fewer users. When a partition is incrementally updated, a temporary partition is created and then merged into the existing partition. This can result in data fragmentation, which is similar to disk or index fragmentation. As a result, you can occasionally do a refresh of a partition to enable Analysis Services to re-sort the data for faster multidimensional access, create better multidimensional mapping files, and make smaller aggregations.

In addition to saving overall processing time, partitioning can also significantly improve query responsiveness. For more information, see "Use Multiple Partitions to Improve Query Responsiveness" later in this paper.

Eliminate Joins between Fact and Dimension Tables

When Analysis Services processes a MOLAP or HOLAP partition, it issues SQL queries to the relational engine to retrieve fact and dimension data from the underlying relational tables. If the partition is stored using MOLAP, Analysis Services organizes and saves the retrieved data into multidimensional data files and then calculates and stores aggregations on the Analysis server. If the partition is stored using HOLAP, the retrieved data is only used for calculating and then and storing aggregations on the Analysis server. When Analysis Services processes a ROLAP partition, it issues SQL queries to the relational engine that cause the relational engine itself to calculate and then store the aggregations in the relational database. With a HOLAP or ROLAP partition, Analysis Services also issues queries to the relational engine to resolve some or all user queries.

Regardless of the storage mode, Analysis Services constructs the queries that retrieve fact and dimension data from the relational tables by using the dimension member keys identified in the Member Key Column property for each dimension. By default, the dimension member key defined in the Member Key Column property is a column in the dimension table for each dimension in the cube. As a result, in a complex cube with many dimensions, the query submitted against the relational tables for each partition can be a very large, multi-table join. These types of complex joins can take a long time to execute.

Analysis Services uses this method of constructing the SQL query to guarantee that Analysis Services generates the safest possible query, which is not the most efficient query. Certain prerequisites must be satisfied before you can safely use more efficient queries. For example, Analysis Services constructs the following SELECT statement when processing the first partition of the Warehouse cube in the FoodMart 2000 database:

SELECT
   store.store_id,
   time_by_day.quarter,
   time_by_day.month_of_year,
   warehouse.warehouse_id,
   product.product_id,
   inventory_fact_1997.store_invoice,
   inventory_fact_1997.supply_time,
   inventory_fact_1997.warehouse_cost,
   inventory_fact_1997.warehouse_sales,
   inventory_fact_1997.units_shipped,
   inventory_fact_1997.units_ordered,
   inventory_fact_1997.warehouse_sales-inventory_fact_1997.warehouse_cost
FROM
   inventory_fact_1997,
   time_by_day,
   store,
   warehouse,
   product
WHERE
   (time_by_day.the_year=?) AND              clause added by data slice
   (inventory_fact_1997.time_id=time_by_day.time_id) AND
   (inventory_fact_1997.store_id=store.store_id) AND
   (inventory_fact_1997.warehouse_id=warehouse.warehouse_id) AND
   (inventory_fact_1997.product_id=product.product_id)

This SELECT statement joins four dimension tables to the fact table. With complex cubes containing many more dimensions, the number of joins increases significantly. However, if a dimension member's foreign key in the fact table can be used rather than the dimension member key in the dimension table, much simpler queries can be generated that can speed the retrieval of data from the relational tables. A dimension member's foreign key in the fact table can be used in the processing query if the dimension meets the following criteria:

The dimension is shared and has been processed.
The member key column for the lowest level of the dimension contains the keys that relate the fact table and the dimension table, and is the only key column necessary to relate the fact table to the dimension table.
The keys in the member key column for the lowest level of the dimension are unique within the dimension. (Use a surrogate key to ensure this.)
The lowest level of the dimension is represented in the cube (it does not need to be visible).

If these criteria are met, you can use the Optimize Schema option in Analysis Manager to simplify the queries issued against the relational tables and increase their execution speed. In the case of the Warehouse cube in the FoodMart 2000 database, the Store, Product, and Customer dimensions meet these criteria and, as a result, three of the dimension table joins can be eliminated by using the Optimize Schema option. The resulting SELECT statement is:

SELECT
   inventory_fact_1997.store_id,
   time_by_day.quarter,
   time_by_day.month_of_year,
   inventory_fact_1997.warehouse_id,
   inventory_fact_1997.product_id,
   inventory_fact_1997.store_invoice,
   inventory_fact_1997.supply_time,
   inventory_fact_1997.warehouse_cost,
   inventory_fact_1997.warehouse_sales,
   inventory_fact_1997.units_shipped,
   inventory_fact_1997.units_ordered,
   inventory_fact_1997.warehouse_sales-inventory_fact_1997.warehouse_cost
FROM
   inventory_fact_1997,
   time_by_day                                join added by data slice
WHERE
   (time_by_day.the_year=?) AND                 clause added by data slice
   (inventory_fact_1997.time_id=time_by_day.time_id)

This query uses only one join on a dimension table and will execute must faster than the original query. The other join is related to the partition slice setting and is discussed below.

When Analysis Services uses an INNER JOIN between the fact table and dimension tables, fact-level data without a corresponding dimension table foreign key is eliminated from the result set (if foreign key constraints are used on the fact table, these rows will never be entered in the first place). When you eliminate the INNER JOIN, Analysis Services reads the entire fact table, not merely those rows represented in the dimension tables. As a consequence, you may discover during the process of attempting to eliminate joins that your fact table contains referential integrity violations that were masked by the INNER JOIN. Best practice dictates that you fix the bad data in the fact table or add an "unknown member" to the dimension tables. You can then map facts without associated dimensions to this "unknown member." Unless you do so, facts without associated dimensions are reported as processing errors rather than disappearing as they did previously. Once you eliminate the referential integrity violations, you can use the Optimize Schema option to significantly increase processing performance, regardless of the storage mode. If you cannot clean all of the dimensions, you can selectively eliminate joins by manually pointing the Member Key Column property to the fact table in Cube Editor for those dimensions that you know are clean.

There are a couple of additional steps you can take to eliminate joins that cannot be immediately eliminated by the Optimize Schema option. It is helpful to view the query Analysis Services submits to the relational engine when it processes a partition. To do this quickly without actually executing the query, add "1 = 0" to the partition's WHERE clause using the Partition Wizard (on the Advanced Settings tab) and then process the partition. With this new WHERE clause, the query returns from the relational database with no rows. This enables you to view the partition population query in the partition processing window or by capturing it with SQL Profiler (or an analogous tool) on the relational database. When you are finished optimizing the schema, return the partition's WHERE clause to its original value.

Reviewing the SQL query may enable you to identify the reason for any remaining joins in the SQL statement. Two common reasons why joins may remain are discussed below:

The lowest level in the dimension is not represented in the fact table or is not marked as unique.

In the dimension editor for each dimension, make sure that the member key on the lowest level in the dimension is set to the key (generally an integer key) used for the join between the dimension table and the fact table. For some dimensions, you may need to add a level to the dimension and mark this lowest level as MemberKeysUnique = True (the dimension members must be unique). If you add a level solely to eliminate joins, you can hide the newly added dimension level. For example, if your Time dimension uses Month as the lowest level and the fact table has a daily grain, you can add the day level to the Time dimension (this assumes that the underlying relational table actually contains day-level data) and then hide the day level from users. This will enable you to optimize the schema and eliminate the join to the Time dimension table.
The join was caused by a partition slice.

The use of partitions introduces a join clause (see the SELECT statements for the FoodMart 2000 warehouse previously discussed). The join clause comes from the WHERE clause generated to enforce the slice setting. The WHERE clause ensures that only data meeting the slice criteria arrives in the cube partition.

The WHERE clause is not needed if each partition comes from a separate relational table containing only data for that slice. To suppress the automatic generation of a WHERE clause for all partitions on the Analysis server, modify the DataCompressionSettings registry setting and add the hex value of 0x00100000 to the existing value for this key. This registry change suppresses the generation of the WHERE clause for all partition slices. The setting applies to the entire server and is read at server startup.

Note: Do not set this option unless you can guarantee that each partition table contains only data for that partition. Remember that the SQL query returns the entire fact table. A common practice is to partition the fact table in the relational database and use a single, separate fact table for each partition. If you eliminate the WHERE clause, which eliminates the join, and you have not properly partitioned the fact table in the relational database, you will receive no warnings. However, you will have corrupted data, including the possibility of double counted or missed rows.

Caution: Because corrupt data can result if the fact table is not partitioned properly and because this is a server-wide setting, modifying the DataCompressionSettings registry key is not generally recommended.

These steps will eliminate two common reasons for remaining joins. Different designs may yield additional explanations for the remaining joins. Each join has its own reason and you may need to be creative to understand why the join exists and how you can eliminate it. For example, you may have joins required for dimensions that really should be virtual dimensions. If you had a dimension called DayOfWeek, you may be using a join to the Time dimension table to look up its value (Mon, Tues, Wed, Thurs, Fri, Sat or Sun). If you converted this DayOfWeek physical dimension to a virtual dimension, based on a member property at the day level, you could eliminate this join.

Once you have eliminated the reasons for the remaining joins, to the extent possible, use the Optimize Schema option to eliminate additional joins. You must process a newly added dimension before you can use it to eliminate a join. If the MemberKeyColumn property of any of these levels still references a dimension table column, you can manually change this property to include the fact table column that joins to the dimension. When you are finished, remember to return the partition's WHERE clause to its original value.

You do not need to repeat this process for each partition in the cube, because the cube structure is defined at the cube level, each partition is a clone of that structure, and all partition fact tables are identical in structure.

Use Incremental Processing and Changing Dimensions

New data and changes to dimensions require dimensions and partitions to be processed. Some changes require full processing, while some changes can be incorporated using incremental processing. For example, changes to the structure of a standard dimension require Analysis Services to fully process the dimension to incorporate structural changes. When a standard dimension is fully processed, no cube that uses the newly processed dimension can be queried until that cube is also fully processed. As a result, a structural change to one dimension can have a ripple effect on multiple cubes, making all of them unable to be queried ("knocking them offline") until the cubes have been fully reprocessed. Given these dependencies, full dimension processing of standard dimensions should be avoided whenever possible. Defining a dimension as "changing" can minimize the frequency with which the dimension, and the cubes that use it, needs to be fully processed when relocating members in the hierarchy.

Incremental Processing

Analysis Services can add new data and incorporate changes that do not affect the structure of a standard dimension by using incremental processing. Examples of structural changes that do not require full processing include adding new members, changing member properties, or changing custom member options.

Incremental Dimension Processing

When a standard dimension is incrementally processed, Analysis Services adds the new dimension information but does not modify cube data or aggregations. While incremental processing of a dimension is occurring, users can continue to query the cube using the existing dimension information until the new dimension information is available. After the new dimension information has been incrementally processed, new fact data can be incrementally processed.

Incremental Partition Processing

When a partition is incrementally processed, Analysis Services adds the new fact-level data if the partition is stored using MOLAP and updates the aggregations for a partition stored using either MOLAP or HOLAP. Analysis Services does this by creating a temporary partition, processing the data into it if the partition is stored using MOLAP, and computing aggregates on the new data if the partition is stored using either MOLAP or HOLAP. Analysis Services then merges the two partitions together. Remember that cube processing is merely the serial processing of all partitions in the cube in a single transaction.

Incremental processing of partitions improves performance by processing only the data that is new, which generally takes much less time than processing the entire partition. Incremental processing also ensures that a partition always remains available for user queries. While incremental processing of a partition is occurring, users can continue to query the partition based on the existing data in the partition until the new data is merged from the temporary partition.

Because merging appends the new data and aggregations rather than sorting them into the existing data and aggregations, you can improve query performance by periodically doing a full process on a partition that has been incrementally updated numerous times, to resort and re-optimize the multidimensional data. Typically this is only important in those situations where a partition is incrementally processed frequently. For example, suppose you are managing a system with a rolling 48 months' worth of data, grouped into two-month partitions. Twenty-three (23) of those partitions are historic and one is the current two-month period. The current partition is incrementally processed daily to load a day's worth of data. At the two-month period, it has been incrementally processed approximately 60 times. Before you roll off the current partition to historic and start a new current one, you should fully process the partition so that it is resorted and re-optimized. For information on using multiple partitions to enable you to periodically refresh individual partitions and eliminate fragmentation, see "Use Multiple Partitions to Improve Processing Performance" earlier in this section.

Changing Dimensions

If you have a dimension that contains intermediate levels whose members frequently move within a level (such as during a sales territory reorganization), you can mitigate the performance impact of these structural changes by designating the dimension as a changing dimension. These types of changes would otherwise require full processing of the dimension and all cubes that use the dimension. A changing dimension generally does not require full processing if a member below the top level and above the bottom level is added, moved, changed, or deleted; a standard dimension must be fully processed whenever a member is moved, changed, or deleted. After moving, changing, or deleting a dimension member in a changing dimension, the lazy processing background process will drop and then re-create the flexible aggregations.

If the member in a changing dimension that is to be deleted has corresponding fact data, the data must be deleted before the member can be deleted. Deleting data is not a typical operation for an OLAP system and there is no way to remove data from partitions other than by reprocessing and refreshing the data. Typically the way data goes away is that it "expires" or exceeds its useful lifetime. Partitioning is a good way to manage this. For example, suppose you have a cube that is partitioned by month and you want to roll off old periods, such as after twelve months. You can add members for the new period, delete the partition containing the old period, and perform an incremental process to eliminate dimension members that only appear in the old period.

Designating a dimension as a changing dimension can result in uneven query performance. As discussed in "Aggregation Design Considerations" earlier in this paper, when you update any data in a changing dimension, all flexible aggregations are dropped and recalculated as a background task (this is the default). This includes all flexible aggregations on cubes using the dimension. The lazy aggregator calculates these aggregations serially through each partition in the cube. While these aggregations are being recalculated, query performance drops significantly for queries that would have used those aggregations. The recalculated aggregations become available on a partition-by-partition basis.

Remember that Analysis Services does not design flexible aggregations by default. Flexible aggregations exist only when you change the Aggregation Usage property of a changing or virtual dimension to a non-default value.

Service Pack 3 introduced a new option that enables you to mitigate the effect of recalculating flexible aggregations. You can choose to have the flexible aggregations recalculated in one transaction to minimize the performance impact of dimension structure changes. With SP3, you can use the DSO Process method ProcessFullReaggregate to enable this feature on a per-dimension basis or set the DSOCommitMode registry setting to enable this feature on a server-wide basis. When you use this new feature, users can continue to benefit from the existing flexible aggregations until the new flexible aggregations are calculated. The new aggregations are calculated in a shadow folder until all of the aggregations have been calculated; then the new aggregations are made "current" and the old aggregations are deleted in a single transaction. The downside of using this approach is that the old data is visible as "current" until the aggregations have been fully recalculated. In effect, a longer time elapses before the new data is visible to users. In the pre-SP3 mode, which is still the default, while the system runs slower, all of the data is "current" immediately.

Guidelines for Changing Dimensions

Use incremental processing rather than full processing to enable users to continue querying the cube. Use changing dimensions to minimize the frequency with which a dimension needs to be fully processed if dimension changes need to be quickly reflected to users. For example, if you have customers that move between cities and wish to reflect those changes with each daily update, marking the dimension as changing would be appropriate rather than performing a full process daily. On the other hand, if sales territories are rarely changed, it might be better to leave the dimension as a standard dimension and accept that full processing is required for those events. Changing dimensions are particularly useful for large or complex cubes that cannot be easily and quickly processed.

If you decide to employ changing dimensions, consider the following guidelines:

Use the default Aggregation Usage property settings.

Do not change the default Aggregation Usage property settings unless you determine that actual query performance demonstrates the need for aggregations at intermediate levels of a changing dimension or at intermediate or leaf levels in a virtual dimension.
Limit the frequency of changes.

To minimize the performance impact of dimension changes, limit the frequency with which changes are made to a changing dimension, and submit changes in batches to limit the frequency with which flexible aggregations are recalculated. For example, if three new members are added to a changing dimension, the impact on query performance will vary based on whether all three changes were made at one time or at three separate times. In the former case, the flexible aggregations are only recalculated once. However, in the latter case, the flexible aggregations are recalculated three times. Notice that the aggregations were dropped and recalculated despite the fact that only additions were made; the hierarchy was not changed. With a standard dimension, the addition of dimension members does not necessitate the recalculation of aggregations. Aggregations only need to be recalculated when a structural change occurs.
Consider using a dedicated Analysis server.

Since flexible aggregations are recalculated as a background process, using a dedicated Analysis server ensures that the competing demands for processor resources are minimized and that this background process receives sufficient processor cycles to complete in a timely fashion. The Analysis server should have sufficient capacity to perform the background processing without degrading query responsiveness. For more information, see "Place the Relational Database on the Analysis Server" later in this paper.
Consider incorporating updates in a transaction.

If the Analysis server has sufficient memory and uneven query performance is a problem, use the new SP3 feature to ensure that flexible aggregations remain available while the new flexible aggregations are being calculated. If insufficient virtual memory address space is a problem, consider upgrading Analysis Services to SQL Server 2000 (64-bit).

Tune the Relational Database

During processing, new data is retrieved from the relational database. You can take a number of steps on the relational side to increase processing performance.

Consider placing indexes on fact table keys.

If you optimized to eliminate joins in the processing query, do not place any indexes on the fact table when using MOLAP, because the processing query will perform a full table scan on the fact table and will not access any of the dimension tables. In this case, fact table indexes will provide no benefit. If your relational store is using SQL Server and you are not using MOLAP, consider placing a clustered index on the fact table time key and a separate single column nonclustered index on each of the fact table's foreign keys to improve query responsiveness and processing performance. If you are not using SQL Server, the indexing recommendations may be different. In addition, if you have not eliminated the joins in the processing query and are using MOLAP, these indexes may improve processing performance. However, the relational engine incurs a maintenance cost by keeping these indexes current as new data is added to the relational database. If you use indexes, make sure the statistics used by these indexes are kept current.
Place an index on the distinct count measure.

If you are using a distinct count measure, place a clustered index on that measure in the fact table to improve the performance of the processing query sent to the relational engine by Analysis Services. This index will improve processing performance because the processing query uses an ORDER BY clause when retrieving data for cubes with a distinct measure. This index can be clustered because Analysis Services supports only one distinct count measure per cube.
Partition the fact table.

If you are using multiple MOLAP or HOLAP partitions for your cube, using a separate relational fact table (or partition, if your relational database supports partitions) for each Analysis Services partition will improve processing performance. If the fact table is physically partitioned, the relational engine does not need to scan extra data when returning rows to Analysis Services. Fact table partitioning may also enable you to eliminate the join added as a result of the partition slice. If you must keep a single fact table, then consider using a partitioned view rather than a physical table. A partitioned view still has a physical table per partition, but uses the CHECK clause to optimize the query so that only a single partition is scanned. However, you must ensure that the query optimizer makes the proper decisions with regard to optimizing the join through the partitioned view. If you get the CHECK clause incorrect, or if the optimizer decides that it cannot determine in which physical table the query resides, then an entire scan of all partitions results, and performance drops considerably.

In addition, use a native OLE DB driver to access your relational database, if one is available, to maximize performance.

Optimizing the Data Model to Improve Performance

When designing the data model for your cube, your decisions will have a significant impact on both query responsiveness and processing performance. While the impact of a design decision on performance should not dictate the design of your data model, taking into account the performance implications of each design decision will help you improve the query responsiveness and processing performance of Analysis Services. This section delineates the design decisions that have the most significant impact on performance.

Minimize Dimensions, Levels, Members, and Member Properties

Avoid unneeded dimensions, levels, members, and member properties because they have a cumulative negative impact on processing performance. Each additional level or dimension increases the size and number of aggregations and decreases processing performance.

There are also several memory considerations when you use member properties. To increase query responsiveness, Analysis Services stores a copy of each member, level, and member property in each dimension in memory. In addition, each client stores copies in memory of the members, levels, and member properties needed to resolve queries submitted by the client.

Analysis Services stores member properties as Unicode strings, which means that they require two bytes per character. However, duplicate string values are only stored once, and then 4-byte pointers link the members to the string value.

Member properties consume more memory the lower they are defined in the hierarchy. Therefore, you should place a member property at the highest level in the hierarchy that makes sense. For example, if you have a sales region level and a ZIP code level, define the member properties for the region at the sales region level rather than repeating the property on all ZIP codes in the sales region.

As an alternative to member properties, consider the use of an action rather than a member property to retrieve data associated with a member, especially if the data is large (such as a picture or a long description). Actions allow users to look up data at the point when it's needed, rather than always loading the member property into memory.

Consider also why you have a dimension or level in the first place. For example, imagine that you have a very large customer dimension, greater than one million members (or greater than ten million members on a 64-bit system). If you are more interested in the demographics of your customers than in the customers themselves, you could replace the customer dimension with one or more dimensions that articulate all of the possible combinations of demographics (these are sometimes called mini-dimensions). This technique works only if you have a small number of demographics (as the number of their combinations has exponential growth), and requires you to map the customer key in the fact table to the matching demographic member for that customer. However, it can greatly reduce the size of a large dimension.

If the only reason you have a level in a dimension is to support a member property, consider creating a separate physical dimension instead. For example, if you define gender as a member property in a large dimension, you will store a 4-byte pointer to this string value for every member in the dimension, plus approximately 125 bytes for each of the members. Creating a separate physical dimension that contains only a single level will save a significant amount of memory that Analysis Services can use for other tasks (such as the query results cache or for processing).

Choose Appropriate Data Types and Sizes

Selecting the appropriate data type and size can significantly affect query responsiveness, processing performance, cube size, network transfer times, and client memory usage. Use the following guidelines when choosing data types and sizes:

Use integer key values rather than string values for member keys. Numeric data types, such as a 32-bit long integer, perform much better than a string, especially if the string for the member key is different from the member name. Key lookup routines can be 10 to 100 times slower for characters than for integers. As a result, use integers rather than strings whenever possible for best performance. While you can use the "natural" or "production" key from the production database if it is an integer, it is considered a best practice to assign an integer surrogate key for each dimension table in the relational data warehouse. If a string is necessary, use a string with a data length as small as possible.
Use small cube measures. Use the smallest possible data type to represent a cube measure, but make sure the data type is large enough to contain the summarized values when aggregations are calculated. Because fact data represents the vast majority of data in a data warehouse, using a data type larger than necessary will have a negative impact on the overall size of a cube.
Use small member keys. A 4-byte numeric, such as a long integer, performs better than an 8-byte numeric.
Avoid data coercion. Do not create a member key with a data type different from that of the value actually stored in the underlying dimension table. Coercion of data types reduces processing performance.
Avoid creating member properties for data requiring a large data type. Member properties are strings by default, which are stored as Unicode data. Creating a large binary column in a dimension as a member property, such as a binary image, will consume a large amount of dimension memory. You can use an action to implement this functionality and consume less memory. An action also limits the network impact by loading the image only when a user requests it.

Use Unique Member Keys Whenever Possible

Use unique member keys and names whenever possible. Unique member keys allow the Analysis server to use the identifier supplied by the underlying relational database. They also permit faster addressing when MDX queries are run. And with unique member keys, joins to the dimension tables in the relational database can be eliminated during processing.

Use Virtual Dimensions Effectively

A virtual dimension is created from member properties of a physical dimension. Virtual dimensions enable you to display categorical labels from a single dimension table on multiple axes in browsers without increasing cube size. However, if you are simply representing a physical dimension with a different hierarchy, you should use a dimension hierarchy rather than a virtual dimension for optimal performance.

While virtual dimensions are fully indexed to improve performance, by default intermediate and leaf levels are not included in aggregations. To improve query responsiveness, you can enable aggregations using these levels by changing the Aggregation Usage property in Cube Editor to Standard or Custom. However, as discussed in "Design Effective Aggregations" earlier in this paper, aggregations on virtual dimensions are flexible aggregations. Because the dropping and recalculating of the flexible aggregations can result in uneven query responsiveness, consider replacing the virtual dimension with a physical dimension or hierarchy, instead of enabling aggregations on virtual dimensions.

Use Virtual Cubes Appropriately

When you need to query data from multiple cubes, there are two ways you can accomplish this. You can either design a virtual cube across multiple physical cubes that share at least one dimension, or you can use the LookupCube MDX function. Using a virtual cube is most appropriate when many values will be obtained from another cube using shared dimension values. For example, suppose that you have a Sales cube and an Inventory cube and you want to determine the current inventory levels for products that sold more than 1000 units in the last week. If you created a virtual cube across these two cubes, the Time and Products dimensions would be common across both cubes. Analysis Services can resolve the query using the standard multidimensional engine. If the LookupCube MDX function were used for this query, Analysis Services would execute a separate query for each value retrieved from the lookup cube. In this scenario, a virtual cube will perform much better than the LookupCube MDX function.

With a virtual cube, each of the dimensions in the underlying physical cubes is included in the virtual cube. Because of this, cube sparsity can degrade query responsiveness under some conditions when very large dissimilar cubes are used to create a virtual cube. For example, MDX functions involving empty cells, such as the CoalesceEmpty or NonEmptyCrossJoin (NECJ) MDX functions, take slightly longer to process due to the sheer number of empty cells that Analysis Services must be negotiate in responding to a query. In this situation, and also when you only need to return a few values from another cube, the LookupCube MDX function can be used instead of a virtual cube, and will also yield good query performance.

Use Distinct Count Measures Appropriately

Cubes with distinct count measures take longer to process, for two reasons: because the processing query against the relational database uses an ORDER BY clause, and because distinct counts dramatically increase the size of each aggregation. Large aggregations take longer to calculate because of their size, and because they may require Analysis Services to use temporary files to supplement the process buffer memory. Cubes with distinct count measures are also slower to query because Analysis Services must query each partition serially rather than querying partitions in parallel, which it normally does.

To solve these problems, optimize the relational database for distinct count measures, create a separate physical cube for the distinct measure with the same dimensions as the initial cube, and then use a virtual cube that links these two cubes together to give the appearance to the user of one seamless cube. Remember that you can only have one distinct count measure per physical cube. Analysis Services resolves user queries that do not use the distinct count measure against the initial cube in parallel with queries that require the distinct count measure, using aggregations where appropriate.

You can design the appropriate set of aggregations for the initial cube, without regard to the explosive effect of distinct count measures on aggregation size. You can then design aggregations selectively on the distinct count measure cube. Aggregations on distinct count cubes are significantly larger and slower to compute, and their cost may not justify the gain in query responsiveness. After creating a few aggregations on a distinct count measure cube, you should test query performance to determine how much improvement in query performance the aggregations actually provide.

When using a distinct count measure, avoid nulls in the source data. A null cannot be compared with another null. Also, as a general rule, you should avoid nulls altogether in your data warehouse. Dimensionally, nulls have no meaning — you should have a specific "unknown" member if you want to capture nulls in your dimensions. Nulls are not permitted in dimension foreign keys of fact tables. Nulls as measures at the fact level are converted to zeros, which you probably do not want.

For more information about optimizing the relational database when you use distinct count measures, see "Tune the Relational Database" earlier in this paper.

Limit the Use of Complex Named Sets

Named sets are useful for sharing common definitions, such as the current month or last eight quarters. A named set can contain a constant set of members or an expression that resolves to a set. Named sets are evaluated and resolved at connection time for each client connection, regardless of whether the client actually uses the named set in a query.

Experience has shown that simple named sets perform well, but that complex named sets (such as an expression filtering for the top five customers) can reduce query responsiveness, because each complex named set requires the scanning of many cells by Analysis Services at connection time. For example, if you have one million customers and have a named set containing an expression that evaluates the top five customers, Analysis Services must evaluate all one million customers to determine the top five customers for each client that establishes a connection. This query is a very expensive query against a large cube with many dimension members. If you have multiple named sets, each must be resolved at connection time before the client can begin querying.

To improve performance in this scenario, consider enumerating the top five customers in the named set definition, rather than calculating the top five customers with every connection. You can then periodically update the contents of the named set by changing the named set definition (this does not require any processing).

Use a Virtual Cube to Group Measures by Usage

When a client queries a cube, all measures in the cube are read from disk and downloaded, regardless of the measures specified in the query. In most cases, this design feature increases query responsiveness for many users at the cost of some client memory. As the number of cube measures increases, the internal disk record for each cell storing data in the cube gets wider; wider records require more I/O bandwidth and network bandwidth to download. When clients are using a connection with limited capacity, downloading a large number of measures may consume much of the available bandwidth. In this case the queries may appear to take a long time to resolve, while in fact the queries are running quickly on the server.

You can change this behavior by creating multiple physical cubes and grouping measures in each physical cube by use. In the first physical cube, place the first set of measures that are typically queried together. In a second physical cube, group the second set of measures that are typically queried together. Repeat this process for each group of measures. When you are finished creating these physical cubes, create a single virtual cube over all of the physical cubes. When a client queries the virtual cube, only the measures in the underlying physical cube that Analysis Services uses to respond to the query are retrieved from disk and downloaded (rather than all of the measures in the virtual cube). Because measures are grouped by usage, unnecessary measures are not read and downloaded, which reduces the number of bytes transferred and returns the resolved query to each client faster.

As a general rule, you should certainly consider doing this when you have more than 40 measures in a cube, and consider organizing measures into groups of 20 or fewer members. The result will be less disk I/O on the server and less network bandwidth to support client queries.

Use Query Time Calculations Sparingly in Very Large and Complex Cubes

Calculated cells, calculated members, custom member formulas, custom rollup formulas, custom rollup operators, and unary operators are all calculated dynamically when a user submits a query that requires their calculation. Analysis Services does not save the calculated results in the query results cache, so these calculations must be re-executed each time they are requested. Simple calculations are performed very quickly, but Analysis Services may consume a significant amount of time resolving queries that invoke these calculations when complex calculations are involved. The performance impact is most pronounced when users query very large and complex cubes (in excess of several hundred gigabytes) with calculations that require large amounts of data to be evaluated to derive a result.

While complex calculations should be used sparingly in very large and complex cubes, this discussion should not discourage you from incorporating the many simple, useful calculations that can be performed at query time, such as change from last year, rolling average, or percent contribution to total. Calculations involving large dimensions that descend to the lowest level to obtain their results take the longest to resolve. An example of such a calculation is the ranking of all customers in a large customer dimension or the computing of a median value (which requires ordering all the items in the set to find the middle value).

Use the following guidelines when working with complex query time calculations:

Ensure that the additional functionality provided through the use of complex query time calculations justifies the cost in query responsiveness. In particular, it is best whenever possible to avoid calculations that recursively operate on cells or members, or scan large numbers of members to obtain the desired result.
When you use a calculated member on the Measures dimension, set the Non Empty Behavior property to the name of a measure, in order to treat the calculated measure as empty if the specified measure is empty. Otherwise, computations that include the NON EMPTY keyword can be slow and consume a lot of client resources, because the calculation is performed for every row or column just to determine whether the result is empty. On sparse cubes, the computation time can be cut dramatically.
Consider shifting the performance impact of complex calculations to the processing phase. You can accomplish this by using views or computed columns in the relational database or by using SQL queries to define computed columns in the Measures dimension.

Optimizing Hardware Resources to Improve Performance

Query responsiveness and processing performance require sufficient and properly configured hardware resources. This section discusses how Analysis Services uses existing resources, the impact of insufficient resources, and the configuration settings that enable you to make the best use of existing resources. For more information on the registry entries mentioned in this section, see Registry Entries for Microsoft SQL Server 2000 Analysis Services in the MSDN library. For information on monitoring the use of existing resources using Analysis Services counters, see Appendix B, "Interpreting Performance Monitor Counters."

Provide Sufficient and Appropriately Configured Memory

Processing performance and query responsiveness depend on sufficient physical memory, appropriately configured paging files, and sufficient virtual memory address space. Analysis Services balances the competing demands of processing and querying operations using a number of memory management techniques.

Analysis Services relies on Microsoft Windows® virtual memory for its memory page pool. Unless you are using SQL Server 2000 (64-bit), the maximum amount of virtual memory that an Analysis Services process can address is 3 gigabytes (GB). By default, an Analysis Services process can only address 2 GB. For information on enabling Analysis Services to address 3 GB, see "Configure Available Memory Effectively" later in this section. Analysis Services cannot allocate physical or virtual memory beyond its virtual address space. This means that because most Analysis Services activity occurs within the main process space, Analysis Services can only use more than 3 GB of physical memory for limited purposes. Additional physical memory may be usable on the Analysis server by other applications on the same server, such as SQL Server. However, SQL Server 2000 (64-bit) is not limited by a 3-GB virtual address space limit, enabling the 64-bit version of Analysis Services to use as much address space as it needs.

You can frequently increase the overall performance of your Analysis server by adding additional physical memory, provided that Analysis Services does not hit the 3-GB virtual address space limit. To properly configure Analysis Services to maximize the use of memory on the Analysis server, you need to understand how Analysis Services uses memory.

How Analysis Services Uses and Manages Memory

Analysis Services requests allocations of memory from the Windows operating system for a variety of purposes, and returns that memory to the Windows operating system when the allocated memory is no longer needed. Analysis Services employs a number of mechanisms to keep the amount of memory allocated to it within a certain range. The range is defined by two settings on the Environment tab of the Server Properties dialog box in Analysis Manager: the Memory conservation threshold setting (the HighMemoryLimit value in the registry) and the Minimum allocated memory setting (the LowMemoryLimit value in the registry). The default value for the Memory conservation threshold setting is the amount of physical memory on the Analysis server; the default value for the Minimum allocated memory setting is half the amount of physical memory on the Analysis server. If you change the amount of memory on the Analysis server after installation, you must manually modify these values. Analysis Services uses memory buffer size settings that limit the amount of memory used for certain purposes and a cleaner thread that reduces the amount of memory allocated to Analysis Services when the amount of memory allocated reaches the half-way point between the Memory conservation threshold setting and the Minimum allocated memory setting.

When the cleaner thread is activated, it begins evicting entries in the query results cache, based on a cost/benefit algorithm that takes into account a variety of factors, including how frequently the data is being used, the amount of resources required to resolve the entries, and how much space is being consumed by related entries. By default, the cleaner thread runs at below-normal priority. The frequency with which it runs is determined by the BackgroundInterval registry setting. This setting actually governs the number of seconds between processing periods for a variety of background tasks, including the cleaner thread, query logging, and lazy processing. The default value is thirty seconds. If you want to set an interval for the cleaner thread separate from these other background tasks, add the CleanerInterval registry key and set a value just for the cleaner thread.

When the amount of memory used by Analysis Services exceeds the Memory conservation threshold setting, Analysis Services increases the priority of the cleaner thread to normal, to more quickly reduce the allocated memory to the Minimum memory allocation setting. If the total memory allocated to all Analysis Services tasks exceeds the memory conservation threshold by more than approximately 6.25 percent, Analysis Services immediately begins dropping the cache entries for entire cubes in order to quickly reduce the amount memory used by Analysis Services. In this scenario, because Analysis Services is shedding memory extremely quickly, the total amount of allocated memory may drop below the Minimum memory allocation value.

Having sufficient memory for all of the Analysis Services tasks increases query responsiveness and processing performance. Properly configuring available memory will maximize the use of memory, limit the use of disk resources for processing and memory paging, and prevent the cleaner thread from evicting cache entries too quickly. In order to properly configure available memory and add memory when it is needed, you need to understand how Analysis Services uses memory. Understanding how Analysis Services uses memory also helps you understand the impact of data model design decisions.

This section describes the different ways in which Analysis Services uses memory and describes how you can control the amount of memory used by each memory consumer.

Dimension Memory

On startup, Analysis Services loads into memory all MOLAP dimension members for all databases on the Analysis server, and all of their member properties, to help increase query responsiveness. By default, MOLAP, HOLAP, and ROLAP cubes contain MOLAP dimensions. If you create a ROLAP cube and specify ROLAP dimensions, these dimensions are not loaded into memory. New dimensions are added to memory as they are created. The amount of memory consumed by an existing dimension in memory is only adjusted during dimension processing. This means that large MOLAP dimensions can consume a significant amount of virtual memory on the Analysis server, reducing the address space that is left for other tasks.

For an estimate of the memory space required to hold each dimension, you can look at the sizes of the files that hold the dimension structure in the file system. For shared dimensions, this dimension structure information is stored in four types of files: .dim, .dimcr, .dimprop and .dimtree. You can find these files in the database folder for the cube, which is stored in the Analysis Services data folder. The amount of memory required for dimension memory is approximately equal to the sum of the sizes of these files.

If you need to estimate the space required before the dimensions have been defined, such as when planning a hardware purchase, you can use the following formula as an approximation:

DimSize = CMembers*(61 + 4*CLevels + Size(name) + Size(key)) + 4*CProps + Size(props)

where:

CMembers = the total number of members in the dimension.

CLevels = the number of levels in the dimension, including the All level.

Size(name) = the average size required to hold the member names. For example, a 10-character string stored as Unicode requires 20 bytes.

Size(key) = the size required to hold the member key. Example: an integer key requires 4 bytes. If the member name is the same as the member key, Size(key) is zero.

CProps = the number of member property settings in the dimension for all levels. For example, if a level with 1000 members has two properties on each member, there are 2000 property settings for that level.

Size(props) = the size required to hold the distinct member property values for all member levels. Remember that all member properties are stored as Unicode strings and each unique string is only stored once. For example, a customer gender property with possible values of Male, Female and Unknown requires only 34 bytes of storage (17 characters x 2 for Unicode) no matter how many times they are referred to.

Analysis Services will use as much memory as required for dimension memory. You can control the amount of memory used as dimension memory by eliminating unnecessary dimensions, levels, and member properties. In addition, because all dimension information in all cubes on the Analysis server is loaded into memory when Analysis Services starts, you should also eliminate unnecessary test cubes and unused dimensions on the Analysis server to save memory. Analysis Services attempts to prevent large dimensions from using all of the available virtual memory address space by loading very large dimensions in a separate process space with its own virtual memory address space. A very large dimension is one that exceeds the VLDMThreshold value in the registry. The default VLDM threshold is 64 megabytes (MB). While using a separate address space for dimensions that exceeds the VLDM threshold does save virtual memory address space for other uses in the main process, overall performance slows when one or more dimensions exceed the VLDM threshold. Analysis Services in SQL Server 2000 (64-bit) does not use a VLDM threshold because this version does not have a 3-GB virtual address limit.

Query Results Cache

Analysis Services stores data returned by client queries (but not calculated data) in its query results cache. The cleaner thread begins evicting entries from the query results cache when the memory used by the Analysis Services process exceeds the halfway point between the Memory conservation threshold and Minimum allocated memory settings. As the amount of memory allocated for other Analysis Services uses increases, the amount of memory for the query results cache decreases.

Connection Memory

Analysis Services allocates approximately 32 kilobytes (KB) for each client connection. If Analysis Services is supporting a large number of connections, the memory required for each connection reduces the amount of memory available for the query results cache. When a client specifies that the execution location for query is the Analysis server (this is called a remote query), additional memory is required to service the query. The amount of memory that can be allocated to a remote query is determined by the value of the AgentCacheSize registry key. By default, up to ten percent of memory on the Analysis server can be allocated to each agent cache. Because more than one of these caches can be allocated at the same time (to service multiple clients issuing remote queries), reduce this value when many remote queries are being evaluated to reserve memory for the query results cache. For more information on remote queries, see the "Optimizing Clients for Slow Network Connections" section later in this paper.

Replica Dimensions

When a user queries a cube that is secured by dimension security, Analysis Services calculates and loads replica dimensions for each unique combination of security roles actually used by the client. For example, if a user is a member of the two roles of plant users and plant administrators, then the final list of permitted and denied members is the combined list. A replica dimension contains all permitted members plus their siblings, their ascendants, and the siblings of the ascendants; the names and properties of members that the client is not permitted to view are removed.

Replica dimensions are unloaded when the dimension (or cube containing the dimension) is processed, Analysis Services is restarted, or role membership changes. Replica dimensions are not unloaded when the client disconnects. This enables replicas to be reused across clients that have the same permissions. Before Analysis Services builds a new replica, it checks to see whether a replica already exists with the same allowed and denied list. If so, the replica is reused. If not, a new replica is created. Analysis Services creates many different replicas if there are a large number of security roles (for example, 100) or if dynamic security is used and each user has a different allowed and denied set. In dynamic security, the list of permitted and denied users is based on a custom MDX statement containing the USERNAME function. Replicas will be shared among users, and with security roles, if the list of allowed and denied member sets are the same.

If a cube contains a large number of security roles, combines roles by user, and has a large number of roles that can see all but a few members, replicas can consume a substantial amount of memory on the Analysis server. In this scenario, use cell-level security rather than dimension security to limit the impact of security on performance.

Shadow Dimensions

When Analysis Services processes a dimension, it creates the dimension in memory as a shadow dimension until processing is complete. While Analysis Services is processing a dimension that has been processed before, user queries are resolved against the previously existing dimension in memory until the processing transaction commits. After the dimension is processed, the old dimension is released from memory and user queries are resolved against the newly processed dimension. If dimensions are processed as part of cube processing (as a single transaction), the creation of shadow dimensions can have a major impact on memory if the cube has large dimensions. When the dimensions of a cube are processed as a single transaction, a shadow copy of each dimension is stored in memory until the transaction commits. While Analysis Services loads dimensions larger than the VLDM threshold in a separate process address space, the shadow dimensions are always created in the main process address space.

If the Analysis server has insufficient memory, processing all existing dimensions in a single transaction can fail. All dimensions are processed in a single transaction whenever you click Process all dimensions or Process the database in Analysis Manager, or specifically process all dimensions using Decision Support Objects.

Read-Ahead Buffer

Analysis Services creates a read-ahead buffer during processing for each partition being processed simultaneously. Analysis Services uses the read-ahead buffer when retrieving data from the relational database to maximize data throughput. Asynchronously, Analysis Services loads data from the read-ahead buffer into the process buffer. The Read-ahead buffer size setting in the Server Properties dialog box in Analysis Manager (the ProcessReadAheadSize value in the registry) determines the size of each read-ahead buffer. By default, the size of the read-ahead buffer is 4 MB; experience indicates that the read-ahead buffer is rarely a performance bottleneck.

Process Buffer

Analysis Services creates a process buffer during processing for each partition being processed simultaneously. It uses this buffer for two separate tasks.

First, Analysis Services loads fact data from the read-ahead buffer into the process buffer and then sorts, indexes, and writes the fact data into the fact level of a MOLAP partition file in segments. Each segment consists of 65,536 rows; the number of bytes in each segment varies based on the size of each row. The sorting process includes as much data as can be held in the process buffer at one time. A large process buffer enables Analysis Services to improve the data organization among segments.

Second, the process buffer is used during the calculation of aggregations.

The Process buffer size setting in Analysis Manager (the ProcessReadSegmentSize value in the registry) determines the maximum size of each process buffer. By default, the maximum size of the process buffer is approximately 32 MB.

If the size of the process buffer is large enough to efficiently sort and index large portions of the fact data before it is written to segments in the partition file, overall data organization and query responsiveness improves. Furthermore, if the fact table contains many duplicate records, a large process buffer allows Analysis Services to merge duplicate records in memory, which also saves space and improves query speed.

If Analysis Services exceeds the size of the process buffer while creating aggregations, Analysis Services uses temporary files to augment the memory in the process buffer. In this case, Analysis Services moves aggregations between these temporary files and the memory allocated to the process buffer as the aggregations are being calculated. Reading and writing these temporary files is much slower than in-memory calculations and very I/O-intensive. You should tune your system to eliminate the use of these temporary files. A procedure for doing this is given in Appendix D, "How to Tune the Process Buffer Size." Remember that all aggregations for a partition are calculated at once and must fit into memory; otherwise temporary files will be used.

Each partition being processed in parallel has its own process buffer space limit. When processing multiple partitions in parallel or processing an entire cube in a single transaction, you must ensure that the total memory required for the process buffers, shadow dimensions, and other memory requirements does not exceed the memory conservation threshold. If Analysis Services reaches the memory conservation threshold during processing, it will not allow the process buffer to grow and may cause temporary files to be used during aggregation processing. Furthermore, if you have insufficient virtual address space for these simultaneous operations, you may receive out-of-memory errors. If you have insufficient physical memory, memory paging will occur.

Configure Available Memory Effectively

While adding memory can frequently improve the overall performance of Analysis Services, merely adding memory is not sufficient. Configuring memory appropriately is particularly important as you add memory to the Analysis server. When you properly configure existing memory, you increase performance during querying and processing. Use the following guidelines when configuring Analysis Services:

Consider using SQL Server 2000 (64-bit).

If you have one or more very large and complex cubes and your Analysis Services memory needs cannot be met within the 3-GB address space, SQL Server 2000 (64-bit) allows the Analysis Services process to access more than 3 GB of memory. You are most likely to exceed the 3-GB limit with very large dimensions or when processing multiple partitions in parallel and you have configured a large process buffer. For more information about considering SQL Server 2000 (64-bit), see Appendix F, "When to Consider Moving to SQL Server 2000 (64-bit)."
Consider enabling up to 3 GB of virtual address space if you need additional virtual address space for large dimensions, processing partitions, and the query results cache.

If you have one or more large or complex cubes and are not using SQL Server 2000 (64-bit), use Windows 2000 Advanced Server or Datacenter Server with SQL Server 2000 Enterprise Edition (or SQL Server 2000 Developer Edition) to enable Analysis Services to address up to 3 GB of memory. Otherwise, the maximum amount of memory that Analysis Services can address is 2 GB.

To enable Analysis Services to address more than 2 GB of physical memory with either of these editions, enable the Application Memory Tuning feature of Windows 2000. To accomplish this, you must use the /3GB switch in the boot.ini file and also set the Memory conservation threshold value in Analysis Manager. When configuring the memory conservation threshold, do not set this value to more than approximately 2.7, GB to ensure that the cleaner thread has sufficient time to respond to low memory conditions and to reduce allocated memory before Analysis Services uses the entire 3-GB address space. Remember, the memory conservation threshold does not prevent Analysis Services from using more memory than this threshold. For more information on setting the /3GB switch, see How To Enable Analysis Server To Use 3 GB of RAM in Knowledge Base.

If you set the /3GB switch in the boot.ini file, the Analysis server should have at least 4 GB of memory to ensure that the Windows operating system also has sufficient memory for system services. If you are running other applications on the Analysis server, you must factor in their memory requirements as well.
Set the VLDM threshold appropriately.

Evaluate the size of your dimensions and your available virtual memory address space. If Analysis Services has sufficient virtual memory address space in the main process space, increase the VLDM threshold to a value larger than 3 GB. Analysis Services must have enough virtual memory address space to do all of the following:
- Load all dimensions into memory.
- Load all shadow dimensions during processing.
- Perform all processing without using temporary files (have a sufficiently large processing buffer).
- Include a sufficiently large query results cache.
If Analysis Services does not have enough virtual memory address space in the main process space for all these tasks, , then set the VLDM threshold so that only the largest dimensions are loaded into a separate address space. Remember that the shadow dimensions for these large dimensions are still loaded in the main address space during processing.
Set the process buffer size appropriately.

If the process buffer size is too small, Analysis Services supplements the process buffer with temporary files when calculating aggregations. If the process buffer is too large and processing actually uses most or all of the memory specified for the process buffer, there may be insufficient memory available for the query results cache. It is not uncommon to set the process buffer size to 300 or 500 MB on servers with large cubes. A procedure for finding an appropriate process buffer size is given in Appendix D, "How to Tune the Process Buffer Size."

To determine whether a temporary file is being used during processing, monitor the Temp file bytes written/sec or the Temp file rows written/sec counters on the Proc Aggs object in System Monitor. To ensure maximum processing performance, these values should remain at zero for the duration of the processing operation. For more information about Performance Monitor counters, see Appendix B, "Interpreting Performance Monitor Counters."
Consider increasing the size of the operating system paging files.

If you cannot add additional physical memory to increase performance, increasing the size of the paging files on the Analysis server can prevent out–of-memory errors when the amount of virtual memory allocated exceeds the amount of physical memory on the Analysis server.
While these steps can assist Analysis Services in using available memory effectively, do not overlook the impact of design decisions on the use of memory. Modifying a design that contains excessive dimensions, levels, or member properties, or using large data types, can also enable Analysis Services to use memory more efficiently. For more information, see "Optimizing the Data Model to Improve Performance" earlier in this paper.

Ensure Analysis Services Has Sufficient Processor Resources

Analysis Services uses processor resources for both querying and processing. Increasing the number and speed of processors can significantly improve processing performance and, on cubes with a large number of users, improve query responsiveness as well. However, unless you have a large number of users, Analysis Services is more likely to be constrained by insufficient processor resources during processing than during querying.

How Analysis Services Uses Processor Resources during Querying

Analysis Services uses listener, worker, and process threads to respond to user queries.

The listener thread waits for new connection requests. It is responsible for creating server connections as needed. The listener thread also manages the worker thread pool; it assigns worker threads to requests, initiates worker threads if there are not enough active worker threads in the pool, and terminates idle worker threads as needed.

One worker thread is required for each query being handled concurrently. Worker threads check the query results cache for data pertinent to a client request and store the results obtained by process threads in the query cache to resolve future queries. They also allocate process threads to retrieve data from storage. The number of worker threads initially started by the listener thread is determined by the value of the InitWorkerThreads registry setting; its default setting is 4. The maximum number of worker threads maintained in the worker thread pool is determined by the value of PoolWorkerThreads in the registry; its default setting is 30. If many queries are to be serviced concurrently, this number will need to be increased. The number of Analysis Services threads that can simultaneously use each processor on the Analysis server is determined by using the Maximum number of threads setting in Analysis Manager (the WorkerThreads value in the registry); its default setting is 2.

Process threads retrieve data from storage, asynchronously querying and retrieving stored data in parallel. Different process threads can concurrently retrieve data from different segments. The value of the PoolProcessThreads registry setting determines that maximum number of process threads maintained by the process thread pool; its default setting is 40. A process queue holds additional work items when there are not threads available. The maximum number of process threads that can simultaneously use each processor available on the Analysis server is determined by the ProcessThreads value in the registry; its default setting is 2.

How Analysis Services Uses Processor Resources during Processing

Analysis Services processes MOLAP partitions in two stages. Each stage is a multi-threaded activity.

During the first stage of processing a MOLAP partition, Analysis Services populates the read-ahead buffer from the relational database, populates the process buffer from the read-ahead buffer, and writes segments to the partition file. Analysis Services utilizes multiple threads for this stage, which execute asynchronously. This means that while data is being added to the read-ahead buffer, data is being moved from the read-ahead buffer into the process buffer and sorted into segments. When a segment is complete, it is written to the partition file. Processor usage during this first phase depends on the speed of the data transfer from the relational tables. Generally this stage is not particularly processor-intensive, using less than one processor. This stage is generally limited by the speed of retrieving data form the relational database. The Read-ahead buffer size setting in the Analysis Manager Server Properties dialog box (the value of the ProcessReadAheadSize registry setting) determines the maximum amount of data that can be staged in memory before being moved into the process buffer. While setting this value too low can cause unnecessary context switches, experience has shown that the default size of 4 MB is usually adequate.

During the second stage, Analysis Services creates mapping files and computes aggregations for the data. Analysis Services utilizes multiple threads for this stage, executing these tasks asynchronously. These threads read the fact-level data into the process buffer from the MOLAP cube. (If HOLAP is used, these threads read fact-level data from the fact table.) If sufficient memory is allocated to the process buffer, these aggregations are calculated entirely in memory. If Analysis Services does not have sufficient memory in the process buffer to calculate all aggregations in memory, Analysis Services uses temporary files to supplement the process buffer. Analysis Services spools partially completed aggregations to these temporary files while it calculates additional aggregations in memory. Analysis Services then merges these aggregations with the aggregations in the first temporary file, and writes them to a second temporary file. This process continues until all aggregations have been calculated. Aggregations are not written to the partition file until all aggregations have been calculated. This stage can be processor-intensive; Analysis Services will take advantage of multiple processors if they are available. If temporary files are employed, I/O to those files slows the entire process and causes under-utilization of the processor resources.

As this section shows, both MOLAP and HOLAP partitions can obtain significant benefits from multiple processors. ROLAP partitions use the processor resources on the relational database computer during processing, which may be on the same or a different computer.

Using Available Processor Resources Effectively

While adding additional processor resources can improve the overall performance of Analysis Services, use the following guidelines to use existing processor resources effectively.

Consider increasing the process buffer size.

If you have sufficient memory, and the temporary files are being used during partition processing, increase the size of the process buffer to improve processing performance by eliminating temporary file I/O. A procedure for finding an appropriate process buffer size is given in Appendix D, "How to Tune the Process Buffer Size."
Add physical memory to avoid memory paging.

If the Analysis server has sufficient memory, the Windows operating system will not need to page memory from disk. Paging reduces processing performance and query responsiveness.
Avoid context switching during processing.

Processing too many partitions at once is counter-productive, because it causes context switching.

Number of processors

Partitions to process in parallel

Four

Two or three

Eight

Four or five

Sixteen

Four to six, measured on SQL Server 2000 (64-bit)
Consider increasing the PoolProcessThreads registry setting to increase query responsiveness.

By default, the value of the PoolProcessThreads registry setting is set to 125 percent of the value of the PoolWorkerThreads registry setting, because each worker thread can allocate more than one process thread to complete a request. However, if the Analysis server consistently handles numerous and complex requests, you can raise the value of the PoolProcessThreads registry setting to 150 percent or more of the PoolWorkerThreads value, to reduce queuing and improve query responsiveness.
Consider increasing the number of worker threads to avoid connection failures.

If Analysis Services returns errors to queries during peak usage periods, you can increase the number of worker threads by increasing the value of the Maximum number of threads setting in Analysis Manager (the WorkerThreads value in the registry).

Number of processors	Partitions to process in parallel
Four	Two or three
Eight	Four or five
Sixteen	Four to six, measured on SQL Server 2000 (64-bit)

In addition to these recommendations, consider distributing processor requests across multiple computers if you are unable to add additional processing resources to the Analysis server. For more information, see "Optimizing the Architecture to Improve Performance" later in this paper.

Minimize and Distribute Disk I/O

Analysis Services uses disk I/O resources for both querying and processing. Increasing the speed of your disks, spreading the I/O across multiple disks, and using multiple controllers, can significantly improve processing performance. These steps will also significantly improve query responsiveness when Analysis Services is required to perform fact-level scans. However, unless you have a large number of queries that require fact-level scans, Analysis Services is more likely to be constrained by insufficient disk I/O during processing than during querying.

How Analysis Services Uses Disk I/O During Processing

During processing of a MOLAP partition, Analysis Services first reads records from the relational database, sorts and organizes them into segments, and then writes these segments to multidimensional storage on disk as file system writes. Analysis Services then reads this data back to create aggregations. After aggregations are computed, Analysis Services writes these aggregations to disk (again as file system writes).

If the process buffer is not large enough to contain all of the aggregations being calculated, Analysis Services spools them to a temporary file. As aggregations are computed for each segment of fact-level data, existing aggregations are read from the first temporary file, merged with the newly calculated aggregations, and then written to a second temporary file. This reading and writing of aggregation data to and from temporary files is very disk-intensive, and significantly slows processing performance. Avoid using temporary files if possible, by increasing the process buffer size or reconsidering the aggregation design to reduce the number and size of the aggregations.

How Analysis Services Uses Disk I/O During Querying

During querying, Analysis Services may request arbitrary parts of the data set, depending on user query patterns. This I/O is essentially random. However, if commonly used portions of the cube (particularly the mapping files) fit in the file system cache, the Windows operating system may satisfy the I/O requests from memory rather than generating physical I/O. With large cubes, using a 64-bit version of the Microsoft Windows Server 2003 family increases the amount of memory that the operating system can use to cache Analysis Services requests. With sufficient memory, much of the cube can be stored in the file system cache.

Using Available Disk I/O Effectively

While increasing disk I/O capacity can significantly improve the overall performance of Analysis Services, there are several steps you can take to use existing disk I/O more effectively.

Increase the process buffer size.

If the process buffer size is set too low, Analysis Services uses temporary files to store aggregations during processing and to generate substantial disk I/O. A procedure for finding an appropriate process buffer size is given in Appendix D, "How to Tune the Process Buffer Size."
Place the multidimensional data file on a fast disk subsystem.

To optimize disk access for querying and processing, place the Analysis Services data folder on a dedicated disk subsystem (RAID 5, RAID 1+0, or RAID 0+1).
If you must use temporary files, then optimize temporary file disk I/O.

The location of the default temporary file is controlled by the Temporary file folder setting in Analysis Manager (the TempDirectory registry setting). The default location of this temporary file is the ...\Microsoft Analysis Services\Data subdirectory. If a temporary file is used during processing, you can increase processing performance by placing this temporary file on a fast disk subsystem (such as RAID 0 or RAID 1+0) that is separate from the data disk. You can further increase temporary file performance by splitting writes between two temporary files located on separate physical drives. To add a second temporary file location, add the TempDirectory2 registry key and specify a location on a separate physical drive for a second temporary file. When you must use temporary files, using two temporary files increases processing performance because one temporary file is always being sequentially read while the other temporary file is being sequentially written.
Consider adding page files.

If you cannot add sufficient physical memory to avoid memory paging, create multiple paging files on different drives to spread disk I/O across multiple drives when memory paging is required.
Consider enabling the UnbufferedThreshold registry setting.

By default, when Analysis Services opens files, they are cached (buffered) by the Windows operating system. If you have very large cube files that exceed the size of the file system cache, consider enabling the UnbufferedThreshold setting and specifying a file size. Files larger than this threshold (such as 100 MB) are not cached by the operating system. This can prevent one large file from using the entire file system cache and evicting other cache entries.

Ensure Sufficient Network Bandwidth

Insufficient network bandwidth can affect both query responsiveness and processing performance.

Clients that are connected to Analysis Services using slow connections experience slower query response times from Analysis Services than clients using fast connections. For more information on optimizing query responsiveness over slow network connections, see "Optimizing Clients for Slow Network Connections" later in this paper.

If the relational database is located on a different server from the Analysis server, processing performance can be negatively affected if the network is congested. For more information on the impact of architecture on processing performance, see "Optimizing the Architecture for Overall Performance" later in this paper.

The techniques that you can implement to increase network performance are beyond the scope of this paper, but are well documented. For example, you can place the Analysis server on the same network segment as the relational database or create a dedicated network segment between these two servers. For more information on network issues that relate to SQL Server, see Chapter 11, "Configuring Microsoft SQL Server on the Network," in Microsoft SQL Server 2000 Administrator's Companion. Also see "Monitoring Network Activity" in System Monitor online Help.

Optimizing the Architecture to Improve Performance

The architecture of your data warehouse deployment can have a significant impact on query responsiveness and processing performance. This section provides an overview of the architectural issues that have the most direct impact on performance.

Optimizing the Middle Tier

When a client queries Analysis Services through Microsoft Internet Information Services (IIS), IIS makes calls to the PivotTable Service (PTS) to obtain data from Analysis Services. PTS makes the connection to Analysis Services and submits the client query for resolution. However, PTS was optimized for a two-tier client server environment, not a three-tier environment. In this environment, consider taking one or more of the following actions improve query responsiveness.

Enable connection pooling.

Establishing and closing connections to Analysis Services requires time and resources. Pooling connections results in a significant increase in query responsiveness, because existing connection objects can be reused rather than discarded. To enable PTS to use connection pooling, install XML for Analysis. Installing XML for Analysis installs the XML for Analysis Provider, which makes connection pooling available automatically for all applications.

With connection pooling, the connection pool object attempts to use a free connection with the client SID information. If one is not found, the connection pool object parses the client connection information to determine whether a free connection for the same requested database already exists in the connection pool. If a database match is found, the connection pool object attempts to match the role security of the client request to the role security of the existing free connection. If a role security match is found, the connection pool object compares the user name on the free connection to the user name of the client request. If the user names match, the free connection is returned to the client application. If the user names do not match, the free connection is re-authenticated against role security on the Analysis server, using the domain and user name of the client request. It is then returned to the requesting client application. If a role security or database match is not found, a new connection is created in the connection pool and allocated to the requesting client application.
Unlike typical approaches to resource sharing, this approach reuses existing active connections that have identical role security privileges, even if a different user originally requested that connection. The new user name associated with the free connection is still authenticated, and therefore maintains security, but the existing connection can be reused. This reduces connection time and overhead for a client application, such as PTS, that services a large number of concurrent users. For client applications that perform many operations and repeatedly request and return connections, this mechanism is particularly efficient. For more information on connection pooling, see Connection Pooling with SQL Server 2000 Analysis Services in the MSDN library.
Modify the PTS client cache setting.

Because PTS was optimized for a two-tier environment, the default configuration for PTS uses up to 25 percent of the available memory on the client computer for data caching. If PTS uses up to 25 percent of the available memory for each connection, the IIS server may become starved for memory as the number of client connections increases. You can specify the maximum amount of memory used for data caching for each client by changing the Client Cache Size property in the connection string that IIS uses to connect to Analysis Services. The appropriate setting will vary depending on the number of users connecting to Analysis Services through IIS.
Control dimension memory use.

As client queries are resolved using PTS, all dimension members returned are cached on the client side (due to the meta data caching architecture of PTS). In a two-tier client-server environment, this design feature optimizes performance by reducing the number of round trips to the server over the course of a client session with Analysis Services. However, in a three-tier environment with connection pooling enabled, cubes with enormous dimensions (millions of members) can cause the amount of memory consumed for dimension members to continually increase as pooled connections are reused.
If the connection pool is being controlled by IIS (for example, if you are allocating and controlling connections for ASP and ASP.NET applications), this caching of dimension memory will appear to the IIS administrator as a memory leak. The amount of memory consumed by dimension members cannot be controlled directly. You can use the Large Level Threshold property in the connection string to specify the point at which a level is too large to be sent to the client in a single piece. Reducing this threshold will reduce the amount of dimension memory consumed on the middle tier by each connection. You can also set the value of the HugeLevelThreshold registry setting on the Analysis server to specify the minimum number of members used, in order to indicate a large level for all client connections.
To clear the dimension members from memory, close and restart the connection using one of the following methods:
- Structure the application so that it can be restarted independently of IIS, and then re-start the application.
- Have the application count some fixed number of times that a pool can be re-used, and re-allocate the connection pool objects after that number of requests.
- Have the application monitor its virtual memory allocation and re-allocate the connection pool objects at a specified memory allocation threshold.
Push processing to the Analysis server.

Use connection string properties to push as much processing as possible to Analysis Services. For more information on pushing processing to the Analysis Services, see "Optimizing Clients for Slow Network Connections" later in this paper. In addition, avoid features that demand PTS resources, such as visual totals and user-defined functions, which must be resolved by PTS and cannot be pushed to Analysis Services.

Use a Load-Balancing Cluster

If your performance bottleneck is processor utilization on a single system, you can increase query performance by using a cluster of Analysis servers to service query requests. Requests can be load-balanced across two Analysis servers, or across a larger number of Analysis servers to support a large number of concurrent users (this is called a server farm). Load-balancing clusters generally scale linearly. Both Microsoft and third-party vendors provide cluster solutions. Microsoft's load-balancing solution is Network Load Balancing (NLB), which is a feature of Windows 2000 Advanced Server and Windows 2000 Datacenter Server. With NLB, you can create an NLB cluster of Analysis servers running in multiple host mode. When an NLB cluster of Analysis servers is running in multiple host mode, incoming requests are load balanced among the Analysis servers. When you use a load-balancing cluster, you should be aware that the query results cache on each of the servers in the load-balancing cluster will be different, resulting in differences in query response times from query to query by the same client.

A load-balancing cluster can also be used to ensure availability in the event that a single Analysis server fails. An additional option for increasing performance with a load-balancing cluster is to distribute processing tasks to an offline server. When new data has been processed on the offline server, you can update the Analysis servers in the load-balancing cluster using scripts or Application Center 2000.

If your users submit a lot of queries that require fact-level scans, a load-balancing cluster may be a good solution. Queries that may require at large number of fact-level scans include wide queries (such as top count or medians), and random queries against very complex cubes where the probability of hitting an aggregation is very low.

However, a load-balancing cluster is generally not needed to increase Analysis Services performance if aggregations are being used to resolve most queries. In other words, concentrate on good aggregation and partitioning design first. In addition, a load-balancing cluster does not solve your performance problem if processing is the bottleneck or if you are trying to improve an individual query from a single user. One downside to using a load-balancing cluster is the inability to use writeback, because there is no single server to which to write back the data. For more information on clustering, see https://www.microsoft.com/downloads/details.aspx?FamilyID=9989a445-142b-4872-ac68-2b50f05228e2&displaylang=en.

Place the Relational Database on the Analysis Server

You can place the relational database on the same computer as Analysis Services, or on a separate computer. When the relational database is placed on a separate computer, network traffic will increase. The effect depends on whether you use MOLAP, HOLAP, or ROLAP as your storage mode, as well as the speed of the network.

When a query is resolved against a MOLAP partition, the location of the relational database has no direct impact on query responsiveness because all queries are resolved against the MOLAP data structures. During the processing of a MOLAP partition, however, the location of the relational database can have a significant impact on processing performance, because all new data is loaded from the relational database.

With HOLAP, the location of the relational database has a direct impact on query responsiveness whenever a query must be resolved against fact-level data. The location of the relational database also has a significant performance impact during the processing of a HOLAP partition, because all new data is loaded from the relational database.

For ROLAP partitions, the location of the relational database always affects query responsiveness, because all queries are resolved against facts and aggregations stored on the relational database. However, the location of the relational database has minimal impact on processing performance, because all processing occurs on the relational server.

When there are sufficient resources on the server, placing the relational database and Analysis Services on the same server can increase performance, because network communication between computers is eliminated. However, there are some additional factors to consider, with regard to memory, processor resources, and network connections.

If there is insufficient memory or processor resources on the Analysis server for both Analysis Services and the relational engine, placing the relational database on a different server will increase overall performance with both MOLAP and HOLAP partitions. Insufficient overall resources is a common reason to separate the relational database and Analysis Services when working with large or complex cubes, or when the relational engine is performing other tasks that consume significant memory or processor resources. Even in cases where Analysis Services only contributes a small part of the overall server load, separating the Analysis server from the relational engine may improve query responsiveness because the queries will not have to compete with the relational engine for resources. In this case, the Analysis server could be placed on a smaller computer that is dedicated to Analysis Services and still achieve significant performance gains.

If you separate the relational database and Analysis Services, ensure the network connection between them is fast enough to increase performance, and tune each server to maximize network throughput. For example, consider using a fast, dedicated network segment between the servers and increasing the network packet size to increase throughput on a high-speed network. In addition, if you do not plan to enable write-back to the relational database, configure the driver to use a fire hose cursor.

Migrating the Repository

The Analysis Services repository is a small database where Analysis Services stores its meta data about databases, dimensions and cubes, and security. You can improve the performance of management tasks in Analysis Services if you migrate the meta data repository to a dedicated SQL Server database. By default, the meta data repository is stored in a Microsoft Access database. The Microsoft Access relational database engine is not as fast as the SQL Server relational database engine. When migrating the meta data repository, choose Native format for best performance — do not choose meta data services format.

Optimizing Clients for Slow Network Connections

When a client queries a cube in Analysis Services, the speed of the network connection can have a significant impact on the perceived response time. By default, the query resolution work is shared between the local version of PivotTable Service (PTS) on the client computer and PTS on the Analysis server. While no client optimization is generally required for clients connecting to Analysis Services over a fast network connection, you should consider one or more of the following steps to compensate for a slow network connection.

Install Service Pack 3.

SP3 includes significant performance enhancements for clients running on slow networks or WANs. PTS runs on the Analysis server and on the client, with much of the functionality running on the client side. To enable clients to benefit from these performance enhancements, SP3 must be installed on all clients running PTS to enable them to benefit fully from these improvements. You can load either PTSFULL.EXE or PTSLITE.Exe from the ..\msolap\install\PTS folder. PTSFULL contains PTS and the entire MDAC stack, whereas PTSLITE contains only PTS. If your clients are connecting using a Web browser through IIS, SP3 must be installed on the computer running IIS.
Use the Execution Location setting of 3 and the Default Isolation Mode setting of 1.

The Execution Location and Default Isolation Mode properties are always used together.

The Execution Location property controls whether a query is executed on the client or on the Analysis server. By default, the execution of the query is split between the client and the server in order to distribute the processing load across multiple computers. Using an Execution Location setting of 3 in the connection string causes most queries to be executed on the Analysis server. This setting also reduces the amount of data that is returned to the client over a slow network connection. In most cases, only the final result is returned to the client, and the client processor resources are not used to help resolve the query. The value of the Execution Location property is set in the connection string when a session is established. It can be changed during the session. However, every client using this setting adds to the load on the Analysis server.

The Default Isolation Mode property controls the refreshing of the cache on the client. Using a Default Isolation Mode setting of 1 causes the cache to be invalidated whenever a query statement is executed. When this setting is used in conjunction with the Execution Location setting of 3, you can ensure that most queries are resolved on the Analysis server rather than at the client. By default, PTS will use data in the local cache to resolve queries when possible, which conflicts with the Execution Location setting if you are attempting to execute as many queries as possible on the Analysis server. The value of the Default Isolation Mode property is set in the connection string when a session is established. This value can also be changed during a session.
Set an appropriate value for the Large Level Threshold property.

When PTS connects to a cube, dimension members in the cube are loaded into client cache. The Large Level Threshold property controls the point at which an intermediate or leaf level is too large to be sent to a client as a single piece. When the number of members in an intermediate or leaf level exceeds this threshold, only a portion of the dimension members are sent initially. Use this setting to reduce the amount of data transmitted to the client across the slow network connection. The default value for this property is 1000 and the minimum value is 10. However, decreasing the Large Level Threshold setting will increase the number of round trips to the Analysis server to load a large dimension, as more of the cube is browsed. The value of Large Level Threshold property is set in the connection string when a session is established; it cannot be changed during a session. You can also set the HugeLevelThreshold registry value on the Analysis server to specify the minimum number of members used to indicate a large level for all client connections. Finally, if you must browse very large dimensions over a slow connection, you will achieve better performance if you reduce the total size of the dimension and its members or use a terminal server solution.
Group measures by usage.

Group measures by usage. Place any measures that are not commonly used together in the same queries into separate physical cubes. Use a single virtual cube over the physical cubes. This technique minimizes the amount of network traffic sent between the client and the Analysis server. For more information, see "Use a Virtual Cube to Group Measures by Usage" earlier in this paper.
Submit MDX queries using TCP/IP.

In order to use HTTP to submit MDX queries, you must use an IIS server to submit the HTTP requests to Analysis Services using TCP/IP. This extra step in the middle tier introduces additional latency. Submitting MDX queries directly to the Analysis server using TCP/IP yields better performance.

Summary

There are many steps that you can take to improve both query responsiveness and processing performance. The most important steps you can take to improve performance with a well-designed cube are to design appropriate aggregations, use multiple partitions, use MOLAP, and eliminate joins. However, it is equally important to design a data model that can perform well, before ever rolling out to a physical implementation.

Appendix A: Resources

The following books, papers, Web sites, and courses are excellent resources for additional information about query responsiveness and processing performance in Analysis Services.

Books

MDX Solutions: With Microsoft SQL Server Analysis Services, George Spofford
Fast Track to MDX by Mark Whitehorn, Mosha Pasumansky, Robert Zare
Microsoft SQL Server 2000 Resource Kit, Microsoft Corporation
Microsoft SQL Server 2000 Bible, Microsoft Corporation

Papers and Links

Web Sites

Microsoft Official Curriculum Courses

Appendix B: Interpreting Performance Monitor Counters

Perfmon Object	Counters to Monitor	Description	Possible Conclusions or Actions
Analysis Server: Connection	Current connections	Number of client connections established	Monitor for changes over time; additional connections may require additional resources.
	Failures/sec	Rate at which user sessions request to connect but do not succeed	Too few worker threads can cause connection failures. In cases of failure, compare the Current connections counter to the maximum size of the worker thread pool. Connection failures may occur in benchmarking situations where the number of connections and rate of queries are set artificially high. Note that agents also require worker threads. When a user query is executed on the server instead of on the client computer, an agent performs that task. See "Optimizing Clients for Slow Network Connections" earlier in this paper.
	Requests/ sec	Rate of incoming connection requests	Monitor for changes over time; additional connections may require additional resources.
Analysis Server: Query1	Avg time/query	Response time in milliseconds, based on queries answered since the last counter measurement	Monitor for changes over time; increasing query times may suggest a need for new aggregations or a need for additional resources.
	Current queries	Number of queries being actively worked on	This counter should ideally be low because the goal with OLAP queries is to make them as fast as possible. If this value climbs, it's likely that the query response times are growing too.
	Data bytes/sec	Bytes read from the data file	Use these counters to look for very high disk activity. I/O to the data files and map files should be added together to see total file I/O activity. Be aware that requests satisfied by the file system cache are included in these counters.
	Data reads/sec	Number of logical read operations per second from the data file
	Map bytes/sec	Bytes read from the map file
	Map reads/sec	Number of logical read operations per second from the map file
	Bytes sent/sec	Rate of bytes sent from the Analysis server to clients in response to queries	If this counter is high, network traffic could be a bottleneck.
	Current worker thread pool	Count of threads in the worker thread pool	Worker threads are required to establish user connections and execute queries. One worker thread is required for each concurrent query or agent. If the worker thread pool is exhausted, new queries may fail. Increasing the worker thread pool in this situation can prevent such failures.
	Current worker threads active	The number of active worker threads
	Current process thread pool	Count of threads in processing thread pool	Process threads are required to execute queries that cannot be answered from the query results cache. Process threads fetch data from storage segments in the data files. A single query may use multiple process threads. If all threads in the pool are active, additional requests are queued. If the queue length is non-zero for a significant period, consider increasing the size of the process thread pool, but bear in mind that excessive threads will introduce excessive context switching. The queue may be more efficient than more threads. Faster disks or more processors may also help.
	Current process threads active	Count of threads actively working on queries
	Current process thread queue length	Current length of the queue for processing thread pool work items
Analysis Server: Agg Cache	Current bytes	Size of the query results cache	This counter is most useful if looked at over time to assess how the cache grows and shrinks compared to other uses of memory (such as processing). You should also compare the cache size over time to the cache hits and misses, to detect changes over time.
	Direct hits/sec	Rate of cache direct hits; queries answered from an existing cache entry	Direct hits and filter hits are the cases where the query results cache was successful at providing results for new queries. In an ideal case, direct hits and filter hits should be an increasing percentage of the number of lookups, bearing in mind that the cache is empty at system startup and is flushed when a partition is processed.
	Filter hits/sec	Rate of cache filter hits; queries answered by filtering an existing cache entry
	Evictions/ sec	Rate of evictions from the query results cache	Cache evictions are an indication of memory pressure. If the cleaner thread is evicting cache entries, increasing the amount of memory on the Analysis server and increasing the memory conservation threshold may improve query performance. If cache evictions are occurring during parallel processing, consider processing fewer partitions in parallel.
	Lookups/ sec	Rate of cache lookups	This counter enables you to put cache misses and cache hits in perspective (what percentage of lookups are successful?)
	Misses/sec	Rate of cache misses	Low values are better, but inevitably this counter will be highest at system startup and after processing a partition because all cache entries for a cube are flushed when a partition is processed. It is most useful to monitor this counter after the cache is warm.
Analysis Server: Proc2	Current partitions	The number of partitions in this phase of processing at one time	Processing partitions simultaneously requires additional resources. Monitor this counter in conjunction with counters indicating resource usage, particularly memory.
	Rows read/sec	The rate of retrieval of data from the relational database	If this counter is low, it may mean that the data source is delivering data slowly. Executing the same query that Analysis Services ran to fetch the data in another tool (such as SQL Query Analyzer) may help you determine the bottleneck.
	Rows created/sec	Rate at which aggregation rows are created	Rows created/sec may be lower than Rows read/sec if rows are being combined, because they belong in the same cube cell (all dimension members are duplicate). If these counters are different, it indicates that data is being combined. This may or may not be intentional in the design of the cube.
	Memory size bytes	Estimate of the size of the current rows in memory	These counters are the estimate of the current amount of process buffer space used for sorting. This is cumulative for all partitions processing at once.
	Memory size rows	Estimate of the size of current rows in memory
	File bytes written/sec	Rate of writing bytes to a MOLAP partition file	Monitor for changes over time.
	File rows written/sec	Rate of writing rows to a MOLAP partition file	Monitor for changes over time.
Analysis Server: Proc Aggs3	Current partitions	The number of partitions in this phase of processing at one time	Processing partitions simultaneously requires additional resources. Monitor this counter in conjunction with counters indicating resource usage, particularly memory.
	Rows created/sec	Rate at which aggregation rows are created	As rows are read in, they must either be placed in new rows of aggregations or merged into existing aggregation rows. A higher percentage of rows merged means that the overall aggregation set will be smaller. Monitor these counters in conjunction with the Temp file bytes written/sec counter.
	Rows merged/sec	Rate at which rows are merged or inserted into an aggregation
	Memory size bytes	Estimate of the size of the current rows in memory	These counters are the estimate of the current amount of process buffer space used for sorting. This is cumulative for all partitions processing at once.
	Memory size rows	Estimate of the size of current rows in memory
	Temp file bytes written/sec	Rate of writing bytes to a temporary file	This counter tells whether temporary files are being used to hold aggregations that don't fit in process buffer space. This counter should always be zero, if at all possible. Allow more space for process buffers and/or process fewer partitions in parallel so that temporary file I/O is not needed. When Analysis Services uses temporary files, performance slows dramatically.
Cache	Monitor the percentage of hits on the file system cache. A sudden decrease may mean there is not enough physical memory to support processes (including msmdsrv.exe) and to keep cube data in the file system cache.
Memory	Compare the available virtual memory to the amount in use. Watch for high page fault rates, which indicate memory paging. Use this object to determine how much memory is being used for the file system cache.
Network Interface	If network traffic exceeds one-half of the theoretical maximum for your network or network interfaces, this could indicate a network bottleneck.
Physical Disk4	Be sure that the byte rate and the request rate stay below the theoretical sustained maximums for your disks and interfaces. This can be hard to determine with modern controllers, RAID systems, striping, and mirroring. You may have to ask the hardware vendor what the sustained maximums are. Long-running queries accessing the disks are often a sign of design or aggregation issues.
Processor	During querying, processor usage should not stay near 100% for long periods of time. Processor usage should be very bursty (lots of spiking). Long-running queries consuming processor resources are often a sign of design or aggregation issues. During processing, processor usage should stay high while creating aggregations.
Process (msmdsrv.exe)	Monitor the virtual memory (virtual bytes) and physical memory (working set) used by the msmdsrv.exe process, as well as paging rates (page faults/sec). Paging should be nonexistent or very low once the process is fully started. If paging continues at high rates, Analysis Services does not have enough physical memory.

1 A query as seen by the server may not be the same as a query sent by the client application, because PivotTable Service (PTS) may respond to some client queries from the client cache while other client queries may result in multiple server queries. A different object, Analysis Server:Query Dims, tracks information about requests for dimension member information. This is not usually the source of performance problems, but it is worth monitoring if large dimensions are involved.

2 Processing MOLAP partitions is a two-phase operation. The counters in this object relate to the first phase, where data is read from the data source, sorted in memory, and saved (in a compressed format) to multidimensional storage on disk. These counters are not relevant to HOLAP or ROLAP partitions.

3 Processing MOLAP partitions is a two-phase operation. The counters in this object relate to the second phase, where data is read from multidimensional storage and aggregations are created (or the aggregation phase if HOLAP). They also apply to the creation of aggregations for HOLAP partitions. There is also a phase where indexes are created, with counters in the object Analysis Server:Proc Indexes, but there are so few issues related to indexing that we don't have recommended counters.

4 Physical disk counters are enabled by default in Windows 2000, but logical disk counters must be enabled using the diskperf utility.

Appendix C: How to Flush the Query Results Cache

You can intentionally empty the query results cache by pausing and restarting Analysis Services. You can also use the following script to pause a single cube and then flush the cache for that specific cube, rather than flushing the cache on all cubes on the Analysis server.

Note: To make this script readable, some lines were "wrapped". If you use this script, this can be easily fixed using your editor.

Option Explicit

'/*********************************************************************
'
' Invoke via DOS commandline:
'
' CScript flushcache.vbs  "<servername>" "<database_name>" "<cube_name>"
'
'/*********************************************************************

dim strMsg : strMsg = ""
Call FlushCache
If Len(strMsg) <> 0 Then Msgbox strMsg

Sub FlushCache

   dim strAnalysisServer : strAnalysisServer = "localhost"
   dim strOlapDB         : strOlapDB = "FoodMart 2000"
   dim strCube           : strCube = "Sales"
   
   dim dsoServer, dsoDb, dsoCube
   
   ' DSO constants
   Const stateFailed = 2
   Const processDefault = 0
   Const processFull = 1
   Const processRefreshData = 2
   Const processFlushCache = 3
   Const processResume = 4
   Const processSuspend = 5
   Const processRefreshDataAndIndex = 6
   Const processReaggregate = 7
   
   ' Connect to the Analysis server.   
   Set dsoServer = CreateObject("DSO.Server")
   dsoServer.Connect strAnalysisServer
   
   ' If connection failed, then end the script.
   If dsoServer.State = stateFailed Then
      strMsg = "Not able to connect to Analysis Server '" & strAnalysisServer & "'."
      Err.Clear
      Exit Sub
   End if
   
   ' Find the database on the server.
   If (dsoServer.mdStores.Find(strOlapDB)) = 0 Then
      strMsg = "Database '" & strOlapDB & "' not found on '" & strAnalysisServer & "'."
      Err.Clear
      Exit Sub
   End If
   Set dsoDB = dsoServer.mdStores(strOlapDB)
   
   ' Find the cube.
   If (dsoDB.mdStores.Find(strCube)) = 0 then
      strMsg = "Cube '" & strCube & "' not found in database '" & strOlapDB & "'."
      Err.Clear
      Exit Sub
   End If
   Set dsoCube = dsoDB.MDStores(strCube)
   
   ' Return the old state
   if ConvertState(dsoCube.State) <> "Current" Then
      strMsg = "Cannot flush cache. Cube '" & strCube & "' previous state was: _
       " & ConvertState(dsoCube.State) & "." & VbCRLF
   Else
      ' Flush the cube's query cache
      dsoCube.Process processSuspend
      dsoCube.Process processResume
      strMsg = strMsg & "   Cube '" & strCube & "' has it's query cache flushed." & vbCRLF
   End If
   
   Set dsoCube = Nothing
   Set dsoDB = Nothing
   
   dsoServer.CloseServer
   Set dsoServer = Nothing
End Sub

Function ConvertState(dsoState)
    Const olapStateNeverProcessed = 0
    Const olapStateStructureChanged = 1
    Const olapStateMemberPropertiesChanged = 2
    Const olapStateSourceMappingChanged = 3
    Const olapStateCurrent = 4
    Select Case dsoState
        Case olapStateCurrent
            ConvertState = "Current"
        Case olapStateMemberPropertiesChanged
            ConvertState = "Properties changed"
        Case olapStateNeverProcessed
            ConvertState = "Never processed"
        Case olapStateSourceMappingChanged
            ConvertState = "Source mapping changed"
        Case olapStateStructureChanged
            ConvertState = "Structure changed"
        Case Else
            ConvertState = "Unknown state"
    End Select
End Function

Appendix D: How to Tune the Process Buffer Size

Perform the following steps to tune the process buffer size on an Analysis server:

If you have 4 gigabytes (GB) or more of physical memory on the computer, you are running Microsoft Windows Advanced Server or Windows Datacenter Server, and either large dimensions or large process buffers are causing memory concerns, then enable the /3GB switch for the operating system and enable Analysis Services to use this additional memory. For more information, see "Configure Available Memory Effectively" earlier in this paper.
Set Performance Monitor to collect the Temp file bytes written/sec counter for the Analysis Services:Proc Aggs object.
Using Analysis Manager, configure the Analysis server properties to assign the temporary file folder (on the General tab of the Server Properties dialog box) to an unused physical drive and configure the process buffer size (on the Processing tab) to a minimal value, such as 32 MB.
Restart Analysis Services and then use Performance Monitor or Windows Task Manager to determine what the virtual memory usage stabilizes at for the Analysis Services process (msmdsrv.exe).
Process the cube or partitions under consideration and observe the Temp file bytes written/sec counter you added to Performance Monitor. Once the aggregation calculation phase starts, you will start to see I/O to the temporary files.
Gradually increase the process buffer size and re-process (restarting the Analysis Services service each time) until the Temp file bytes written/sec counter shows that the temporary file is not being used. Then increase the number by 10 percent. If the virtual memory allocation for the Analysis Services service exceeds the HighMemoryLimit threshold, increase that value as well.
Repeat these steps for any large partitions (or groups of partitions) to determine the best system-wide process buffer size.

Appendix E: A Checklist of Configuration Steps for an Analysis Server

Whenever you walk up to a new Analysis server, perform the following steps to ensure that the Analysis server is optimally configured:

If Analysis Services is the only application running on the Analysis server, you have more than 3 gigabytes (GB) of physical memory on the computer, you are running Windows Advanced Server, and either large dimensions or large process buffers are causing memory concerns, then enable the /3GB switch for the operating system and enable Analysis Services to use this additional memory. For more information, see "Configure Available Memory Effectively" earlier in this paper.
On a second physical disk, add a second page file that is the same size as the amount of physical memory on the Analysis server.
On the Processing tab of the Analysis Services Server Properties dialog box, increase your process buffer size to a minimum of 400 megabytes (MB), and then tune this later (see Appendix D, "How to Tune the Process Buffer Size").
On the Logging tab of the Server Properties dialog box, set up a processing log file. This will provide a history of all processing performed on the system. As a DBA, you will find it very useful to debug processing problems. For example, users start reporting problems on Wednesday, and you find that a new operator had inadvertently performed a full process of a changing dimension on Tuesday, but didn't tell anyone.
Ensure that you have an Error Reporting tab on the Server Properties dialog box. This is a new feature for SP3. This tab ensures that SP3 is installed, and enables the reporting of errors so that Microsoft can better support customers in the future.
Move your data folder to a hardware RAID array, or SAN device. This ensures that you get maximum disk I/O. On a production server, do not run the data folder on your C drive, to avoid conflicts with disk I/O that the operating system is generating.
Examine each cube on the Analysis server. If a cube has multiple partitions, ensure that data slices have been set up correctly and appropriately for the typical query pattern.
If you have existing databases, go to the data folder, and look at the disk space allocated to the .fact files. If individual fact files are over 30 GB, determine why they are not partitioned, or more highly partitioned. Partitioning is an important mechanism to ensure that you are getting fast, even response times from your queries. Ensure that the partitioning design matches user query patterns.
If you have existing databases that users have been querying, use the Usage-Based Optimization Wizard to check whether certain combinations of levels lack appropriate aggregations and are therefore causing poorly performing queries. Check these against any user reports of poorly performing queries. Consider adding aggregations based on Usage-Based Optimization Wizard.
Check the query logging frequency. Except when testing the system or building an initial query database, Analysis Services should not log more than one out of every 10 queries. In production, under heavy concurrent user load, reduce the logging frequency to one out of every 50 or 100 queries.
If you have existing databases, open each cube in Cube Editor. On the Tools menu, click Optimize Schema. Examine the resulting analysis of the join techniques. Ensure that the cubes have an optimized schema. For more information, see "Eliminate Joins Between Fact and Dimension Tables" earlier in this paper.
If you have existing databases, open each dimension in Dimension Editor. Examine each level. Check to see if the member keys are integer. If not, examine the design to see if you can make the member keys integer. If at all possible, do not use strings as member keys. For more information, see "Choose Appropriate Data Types and Sizes" earlier in this paper.
If you have existing databases, look at the number, size, and use of member properties. Remove member properties that are not currently being used, and minimize the size in bytes of those member properties that are being used, if possible. Carefully examine those member properties that are dependent on the member itself (that have a different value for every member). A member property that has a limited number of possible values, such as Male or Female, has little overhead, whereas a member property with a large number of possible values, such as a mailing address for every member, has much higher storage requirements and requires more memory to load.
If you have existing databases, examine the processing log file. Determine whether there are any SELECT statements used to populate cubes that are taking a considerable amount of elapsed time. If so, copy the SELECT statement out of the processing log file and run the query interactively in Query Analyzer. Examine the show plan and determine whether there is any RDBMS tuning that you can do to improve processing time.
Examine the RDBMS networking parameters, such as TCP/IP packet size and SQL Server TDS packet size, to determine whether increasing any of these parameters will improve network bandwidth utilization.

Appendix F: When to Consider Moving to SQL Server 2000 (64-bit)

When considering the use of 64-bit computers over 32-bit systems, examine the following characteristics of your existing system.

What are the largest dimensions? Are there many member properties? If large dimensions or many member properties are causing memory paging, or causing Analysis Services to approach the 3-gigabyte (GB) virtual address limit, consider that a 64-bit system does not have a 3-GB virtual address limit. Remember that all dimension members (and their member properties) in all databases must be loaded into memory when Analysis Services starts up.
Are there many databases on the same Analysis server (even if individually each database is small to medium)? Remember that Analysis Services loads all dimensions members from disk for each of the cubes in each database serially at startup time. This serial reading of dimension data for all cubes must complete before Analysis Services can begin responding to queries.
Do you have business requirements that result in users performing a high level of querying while partitions are being processed? A 64-bit system allows Analysis Services to have a large query results cache and large process buffers at the same time.
How large is the file system cache? It is very helpful when the file system cache is large enough to hold actively queried cubes in memory. A 64-bit system allows the operating system to have a larger file system cache than a 32-bit system can.
Are you unable to increase the size of the process buffer to a size large enough to avoid the use of temporary files during processing? If so (and you have not designed excessive aggregations), then a 64-bit system allows Analysis Services to have an extremely large process buffer size (10 to –20 GB), which can significantly improve your processing times.

While 64-bit systems provide a lot more memory and computing resources, there are presently some limitations. You need to ensure that you have a complete system with all of the prerequisite components available and in place. For example:

Do you have access to 64-bit OLE-DB providers for all of your data sources? SQL Server 2000 (64-bit) includes a 64-bit OLE-DB provider for SQL Server. If you are using other RDBMS products, such as Oracle, DB2, or Informix, to load data directly into Analysis Services, you need to verify that the prerequisite OLE-DB providers are available.
Do you have any user-defined functions (UDFs) or other components written in Microsoft Visual Basic®? Not all software available for a 32-bit platform is available for the 64-bit platform. A notable case in point is Microsoft Visual Basic 6, which is not available in 64-bit. You need to verify this in Analysis Services, SQL Server, and all other software that you will be using on the 64-bit system.
Is it a requirement to run Data Transformation Services (DTS) on the 64-bit system? The DTS runtime and its designer tools are not available on the 64-bit platform. You can read data from and write data to a 64-bit system from a 32-bit system running DTS, but the DTS packages themselves will not run on the 64-bit platform.