Performance of BI Systems Built with Microsoft SQL Server Accelerator for Business Intelligence

Article
01/28/2010

Updated : November 19, 2002

Len Wyatt

October 2002

Summary: This paper describes a number of performance experiments conducted on a business intelligence (BI) system built with Microsoft SQL Server Accelerator for Business Intelligence (BI Accelerator). Based on the results of those experiments, the paper makes recommendations for the operation of similar systems. It also suggests a methodology for conducting performance or system sizing experiments that are specific to customer implementations.

Many of the findings and recommendations contained here are applicable to other BI systems that use Microsoft tools, regardless of whether BI Accelerator is used to build them.

Introduction
Performance Measurements for BI Systems
About BI Accelerator
Hardware and Software Configuration and Test Data
Data Preparation Performance Tests
Query Performance Tests
Aggregation Designs
Recommendations for Deployments
Designing Performance and Scalability Experiments
Conclusions
Appendix A: Query Examples

Introduction

When a company uses the data from its operational systems to make business decisions, thats business intelligence (BI). Operational systems are used for order entry, inventory management, financial account tracking, airline reservations, and so on, depending on the industry. BI systems take data from operational systems and restructure it so users can discover previously unknown information about the business, its customers, its operations, or its products. We say take data from operational systems advisedly, because a companys operational system is typically not very good at providing business intelligence. Usually the data from an operational system is copied using Extract, Transform, and Load (ETL) functions to a separate database, which is structured and optimized for BI operations. This type of database is sometimes known as a data warehouse or a data mart.

An industry has developed that focuses on creating and supporting data warehouses and data marts. The teams creating them must have extensive business skills as well as deep technical expertise. Often, teams with these skills are found in consulting organizations, which can build these teams and deploy them for many customers, but even so, most BI systems take a long time to design, build, and deploy.

Microsoft SQL Server Accelerator for Business Intelligence (BI Accelerator) is a tool that allows these skilled teams to create BI applications much more quickly. BI Accelerator automates the creation of detailed schema designs and complex ETL operations, which are low in business value but vitally important to make the system work properly. Early evidence shows that BI Accelerator can shorten by three to four months in some cases. This time savings is made possible by Analytics Builder Workbook, a Microsoft Excel workbook that allows developers to design all aspects of a BI data model in one place, and the Analytics Builder utility, which creates the databases and ETL operations needed to implement the data model.

This paper addresses the following questions:

When an application is created with BI Accelerator, what performance can be expected on given hardware?
What settings do sites use and what guidelines should sites follow to get optimal performance?
What should sites consider when planning a new system?

Because BI Accelerator can be used to create many different designs, we cannot characterize every possible system. It would be a little like asking, How fast is a program generated by this compiler? Clearly it depends on the program being compiled. Still, we will investigate the most important performance aspects of a BI system.

In a sense, this paper is an extension of the documentation that is included with BI Accelerator. The paper contains performance-oriented recommendations and measurements that were not available when BI Accelerator was released.

Finally, a word about what this paper is not. This paper is not an exhaustive study of all possible configurations, nor does it contain specific configuration guidance that will define a system for deployment. We attempt to give general information that is helpful in planning a deployment, tuning the configuration, and avoiding pitfalls. The information and advice given here should be regarded as indicative, not definitive.

To get the most out of this paper, you should be familiar with BI concepts and with BI Accelerator. If you are not familiar with BI, the book Business Intelligence: Making Better Decisions Faster (Microsoft Press, 2002) provides a good business-oriented overview of the field. A more technical treatment is available in The Data Warehouse Lifecycle Toolkit (John Wiley & Sons, 1998). To learn more about BI Accelerator, visit https://www.microsoft.com/sql/ssabi/, read the overview, and most important, download BI Accelerator and review the documentation; it contains chapters about designing, deploying, and maintaining a BI application.

Performance Measurements for BI Systems

There are two key measurements for BI systems: the speed at which data is prepared for user queries, and the number of users that can query the system concurrently with acceptable response times.

Update Rate

The nature of BI requires that data must be moved from operational systems to BI systems (data warehouses, data marts, or decision support systems). The data is moved not just so the company has a separate copy to query against; through ETL operations, the data is copied into a database with a star or snowflake schema for both usability and performance reasons. ETL operations can involve large amounts of data, and often companies try to perform these tasks during off-hours to minimize the impact on other operations of the business. This presents a common operational question: can the daily (or weekly, and so on) updates be performed during the time available? To address this question, we will measure the update rate for the system. We measure the update rate by the number of fact table rows updated per hour, because fact tables are larger than dimension tables by orders of magnitude.

Active User Sessions

We build BI systems so that users can query them. There are many ways to think about query performance. OLTP benchmarks such as the TPC-C study the rate at which large volumes of simple queries are processed, measuring the number of queries per second. But these benchmarks do not address the needs of BI systems—BI queries are more complex and address much more data and they are read-only. In addition, the typical BI system supports a smaller set of users than an OLTP system, although deployments to thousands of users are increasing.

Although BI queries are larger and more complex, they must run in a short amount of time (preferably in less than a second) in order to encourage users to explore the data. Response time is a critical measure of the performance of a BI system under load. Many IT organizations have a service level agreement with users, stating that the average response time must be less than a certain threshold, such as five seconds.

Users tend to submit a query, consider the results for a time, and then submit another query. This time between queries (the "think time") limits the rate at which queries arrive from any user session. This allows us to consider the number of concurrent active user sessions that the system can support while meeting response time requirements as the key query performance metric. While we can consider a simple metric like queries per second, measuring concurrent sessions seems more useful.

So what "think time" should be used? Some users submit queries every few seconds, but others leave the session idle to answer the phone, have coffee, or attend a meeting. This makes the average think time quite long. In this paper, an average of 30 seconds is used. That time may seem long when users are actively using the system, but considering sessions that are on hold, its really quite short.

Measurements of query performance depend greatly on the actual queries chosen for the workload. The numbers reported in this or any performance study may be higher or lower than those seen in an actual deployment. Therefore, we recommend site-specific evaluation tests for large deployments. A methodology for such tests is discussed at the end of the paper.

About BI Accelerator

As noted earlier, this paper is not a replacement for the BI Accelerator documentation. What follows is a recap of some salient points that will help you understand the experiments described in this paper.

When the Analytics Builder utility runs, three databases are created:

The Staging database provides a temporary holding area for data arriving from operational systems before it is incorporated into the Subject Matter database. Sites must develop or purchase an ETL module to populate this database. When data has been incorporated into the Subject Matter database it is deleted from the Staging database. The Staging database is always located on the same server as the Subject Matter database.
The Subject Matter database is the long-term storage area for data. Tables and columns in the Subject Matter database are created in accordance with the definitions in Analytics Builder Workbook. The Subject Matter database has a snowflake schema—every level of each dimension has its own level table, which helps to enforce relational integrity. Dimension members in the Subject Matter database are assigned surrogate keys, which are different from the business keys (or codes) assigned by the operational systems. All rows in fact tables have corresponding keys in the dimensions, and this is also enforced by table constraints (dependencies).
The Analysis database is structured to support fast user queries through Analysis Services cubes, and also to implement calculations and named sets that are defined in Analytics Builder Workbook. The definition of this database is tightly linked to the Subject Matter database, which is the source of its incoming data. The Analysis database may or may not be on the same server as the Subject Matter database.

The Analytics Builder utility also creates a number of Data Transformation Services (DTS) packages, which manage the ETL operations that prepare data for user queries. Most packages are controlled by two top-level packages: Master Import and Master Update.

Master Import brings data from flat files into the Staging database, which is largely a bulk insert operation. We expect that most sites will use Master Import as an example when creating custom packages for populating the Staging database.
Master Update controls the packages that bring data from the Staging database into the Subject Matter database and the Analysis Services database. This part of the ETL process includes error checking, surrogate key assignment, history tracking, cube processing, and partition management. Because Master Update and its subordinate packages do bulk of the work to prepare data for user queries, this paper focuses primarily on the performance of Master Update.

BI Accelerator embodies a number of best practices that are generally recommended by the BI Practices team at Microsoft. Thus, certain functionality is built in by default, along with its corresponding performance implications—positive or negative. For example, it has long been recommended that Analysis Services cubes be partitioned and that the relational fact tables be partitioned in a corresponding manner to aid management. Partitioning can help both processing and query performance. In the past, partitioning has been tedious to implement, requiring attention to the relational database, to Analysis Services, and to the DTS packages that manage the flow of data. BI Accelerator takes care of all that in a consistent manner when you use Analytics Builder Workbook to set the Partition by Months property of a cube to TRUE. This is a huge benefit for the large majority of sites that partition by time. On the other hand, BI Accelerator does not presently implement partitioning by any other dimension, although Analysis Services clearly supports this.

Some other design implications that affect performance include the following:

Indexes and constraints are placed on Subject Matter database tables before data is loaded.
Indexes are not maintained on the fact tables. Because user queries are directed to cubes rather than relational tables and the primary purpose of the Subject Matter database is to maintain a clean and consistent dataset to feed into Analysis Services, indexes are not necessary. The fact tables are loaded more quickly as a result, and the space savings are significant.
Master Update and its subordinate packages perform extensive data validation. While this can be turned off, it is generally recommended that sites leave it in place. For more information see "Enabling Error Checking" later in this paper.
Fact tables in the Subject Matter database can be partitioned by month and their corresponding cube partitions can in principle, be processed in parallel. Master Update does not implement this level of parallelism, although multiple fact tables are processed in parallel. Partition-level parallelism could be a local site customization.
BI Accelerator does not have information about how much data will be loaded into dimensions and fact tables. Without member counts and fact table counts, it cannot instruct Analysis Services to design aggregations for cubes. The Partition Aggregation utility is provided to assist with this task, which is generally performed after data is loaded.

Hardware and Software Configuration and Test Data

This section describes the software and hardware configuration and the sample data with which the tests were performed.

BI Accelerator

While the purpose of BI Accelerator is to facilitate rapid custom deployments, two example schemas are provided with BI Accelerator, one for Sales and Marketing Analytics (SMA) and one for Retail Analytics (RA). The work in this paper is based on the SMA schema, but with the following changes:

Additional member properties have been added to the Customers dimension. The Unisys Corporation provided a data generator that we used to create various size datasets for this schema, but with the additional properties. This slight alteration is not relevant to the tests this paper describes.
The named set Top 5 Customers was removed from the Sales cube—the reasons for this are explained later in this paper in "Query Performance Tests."
The Enable Aggregations property has been set to Yes for all levels of the Time dimension in all cubes. For sites using the 1.0 release of BI Accelerator, this change is strongly recommended. The change is included in version 1.1.

The SMA schema has a Time dimension on which several virtual dimensions are based (for example, Day of Week) and a number of multi-level dimensions: Sales Force, Product, Geography, Customer, and Campaigns. In addition, the schema includes several flat dimensions containing a single level: Sales Type, Order Type, Order Status, Forecast Scenario, Quota Version, Lead Quality, and Customer Status.

Seven physical cubes and six virtual cubes are created based on these dimensions. The physical cubes are Sales, Orders, Backlog, Forecast, Quota, Mktg Campaign, and Campaign Leads. The virtual cubes are Forecast and Sales, Orders and Backlog, Orders and Sales, Sales Force Performance, Mktg Campaign Analyses, Sales vs Forecast, and Quota.

One set of dimension data was used for all tests. The largest dimension was the Customer dimension, with 1 million members.

Test Data

We used the Unisys data generator to create data for the Sales, Orders, and Backlog fact tables, which were used for the tests described in this paper. Three datasets were used, a 10 million row database, a 100 million row database, and a 1 billion row database. The following table shows the number of rows per fact table in each of the three databases.

Table name	10 million row database	100 million row database	1 billion row-database
Fact_Backlog	3,850,466	38,146,462	382,756,178
Fact_Orders	3,265,839	32,369,806	324,823,696
Fact_Sales	2,955,561	29,291,120	293,907,214
Total	10,071,866	99,807,388	1,001,487,088

At query time, a mix of queries was used. Five-hundred user sessions were created from 23 template queries. Each user session used a query from each template but with some changes, so that the queries from various users were different. This prevented Analysis Services from answering all of the queries from its query result cache. The cache is a tremendous benefit in normal operations, but when you are collecting performance data, you must find a way to work around the cache, or you will get unrealistically fast response times. The following two simple queries were created from the same template and appear in different user sessions. Each query selects a different measure from the Sales cube and shows the value of that measure for some month and for all industries:

SELECT { [Time].[Standard].[Quarter].[Q1 - 2000] } ON COLUMNS ,
{ [Customer].[Direct].[Industry].members } ON ROWS
FROM [Sales]
WHERE ( [Measures].[Actual Discount %] )
SELECT { [Time].[Standard].[Quarter].[Last Year]} ON COLUMNS ,
{ [Customer].[Direct].[Industry].members } ON ROWS
FROM [Sales]
WHERE ( [Measures].[Actual Avg Gross Margin] )

Although the queries are similar, they refer to different quarters and different measures. Note that while one query refers to a specifically named quarter, the second query refers to a calculation—the quarter from last year that corresponds to the current quarter.

Each of the 23 templates contains variations that prevent the server from answering all queries based on previously cached results. The queries address a wide range of dimension levels and Multidimensional Expressions (MDX) functionality. They are intended to reflect real queries that users might ask against the SMA model. For additional query examples, see Appendix A.

Hardware

The servers for this performance study are standard four-way computers, set up in a pair so that one server supports the Subject Matter and Staging databases and runs the DTS packages, and another runs the Analysis server. We chose this configuration for two reasons:

First, it separates most data preparation tasks from query support so that new data can be prepared in the Subject Matter database without having an impact on user queries. This setup seems like a reasonable real-world deployment scenario.
Second, we could more easily measure the resources used by the SQL Server relational engine and Analysis Services.

The RDBMS server is a Compaq 5800 with four 700-megahertz (MHz) CPUs, 8 gigabytes (GB) of memory, and 880 GB of usable RAID-5 disk space using a StorageWorks controller. The Analysis server is a Compaq 5800 with four 700-MHz CPUs, 4 GB of memory, and 406 GB of usable RAID-5 disk using a StorageWorks controller. Note that while these are capable systems, the state-of-the-art in processor speeds has advanced since we obtained these systems. Today, systems are available with CPU speeds more than twice as fast as the CPUs we used in this study.

In addition, four client computers were used to submit queries to the Analysis server. These are also four-CPU computers, each running up to 125 client sessions. For a query performance test, the client computers must not become bottlenecks in the system.

Software

The software configurations on the computers met the requirements given in the BI Accelerator documentation. In short, the computers were running Microsoft Windows 2000 Enterprise Edition, SQL Server 2000 Enterprise Edition with Service Pack 2 (SP2), and Excel 2002.

The SQL Server databases were installed on the RDBMS server, and both SQL Server and Analysis Services were installed on the Analysis server. The Analysis Services repository was migrated to SQL Server as recommended in the BI Accelerator documentation.

In the RDBMS installation, the database logs were placed on a separate single-disk drive from the database files, following typical recommendations. All other configuration settings and options were typical default values.

On the Analysis server, the /3GB switch was set in the operating system, allowing a process access to a 3-GB address space (the default is 2 GB). The Analysis Services memory conservation threshold was set to 3 GB as well, although as noted in the BI Accelerator configuration documents, this value cannot be set in the user interface and must be set through the registry. The Analysis Services process buffer size was set to 1 GB to benefit processing performance. This buffer size is larger than most sites require. No test was performed to find the optimal size.

In the Analysis Services installation, the temporary files directory was placed on a separate single-disk drive. These temporary files are used during processing if the aggregations being computed do not fit in the allocated process buffer space. Then the aggregations are spooled to a temporary file, and are read back to be merged with data from additional rows and written to another temporary file. Whether this configuration with a single-drive temporary directory is beneficial is unclear, because a single drive spends much time seeking, with both reads and writes occurring. This was not relevant to the performance work described here, because the process buffer was set to allow growth to 1 GB, making temporary files unnecessary. Using a process buffer large enough to avoid the use of temporary files is recommended for Analysis Services.

To support high degrees of query concurrency, especially when inadequate aggregation designs were used and the server was overloaded, two thread count settings were changed for Analysis Services. These values are set using the registry editor; they are not available through the user interface: PoolProcessThreads was set to 600, and PoolWorkerThreads was set to 500.

All computers on the Microsoft corporate network, including the ones used for these tests, run an anti-virus utility. There is no way to control when this software runs, short of disabling it. The presence of the anti-virus software introduced some noise when we ran the performance tests. This noise doesn't appear to have affected the results, but it may explain certain variances in the data.

Data Preparation Performance Tests

This section deals with the operations that are required to prepare data for users to query. These operations include running Master Import and Master Update and processing Analysis Services cubes (which is generally performed by Master Update also).

Although preparing dimensions is a crucial part of what Master Update does, dimension tables are typically smaller than fact tables by orders of magnitude. Therefore, the emphasis of this section is on fact table processing. All speeds reported in this section are based on fact table rows, unless otherwise noted.

Because Master Import is largely a template from which sites design custom import processes, this paper does not spend much time on the performance of Master Import. There is another reason for this: because Master Import essentially launches bulk import tasks, it runs very quickly compared to Master Update and is unlikely to be a bottleneck. For example, Master Import ran almost four times faster than Master Update on the 100 million row database.

Cube processing is included in the tasks performed by Master Update, but cube processing is not the bottleneck: cube processing performs at more than twice the speed of Master Update. Thus, the RDBMS operations performed by Master Update deserve the most careful characterization. The following table shows the number of rows processed per hour during import, update, and cube-processing operations.

Operation	Rows processed per hour
Import 100 million rows	95,364,238
Update 100 million rows	22,932,858
Process cubes serially	51,027,640

Table-Level Parallelism in Master Update

Master Update works with dimensions in parallel, and when all dimensions are complete, it works with fact tables in parallel. In the SMA sample data used here, the Sales, Orders, and Backlog fact tables are roughly the same size: they contain 29 percent, 32 percent and 38 percent of the rows, respectively. For a large part of the time that Master Update spent working on fact tables, these tables were handled in parallel. But what if the fact tables were less balanced in size, such that a single fact table dominated in row count? To test this, the 10 million row database was used and a similar database was created with only rows in the Sales fact table. The following table shows the speed of Master Update in both scenarios. Clearly, the parallelism was a substantial help in processing this data.

Number of fact tables	Rows processed per hour
Single fact table	10,587,084
Three fact tables	20,156,775

Figure 1 shows CPU usage during a run that loaded three fact tables in parallel, followed by a run that loaded a single fact table. Note that during the single-table test, the workload was slightly greater than a single processor could handle without becoming a bottleneck (remember that this is a four-CPU server, so 100 percent usage means that all four CPUs are in use). This suggests that the work saturates one CPU, and the others arent helping substantially. A greater number of CPUs are involved when multiple tables are prepared in parallel.

Figure 1: Processor usage when Master Update loads fact tables

Partitioning

Partitioning can have great performance advantages for Analysis Services and make BI data easier to manage. When the Partition by Month property is set to TRUE for a cube in Analytics Builder Workbook, the Analytics Builder utility creates packages that are implement partitioning and create a separate fact table in the Subject Matter database for each months data. The fact table is created on-the-fly as Master Update processes data for each new month. The Analysis Services cubes have partitions that map to these fact tables.

Note: For detailed information about partitioning in a SQL Server data warehousing environment, see the white paper, Using Partitions in a Microsoft SQL Server 2000 Data Warehouse, which is available on the MSDN site.

While partitioning is good for Analysis Services cubes, it introduces extra work for Master Update, which scans the fact table in the Staging database once for each partition, to determine which rows belong in each partition table. Scanning does not produce significant overhead when data for a single partition is included in the table in the Staging database, as is the case for a typical incremental update. But the overhead can be significant when Master Update processes a large initial load, with many months of data in the same table.

The 100 million row database contains data representing 48 months; thus, many passes over the fact table in the Staging database are required. This represents a worst-case scenario for partition processing. Even though Master Update always incurs a cost when partitions are used, the cost is rarely as extreme as in this example. In an incremental update scenario, any one update would probably have data from a single month. Thus instead of 48 passes over the fact table, there would be one. The following table shows the number of rows per hour that Master Update processes when moving non-partitioned and partitioned data from the Staging database to the Subject Matter database.

Configuration	Rows processed per hour
Standard	23,523,262
Partitioned	8,081,715

Because partitioning has important benefits, it is worth expending some effort to find other approaches that might relieve some of this overhead in Master Update.

One technique that can help performance is to presort the data by time. The data in the following table, obtained by running Master Update on the 10 million row database, shows only a slight improvement.

Configuration	Rows processed per hour
Partitioned	5,966,192
Partitioned and sorted	6,325,778

Another approach might be to add a clustered index on the time key field of the fact table in the Staging database, which should enable Master Update to select the rows that belong in each partition table in the Subject Matter database more quickly. A clustered index slows Master Import somewhat, but its impact can be minimized if you presort the data.

As noted earlier, this issue is primarily a concern for large loads of historical data and not for ongoing incremental updates. For these large historical loads, another approach is available if the source data can be obtained on a monthly basis. For each month, you could run Master Import and then Master Update on just that months data, eliminating the need for Master Update to perform multiple scans.

Batch Size and Error Checking

Certain global variable settings can improve the performance of Master Update. You can set global variables in configuration sets using the Global Variable Configuration utility, or you can pass them to Master Update as command line arguments. The global variable giCommitBatchSize controls how often data is committed to the database and the global variable giProgressRowCount determines how often progress is reported in logs. By default these are both set to 1,000, but for this experiment they were set to 10,000.

The setting gbDontCheckErrors determines whether relational integrity checks are performed on incoming rows. This is only one type of error checking BI Accelerator performs, but it affects performance because every row is checked. By default, the natural keys (or codes) of each fact table row in the Staging database are checked for consistency against the corresponding dimensions, and bad rows are copied to an error table. If this is not done, the bad rows still do not arrive in the Subject Matter database because table joins filter them out, but error checking allows you to spot bad data easily by using the Browse Batch utility and repair it by referencing the error table. You should turn off error checking only when there is absolutely no doubt as to the integrity of the incoming data. When the value of gbDontCheckErrors is set to TRUE, error checking is disabled, which saves some time.

The following table shows the results of using these settings on the 10 million row database with partitioned cubes. Note that in the experiment with larger batch sizes, both giCommitBatchSize and giProgressRowCount were set to 10,000.

Settings	Rows processed per hour
None	5,966,192
Batch size set to 10 kilobytes (KB)	6,198,347
Error checking turned off	6,562,158
Batch size set to 10 KB and error checking turned off	6,449,301

While increasing the batch size made some difference, disabling error checking was more significant at around 10 percent improvement. Using both techniques at once did not result in further improvement.

Note that while increasing the batch size has a small effect on Master Update, it has a major impact on Master Import. Although Master Import is already very fast, increasing the batch size is a way to make it faster. The data in the following table results from running Master Import on the 10 million row database.

Batch size	Non-partitioned database	Partitioned database
Batch size set to the default	52,023,121	70,312,500
Batch size set to 10 KB	93,264,249	122,448,980

Preallocated RDBMS Storage

By default, the Staging database and Subject Matter database are initially fairly small at 50 megabytes (MB) each. It is expected that the files for these databases will autogrow to the appropriate sizes, and the database administrator must ensure that adequate space is available. A common technique for speeding up RDBMS insertion of new data is to ensure that sufficient file system storage is already allocated to the database, so that it doesn’t have to grow during normal operation. This was tested on a 10 million row partitioned database—we used the ALTER DATABASE statement to set the size of the Subject Matter database prior to running Master Update. The following table shows the processing speed of Master Update both with and without pre-allocated space for the Subject Matter database.

Configuration	Rows processed per hour
Partitioned only	5,966,192
Partitioned and space preallocated	6,517,017

Update Rate and Database Size

Most of the data preparation measurements given in this paper were taken based on a rather small (10 million row) database. With a small dataset, its easier to perform many varied tests because a configuration can be run every few hours. But it is vital to note that throughput is higher with larger datasets—to a significant degree. The following table shows the speed of Master Update, in rows per hour, for various size databases. Part of the performance gain with larger databases can be explained by overhead costs that average out over a greater number of rows in the larger databases. But the gains are more dramatic than the averaging of overhead costs can explain. The results appear to show that the self-tuning features of SQL Server work over time, although the exact reason for these benefits has not been investigated.

Database size	Partitioned	Standard
10,000,000 rows	5,966,192	20,156,775
100,000,000 rows	7,973,775	22,932,858
1,000,000,000 rows		27,161,612

Aggregations

The number and size of the aggregations included in Analysis Services cubes have a clear impact on the time required to process the cubes or cube partitions. To show the effect of aggregations separately from the rest of the Master Update tasks, we varied the aggregation designs and processed the cubes using Analysis Manager (and therefore, sequentially, not in parallel). We used three aggregation designs: no aggregations (the default), a design for a 10 percent performance gain, and a design based on usage-based optimization (which we designed using the Usage-Based Optimization Wizard in Analysis Manager and propagated to partitions using the Partition Aggregation utility, as recommended later in this paper).

Note: In illustrations that show how aggregation design affects performance, the abbreviations Agg0, Agg10, and AggUBO are used to represent no aggregations, the 10 percent performance gain design, and the usage-based optimization design, respectively.

Figure 2 shows the processing rates for the various aggregation designs. For this test, we used partitioned and non-partitioned 100 million row databases in which the Backlog, Orders, and Sales tables were processed sequentially.

Figure 2: Rows processed per hour—various aggregation designs

Two points arise from this data:

Non-partitioned (actually, single partition) cubes are processed more quickly than partitioned cubes, until the number of aggregations grows large enough so that the overhead of computing the aggregations exceeds the overhead of extra partitions. Designs that use partitions often require fewer and smaller aggregations. In a cube with monthly partitions, aggregations that slice time by month or year are unnecessary, and the aggregations in each partition are smaller because they contain fewer rows. This is illustrated by the aggregation design for a 10 percent performance gain shown in Figure 2.
The main justification for usage-based optimization is to pre-compute the aggregations that are most beneficial to user queries, but there is another important benefit: the system does not have to compute aggregations on speculation that users might use them. Instead, the system can more accurately determine which aggregations are useful and not compute extra aggregations. You can see this by comparing the 10 percent performance gain design to the usage-based optimization design: the usage-based optimization design processes data faster. The benefit of usage-based optimization increases as the amount of data increases, and the difference is greater still if you compare a usage-based optimization design to a design with a higher degree of performance gain than 10 percent.

Another interesting point is illustrated in Figure 3, which shows the processing rates for partitioned cubes of various sizes in which the Backlog, Orders, and Sales tables were processed sequentially. As the sizes of the cubes increase, so do their processing speeds. At an improvement of nearly 100 percent, this improvement is caused by more than just amortizing the overhead over a greater number of rows—more likely, the self-tuning features of SQL Server are coming in to play, so that data is delivered from the Subject Matter database faster. An important lesson here is to not extrapolate processing times from small cubes to large ones. You should conduct processing performance experiments at the expected scale of operation, and with a best approximation of a real aggregation design.

Figure 3: Rows processed per hour—various size databases

Resource Usage

The most important resource to consider when planning a BI system deployment is disk storage space—you need enough to hold all of the data. Once sufficient storage space is assured, you can consider other system resources, particularly CPU time, memory usage, disk I/O bandwidth, and network I/O bandwidth.

Storage Requirements

Because fact tables are generally larger by orders of magnitude than dimension tables, this section focuses on the storage requirements for fact tables. The following table shows the storage space used in various locations to hold the same data during different phases of its handling. The values don’t vary significantly with the size of the database, but the size of Analysis Services cubes increase if excessive numbers of aggregations are computed. The numbers presented here are based on the SMA schema and can increase or decrease when the number of keys and measures in fact tables, and their data types, changes in an actual deployment.

Location of data	Bytes per row
Flat files	103
Staging database fact tables	195
Subject Matter database fact tables	142
Analysis Services cubes: Agg0	33
Analysis Services cubes: AggUBO	38
Analysis Services cubes: Agg10	65

Data is held in flat files before Master Import loads the data into the Staging database. The flat files are a transitory storage location—we assume that sites will not retain data in this form. Once the data is loaded into the Staging database, the flat files can be deleted. Note that sites that replace Master Import with ETL processes of their own may not need flat files from which to import data. In flat files, a row of fact data occupies about 103 bytes.

Data is held in the Staging database until a Master Update batch runs to prepare the data and move it into the Subject Matter database. The Staging database is also a transitory storage location—data should be removed from this database after it is successfully moved into the Subject Matter database. Thus, data does not accrue here over time. In the Staging database, a row of fact data occupies about 195 bytes.

That same data occupies about 142 bytes per row after it is moved into the Subject Matter database. The difference in size is primarily because the Staging database fact tables still have production keys, or codes, on the rows, which are character fields. Numeric surrogate keys are substituted in the Subject Matter database, saving considerable space. No indexes are created on fact tables in either database, so no space is required for indexes. The Subject Matter database is the long-term holding place for data in a system built with BI Accelerator.

That same data occupies less than 40 bytes per row when processed into Analysis Services cubes while small numbers of aggregations are stored. This huge efficiency in space usage is because of the compression that Analysis Services performs on data that it stores and on the unique indexing mechanism it uses. The space required increases as larger numbers of aggregations are stored, as evidenced by the aggregation design for a 10 percent performance gain. This data was collected using the 1 billion row database, and represents the largest set of aggregations used in this study. The storage efficiency of the usage-based optimization design is another argument for using that approach.

CPU Usage

Figure 1, which is shown earlier in this paper, shows higher CPU usage while processing multiple fact tables in parallel. This resulted in higher processing throughput.

Unless other resources become bottlenecks, it is reasonable to expect that newer processors can substantially increase the rate at which Master Update runs. While this work was done with 700-MHz processors, systems easily twice that fast are now available.

While the Analysis server has bursts of CPU activity, particularly when computing aggregations, the Analysis server CPU activity is low compared to the activity on the RDBMS server when Master Update runs. When the RDBMS and Analysis Services are installed on separate servers as they were for these experiments, the Analysis server is unlikely to become a bottleneck when Master Update runs unless an excessive number of aggregations are used. Although two computers do not appear to be required for good Master Update performance, splitting the installation between two servers can be beneficial if users are querying the system while updates are occurring.

With regard to CPU usage while running Master Update, there seems to be little benefit in placing the RDBMS and Analysis Services on separate computers.

Query Performance Tests

Having completed a substantial amount of work measuring the performance of the system while it prepares data for querying, we now arrive at the actual purpose of a BI system—to let users query data. As with any complex system, the query response times depend on a number of factors that are sometimes difficult to isolate. The following sections attempt to separate them as much as possible.

For a description of the query workload, see "Hardware and Software Configuration and Test Data" earlier in this paper.

Aggregation Designs

The use of pre-aggregated summaries of data is one of the key techniques that make online analytical processing (OLAP) systems fast, and Analysis Services uses this technique extensively. Analysis Services provides two mechanisms that you can use to determine the aggregations that are computed: the Storage Design Wizard and the Usage-Based Optimization Wizard.

The Storage Design Wizard applies an algorithm to determine what aggregations are necessary. The algorithm makes these determinations based on the member counts of all levels in the dimensions and the row counts in each cube or partition. This algorithm does not have information about the queries that are likely to be submitted to the server; it assumes that a query with any combination of levels is as likely to be submitted as any other. Using the row counts, the algorithm attempts to determine a set of aggregations that provide an appropriate gain in query performance or use a given amount of storage space. This algorithm is totally dependent on the row counts. Therefore, setting good values for the dimension and partition row counts is vital. This information is not available in the Analytics Builder utility, which is why cubes created with BI Accelerator initially have no aggregations. It is assumed that you will use the Partition Aggregation utility to provide accurate row counts after the cubes are populated with data, or that you will use the tool to input estimated row counts. You can also use the Partition Aggregation utility to design aggregations. The Partition Aggregation utility creates aggregations using the same algorithm as the Storage Design Wizard.

The second mechanism is the Usage-Based Optimization Wizard in Analysis Manager. The Usage-Based Optimization Wizard also requires row counts, but instead of assuming that any combination of levels is equally likely, the Usage-Based Optimization Wizard uses information from a query log maintained by the Analysis server. This log records the levels of queries that come in to the server, but it does not record the queries themselves. Using this information, the Usage-Based Optimization Wizard designs an optimal set of aggregations for the query pattern recorded in the log. Because query patterns can change over time, you should re-evaluate usage-based optimization aggregation designs from time to time. As noted earlier, besides designing the optimal aggregations for a query pattern, you can also use the Usage-Based Optimization Wizard to make sure that the system does not design aggregations that won't be used saving processing time and storage space.

In this study, three aggregation designs were used: no aggregations, a design for a 10 percent performance gain as defined by the Storage Design Wizard and the Partition Aggregation utility, and a usage-based optimization design for a 100 percent performance gain. The third aggregation design was obtained by logging all queries while running the first two designs, and then doing a usage-based optimization design using that log.

Figure 4, which illustrates query response times in seconds for a non-partitioned 100 million row database, shows the system functioning as expected: the aggregation design for the 10 percent performance gain was faster than having no aggregations, but by much more than 10 percent. This is typical. The usage-based optimization design was by far the fastest. Also, as the number of concurrent users increases, the average query response time generally slows. When operating with a usage-based optimization aggregation design, though, 500 concurrent users were not enough to cause an increase in response times. Lab resources were not sufficient to increase the query workload further.

Figure 4: Query response times—100M row non-partitioned database with various aggregations and numbers of users, cold cache

Note that the single-user response time is slower than the response time for 100 users. This is a cold-cache scenario. Analysis Services was restarted before the queries were run, so that the query result cache would not contain helpful information. The data suggests that it may be worthwhile to run some typical queries after restarting an Analysis server to populate the query cache and to incur any other transient costs, such as bringing data into the file system cache. Then the first users to submit queries won’t have to incur those costs.

Partitioning and Database Size

Figure 5 shows the average query response times (in seconds) against partitioned and non-partitioned versions of the 10 million row database with no aggregations. Even with no aggregations, this database offers exceptional response times. The server is able to contain the entire (compressed) Analysis database in the file system cache, so disk I/O is not required to respond to a query once the data has been read in. Larger databases require more attention to aggregation design and other tuning issues.

Figure 5: Query response times—10 million row partitioned and non-partitioned databases with no aggregations, cold cache

Now look at the results for the 100 million row database, which are shown in Figure 6. Because the database no longer fits in memory, aggregation design and partitioning are much more important. Both techniques offer significant benefits in query performance. Note that while the 10 percent performance gain aggregation design appears adequate for this size database, a 20 percent performance gain design requires extraordinary amounts of time to process—operating with this number of aggregations is impractical. To improve query performance, usage-based optimization is essential.

Figure 6: Query response times for various aggregation designs—100 million row partitioned and non-partitioned databases, cold cache

Finally, Figure 7 shows the results for the 1 billion row database. At this size, we assume that the only practical way to operate the system is by using partitions. From a query performance point of view, it is not reasonable to leave aggregations off or to use the 10 percent performance gain design. Only usage-based optimization provides good response times—and those are excellent. You have to look at the numbers to see the usage-based optimization response times; they arent visible on the chart.

Figure 7: Query response times for various aggregation designs—1 billion row database, cold cache

Cold Cache vs. Warm Cache

Figure 8 shows the effect of running the queries when the Analysis Services query result cache was cold and running the same queries when the cache was warm. The warm cache results were obtained by re-running the same query set in a new set of user sessions, but without restarting the Analysis server.

The cache can make query responses nearly instant. However, it would be unusual to find all of the results available as they were in this test. In practice, average response times are somewhere between the cold cache scenarios presented here and the warm cache scenario. How many queries find results in the cache depends on how similar the query patterns of users are.

It is a good idea to run a set of known typical queries when you restart the system for the purpose of warming the query result cache.

Figure 8: Query response times, cold cache vs. warm cache—100 million row partitioned database

Query Complexity

The template queries used for this experiment span a range of complexities, from fairly simple to somewhat complex. Some require simple slicing of the data along various dimensions, some require cross-joining dimensions, and some require sorting sets of data. Certain aspects of query complexity have been generally observed to require more resources and take longer to execute. These include:

Topcounts, sorting, and so on. Many users want to see the top five customers, the top ten products, and so on. This is reasonable, but these queries require enumerating all customers or products and sorting them in order to determine the top ones. Sorting sets is reasonable; sorting very large sets takes longer.
Cross-joining large sets. Cross-joining sets is a normal and essential part of an analysis. However, cross-joining very large sets requires substantial computation. For example, a query that cross-joins all destination airports with all origin airports, when there are tens of thousands of airports, takes longer to run.
Measures requiring fact-level detail. Measures that require data from the fact level cannot be aggregated to higher levels, and therefore take longer to compute. The average (arithmetic mean) of a measure is easy to compute because the sum of the measure and the count of items can be aggregated and the mean can be computed at any level from those aggregates. The median, however, is not so simple. It requires enumerating all individual fact measures, sorting them, and finding the middle item of the set.
Bringing large dimension sets to the client. Queries that require large numbers of dimension members to be brought to the client computer run more slowly. This is not stressful for the server, but the latency that occurs when information is sent over the network slows the response time.

The results shown here represent an average of all query response times, and do not distinguish queries by their complexity. (The query results were analyzed by using averages computed in an Analysis Services cube. The results are sliced by database size, aggregation design, and so on.)

Named Sets

Named sets are useful for predefining frequently-used collections in Analysis Services. For example, BI Accelerator defines a set, Standard Last 4 Quarters, that can be used by any query to refer to the last four quarters, regardless of when that query is run. The actual contents of the set are determined dynamically by the system by checking the current date in the Time dimension. Named sets are evaluated on behalf of a user session at the time of connection to the server. That means that all named sets are evaluated whether they are used or not.

For this reason, some types of functions should not be included in sets defined on a system-wide basis. An example of this type of function is included in the SMA schema: the Top 5 Customers set on the Sales cube, which is shown in the following code:

TOPCOUNT(  { DESCENDANTS(  [Customer].[Direct].[All Customers],
                           [Customer].[Direct].[Customer] ) },
5,
( [Time].[Standard].[Year].[Current],
  [Measures].[Actual Net Sales Amt] )
  )

To evaluate this set, the system must total all sales for the current year by customer and then sort the customer list by sales list to find the top five. If the set of customers is small, as it is in the sample SMA dataset, evaluating the set is not an issue. But if the set of customers is large, as it is with the 1 million customers used for this performance study, evaluating this set takes a significant amount of work, which should not be done unnecessarily.

When named sets are important but require so much work that they should not be evaluated for all user sessions, they can be defined on a per-session basis using the CREATE SET statement in MDX, or on a per-query basis using the WITH clause on the SELECT statement.

For purposes of this performance study, the Top 5 Customers set was deleted from the Sales cube.

Recommendations for Deployments

This section offers a number of recommendations on how to deploy and manage BI systems. The recommendations in this section are based on the findings of these performance experiments or on best practices known to the development team. A few recommendations are based on discoveries that occurred during the performance experiments. Some items in this section are believed to work but were not specifically tested as part of this study. Those are described using terms such as should or may.

The saying goes that there is an exception for every rule. Every recommendation here probably does not apply to every site. You should consider each recommendation in the context of a specific project to make sure it makes sense.

Managing Aggregations

We have already seen that pre-computed aggregations are a key mechanism for getting fast query performance. We have also seen that there are costs—notably more CPU time is required for processing aggregations—associated with pre-computing the aggregations. As mentioned earlier, BI Accelerator cannot determine an aggregation design at the time a BI application is generated, because it does not have row count information. Given all this, how should aggregations be managed?

As seen earlier, small databases—those with 10 million fact rows or less, and maybe even larger ones—are not significantly affected by aggregation design. If the server has enough memory to keep the key database information in memory, the system runs quickly.

Larger databases—those with 100 million or more fact rows—require a reasonable aggregation plan. Following are some general guidelines:

Use partitioning. Even though partitioning can slow down Master Update, its valuable in improving query performance. (This may not seem as though it is related to aggregation design, but it is, because the aggregation design changes when partitions are used.)
Perform the initial historical data load and full cube processing with no aggregations on the database.
Set dimension member counts and partition row counts before designing aggregations. If you skip this step, the aggregation design is useless. The Partition Aggregation utility is a very convenient tool with which to do this, particularly because it counts all dimensions and levels at once, and all partitions in cubes at once. This operation is slow, but be patient—it is working.
Design general aggregations for a modest performance gain, such as the 10 percent performance gain used in this study. As noted earlier, this can result in a gain that is significantly more than 10 percent, depending on the query mix. After designing the aggregations, either reprocess the cubes, if the amount of data does not make this an inordinately long process, or use the re-aggregate option in the Partition Aggregation utility. Because Analysis Services re-aggregates without rereading data from the Subject Matter database, but using already read data instead, it can get data extremely quickly and the CPU loads become high. Do not re-aggregate when users are querying against the server.

Note: For very large datasets (for example, 1 billion or more rows), aggregations based on expected performance gain may be very expensive to perform. It may be necessary to move quickly to usage-based optimization instead.
Make sure query logging is enabled on the server. If the initial user population is small, consider logging every query instead of the default of every tenth query.
Put the system into a pilot deployment or give access to the first users. The query log will start to accumulate information for a usage-based optimization aggregation design.
Design a set of aggregations using usage-based optimization after users have had a reasonable time to establish a usage pattern, or if poorly performing queries require immediate attention. Here are some important points to keep in mind:
- If you select the slowest running queries to optimize, add the new usage-based optimization aggregations to the existing set. If you replace the existing set, queries that were aided by the old aggregates become slow. An alternative is to include all queries in the set to optimize.
- Users query recent partitions more often than they query older partitions. If you have a partitioned cube, design aggregations on a recent partition and copy them to other partitions. Do not use usage-based optimization on an old partition and copy the design to recent partitions.
- Periodically rerun the Usage-Based Optimization Wizard on only the slowest queries, and add new aggregations to the set. This enables you to gradually respond to changes in query patterns. You do not have to re-run the Usage-Based Optimization Wizard for all partitions. Create the design on the most recent partition, and copy the design to the default partition, which becomes the template for future partitions.
- Use usage-based optimization on the appropriate (recent) partition, and then use the Partition Aggregation utility to copy that aggregation design to the default partition for the cube. This partition contains no data, but holds the default aggregation design for new partitions. For example, in the Sales cube, the partition Sales is the default partition, while Sales_2002_01 contains data for January of 2002. It may be desirable to copy the aggregation design to other partitions as well, depending on the user workload.
- Reprocess the partitions to implement the new aggregation design (the new aggregations have no effect on the partitions until they are processed). To create aggregations based on the new designs, either reprocess the partitions or re-aggregate them using the Partition Aggregation utility.

Managing Partitions

Master Update handles the most difficult aspects of partition management automatically. Master Update determines when new partitions are necessary based on the incoming data, creates both the relational tables and the cube partitions as required, ensures that the right data is included in each partition table, and processes the appropriate cube partitions as part of an update. You need to do only two things:

Make sure that partitions have the right aggregation designs, as discussed in the previous section. It is not necessary for all partitions to have the same aggregation design, but remember that the design in the default partition is the design that is propagated to new partitions.
Determine the useful lifespan of data in the system and delete partitions when appropriate. For some systems this is practically never, and for others it is a practical necessity. You cannot delete data from within a cube partition, or from a single-partition cube. The way to delete old data is to drop the partition the data resides in. To drop a partition:
1. Back up the relational databases and the Analysis database.
2. Use Analysis Manager to delete the partition from the cube.
3. Delete the corresponding relational table.

Planning Initial Loads

For most sites the performance of Master Update is sufficient for the initial data loads. But if a site is loading large amounts of initial data, waiting for Master Update may be frustrating, because Master Update is doing massive amounts of work. In addition, a multi-day execution puts the process at risk, because a failure late in the process can require a restart. Here are some options for making the process run faster and more smoothly:

Preallocating storage space and possibly increasing the batch size can result in small performance gains. For more information about these options, see "Preallocated RDBMS Storage" and "Batch Size and Error Checking" earlier in this paper.
Turning off error checking can result in a performance gain, but you should not attempt this unless you are absolutely certain of the integrity of your data. Weve had many customers tell us their data is clean when actually it is not, so be very careful when considering this option.
Waiting to process the cubes until after the initial data load may make sense, especially if you plan on changing the aggregation design after loading the data, as you typically do when using BI Accelerator.
Loading the data in batches instead of all at once may be a good idea for three reasons: first, less space is required in the Staging database, because you can remove the data from that database as soon as Master Update finishes running. Second, if the data can be loaded in time-based batches (for example, monthly) most of the performance impact of partitioning should be relieved. As noted earlier, this performance impact is caused by multiple scans of the Staging database fact table, once for each partition involved in the load. Third, in the event of a catastrophic failure in loading, a batch can be rolled back. In essence, loading the data in batches provides checkpoints in the process.
Loading data directly into the Subject Matter database might be helpful if no other options provide acceptable performance, but keep in mind that all functions performed by Master Update must then be performed beforehand. This rarely provides a net savings, because tools and procedures must be developed and tested before beginning the process. That usually takes more time than Master Update requires to execute.

Planning Incremental Loads

In most cases you should set up incremental loads to run automatically, calling Master Import and Master Update at the appropriate times.

Some sites may need to bring data in during the course of a business day, but not want the data in user-viewable cubes to change until a set time (usually after close of business). In those cases, the data can be checked, prepared, and stored in the Subject Matter database without altering the cube data, and the cube data can be processed later. The next section describes how this can be done.

Processing Cubes Outside Master Update

Processing cubes is normally part of the work of Master Update, once data is securely stored in the Subject Matter database. This task can be deferred to a time other than when the Subject Matter database is updated. One important reason might be to allow Master Update to run continuously or many times throughout the day without changing the data that users see. Then, at a scheduled time the cubes can be processed, which is a significantly faster operation. Another reason relates to large initial loads. The data can be prepared in the Subject Matter database, then aggregations can be designed, and then the cubes can be processed.

To run Master Update without processing dimensions, set the global variable gbManualProcessDim to TRUE. To run Master Update without processing cubes, set the global variable gbManualProcessFact to TRUE. These settings cause the appropriate Subject Matter database updates to occur without modifying the Analysis database.

The following tools are available for processing Analysis Services structures:

Analysis Manager provides the default user interface for all administrative tasks, including processing.
The Partition Aggregation utility included with BI Accelerator can be helpful when you are selecting multiple partitions to process. However, the utility does not process multiple cubes or partitions in parallel unless multiple instances are started at the same time. As was discussed earlier in "Aggregation Designs," the Partition Aggregation utility also has an important role in re-aggregating cubes.
The Analysis Services Processing Task in DTS can be used to include dimension and cube processing in an automated operation.
A parallel processing tool is available with the SQL Server 2000 Resource Kit. You can use this tool to process Analysis Services cubes and partitions in parallel. Automated scripts can call the tool for routine operations.

Configuring Systems

Its not hard to estimate the disk space required for flat files, the Staging database, the Subject Matter database, and the Analysis database that are used in a BI application. You can also measure disk space requirements on a small scale and extrapolate—the bytes per row don’t change much with as the row count increases. Make sure that you have enough disk space to store the data in the Subject Matter database and the Analysis database, that you have room for transient data in flat files and the Staging database, and that you allow for database logs and temp files. These items should all be on fast, redundant, multi-drive configurations, such as RAID-5 or RAID-10.

Many installations can run both the RDBMS and Analysis Services on the same server, but this is not recommended if user response times would be negatively impacted by data preparation activities in the RDBMS, or if contention for memory becomes an issue. As evidenced by the results in this study, standard four-CPU servers are capable computers, and newer machines are much faster than the ones used in these experiments.

Your computers must have enough memory to run the operating system, the SQL Server RDBMS, and Analysis Services. Memory most likely will become an issue when large dimensions come in to play. Analysis Services stores all dimension members in memory, and dimensions with millions of members place memory pressure on the server. The limiting factor is the address space available to a single process. This is determined by the operating system. Adding physical memory beyond 4 GB to a computer supporting only Analysis Services does not help.

For additional configuration suggestions, see the BI Accelerator documentation.

Enabling Error Checking

During the development cycle, developers commonly have BI Accelerator create DTS packages with full error checking turned off. In Analytics Builder Workbook, on the Processing page, the option Generate DTS packages with error handling for each package step is turned off by default. However, you should enable this option when your application goes into production. The DTS packages catch important types of errors and record them in the package logs, but you cannot see them in the Browse Batch utility unless error checking is enabled. For example, if the computer runs out of disk space while Master Import or Master Update is running, the packages report the correct error in their logs and fail, but you cannot use the Browse Batch utility to see the error unless error checking was enabled when the packages were generated.

Updating RDBMS Statistics

The SQL Server relational engine uses statistics on the relational tables to determine the optimal methods for executing SQL statements. If the statistics are incorrect, performance suffers. We observed this when using the large datasets for this paper, and it is likely to occur in a situation in which large amounts of data are deleted from a table and then re-inserted. For example, suppose a site loads a large amount of data during a pilot program. Before going into production, the site deletes all of the data from the databases using Master Delete and Master Truncate, and then re-loads the data from scratch. The second time through, Master Update may run more slowly than the first time. This appears to be because the RDBMS statistics have not kept up with the changes in the system. You can reset the statistics on any table by using either DBCC REINDEX (tablename) or UPDATE STASTISTICS tablename WITH FULLSCAN. For the purposes of this paper, no attempt was made to determine which tables required updating; they could be updated quickly so we updated them all.

Designing Performance and Scalability Experiments

This paper has discussed at length the results of a performance experiment that was conducted using one schema, one hardware configuration, and several variations on one dataset. Naturally, the results will vary if any of those factors are changed, which is why it is difficult to predict the performance of any individual site. In particular, the query performance results are highly dependent on the actual query pattern of the site and on the aggregation design. For these reasons, we recommend that you conduct a performance experiment when the following issues come into play:

The system receives queries that require data at the most granular level in order to provide a higher-level result. As discussed earlier, topcounts are a common example, as are calculations of medians or percentiles.
You have large datasets; for example, more than 100 million rows.
You have large dimensions; for example, more than 2 million members.
You have many concurrent active users; for example, more than 100 concurrent active users.

Note that such an experiment is meaningful only if you have a reasonably accurate estimate of the types of queries that users submit.

Workloads

Consider two parts of the system workload when planning performance tests: the data to be used and the query set.

Do not make performance projections using small datasets. Suppose this study had projected the test results from the 10 million row dataset onto even the 100 million row dataset. The measurements of data preparation speeds and query performance would be off by sizable amounts. Only the disk space requirements would have been close.

In general, system performance is not sensitive to the measure values that are in the dataset; for the most part, random values and a number of measures that correspond to what you will use in production should suffice. The distribution of data in dimensions should be somewhat realistic.

Query performance is quite sensitive to the query patterns and aggregation designs. If possible, collect actual queries submitted by pilot users who are using the query tool proposed for production. Microsoft PivotTable Service, the client component of Analysis Services, has a facility for logging the actual text of the MDX queries that are submitted. This is not the same as the server log, which only notes the level of each dimension used in a query. For more information about logging MDX queries, see the topic "Log File Property" in SQL Server 2000 Books Online.

For more information about designing a query set for testing, see the discussion earlier in this paper about the query set used for this study. Any Analysis Services performance study must pay attention to caching on the client and the server and generate enough distinct queries to make the experiment meaningful. Generating such query sets is one of the greater challenges in characterizing the performance of an Analysis Services deployment.

What to Measure

Just as in this study, it is always important to measure the performance of data preparation (the update rate) and the response times for queries. Response times should be measured when the system is under a multi-user workload, to determine the number of concurrent users that the system can support.

In addition, if the data is to be used as input to determine the optimal size of the system, measurements of CPU usage, memory usage, disk space, bandwidth, and network usage are helpful.

Unless you are performing a sophisticated analysis, this will be enough data to work with.

Tools Required

Given a set of data in flat files, simple calls to Master Import and Master Update can test the update rate of your system. Time stamps reported by the Browse Batch utility can provide timing information.

Query performance is a bit trickier. Once a set of queries is available, you need a tool to submit those to the server and time the results. The MDX Sample Application, which provided as a sample with Analysis Services, shows you all of the code you need to submit MDX queries. A wrapper around the code that submits queries can report elapsed query times. Be sure each emulated user session uses a separate instance of PivotTable Service, or client-side caching can artificially decrease the server's workload. Also, be sure not to overload the client computers. A performance study is not valid if the client computers create a bottleneck. For this study, we performed an exercise to determine how many emulated client sessions we could run on each client computer.

Repeatability

Pay close attention to the repeatability and correctness of tests in a performance experiment. Its easy to take a measurement that is subject to variation under the influence of other factors, known or unknown, and not realize that the measurement is unstable. Here are some of the factors we encountered in this study:

Restarting Analysis Services flushes its query cache.
Restarting the RDBMS flushes its page cache.
Rebooting a server flushes the file system cache and breaks network connections.
Reusing RDBMS storage space means that the server does not have to allocate it again.
Deleting rows from an RDBMS table and reinserting them can result in unexpected values in the table statistics.
Anti-virus scans of a server file system can compete for resources.
Shared networks can become saturated by users who are not included in the experiment.
File systems can fill up, causing errors that may not be detected by simple test scripts.

Conclusions

Much data has been presented here along with a number of findings and recommendations. Here are some of the key points.

Data preparation
- Parallel table loads are helpful if fact tables are of similar size.
- Partitioned loads are expensive if many partitions will receive data. Consider breaking data into monthly sets and processing them month-by-month.
- A batch size of 10 KB provides substantial performance improvement over the default batch size for running Master Import.
Query performance
- Analysis Services can reliably deliver sub-second query performance for at least 500 simultaneous user sessions issuing a wide range of queries.
- Usage-based optimization and partitioning are key to providing this level of performance.
- A warm query result cache improves performance. If possible, warm the query result cache by running typical queries before users access the system.
- A system with approximately 10 million rows of fact data may not require aggregations to provide good query response times.
System management
- Fact table data that required 103 bytes per row in flat files used 142 bytes per row, or 40 percent more, in the RDBMS after key assignment. When processed into Analysis Services cubes, the data required fewer than 40 bytes per row, or 60 percent less space.
- Partitioning the data provides a clean way to manage data updates and control the life cycle of data in the system.

Appendix A: Query Examples

Each of the 23 query templates used in the performance tests contains variations that prevent the server from answering all queries based on previously cached results. This section shows some of the queries that were generated based on the templates. They are examples of the actual queries run during the performance experiments. The queries address a wide range of dimension levels and Multidimensional Expressions (MDX) functionality. They are intended to reflect real queries that users might ask against the SMA model.

-- Show discounts for sales regions (ordered by margin) for a month.
SELECT { [Measures].[Actual Discount %] } ON COLUMNS ,
 { ORDER (
           { DESCENDANTS (
              [Sales Force].[Standard].[All Sales Force],
              [Sales Force].[Standard].[Region] ) },
 ( [Measures].[Actual Gross Margin] ),
   BDESC ) } ON ROWS
FROM [Sales Force Performance]
WHERE ( [Time].[Standard].[Quarter].&[1/1/1998] )
-- Show invoice count by manufacturer for last four quarters 
--      for one marketing area.
SELECT { [Standard Last 4 Quarters] } ON COLUMNS ,
 { [Manufacturer].[Standard].[Manufacturer].members } ON ROWS
FROM [Sales]
WHERE (   [Measures].[Actual Invoice Count],
[Geography].[Marketing].[Direct Mktg Area].&[110] )
-- Show count of returns by manufacturer for four quarters 
--      for one marketing area.
SELECT { [Standard Last 4 Quarters] } ON COLUMNS ,
 { [Manufacturer].[Standard].[Manufacturer].members } ON ROWS
FROM [Sales]
WHERE (   [Measures].[Actual Invoice Count],
[Geography].[Marketing].[Direct Mktg Area].&[110],
[Sales Type].[Standard].[Sales Type].[Return] )
-- For a manufacturer and the last two quarters, show the 
--      sum of order amounts.
SELECT { [Manufacturer].[Standard].[Manufacturer].&[7] } ON COLUMNS,
 { [Time].[Standard].[Quarter].&[10/1/2000].lag(1),
   [Time].[Standard].[Quarter].&[10/1/2000] } ON ROWS
FROM [Orders]
WHERE [Measures].[Order List Sales Amt]
-- Find the top three sales cities in a region, and show 
--      units sold in the current quarter.
SELECT { [Measures].[Actual Sales Units] } ON COLUMNS ,
 { TOPCOUNT(
descendants([Geography].[Standard].[Region].&[2],
[Geography].[Standard].[City]),
3,
[Measures].[Actual Sales Units] ) } ON ROWS
FROM [Sales]
WHERE ([Standard Current Quarter].item(0))
-- Find the top the sales zip codes in a city, and show 
--      the units sold in the current quarter.
SELECT { [Measures].[Actual Sales Units] } ON COLUMNS ,
 { TOPCOUNT(
     descendants([Geography].[Standard].[City].&[11844],
[Geography].[Standard].[Postal Code]),
3,
[Measures].[Actual Sales Units] ) } ON ROWS
FROM [Sales]
WHERE ([Standard Current Quarter].item(0))
-- Define a calculation of gross profit. Show that, show the 
--      sales amount, and show the product cost for all days 
--      in a quarter, for one product.
WITH MEMBER [Measures].[Gross Profit] AS
  '[Measures].[Actual List Sales Amt]-
   [Measures].[Actual Product Cost]'
SELECT { [Measures].[Actual List Sales Amt],
         [Measures].[Actual Product Cost],
         [Measures].[Gross Profit] } ON COLUMNS ,
       { [Time].[Standard].[Quarter].&[1/1/1900].CHILDREN } ON ROWS  
     FROM [Sales]
WHERE ( [Product].[Standard].[Product Item].&[804] )
-- Find the top three product lines by sales units. Show those sales
-- for the three named quarters for a given industry.
SELECT { [Time].[Standard].[Quarter].&[1/4/1900].lag(2),
         [Time].[Standard].[Quarter].&[1/4/1900].lag(1),
         [Time].[Standard].[Quarter].&[1/4/1900] } ON COLUMNS ,
       TOPCOUNT( {[Product].[Standard].[Product Line].members},
         3,
         ( [Standard Current Year].item(0),
           [Measures].[Actual Sales Units] ) ) ON ROWS
FROM [Sales]
WHERE ( [Measures].[Actual Sales Units],
        [Customer].[Direct].[Industry].&[126] )
-- For a product group, show units sold for each day of a month.
SELECT { [Measures].[Actual Sales Units] } ON COLUMNS ,
       { CROSSJOIN( { [Product].[Standard].[Product Group].&[2] },
           Descendants([Time].[Standard].[Quarter].&[1/1/1999],
           [Time].[Standard].[Day]) ) } ON ROWS
FROM [Sales]
WHERE ( [Sales Type].[Standard].[Sales Type].[sale])
-- For three different manufacturers, show backlog by day of week for 
--      open orders for a given quarter.
SELECT {[Day Of Week].[Standard].[Day Of Week].members} ON COLUMNS,
   {[Manufacturer].[Standard].[Manufacturer].&[4],
    [Manufacturer].[Standard].[Manufacturer].&[6],
    [Manufacturer].[Standard].[Manufacturer].&[2] } ON ROWS
FROM [Backlog]
WHERE ( [Measures].[Backlog Count],
        [Order Status].[Standard].[Order Status].[open],
        [Time].[Standard].[Quarter].&[1/3/1900] )