The Advantages of 64-bit to SQL Server 2000 Enterprise Edition Business Intelligence Customers

Article
01/28/2010

By John H. Miller , Sean Boon, Microsoft Corporation

Summary: This white paper discusses the advantages that 64-bit Itanium® technology offers the business intelligence customers of Microsoft SQL Server 2000 Enterprise Edition. The information in this paper applies only to SQL Server 2000 and not to earlier versions of SQL Server, and the reader should not assume that future versions of SQL Server will use 64-bit the same way. This white paper examines scenarios in which 32-bit may have presented specific challenges for existing SQL Server customers, and explains how 64-bit Itanium can help improve those situations.

Two main components of SQL Server 2000 are examined in this paper in some detail: the Relational Database component and the Analysis Services component.

The following are a few of our key findings:

Overall 64-bit Advantages

64-bit provides:

Significantly larger direct-addressable memory space than 32-bit.
Improved chip architecture designed to handle parallel processing tasks more effectively.
Better on-chip cache management.
Improved multiprocessor performance and enhanced on-processor parallelism.
Increased bus and I/O bandwidth for faster and wider throughput.

64-bit Advantages - Relational Engine

With 64-bit, the Relational Engine:

Eliminates limitations associated with Address Windowing Extensions (AWE).
Provides all database operations with larger memory address space.
Provides improved parallel processing capabilities and multiprocessor support.
Provides increased bus and I/O bandwidth for faster and wider throughput.

64-bit Advantages - Analysis Services

With 64-bit, Analysis Services:

Enables loading of very large dimensions into memory.
Allows for a significantly larger query cache.
Allows for a significantly larger aggregate cache during partition processing.
Eliminates issues associated with placing dimension-level security on large dimensions.
Provides enhanced capabilities for processing very large dimensions/large partitions.

Introduction

Microsoft SQL Server 2000 Enterprise Edition (64-bit) represents a major advance for Microsoft business intelligence (BI) customers. The new generation 64-bit systems enable SQL Server 2000 to support much larger and more complex BI applications than could have been attempted on 32-bit systems. Recent TPC-C and TPC-H benchmarks show the significant advancement that 64-bit technology brings to the Microsoft Windows platform. For more information about these benchmarks, see https://www.tpc.org. Throughout this white paper, all references to 64-bit or 64-bit chipsets are referring to 64-bit chipsets based on Itanium technology. Microsoft SQL Server 2000 Enterprise Edition (64-bit) is only supported on Itanium based architectures.

Microsoft partnered with Hewlett-Packard (HP) to obtain the equipment needed for writing this white paper. HP generously provided multiple enterprise-level 64-bit servers, as well as the storage needed to support the experiments. This paper provides specific computer configurations for each set of experiments. Two 64-bit enterprise-class servers were used. Each server was independently configured to use a separate storage subsystem. The reason for separating the servers and storage subsystems was to allow different experiments to take place, in parallel, without one affecting the other. In reality, many customers deploy their BI solutions on a single appropriately sized server. However, to enable both sets of experiments to proceed in parallel for writing this paper, we chose to use two. One set of experiments was conducted on the SQL Server 2000 Relational Database engine. A second set of experiments was conducted on the SQL Server 2000 Analysis Services OLAP database engine.

Although 64-bit technology benefits various types of customers, this paper focuses on the benefits to customers who use SQL Server to host large business intelligence solutions. Application workloads that could benefit significantly by moving to 64-bit include:

Data-intensive applications that hold large amounts of data in memory, such as Enterprise Resource Planning (ERP) applications, CRM applications, enterprise data warehouses, large data marts, and large analytical (OLAP) applications.
Computation-intensive applications that employ enhanced computation capabilities, such as data mining and complex analytical computations.
Applications that demand significant parallel processing or I/O throughput, such as partitioned data loading, partitioned cube processing, parallel database backup and recovery, and parallel index building.

As you might surmise from above, what really sets the 64-bit chipsets apart from their 32-bit predecessors is a core set of enhancements to the architecture, including:

The ability to directly address a vast physical memory address space (RAM) and improved on-chip memory cache management.
Enhanced symmetrical multi-processor parallelism, as well as improvements in single processor parallelism (threading).
Increased bus architecture and better I/O bandwidth, offering faster and wider throughput.

Our Relational Database testing confirms that 64-bit offers significant advantages to SQL Server 2000 business intelligence customers in the following areas:

Direct addressability of very large memory benefits all database operations not just a handful, as is the case with AWE memory support in 32-bit.
The ability to perform database operations in memory allows for faster completion of tasks and helps minimize disk I/O, a major contributor to performance degradation in most systems.
After table data is preloaded into a buffer cache by prior query operations, subsequent queries against the same cached data complete in a fraction of the time.
Where disk I/O cannot be avoidedsuch as when initial table reads must be performedthe enhanced throughput 64-bit helps speed I/O operations along.
Parallel database operations are able to take advantage of the enhanced parallelism capabilities offered by 64-bit.

These are some of the benefits for data warehouses based on the Relational Engine. Business intelligence solutions based on Analysis Services can benefit from 64-bit as well. Because the 64-bit edition of Analysis Services can address a vast virtual memory address space, Analysis Services BI solutions can benefit from the following:

The ability to create very large MOLAP-based dimensions that can support tens of millions of members. In the tests described in this white paper, a customer dimension was created that consists of 50 million members.
The ability to support applications that heavily use dimension security.
The ability to cache more data in memory and to keep the data in cache for potentially long periods of time. Queries that are resolved using the cache return results significantly faster than queries that must rely on disk resources.
The ability to process partitions without the use of temporary files. Temporary files are required for partition processing when Analysis Services has exhausted the memory available to it. For those installations in which temporary files are currently being utilized during partition processing, the 64-bit edition of Analysis Services may significantly decrease processing time. Implementations in which partitions are processed in parallel, and/or in which partitions use the distinct count aggregate, are also likely to benefit from this capability.
The ability to consolidate Analysis Services databases. Customers that have separated Analysis Services databases due to memory limitations may be able to consolidate these databases onto fewer 64-bit servers. Before deciding to consolidate servers, make sure all Analysis Services databases that would reside on the same server have identical administrative rights associated with them.

Impact of 64-bit Technology on the Relational Engine

For a Relational Database management system such as SQL Server, the ability to directly address massive amounts of memory is the most significant enhancement introduced by 64-bit technology. To fully understand the benefits 64-bit brings in this area, you should first have a basic understanding of how SQL Server manages memory. Likewise, you should understand how SQL Server (32-bit) uses Address Windowing Extensions (AWE) to address large amounts of physical memory. These issues are discussed in detail in Appendix A, and summarized here:

A key tenet of database system design and management is to minimize disk I/O, which is very slow compared to in-memory operations.
Only 2 to 3 gigabytes (GB) of memory are directly addressable on 32-bit Windows systems.
Some SQL Server operations benefit from AWE by addressing a significantly larger buffer cache than is addressable using normal 32-bit addressing. Not all SQL Server operations benefit from AWE.

This section describes the following advantages of the 64-bit architecture:

Large direct-addressable memory space
Enhanced parallelism
Improved BUS architecture

Large Direct-Addressable Memory Space

The major advantage 64-bit offers over AWE is that all SQL Server database operations benefit from the increased memory. Systems based on the 64-bit architecture enable SQL Server to directly address an extremely large physical memory space. The following are just a few of the database operations that cannot take advantage of the extended memory under AWE, but benefit from 64-bit:

Queries and other operations that sort large volumes of data. These operations are common in data warehousing workloads.
Queries whose execution requires storage of intermediate query results. Queries with complex, multi-step query plans are common in data warehousing workloads.
Systems with many database objects defined and open simultaneously, such as databases, tables, and views.
Systems with many concurrent users, cursors, transactions, locks, and so on.
Systems with many stored procedures (procedure cache). Such systems are relatively uncommon in relational data warehouses.

Enhanced Parallelism

The new generation 64-bit chipsets based on Itanium also offer significant improvements in terms of parallel processing capability. The enhancements are in two major areas:

The 64-bit chipsets perform symmetrical multi-processor (SMP) tasks faster and more efficiently. The improvements in SMP support enable SQL Server to likewise improve performance on parallel operations.
The 64-bit chipsets achieve better in-processor parallelism. Improved in-processor parallelism is beneficial to SQL Server in situations where overhead from context switching hurts performance. Administrators can configure SQL Server to use fibers (run in fiber mode) instead of threads, which may enable SQL Server to handle parallel operations more efficiently. To configure SQL Server to use fibers instead of threads, you set the server configuration option lightweight pooling to a value of 1. Using fibers over threads may be beneficial if, for example:
- SMP computers are configured with a large number of CPUs.
- The majority of CPUs are consistently running at or near maximum capacity.
- A high level of context switching is occurring.

For more information about how SQL Server manages threads or fibers, see Thread and Task Architecture, SQL Server Task Scheduling, and lightweight pooling Option in SQL Server Books Online.

Parallel operations that benefit from the enhanced parallel processing capabilities offered by the 64-bit chipsets include:

Parallel query resolution (parallel table scans and parallel index scans)
Parallel index builds
Parallel database backup and restore operations
Parallel data load
Parallel database maintenance operations

Improved Bus Architecture

Several enhancements to the current generation 64-bit chipsets provide much improved I/O throughput:

On-chip caching Minimizes the amount of time a processor spends waiting for data by buffering the necessary data ahead of time.
Improved 64-bit processors Perform more instructions per clock cycle and have the ability to process these instructions in parallel.
Bus architecture The bus architecture on current 64-bit chipsets is faster and wider than earlier generations. More data is passed to the cache and processor; this is somewhat analogous to the improvement broadband connections offer over traditional dial-up.

Combined, the enhancements found on current generation 64-bit chipsets add up to a total bandwidth that is three times greater than that of the previous generation.

Demonstrable 64-bit Performance Gains

This section describes the computer configuration and database profile that were used for the Relational Database testing, and presents the results from those tests.

System Configuration

For testing purposes, Hewlett Packard provided an HP Integrity Server rx5670 with four 900 MHz Intel® Itanium 2 processors and 16 gigabytes (GB) of RAM. This server was fiber-attached to an Hewlett Packard StorageWorks Virtual Array 7400 storage device. The disks in this array were configured as two separate logic drives using AutoRaid (a pseudo RAID 5 configuration).

Figure 1 depicts the hardware (server and storage configuration) that was used to conduct the Relational Engine testing.

Figure 1:

Figure 1:

Database Profile

For testing purposes we used the same Sales and Marketing Analytics (SMA) database schema and data as described in the white paper titled Performance of BI Systems Built with Microsoft SQL Server Accelerator for Business Intelligence, available at www.microsoft.com/technet/default.mspx in the SQL Server section.

The SMA schema has a Time dimension on which several virtual dimensions, such as Day of Week, are based. The SMA schema also has several multi-level dimensions: Sales Force, Product, Geography, Customer, and Campaigns; and various flat dimensions containing a single level: Sales Type, Order Type, Order Status, Forecast Scenario, Quota Version, Lead Quality, and Customer Status.

To perform the tests described in this paper, we used a data generator to create data for the Sales, Orders, and Backlog fact tables. The dataset created contained approximately 1 billion rows spread across multiple tables. The following table shows the number of rows per fact table.

Table name	1 billion row database	Tbl size in GB
Fact_Backlog	382,756,178	~44 GB
Fact_Orders	324,823,696	~38 GB
Fact_Sales	293,907,214	~50 GB
Total	1,001,487,088	~132 GB

Prior to performing most of the operations describe below, the instance of SQL Server was stopped and restarted to ensure that results were not being read from cache.

Benefits Large Direct-Addressable Memory

As noted earlier, the single biggest benefit 64-bit brings to the Relational Engine of SQL Server 2000 is the ability to use the increased RAM when conducting a wide array of common database operations. The following sections demonstrate a series of experiments that illustrate how SQL Server can use the increased RAM for a number of other tasks.

Resolving Complex Multi-Step Queries

As described above, one of the drawbacks of AWE is that AWE cannot use extended memory to store the intermediate result sets between multiple steps of a complex query. Consider the following query, which produces a multi-step query plan:

SELECT Cust_Direct_Cust_Idx, SUM(AUnits) AS AUnits, SUM(ASalesAmt)AS 
AsalesAmt 
FROM Fact_Sales 
WHERE 
   Cust_Direct_Cust_Idx = 30602 
GROUP BY 
   Cust_direct_Cust_Idx

This SQL statement generates the query plan shown in Figure 2.

Figure 2:

Figure 2:

A 32-bit server configured to use AWE would not have been able to take full advantage of extended memory to resolve this query, because AWE cannot be used to store intermediate result sets between the steps. In contrast, we ran this query against our HP Integrity Server rx5670 configured with 16 GB of RAM. The following diagram depicts the memory used by the SQL Server database instance in resolving the query, compared to the total memory available on this system:

Figure 3:

Figure 3:

Available Memory (KB) The amount of physical memory immediately available for allocation to a process or for system use.
Maximum Workspace Memory (KB) Total amount of memory granted to executing processes. This memory is used for hash, sort, and create index operations.
Total Server Memory (KB) Total amount of dynamic memory that the instance of SQL Server is currently consuming.

As the query was executing, the table data read in from disk was placed into SQL Servers buffer cache, causing depletion in available memory on the system. In this scenario, we enabled SQL Server to dynamically manage available resources, a choice the system administrator can turn on or off using server level settings. This actually highlights an advantage enjoyed by many SQL Server customers, which is the ability of the database server to intelligently and dynamically manage its own resources.

Resolving Common Queries from the Buffer Cache

This first experiment demonstrates that SQL Server running on 64-bit can address a large amount of available memory. However, it does not illustrate the most important reason for a database server to have a large amount of direct-addressable memory: the ability to use that memory to hold commonly referenced table and index pages in buffer cache. A large buffer cache enables the database server to quickly resolve queries from memory, and helps avoid costly disk I/O. Our next experiment illustrates this point.

To set up this experiment, we first created a smaller version of the Fact_Sales table named Fact_Sales_24MRows. This smaller table contained 24 million rows from Fact_Sales. Because Fact_Sales table is approximately 50 GB in size, we obviously could not load that entire table into the 16 GB of RAM that was available on our computer. However, the Fact_Sales_24MRows table, at approximately 4 GB, fits nicely into our 16-GB memory with plenty of room to spare.

We then ran the following query four times.

SELECT Cust_Direct_Cust_Idx, SUM(AUnits) AS AUnits, SUM(ASalesAmt)AS 
ASalesAmt 
FROM Fact_Sales_24MRows 
WHERE 
   Cust_Direct_Cust_Idx = 30602 
GROUP BY 
   Cust_direct_Cust_Idx 
GO

The following diagram shows the query plan that was generated from the above SQL statement. Note that steps were performed in parallel.

Figure 4:

Figure 4:

On the first run, we set the connection properties in Query Analyzer to record the statistics I/O as well as statistics time. The tables below show the first run compared with subsequent runs of the same query. From this we can see that in order to resolve the initial query the data had to first be read in from disk. This was accomplished by issuing read-ahead requests. Waiting for I/O to complete caused the initial query to take nearly 29 times longer (elapsed time) to complete as compared to subsequent runs of the same query.

Initial Query Statistics

Table 'Fact_Sales_24MRows'. Scan count 4, logical reads 533329, 
physical reads 0, read-ahead reads 536630. 
SQL Server Execution Times: 
   CPU time = 8740 ms,  elapsed time = 59987 ms. 
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms. 
Subsequent Query Statistics 
Table 'Fact_Sales_24MRows'. Scan count 4, logical reads 533329, 
physical reads 0, read-ahead reads 0. 
SQL Server Execution Times: 
   CPU time = 7460 ms,  elapsed time = 2074 ms. 
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

By analyzing the processor utilization on the system while these queries were running, we can see the impact that caching had on the CPUs during query resolution. The graph below shows % Processor Time for each processor. This is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. Notice that during resolution of the first query, a significant amount of time passed, yet utilization of the CPUs rarely approached 20%. That is because the CPUs were waiting for I/O to complete. In contrast, each of the subsequent queries completed very quickly with all CPUs experiencing spikes to nearly 100% utilization.

Another important concept this graph illustrates is that when queries are performed against a table or tables with a large numbers of rows, SQL Server can choose to break apart that query and process segments of it in parallel by spreading the load across multiple CPUs. SQL Server considers processing a query in parallel only on systems that have multiple CPUs and when the rowcount to be retrieved exceeds 5 million rows. This is what is happening here, and this is why all CPUs are showing as busy. The query plan in Figure 4 also indicates that this query would be processed in parallel.

Figure 5 shows CPUs waiting for data from the table to be read in from disk for the initial query. Subsequent queries are resolved in cache, and CPUs do not have to wait.

Figure 5:

Figure 5:

By analyzing the memory utilization on the system while these queries were running, we can see the how SQL Server reuses existing buffer cache to resolve subsequent queries that access data already in cache. Figure 6 shows performance monitor being used to track the memory allocation counters. There are two memory allocation counters:

Available Memory (KB) The amount of physical memory immediately available for allocation to a process or for system use.
Total Server Memory (KB) Total amount of dynamic memory the SQL Server instance is currently consuming.

When the first query was executed, SQL Server acquired memory as the data pages for the table were initially read in. These pages get placed into buffer cache. When subsequent queries of a similar nature are executed, SQL Server does not have to increase its memory footprint, because it is able to resolve those queries from data pages already in its buffer cache. Figure 6 illustrates this point.

Figure 6:

Figure 6:

Building Clustered Indexes

Another operation that benefits greatly from having large amounts of addressable memory is building clustered indexes. Similar to a complex query, the building of a clustered index in SQL Server relies on intermediate result sets to hold data as it is being sorted in order of the clustered index key. After they are sorted, the data and index are written back to disk. If there is not enough available memory to complete such tasks in memory, SQL Server writes intermediate results to disk. This can take significantly longer. To test whether this limitation was alleviated in 64-bit SQL Server, we ran the following CREATE CLUSTERED INDEX command on top of our 24-million row table:

CREATE CLUSTERED INDEX [IX_Fact_Sales_24MRows] 
    ON [dbo].[Fact_Sales_24MRows]([AOrderNum]) 
  WITH  FILLFACTOR = 100 ON [PRIMARY]

To better illustrate how SQL Server manages the buffer cache resource, we performed this operation two times in a row. The sequence of commands we ran were as follows:

Create the index the first time.
Drop the index just created.
Re-create the index (exactly as in step #1).

Figure 7 shows the results captured from Performance Monitor for the initial CREATE CLUSTERED INDEX operation:

Figure 7:

Figure 7:

Three sets of statistics are presented above to illustrate exactly what was going on in the system during the CREATE INDEX operation:

Graph 1 - Memory Allocation

Available Memory (KB) The amount of physical memory immediately available for allocation to a process or for system use.
Maximum Workspace Memory (KB) Total amount of memory granted to executing processes. This memory is used for hash, sort, and CREATE INDEX operations.
Total Server Memory (KB) Total amount of dynamic memory the SQL Server instance is currently consuming.

Graph 2 - CPU Utilization

% Processor Time Average percentage of busy time observed during the sample interval. This counter is the primary indicator of processor activity.

Graph 3 Logical Disk I/O

Disk Reads/sec Rate of read operations performed to disk.
Disk Writes/sec Rate of write operations performed to disk.

We can see from Graph 1 of Figure 7 that SQL Server acquired increasing memory from the system as it proceeded with the index creation. We can see in Graph 2, Phase 1 of Figure 7 that all CPUs were approximately 60-80% busy reading the table data and sorting the data in the order of the index key. Later on, in Graph 2, Phase 2, utilization drops on all CPUs to approximately 20% as the sorted index/table is written to disk. In Graph 3, we see the impact on logical disk I/O. During Phase 1, SQL Server is reading the unsorted table data and sorting the data on the index key column(s). During Phase 2, the sorted table and index are written back to disk. SQL Server performed this index sort operation in memory, which on the 24-million row table required more memory than is available on a 32-bit system.

To fully illustrate the effect of in-memory processing, we immediately dropped and recreated an identical index. Figure 9 shows results captured from Performance Monitor during the drop/re-create step of the second CREATE CLUSTERED INDEX operation:

Figure 8:

Figure 8:

The initial blip that appears in both Graph 2 and Graph 3 of Figure 8 is a result of the DROP INDEX operation which completes very quickly. We can see from Graph 1 that SQL Server acquires only a small amount of additional memory for the second CREATE INDEX operation, illustrating that, for the most part, the memory allocated to SQL Server was re-used. More interesting is what we see in Graph 2 and Graph 3. In Graph 2, Phase 1 we see that CPU utilization jumps almost immediately to 100% across all processors. SQL Server already has the table data in buffer cache from our first CREATE CLUSTERED INDEX operation, so there is no need to read the data from disk again. Further evidence of this is seen in Graph 3, Phase 1, where we see virtually no read I/O operations being issued. Phase 2 of all three graphs in Figure 8 look virtually identical to the first CREATE CLUSTERED INDEX operation (in Figure 7). That is because both of the CREATE CLUSTERED INDEX operations perform essentially the same operations of writing the sorted table and index back to disk.

Impact of 64-bit Technology on Analysis Services

The following sections discuss the changes to Analysis Services when you use 64-bit technology.

Analysis Services and Memory

The single most important benefit that the 64-bit architecture brings to users of Analysis Services is the ability to address large amounts of virtual memory. To fully understand the benefits 64-bit technology offers in this area, you should have a basic understanding of how Analysis Services manages memory. Unlike SQL Server, Analysis Services does not use AWE memory to address more than 2 GB of virtual memory. There is no difference in how Analysis Services addresses memory in the 64-bit version versus the 32-bit version, except in the fact that the 64-bit edition does not use the Very Large Dimension Manager (VLDM). However, because Analysis Services is able to address much larger amounts memory in the 64-bit release, there will be less competition for memory resources on 64-bit installations, thus removing the need for the VLDM process.

Memory is a limited resource in Analysis Services. By default, the 32-bit version of Analysis Services can use up to 2 GB of virtual memory. It is possible to raise this limit to 3 GB by making changes to the boot.ini file (see KB article Q295443), and this additional 1 GB of addressable memory can prove to be a large benefit for many Analysis Services implementations. However, if your application requires more than 3 GB of memory, you should consider using the 64-bit version of Analysis Services.

With the release of the 64-bit version, Analysis Services supports up to 512 GB of RAM (on Windows 2003 DataCenter Edition) and up to 8 terabytes (TB) of virtual memory. To determine if the 64-bit version of Analysis Services is right for your implementation, you must understand how Analysis Services utilizes virtual memory made available to it by the Windows operating system. Appendix B discusses how Analysis Services uses memory.

The next section explores characteristics of applications that may benefit from the additional memory made available by the 64-bit release, and assumes the reader understands how Analysis Services uses memory.

Implementations of Analysis Services that Can Benefit from 64-bit Architecture

Analysis Services applications that use significant memory can benefit from the 64-bit architecture. There are several types of scenarios in this category:

Analysis Services installations with very large dimensions.
Systems that make extensive use of dimension security, either explicit or role based.
Analysis Services installations that support user queries during cube processing.
Systems with a short processing window.
Cubes that contain a distinct count measure.
Customers who wish to consolidate multiple Analysis Services databases onto fewer servers.

Very Large Dimensions

As described in Appendix B, Analysis Services loads all members and member properties of shared and private dimensions into memory when the service is started. This is not a configurable option, and it should be viewed as a constant memory cost: The memory required to store all dimension members and properties cannot be used for other functions during normal operations of the data warehouse. Solutions that use very large Analysis Services dimensions will have less memory available for the process buffer during the processing of partitions, and for the aggregate cache at query time. The memory diagram in Appendix Figure B1 illustrates why this is the case.

Dimension processing also requires memory resources. During the processing phase of a dimension, a shadow copy of the dimension is stored in virtual memory in order to support users who are currently browsing the data and to ensure that the data will be left in a consistent state at the end of the transaction. As a result, incremental dimension processing requires twice the amount of virtual memory that the dimension would otherwise consume. This is an important factor to consider when sizing an Analysis Services based business intelligence solution, because it is possible for an administrator to process a dimension for the very first time successfully but for subsequent attempts to fail if there is not sufficient available memory.

Although it is possible to use multi-million member MOLAP dimensions in the 32-bit edition of Analysis Services, storing and processing dimensions of this size will likely consume a significant amount of the addressable memory that is available to the server, thus leaving less memory for functions such as the aggregate cache or the process buffer. This can have a negative effect on your solution.

Before the release of the 64-bit edition of Analysis Services, customers with very large dimensions would need to choose the ROLAP storage option for these dimensions. This requirement also meant that any cubes that used such a dimension would need to be stored using the ROLAP storage mode. ROLAP cubes typically perform less well than MOLAP cubes at query time.

Because the 64-bit edition of Analysis Services can handle much larger MOLAP dimensions, customers can now use the MOLAP storage option even with very large dimensions. There are several analytical applications that should benefit from the 64-bit edition of Analysis Services, for example:

A retail chain with tens of millions of customers may wish to analyze purchases made by a single customer over multiple periods of time.
A telecommunications firm with millions of subscribers may wish to identify which customers have not used their cell phones for an extended period of time and target them for a special promotion.

For solutions that require the use of very large dimensions, the 64-bit edition of Analysis Services may be an enabling version of the product, allowing users to ask business questions that they would not have been able to answer previously.

Dimension Security

Applications that make extensive use of dimension security may also see tremendous benefits by utilizing the 64-bit edition of Analysis Services. As noted earlier in this paper, dimension security is a feature of Analysis Services that gives an administrator the ability to limit views of a dimension based on a users role. For example, a sales manager might be allowed to see the sales of all his or her direct reports, while each direct report can only see his or her own results. Many analytical applications require this type of security, and Analysis Services implements it by building replica dimensions based on allowed and denied sets of members. Users can share replica dimensions only if their allowed and denied sets of members are identical. Each replica dimension consists of all permitted members plus their siblings, all their ascendants, and all the siblings of their ascendants.

Simultaneous Query and Load

In recent years, the need for data warehouses to have the same level of availability as OLTP systems has increased. As OLAP cubes become an important source of information for decision makers, many OLAP systems are now required to be constantly available. This is particularly important where business requirements dictate that data should be uploaded into the warehouse at multiple times during the day.

In Analysis Services, both querying activities and processing activities are memory dependent. When users are querying the cube, the amount of memory devoted to the query cache is based on the amount of memory remaining for the Analysis Server to use with all other functions accounted for. The purpose of this cache is to help resolve queries as fast possible. As long as cubes or partitions are not being processed while users are querying the system, there will probably not be an excessive demand for memory on the system that would compete with the needs of the aggregation cache. The one exception to this rule is if dimension replicas need to be built as more users connect to cubes that utilize dimension security.

When partition processing begins, the server builds process buffers for each partition being processed. These process buffers store the temporary aggregates that are built during the aggregation phase; they must store all of the aggregates until each partition has been processed and the aggregates have been written to disk. If users query the system during cube processing, the queries compete with the cube processing for memory, and both can have decreased performance. Remember, there is a 3-GB virtual address limit with the 32-bit edition of Analysis Services, and it is possible that the server will have competing demands for memory in this case. If you have a requirement for users to be able to query cubes while cubes are processing, the 64-bit edition of Analysis Services may be the appropriate version for your solution.

Short Processing Window

Some data warehouses must be updated within a very short window of time. The definition of a short window depends on how much processing time is available relative to the volume of data to be processed. In some scenarios, this might mean processing all sales from all stores from the previous day in time for the next days start of business. In other cases, it may mean that quarterly data is made available to the data warehouse and that cubes must be able to process overnight and be ready the next day for browsing.

Because partitions are the unit of cube processing in Analysis Services, the logical way to implement incremental data loads is with the use of partitions. Partitions in Analysis Services allow the administrator to load a particular slice of the data and process it by itself. In the quarterly data example, partitions might be built for each month, and each of these monthly partitions will need to be processed in a potentially small window of time. If you are currently processing partitions serially, processing partitions in parallel should reduce overall processing times, thus making the cubes available to users much sooner.

The 64-bit edition of Analysis Services may be beneficial for scenarios in which parallel processing of partitions is needed to complete processing within a short window. Recall that Analysis Services allocates memory for process buffers during the aggregation phase, and all of a partitions aggregates must fit in the process buffer while the partition is processing before they are finally written to disk. If the server needs more memory to store these aggregates, the server will use temporary disk files. The use of temporary files is much slower than the use of memory. The 64-bit edition of Analysis Services may allow your application to complete aggregate processing in memory, which should decrease the overall processing time for the partitions. You can monitor temporary file usage by Analysis Services by looking at the performance monitor counters Temp File bytes written/sec and Temp File rows written/sec under the Analysis Services:Proc Aggs object.

Even if you are not processing partitions in parallel, you may find that Analysis Services 32-bit edition creates temporary files during partition processing. Moving to the 64-edition of Analysis Services may enable you to avoid the use of temporary files, and thereby decrease overall partition processing time.

Distinct Count Measures

Analysis Services allows you to define cubes with a distinct count aggregate. Distinct counts are a challenge for OLAP systems because they are non-additive in nature. If other factors are equal, cube measures based on a distinct count aggregate require much more storage than measures based on other aggregate types (SUM, COUNT, MIN, and MAX). All of a partitions aggregates must fit into memory during partition processing; therefore, partitions with distinct count measures generally require more memory for processing. If memory is not available, temporary files will be used to process the partition.

It is considered a best practice to keep distinct count measures in their own physical cubes with as few aggregates as possible. This practice saves disk space and reduces overall processing time. This is particularly important for 32-bit implementations of Analysis Services that are limited to 3 GB of virtual address space, because temporary are much more likely to be used on 32-bit installations than on 64-bit installations. To monitor temporary file usage during cube processing, you can use performance monitor counters.

Analysis Services Database Server Consolidation

Another opportunity for the SQL Server 2000 64-bit edition of Analysis Services is for customers who wish to consolidate their existing Analysis Services databases onto fewer servers. Because the 64-bit edition of Analysis Services can address much more virtual memory, the server will be able to hold more of the Analysis Services objects in memory that are required to be in memory (primarily dimensions). Customers that currently have multiple servers running Analysis Services strictly due to the amount of memory that can be used by a single instance of Analysis Services may find it possible to consolidate Analysis Services databases onto a single server.

Note: When consolidating servers running Analysis Services, it is important to remember that OLAP administrators are defined at the server level, and there is no finer level of granularity possible for the definition of this group. This means that consolidating servers might result in giving administrators access to data they may not have had read or administrative privileges for previously. Keep this possibility in mind when considering server consolidation.

64-bit Performance Tests for Analysis Services

This section includes detailed information on the performance tests that were run on the 64-bit edition of Analysis Services. To demonstrate the benefits of using the 64-bit edition of Analysis Services, a series of tests were conducted to prove some of the assertions stated in previous sections. This next section includes detailed information about the performance tests that were run on the 64-bit edition of Analysis Services. All tests were based on the schemas from the SQL Server Accelerator for Business Intelligence Sales and Marketing Analytics (SMA) application. Any modifications to the default schemas are noted. The hardware used for these tests was identical to the hardware used for the Relational Database tests discussed earlier in this white paper. In fact, we used the same server and the same storage array. The only difference between the two scenarios was that an additional 16 GB of RAM was added to the server for the Analysis Services tests, bringing the total amount of physical memory on the server to 32 GB.

Large MOLAP Dimension Support

The first series of tests are straightforward dimension processing tests to identify how many members could be stored in a MOLAP-based dimension on the provided hardware. We ran tests of 1 million, 2 million, 4 million, 6 million, 8 million, and 10 million members, and then increased by intervals of 5 million members until a practical limit was reached. All Analysis Services counters and related performance monitor counters were captured for these tests. One of the main benefits of using the 64-bit edition of Analysis Services is the ability to store large dimensions in the MOLAP storage format, which provides much faster query response than ROLAP storage formats. To verify this, we constructed a customer dimension based on the SMA schema that ships with the SQL Server Accelerator for Business Intelligence.

The goal of this exercise was to test the feasibility of building extremely large dimensions and storing them in the MOLAP storage format. To conduct these tests, we generated a relational table of 50 million customers. Then we set the source table filter property in Analysis Manager to limit the number of members to that specified for a given test. The dimension consists of three levels and a total of two member properties (Customer Acquisition Date and Zip Code).

Figure 9 depicts the dimension size in memory against how many members are included in the dimension. The total time to process the dimension increases as the number of members in the dimension increases. This is as expected, and it appears that the increase is approximately linear. Notice that, for a dimension that consists of 50 million members, the time to fully process the dimension was approximately 80 minutes.

Figure 9:

Figure 9:

Figure 10 depicts the rate of processing for the dimension in its various sizes. The rate of processing decreases as more dimension members are added, because there is a larger set of members that the server has to sort to build out the dimension.

Figure 10:

Figure 10:

Figure 11 depicts the memory that Analysis Services used to store the dimension in its process space during full dimension processing. In most production environments, dimensions are incrementally updated on a regular basis, instead of being fully processed. When dimensions are incrementally updated, a shadow copy of the dimension is stored in memory. Thus the total amount of memory that Analysis Services needs to hold the dimension in memory is twice the amount reported for these tests. These results also do not reflect any replica dimensions that might be created as a result of securing this dimension, as discussed in Appendix B. It is easy to underestimate how much memory Analysis Services devotes to dimensions, particularly when you process an Analysis Services database for the first time and don’t incur these additional memory costs.

Figure 11:

Figure 11:

The most important observation of these results is that the amount of memory required to hold dimension members in memory is a linear function of the count of a dimensions members. As a result, aside from accounting for replica dimension memory needed for dimension security, the best way to determine how much memory will be required to store a dimension of a particular size is to process a small portion of that dimension and note how much memory is consumed. Then, extrapolate how much memory would be required for the production version of the same dimension.

50 Million Member MOLAP Customer Dimension Query Tests

A 50-million-member customer dimension only has value if users can query the data associated with it. After our dimension processing tests were completed, the next step was to develop a data set that utilized this dimension.

Test Data and Query Generation

The data for the performance tests reported here came from the Sales and Marketing Analytics schema of the SQL Server Accelerator for Business, but with the following changes:

The named set Top 5 Customers was removed from the Sales cubethe reasons for this are explained in "Query Performance Tests," later in this paper.
The Enable Aggregations property has been set to Yes for all levels of the Time dimension in all cubes. For sites using the 1.0 release of Business Intelligence Accelerator, this change is strongly recommended. The change is included in version 1.1.
A virtual cube called Orders and Sales and Backlog was created. All queries were directed against this cube to minimize client memory requirements.

The SMA schema has a Time dimension on which several virtual dimensions are based (for example, Day of Week), and a number of multi-level dimensions: Sales Force, Product, Geography, Customer, and Campaigns. In addition, the schema includes several flat dimensions containing a single level: Sales Type, Order Type, Order Status, Forecast Scenario, Quota Version, Lead Quality, and Customer Status.

Seven physical cubes and six virtual cubes are created based on these dimensions. The physical cubes are Sales, Orders, Backlog, Forecast, Quota, Mktg Campaign, and Campaign Leads. The virtual cubes are Forecast and Sales, Orders and Backlog, Orders and Sales, Sales Force Performance, Mktg Campaign Analyses, Sales vs Forecast, and Quota.

One set of dimension data was used for all tests. The largest dimension was the Customer dimension, with 50 million members at the leaf level.

Data was generated for these tests and the data was distributed in a manner consistent with the majority of retail establishments. The holiday season generally accounts for the majority of sales in a given year, and not every customer makes a purchase every month. Based on these assumptions, data was generated for the Sales, Orders, and Backlog fact tables, which were used for the tests described in this paper. The following table shows the number of rows per fact table.

Table name	1.8-billion-row database
Fact_Backlog	619,119,405
Fact_Orders	645,856,465
Fact_Sales	584,570,945
Total	1,849,546,815

Great pains were taken to generate queries that span the dataset, instead of focusing on a subset of data that is quickly cached. At query time, 500 user sessions were created from 23 or 24 template queries. Each user session used a query from each template, but with random changes so that no two queries were identical. This prevented Analysis Services from answering all of the queries from its query result cache. The cache is a tremendous benefit in normal operations, but when you are collecting performance data, you must find a way to work around the cache, or you will get unrealistically fast response times.

The following two simple queries were created from the same template and appear in different user sessions. Each query selects a different measure from the Sales cube and shows the value of that measure for some month and for all industries:

SELECT { [Time].[Standard].[Quarter].[Q1 - 2000] } ON COLUMNS , 
{ [Customer].[Direct].[Industry].members } ON ROWS 
FROM [Sales and Orders and Backlog] 
WHERE ( [Measures].[Actual Discount %] ) 
SELECT { [Time].[Standard].[Quarter].[Last Year]} ON COLUMNS , 
{ [Customer].[Direct].[Industry].members } ON ROWS 
FROM [Sales and Orders and Backlog] 
WHERE ( [Measures].[Actual Avg Gross Margin] )

Although the queries are similar, they refer to different quarters and different measures. The first query refers to a specific quarter, and the second query refers to a calculation: the quarter from last year that corresponds to the current quarter. Every query in the template uses the time dimension as a source of variability. Therefore, it is possible that the query workload generated for 100 users might actually require more work on behalf of the server than a workload that is generated for 150 users on a per-average-query basis.

There were two sets of query templates. The first set contained 23 query templates and did not reference the customer level of the customer dimension. The second set added a query at the customer level of the customer dimension for a total of 24 queries. The only difference between these two query sets is that one references the customer level of the customer dimension. This allows us to measure the impact of querying against the 50-million-member customer dimension. The queries address a wide range of dimension levels and Multidimensional Expressions (MDX) functionality. They are intended to reflect real queries that users might ask against the SMA model. For additional query examples, see Appendix C.

The query template sets balance the need for a realistic query set against the ability to test both cold and warm cache scenarios.

Aggregation Design

Aggregates are precalculated summaries of data that enable Analysis Services to resolve queries quickly. For a detailed discussion of aggregates and how to effectively design them, see the Microsoft SQL Server 2000 Analysis Services Performance Guide at https://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx. Based on the recommendations in the performance guide, two aggregation sets were selected.

The first aggregation set contains zero aggregates; that is, there are no pre-computed aggregates to help the server resolve queries. For a completely cold cache test this means that every query must be resolved by looking at the MOLAP fact tables and calculating the result set.
The second aggregation set consists of aggregates created by using the Usage-Based Optimization Wizard in Analysis Services, and an additional set of specifically created aggregates that were added using the Partition Aggregation Utility. The Partition Aggregation Utility can be downloaded as part of the SQL Server Accelerator for Business Intelligence.

In addition to these two sets of aggregates, an initial attempt was made to develop a set of aggregates based on the performance gain option in Analysis Manager. Because the possible number of aggregates for this solution was so large, setting the performance gain value to such a small value as 5% resulted in over 270 aggregates being defined for a given partition. This is a quite common occurrence with large Analysis Services installations and is described in detail in the Analysis Services Performance Guide. We elected to not process such aggregates, because only 10 to 15 aggregates were needed to resolve all queries on a per-cube basis. For large Analysis Services deployments, always start with an extremely low performance gain percentage (less than 10% and possibly as low as 0% to 2%), then add aggregates using the Usage-Based Optimization Wizard.

Query Test Parameters

The purpose of building Analysis Services cubes is to allow users to query them using a client application. Client applications range from traditional Windows applications to applications that rely on Web servers to display results in a Web browser. The client for these tests was a simple C++ application that simulates a single user connecting to Analysis Services. For the purposes of these tests, each users session consisted of 23 or 24 queries, depending on whether or not the customer level of the dimension was queried. To minimize the impact of memory utilization on the client machines participating in the test (in case the customer level of the dimension was queried), the connection to the server was dropped and recreated after the query to the customer level was completed.

Tests were performed with a variable number of users ranging from a single user up to and including 500 concurrent users. For each user, an interval of 30 seconds was introduced between each of the queries. This interval represents a think-time that is, a time period in which the user would absorb the results and then determine the next query to issue against the Analysis Server. In addition to this interval, each user was brought online and into the test at the rate of one per second. For example, if 500 users were participating in the test, each user would be brought into the test one after the other, with one second separating the start of each session. For a 500 user test, 500 seconds would elapse before all 500 users were online. This period is called a ramp-up period for the test. Query results during this period are included for analysis in these results.

Cold and Warm Caches

When Analysis Services resolves user queries, the client cache is first checked for results that may satisfy the query. If the client cannot satisfy the query result with its own data cache or derive the result set from data in its cache, the client makes a request of the server to resolve the query. After the server receives the request, the server attempts to use its data cache to resolve the query, either by retrieving the exact data set from memory, or creating the required data set from data that already exists in the cache. The latter operation is called a pyramid operation and can be monitored using Analysis Services performance counters. If the query can be resolved by one of these methods, it is called a warm cache query.

For example, suppose a user requests a total for Quarter 1, 2003 and the aggregate cache contains data from all months that make up Quarter 1, 2003. In this case, the server will perform a pyramid calculation and derive the result from the results that already exist in its aggregate cache. If the server cannot resolve the query with either of these two methods, a request will be made to the file system to retrieve the appropriate data to resolve the query. This data is then placed in the servers data cache and can benefit other users queries.

If Analysis Services cannot resolve a query by retrieving or collapsing data in its aggregate cache, the server must request the data from the file system where the MOLAP aggregates or fact tables are stored. If Analysis Services must read data from disk to resolve the query, the query is considered to be a cold cache query.

Real customer implementations of Analysis Services generally do not fit perfectly into strict cold or warm cache scenarios. There are multiple reasons why this is the case.

One reason is that there are likely to be views of the cube that multiple users are interested in. A typical example of this might be national sales for the previous day. Most likely, more than one user would be interested in these results. In such a case, the first user executing the query would require the server to read data from disk if it was not already in the cache. All subsequent queries would benefit from the data being in the cache and would execute a warm cache query.

Another factor that keeps actual implementations from being strictly defined as either a cold or warm cache scenario is that Analysis Services derives result sets from other result sets that already exist in the cache. This behavior is described previously and is called a pyramid operation. Again, a common example of this is when one query asks for a quarterly break down of sales data, and a follow up query asks for yearly totals. The yearly totals would simply be derived from the quarterly totals already existing in the cache, and as such would constitute a warm cache query.

A final factor is that Analysis Services uses a capability known as sonar to retrieve data sets that are nearby the result set of the current query. This capability allows the server to pre-load the data into the data cache before the user actually makes a query for the specific data set.

All of these factors make it very unlikely for an actual customer query workload to be characterized as a 100% warm cache or 100% cold cache scenario. Instead, you will probably find that some queries are answered by the cache and that some queries are answered by relying on aggregates stored on disk. Every workload is different, which is why developing a query plan for a performance test is extremely difficult. What percentage of warm cache queries do you allow? A realistic query set, or one that is more likely to be representative of a typical customer scenario, will have both cold and warm cache queries. Rather than debating what percentage of cold and warm cache queries to allow, we decided to test the server under the strict definitions of cold and warm cache. However, when you perform query tests for your specific implementations, we recommend that you not adhere to such strict definitions. Instead, develop your query performance test plans based on actual workloads that the application is expected to handle. Doing so will give you a much better picture of how your specific implementation will perform in a production environment.

In fact, we went through a very thorough process to be able to run tests that could be characterized as either 100% cold cache or 100% warm cache. Some of the things we did to ensure this included:

Developing a query set in which no result set could be derived from any other result set. This was done to avoid pyramid operations.
Selecting, for all non-Time dimensions, only a few members per dimension per query. The purpose of this was to keep Analysis Services from proactively loading data sets associated with the siblings of dimension members that were part of an existing query into its result cache.
Developing queries that referenced lower (and thus more populated) levels of dimensions, to ensure that no two queries were identical.

Developing strict cold and warm cache scenarios was even more difficult due to the benefits associated with 64-bit Analysis Services. The larger memory space allows query results to live in the server cache longer before they are cleaned out, so it is quite possible that after a result set enters the cache it can reside in cache until some processing option is executed that would invalidate the data. Some of our test results below confirm this.

Results with No Aggregates Defined

Figure 12 represents the average response time for queries in which no aggregates were defined for any of the cubes participating in the query test.

Figure 12:

Figure 12:

As expected, the system illustrated above performed poorly with a cold cache. Only 50 concurrent users could be supported, at an average response time of over 3 minutes per query. When Analysis Services could rely on its aggregate cache to resolve queries, the average response time was less than 1 second. In fact, the warm cache results in the chart are not even visiblethey can only be read by reading the table provided in the chart.

The primary reason for performing query tests with zero aggregates is to populate the Analysis Server query log. By population the query log, you can perform usage-based optimization to design appropriate aggregates based on actual usage of the cubes.

Results with Usage-Based Optimization

After the tests with no aggregates concluded, usage-based optimization was performed on a single cube partition on a per-cube basis. The aggregates generated from this process were then copied to each monthly partition in every cube, and each cube was then processed using the reaggregate option. This processing option avoids rereading the data from the relational fact tables, and instead data is read directly from the already existing MOLAP fact tables. The following figure represents average query response time for users in which usage-based optimization was used to define the aggregates in each partition.

Figure 13 shows the average response time (in seconds) with usage-based optimization and with no customer-level query.

Figure 13:

Figure 13:

With usage-based aggregates defined, Analysis Services is capable of supporting many more concurrent users and with much better average response times. It is likely that typical use of Analysis Services will result in average query times somewhere between cold and warm cache average response times. Again, notice that warm cache response times were less than 1 second.

Measuring the Impact of Querying a Large Dimension

One of the main benefits of using the 64-bit edition of Analysis Services is to be able to build dimensions with many millions of members, and to be able to store them in the MOLAP storage format, thereby benefiting from the query performance benefits associated with MOLAP cubes. For the purposes of our testing, we created a dimension with 50 million customers at the leaf level. Previously, such solutions required the use of ROLAP cubes and could not take advantage of faster MOLAP storage.

To measure the impact of querying the customer dimension, a single query was added to each user session in which querying the customer level was required. The users previous existing queries were not modified in any way. Instead, at a random point in the session, a query was added to query the customer level. This increased the number of queries per user session to a total of 24. The results below depict the average response time for those sessions that include the customer query.

Figure 14 shows the average response time (in seconds) with usage-based optimization and with a customer-level query included in the query set.

Figure 14:

Figure 14:

These results are very similar to the results for when the customer query is not included in the set.

Figure 15 highlights the impact of the customer-level query on the response times. In many cases, adding the customer-level query actually improved average response times.

Figure 15:

Figure 15:

Figure 16 below represents the average query response time for just the customer-level query. To perform this test, we added a customer-level query to each user session at a random point in the test. The results below show the average query response time for the single customer-level query that was inserted into each users session. For example, for 300 concurrent users the average is computed across 300 customer-level queries. These queries ran among a set that consisted of a total of 30 * 240 queries, or 7200 total queries. There is a significant increase in response times, because client memory consumption increased from 120 MB per user to as much as 1 GB per user as customer members were downloaded to the client.

Figure 16:

Figure 16:

Simultaneous Query and Processing Test

It should be possible with the 64-bit edition of Analysis Services to keep the aggregate cache from being depleted during cube partition processing. There are several scenarios, including:

Customers who need to query one set of cubes while processing another set of cubes.
Installations that need to maintain the aggregate cache during partition processing.

To demonstrate this effect, we added a second Analysis Services database to our existing Analysis Services server. The second database contained a 1-million-member customer dimension based on the same schema that we used for all our previous tests. The source database for this additional Analysis Database resided on a separate 32-bit instance of SQL Server 2000. Thus the only impact on the 64-bit server from processing a partition in the second database is isolated to Analysis Services.

With this additional Analysis Database, a partition was created with one large aggregate defined. This aggregate was based on the leaf level of every dimension; therefore, it represents an exact duplicate of the MOLAP fact table. In fact, when we processed this partition, the resulting aggregation file was exactly the same size as the MOLAP fact table. Prior to processing the partition, the Analysis Server service was using approximately 13.1 GB of memory on the server. This memory represents the amount of memory required to start the Analysis Server service, and is equal to the amount of memory required to load all of the dimension members and their properties across all databases on the server.

For the purposes of these tests, a script was created to reprocess the single partition for the entire duration of the query tests. The process buffer was set to 1.5 GB to keep Analysis Services from using temporary files. To process the partition, Analysis Services required an additional 1.2 GB of memory during the read phase, for a total of 14.3 GB. On a 32-bit system, this additional memory could result in the server having to clean the aggregate cache, and thereby increase the average response time for users. During the aggregation phase for this partition, the server again required an additional 1.2 GB of memory, for a total of 14.3 GB. The 1.2 GB of additional memory is well under the 1.5 GB process buffer size.

While this partition was processing repeatedly, the same query sets consisting of 23 queries that were used to test query performance (excluding the customer level) were run against the Analysis database with the 50-million-member customer dimension. The results of these tests are shown in Figure 17 below.

Figure 17:

Figure 17:

The average response time for cold cache queries increased from the earlier case when the query test ran alone. This should be expected, because for a cold cache run the server must conduct file I/O to return the results to the user. These file operations must compete with the file operations needed to process the partition.

The results from the majority of warm cache test runs, however, prove the hypothesis that the aggregate cache is not cleared while users simultaneously query the system and the server processes a partition. It is not until the system is queried with 300 concurrent users that there is any noticeable impact for warm cache users. Based on Analysis Services performance counters, approximately 3% of the aggregate cache entries were evicted while 250 concurrent users were querying the system. The average response time for the warm cache queries indicate that the aggregate cache was not significantly depleted due to the additional 1.2 GB of memory required to process the single partition. In fact, the server utilized just over a total of 16 GB of virtual memory during the 500-concurrent-user test. This is an additional 1.7 GB of memory above the 14.3 GB of peak memory required to process the partition without any users querying the system. This additional 1.7 GB of memory indicates that the aggregate cache has not been cleared.

Figure 18 below represents a collection of data from the performance monitor, based on back-to-back runs with 200 concurrent users while a partition was processing. The first query test is a cold cache run, and the test following it is a warm cache run.

This data illustrates some interesting points:

The aggregate cache builds up during the cold cache run and does not get depleted during the warm cache run. This should be expected.
Disk queue lengths during the cold cache test are responsible for the differences in the average response times between tests in which a partition is not processing while users are querying cube. For best performance, you are advised to use disk systems with as many spindles as possible, and/or to use one server to process cubes and another server to query cubes.

Figure 18:

Figure 18:

Distinct Count Tests

The goal of this series of tests was to determine if cubes that contain distinct count measures can be processed more quickly using the 64-bit edition of Analysis Services. Distinct count aggregates tend to be much larger than their non-distinct counterparts. Processing distinct count cubes often requires the creation of temporary files, which often incurs a noticeable performance hit.

Due to the nature of the distinct count problem, Analysis Services cubes that utilize the distinct count aggregate type are larger in size than other cubes. The reason for this is that the item that is being counted distinctly is effectively added as a level to every aggregate that exists in the aggregation scheme.

Lets take a look at a simple example in which a cube is dimensioned by store, product, and time. If there is no distinct count aggregate in the cube, a typical aggregate might contain the following records.

Month	Product item	Store	Sales units
January	Product A	Store 1	100
January	Product B	Store 1	200
January	Product C	Store 1	50

The same aggregate with a distinct count measure added will be much larger. Instead of containing just a few records, this aggregate will now contain a record for each customer that purchased one of these products.

Month	Product item	Store	Customer key	Sales units
January	Product A	Store 1	1	25
January	Product A	Store 1	2	75
January	Product B	Store 1	1	50
January	Product B	Store 1	3	25
January	Product B	Store 1	4	125
January	Product C	Store 1	1	10
January	Product C	Store 1	2	40

Remember that Analysis Services needs to be able to keep all aggregates in memory during processing. The addition of these records to support the distinct count measure will require more memory for the process buffer. If the memory required to store all aggregates in memory exceeds the amount of memory available in the process buffer, temporary files will be created and overall processing time will be significantly impacted.

The 64-bit edition of Analysis Services can help address implementations that are using distinct count aggregates, because much larger process buffers can be supported. To demonstrate this, the following test was conducted. We used the same cubes previously used in our query tests, with the following changes:

The customer dimension was removed from the cube of interest. We did this because it is considered a best practice to build distinct count measures into their own cube, which can be later combined into a virtual cube. The customer dimension is not needed to count distinct customers.
A set of aggregates based on 1% performance gain was designed for the partition. This resulted in 38 aggregates being defined for the partition.

With these changes made, the December 2002 partition was processed using the reaggregate option in two instances. For the first instance, a distinct count aggregate was added to the partition based on the customer key. For the second instance, the distinct count aggregate was removed and replaced by a count aggregate. The reaggregate processing option was used to remove the impact of querying the relational data that is done to build the MOLAP fact tables.

The following table depicts the file sizes of the partition after the completion of processing and the peak amount of memory used by Analysis Services to process the partition.

Partition	File size for aggregates	Peak memory usage
Without distinct count	5,167 KB	122 MB
With distinct count	1,991,697 KB	8,929 MB

For our testing purposes, the addition of the distinct count aggregate to the partition required approximately 385 times more storage space.

We can conclude from this test that cubes that utilize the distinct count are larger than cubes that do not contain a distinct count. The increase in size is dependent on how many distinct members exist for the distinct count aggregation, assuming all other things are equal.

Notice in the above tests that Analysis Services required only 122 MB of memory to process the partition without a distinct count, and required almost 9 GB of memory to process the distinct count partition. On a 32-bit system, this additional amount of memory required for the process buffer would cause the temporary files to be used during processing. To test the impact of temporary files, the distinct count partition was processed with two different process buffer sizes. The table below depicts the processing times based on the process buffer size.

Process buffer size	Partition	Total processing time
500 MB	Distinct count	1:15:30
1500 MB	Distinct count	30:00

The table above indicates that due to the larger process buffer size, Analysis Services was able to process the distinct count partition in under half the amount of time required to process the partition when the process buffer was set to 500 MB. A 500-MB processing buffer would be considered fairly large for a 32-bit installation of Analysis Services. The extra amount of time required to process the partition when the process buffer was set to 500 MB was due to the creation of temporary files during processing. Since Analysis Services could not fit all 38 aggregates in memory during processing, temporary files were created. When the process buffer was set to 1.5 GB, no temporary files were created.

Conclusion

SQL Server 2000 Relational Database business intelligence customers will find that 64-bit chipset offers the most benefit to systems that need large direct-memory addressability particularly customers who perform similar operations against the same datasets on a repeated basis. Additionally, customers may find that 64-bit chipset yields improvements on systems that require improved disk I/O throughput; systems that perform intense, complex computational calculations; and systems that conduct significant numbers of parallel operations. Microsoft provides a variety of performance monitoring tools, techniques and resources to help determine whether your system is a candidate for 64-bit. Many of these can be found on the following Web sites:

The Relational Database testing conducted for this white paper verified that 64-bit offers significant advantages to SQL Server 2000 business intelligence customers in the following areas:

Direct addressability of very large memory benefits all database operations, not just a handful (as is the case with AWE memory support in 32-bit).
The ability to perform database operations in memory allows for faster completion of tasks, and helps minimize disk I/O, a major contributor to performance degradation in most systems.
After table data has been preloaded into buffer cache by some query operations, subsequent queries against the same data in cache complete in a fraction of the time.
When disk I/O can’t be avoided (such as when initial table reads must be performed), the enhanced throughput 64-bit helps to speed such operations along.
Parallel database operations can take advantage of the enhanced parallelism capabilities offered by 64-bit.

In addition to the benefits associated with the Relational Engine, Analysis Services implementations can benefit greatly from the additional addressable memory available from the 64-bit platform. In particular, the 64-bit edition of Analysis Services can support the following:

Creation of very large MOLAP based dimensions that can support tens of millions of members.
Applications that heavily utilize dimension security.
Caching more data in memory, and keeping the data in cache for potentially long periods of time. Queries resolved via the cache will return results significantly faster than queries that must rely on disk resources.
Processing partitions without the use of temporary files. Temporary files are required for partition processing when Analysis Services has exhausted the memory available to it. For installations in which temporary files are currently being utilized during partition processing, the 64-bit edition of Analysis Services may significantly decrease processing time. Implementations in which partitions are processed in parallel and/or partitions utilize the distinct count aggregate are also likely to benefit from this capability.
Consolidation of Analysis Services databases. Customers that have separated Analysis Services databases due to memory limitations may be able to consolidate these databases onto fewer 64-bit servers. Before consolidating servers, make sure that all Analysis Services databases that would reside on the same server have identical administrative rights associated with them.

For more information, see https://www.microsoft.com/sql/.

Appendix A: SQL Server 2000 Relational Engine Memory Management

The default configuration for SQL Server allows the Relational Database server to dynamically manage its own memory. In this default configuration, SQL Server acquires or releases memory according to application requirements. The ability of SQL Server to self-manage memory resources simplifies SQL Server management. However, static configuration options exist and are required in some situations. One of those situations is when SQL Server is configured to use Address Windowing Extensions (AWE); this scenario is discussed below.

Relational Engine Memory Architecture

Each instance of Microsoft SQL Server has an address space with two main components, each of which has several subcomponents. The two main components are the executable code and memory pool. The executable code is essentially comprised of the executable files and dynamic link libraries (DLLs) used by an instance of SQL Server. This executable code is basically that program code needed to load and run an instance of SQL Server. The amount of memory and size of the code varies slightly depending on the size of executable files, DLLs used by Open Data Services, the SQL Server engine, server Net-Libraries, extended stored procedure use, and so on. However, memory used for executable code is relatively static when compared to memory taken up by the memory pool.

The memory pool is the area we primarily want to focus on in understanding how 64-bit most affects SQL Server. The memory pool is the main unit of memory for an instance of SQL Server. The majority of structures which would use (and therefore benefit from) large amounts of memory are allocated in the SQL Server memory pool. The main types of objects managed in the memory pool are:

System-level data structures Global data for the instance, such as database descriptors and the lock table.
Buffer cache The pool of buffer pages into which data pages are read.
Procedure cache The pool of pages containing the execution plans for all Transact-SQL statements currently executing in the instance.
Log caches Buffer pages used to read and write log pages. Each logs cache is managed separately from the buffer cache to reduce the synchronization between log and data buffers.
Connection context Data structures that record the current state of each connection. The connection context holds items such as parameter values for queries and stored procedures, cursor positioning information, and tables currently being referenced.

Graphically, the memory architecture for SQL Server looks as follows:

Figure A: 1

Figure A: 1

The regions within the memory pool area change size dynamically according to demands being made on the database server by applications and users. Within the memory pool, certain areas such as system data structures and connection context are sized as a result of user actions and connection activity. As new databases are defined, SQL Server allocates data structures to hold the attributes representing each of the databases. As new connections are made, SQL Server allocates data structures to store the context of each connection. As tables and views are referenced, SQL Server allocates data structures describing their structure and attributes. That leaves the buffer cache, procedure cache, and log caches as memory units for which the size can be varied dynamically. SQL Server adjusts the relative sizes of these areas to optimize performance.

Balancing Memory and I/O

The SQL Server memory pool works in concert with the amount of virtual memory that is allocated on the machine. Virtual memory is a method the Windows operating system uses to extend the amount of actual physical memory available on a computer and is a traditional workaround for physical memory limitations. Virtual memory achieves the appearance of a much larger physical memory space by creating a pagefile on disk and by storing information that won’t fit in physical memory to the pagefile. The operating system uses complex algorithms used to determine which pages get written to the pagefile. In brief, the algorithms maintain recently referenced memory pages in physical memory and moves less recently referenced pages to the pagefile. The actual process is called swapping or paging memory, and occurs only when the operating system determines actual physical memory is running low. If a page of memory gets swapped out, but is later needed or referenced, the operating system swaps the referenced memory back into physical memory from the pagefile. The total amount of memory available to SQL Server is the sum of physical memory plus the size of the pagefile configured for use by the Windows operating system.

In a perfect world with unlimited memory resources, SQL Server would be configured with enough memory so that the entire database could be cached in memory. If it were possible and cost effective to cache the entire database, SQL Server would perform the absolute minimum of physical disk I/O. Unfortunately, data warehousing databases are typically far larger than the amount of physical memory that could reasonably be allocated to SQL Server.

A primary design goal of all database servers is to minimize the amount of physical disk I/O which must be performed. Disk I/O is one of the most costly, resource-intensive operations a database server has to perform. Electro-mechanical data transfer operations occur an order of magnitude slower than electronic in-memory data transfer operations. SQL Server minimizes physical disk I/O by managing the buffer cache which holds data and index pages in memory when possible. Like the operating system, SQL Server employs a sophisticated set of memory management algorithms to determine which pages should stay in the buffer pool and which should be swapped back to disk. The common wisdom is that the larger the buffer cache, the less physical disk I/O SQL Server needs to perform.

Database administrators need to understand the basic relationship between SQL Server memory management and Windows operating system virtual memory management. If the SQL Server memory requirements exceed the available physical memory, the operating system may begin swapping memory to the pagefile in an attempt to make more physical memory available for SQL Server. When this occurs, all that has happened is that physical I/O which would normally be managed by the database gets traded for physical I/O being written to the virtual memory pagefile by the Windows operating system. The database administrator must balance the needs of SQL Server in conjunction with the needs of the operating system and to monitor both in order to achieve optimal performance.

Support for AWE

With SQL Server 2000 Enterprise Edition, Microsoft introduced support for Address Windowing Extensions (AWE) in order to allow the Relational Database to address larger amounts of physical memory than 32-bit architecture would normally allow. By employing AWE, SQL Server 32-bit instances can be configured to address about 8 GB of memory when running on Windows 2000 Advanced Server, and about 64 GB of memory when running on Windows 2000 Data Center. Note that only the Relational Database server in SQL Server 2000 uses AWE. Other SQL Server components like Analysis Services and Data Transformation Services do not support AWE.

AWE is a set of extensions to the memory management functions of the Microsoft Win32 API. AWE enables SQL Server instances to address more physical memory than is normally available to any single application running on the 32-bit versions of the Windows operating system. Most applications are limited to 2 or 3 GB of memory space using standard 32-bit memory addressing. The actual addressable size limitation for 32-bit is theoretically 232 bytesthat is, 4,294,967,296 bytes (4 GB). However, it is generally recommended that 1-2 GB of memory be made available for the operating system to avoid virtual memory pagefile swapping. As a result, 2-3 GB of direct addressable memory is typically left over for SQL Server purposes, depending on how AWE is configured.

Using AWE, SQL Server is able to indirectly access extended physical memory as nonpaged memory: The extended memory is not a candidate for the Windows operating system to page in and out of virtual memory. AWE works by dynamically mapping views of this nonpaged extended memory to and from the direct addressable 32-bit memory space allocated SQL Server as needed. Leveraging AWE, SQL Server is able to map memory between buffer cache and extended memory, thereby avoiding the costly physical disk I/O that is normally required to manage the memory pool. Some, but not all, operations under the control of SQL Server benefit by accessing a significantly larger buffer cache than is addressable using normal 32-bit addressing. Limitations of AWE include:

Limited benefit to specific database operations The benefits derived from AWE are limited to operations in SQL Server that can take direct advantage of accessing database pages that can be stored in an expanded buffer cache. Specifically, AWE is valuable for building indexes on large tables, or for simple queries that can be satisfied by reading database pages from the buffer cache. Memory intensive operations that could benefit from a larger stored procedure cache, log cache, or connection context do not benefit from AWE. Such operations are common and are discussed in more detail in the main body of the paper.
Loss of dynamic memory management Any instance of SQL Server configured to use AWE must statically allocate memory. This limitation is frustrating for long-time SQL Server administrators who are accustomed to dynamic memory management. Configuration decisions are even more complex for administrators who run multiple servers on the same computer.
Memory management overhead A certain amount of overhead (memory, processor time, and so on) is used for managing AWE extended memory. Although the overhead introduced is small, the direct memory addressing using 64-bit does not incur this cost.

Appendix B: SQL Server 2000 Analysis Services Memory Management

Dimension Memory

Analysis Services loads all dimension members and dimension member properties for all Analysis databases on the server into memory when the service is started, regardless of whether the dimensions are shared or private. Each dimension member requires approximately 125 bytes of memory, plus the memory required to store its member properties. For example, a 1-million-member customer dimension with a 25 character address member property would consume roughly 175 MB of virtual memory (125 MB for the members and 50 MB for the member properties).

Dimension Security and Replica Dimensions

Dimension security is a feature of Analysis Services that enables an administrator to limit access to dimension members based on a definition of a role and the users credentials. For example, a Salesperson dimension may contain the following levels: (All), Region, State, City, Store, and Salesperson. Corporate policy might dictate that each salesperson can see their individual sales results, but cannot access sales information about other salespeople. Using dimension security is one option that can be used to ensure that each salesperson sees only their data.

In implementing dimension security, Analysis Services creates a replica dimension for each unique combination of roles. Replica dimensions are created and loaded into memory when a user connects to the Analysis Server. These dimensions remain in memory until the service is restarted or until the dimension is processed (either incrementally or fully). A replica dimension consists of all permitted members plus their siblings, plus all their ascendants and all the siblings of their ascendants.

Shadow Dimensions

When Analysis Services processes a dimension, provided that an existing version of the dimension already exists, a shadow copy of the dimension will be created for the duration of the processing transaction. The shadow dimension ensures that a valid version of the dimension can be produced in the event that the processing of the dimension fails, and to allow users to continue to query the impacted cubes during the incremental update process.

Connections

Connections are made to Analysis Services for the purposes of returning results to users. Each connection uses approximately 32 KB of memory.

Process Buffer and Aggregation Cache

The remaining memory available to the server after all dimensions (shadows and replicas) have been loaded into memory, and all connections have been accounted for, is set aside for the process buffer and aggregation cache.

The process buffer is responsible for storing aggregates as they are computed during a partitions processing phase. After the cubes fact data is written to disk, Analysis Services builds the aggregates that are specified for the partition. These are the aggregates that are defined when you use Analysis Manager and opt to select either performance-based or usage-based optimization. If you’d like to know exactly which aggregates are defined for a partition (and its always a good idea to have this information available), you can either write a Decision Support Objects (DSO) script or use available tools such as the Partition Manager utility that ships with the SQL 2000 Resource Kit. For more information about aggregates, see the Microsoft SQL Server 2000 Analysis Services Performance Guide at https://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ansvcspg.mspx.

During cube partition processing, all aggregates for a partition must fit into memory at once. For example, suppose a partition in your cube contains 100 aggregates. Because Analysis Services makes a single pass over the data, all of the aggregates must fit into memory. Analysis Services does not finish one aggregate, write it to disk, and then process the next aggregate. Instead, all aggregates are processed in memory and are written to disk together.

If there is not enough memory available in the cache to process the partition in memory, the server will make use of temporary files in order to finish processing. Temporary files can have a significant impact on overall processing time.

The remaining available memory is primarily devoted to the aggregate cache. The purpose of the aggregate cache is to help resolve queries as quickly possible by keeping some aggregate records in memory. The aggregate records that are kept in memory are those that have been previously requested. These aggregates remain in memory until the process of memory cleaning is initiated. At that point, either all, some, or none of the aggregates will be removed from memory, depending on whether or not the cleaner thread indicates that cleaning is actually required and they are removed based on the reverse order in which they were last accessed.

Dimension Memory Diagram

Figure B1 depicts the memory usage of Analysis Services as discussed in this Appendix.

Figure B: 1

Figure B: 1

Reclaiming Memory

Previously, weve discussed which components of the Analysis Server architecture use memory. The next item to address is how memory is reclaimed by Analysis Services. Based on the activity performed by the server, the server is able to free memory and then use it for other purposes. Based on the Background Interval registry setting, a cleaner thread is dispatched every n seconds (where n is the value in the registry) to determine if Analysis Services needs to clean memory, and to what extent. To make this determination, the server uses the following algorithm.

     0.5 * (Memory Conservation Threshold + Minimum Allocated 
Memory)

If the memory being consumed by the server is less than this value, no memory cleaning is performed. However, if the cleaner thread determines the amount of memory consumed is greater than the value from the algorithm but less than the value set for the Memory Conservation Threshold, the server will go into a mild cleaning phase. The most extreme cleaning occurs though when the cleaner determines that the amount of memory being consumed is greater than the Memory Conservation Threshold. If this is the case, the server will commence a phase called aggressive cleaning to bring down the amount of virtual memory in use.

Some items must always remain in memory. For example, shared dimensions and replica dimensions must remain in memory while the service is running. The server can never reclaim the virtual memory that is being used to support this functionality. In addition, while a dimension is processing, the server cannot clean the memory that is needed to store the dimensions shadow version. After the dimension has finished processing, the server can reclaim the memory consumed by the shadow. Connections also require memory that cannot be released until the connection is closed.

After the server reclaims memory, what does it do with it? Depending on the current demands placed on the system, there are many activities that the newly acquired virtual memory could support. The primary reason the cleaning process occurs is to help keep the aggregate cache as relevant as possible for the users who are querying the system, thus improving query performance. The server might also have to juggle other requests as well. For example, suppose some partitions need to be processed. This process requires the use of process buffers, and therefore the cleaner thread may need to free some memory from the aggregation cache. The cleaner threads responsibility is only to clean memory; other threads are responsible for consuming the memory after it is made available.

Appendix C: Query Examples

The following are examples of the queries that were generated for the query tests conducted against Analysis Services for this white paper.

Query # 1

----------- 
SELECT { [Measures].[Actual List Sales Amt],[Measures].[Actual Net 
Sales Amt] } 
ON COLUMNS , 
{ 
[Geography].[Standard].[City].&[16861],[Geography].[Standard].[City].&[ 
16862] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Quarter].&[1/23/1900] )

Query #2

----------- 
SELECT { [Measures].[Actual List Sales Amt],[Measures].[Actual Net 
Sales Amt] } ON COLUMNS , 
{DESCENDANTS({[Geography].[Standard].[City].&[16861],[Geography]. 
[Standard].[City].&[16862]},[Geography].[Standard].[Postal Code]) } ON 
ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Quarter].&[1/23/1900] )

Query #3

----------- 
SELECT { [Measures].[Actual List Sales Amt] } ON COLUMNS , 
NON EMPTY CROSSJOIN({[Sales Force].[Standard].[Sales 
Rep].&[5114],[Sales Force].[Standard].[Sales 
Rep].&[5119]},{[Time].[Standard].[Month].&[11/1/2002],[Time].[Standard] 
.[Month].&[11/1/2002].lag(1),[Time].[Standard].[Month].&[11/1/2002]. 
lag(2)})  ON ROW 
 FROM [Orders and Sales and Backlog]

Query #4

----------- 
SELECT { [Measures].[Actual List Sales Amt] } ON COLUMNS , 
NON EMPTY CROSSJOIN({TOPSUM({[Day of 
Week].[Standard].members},3,[Measures].[Actual List Sales 
Amt])},{[Sales Force].[Standard].[Sales Rep].&[5114],[Sales 
Force].[Standard].[Sales Rep].&[5119]}) ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[11/1/2002] )

Query #5

----------- 
SELECT { [Measures].[Order List Sales Amt],[Measures].[Order Net Sales 
Amt] } on COLUMNS, 
{ [Geography].[Marketing].[Postal Code].&[19667] } on ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[11/1/1999] )

Query #6

----------- 
SELECT { [Measures].[Order List Sales Amt],[Measures].[Order Net Sales 
Amt] } on COLUMNS, 
NON EMPTY CROSSJOIN({[Geography].[Marketing].[Postal 
Code].&[19667]},{[Product].[Standard].[Product Line].members})  on ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[11/1/1999] )

Query #7

----------- 
SELECT { [Measures].[Order Count],[Measures].[Order Discount Amt] } ON 
COLUMNS , { [Product].[Standard].[Product 
Item].&[478],[Product].[Standard].[Product Item].&[480] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[6/1/2001] )

Query #8

----------- 
SELECT { [Measures].[Order Count],[Measures].[Order Discount Amt] } ON 
COLUMNS ,  NON EMPTY CROSSJOIN({ [Product].[Standard].[Product 
Item].&[478],[Product].[Standard].[Product Item].&[480] },{[Day of 
Week].[Standard].members})  ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[6/1/2001] )

Query #9

----------- 
SELECT { [Measures].[Backlog List Sales Amt] } ON COLUMNS , 
 { [Sales Force].[Standard].[Sales Rep].&[2216],[Sales 
Force].[Standard].[Sales Rep].&[2218]} ON ROWS FROM 
[Orders and Sales and Backlog] WHERE ( 
[Time].[Standard].[Quarter].&[1/25/1900] )

Query #10

------------- 
SELECT { [Measures].[Backlog List Sales Amt] } ON COLUMNS , 
NON EMPTY CROSSJOIN({TOPSUM({[Day of 
Week].[Standard].members},3,[Measures].[Backlog List Sales 
Amt])},{[Sales Force].[Standard].[Sales Rep].&[2216],[Sales 
Force].[Standard].[Sales Rep].&[2218]}) ON ROWS 
 FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Quarter].&[1/25/1900] )

Query #11

------------- 
SELECT { [Measures].[Backlog List Sales Amt],[Measures].[Backlog Net 
Sales Amt] } ON COLUMNS , 
{ [Product].[Standard].[Product 
Item].&[586],[Product].[Standard].[Product Item].&[587] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Quarter].&[7/1/2002] )

Query #12

------------- 
SELECT { [Measures].[Backlog List Sales Amt],[Measures].[Backlog Net 
Sales Amt] } ON COLUMNS , 
NON EMPTY CROSSJOIN({[Product].[Standard].[Product 
Item].&[586],[Product].[Standard].[Product Item].&[587]},{[Day of 
Week].[Standard].members})  ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Quarter].&[7/1/2002] )

Query #13

------------- 
SELECT { [Measures].[Order Product Cost] } ON COLUMNS , 
{ [Sales Force].[Standard].[Sales Rep].&[6236],[Sales 
Force].[Standard].[Sales Rep].&[6237] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[1/24/1900] )

Query #14

------------- 
SELECT { [Measures].[Actual Discount Amt],[Measures].[Actual Invoice 
Count] } ON COLUMNS , 
 { [Product].[Standard].[Product 
Item].&[980],[Product].[Standard].[Product Item].&[1102] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[5/1/2000] )

Query #15

------------- 
SELECT { [Measures].[Actual Discount Amt],[Measures].[Actual Invoice 
Count] } ON COLUMNS , 
NON EMPTY CROSSJOIN({[Product].[Standard].[Product 
Item].&[980],[Product].[Standard].[Product Item].&[1102]},{[Day of 
Week].[Standard].members})  ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[5/1/2000] )

Query #16

------------- 
SELECT { [Measures].[Backlog List Sales Amt],[Measures].[Backlog Net 
Sales Amt] } ON COLUMNS , 
{ 
[Geography].[Standard].[City].&[12814],[Geography].[Standard].[City]. 
&[12816] } ON ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[11/1/2000] )

Query #17

------------- 
SELECT { [Measures].[Backlog List Sales Amt],[Measures].[Backlog Net 
Sales Amt] } ON COLUMNS , 
{DESCENDANTS({[Geography].[Standard].[City].&[12814],[Geography]. 
[Standard].[City].&[12816]},[Geography].[Standard].[Postal Code]) } ON 
ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Month].&[11/1/2000] )

Query #18

------------- 
SELECT { [Measures].[Backlog Discount Amt],[Measures].[Backlog List 
Sales Amt] } on COLUMNS, 
 { [Geography].[Marketing].[Postal Code].&[31680] } on ROWS 
 FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Year].&[1/22/1900] )

Query #19

------------- 
SELECT { [Measures].[Backlog Discount Amt],[Measures].[Backlog List 
Sales Amt] } on COLUMNS, 
NON EMPTY CROSSJOIN({[Geography].[Marketing].[Postal 
Code].&[31680]},{[Product].[Standard].[Product Line].members})  on ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Year].&[1/22/1900] )

Query #20

------------- 
SELECT { [Measures].[Actual Discount Amt],[Measures].[Actual Invoice 
Count] } on COLUMNS, 
NON EMPTY CROSSJOIN({[Geography].[Marketing].[Postal 
Code].&[34236]},{[Time].[Standard].[Month].&[1/3/1900],[Time]. 
[Standard].[Month].&[1/3/1900].lag(1),[Time].[Standard].[Month]. 
&[1/3/1900].lag(2)})  on ROWS 
FROM [Orders and Sales and Backlog]

Query #21

------------- 
SELECT { [Measures].[Actual Discount Amt],[Measures].[Actual Invoice 
Count] } on COLUMNS, 
 NON EMPTY CROSSJOIN({[Geography].[Marketing].[Postal 
Code].&[34236]},{[Product].[Standard].[Product Line].members})  on ROWS 
FROM [Orders and Sales and Backlog] WHERE ( 
[Time].[Standard].[Month].&[1/3/1900] )

Query #22

------------- 
SELECT { [Measures].[Order Discount Amt],[Measures].[Order List Sales 
Amt] } ON COLUMNS , 
NON EMPTY CROSSJOIN 
({[Geography].[Standard].[City].&[8351],[Geography].[Standard].[City].& 
[8352]},{[Time].[Standard].[Year].&[1/24/1900],[Time].[Standard].[Year] 
.&[1/24/1900].lag(1),[Time].[Standard].[Year].&[1/24/1900].lag(2)})  ON 
ROWS 
FROM [Orders and Sales and Backlog]

Query #23

------------- 
SELECT { [Measures].[Order Discount Amt],[Measures].[Order List Sales 
Amt] } ON COLUMNS , 
{DESCENDANTS({[Geography].[Standard].[City].&[8351],[Geography]. 
[Standard].[City].&[8352]},[Geography].[Standard].[Postal Code]) } ON 
ROWS 
FROM [Orders and Sales and Backlog] 
WHERE ( [Time].[Standard].[Year].&[1/24/1900] )