Using the Teradata Database as a Source for a Microsoft SQL Server 2000 Analysis Services Application

Updated : July 22, 2003

Joy Mundy

Microsoft Corporation

June 2003

Summary This paper describes the best practices for building an analytical application leveraging both SQL Server 2000 Analysis Services and the Teradata Database for a Hybrid OLAP (HOLAP), Multidimensional OLAP (MOLAP), or Relational OLAP (ROLAP) solution. This paper is targeted to systems integrators or database developers. Readers are expected to have a basic understanding of the features of both database systems.

On This Page

Introduction
Getting started
Design Considerations
Relational design tips
Designing for low latency OLAP
Improving MOLAP processing performance
Summary of Design Considerations
Summary
Appendix A: Resources

Introduction

Microsoft® Corporation and Teradata®, a Division of NCR®, jointly recognize that mutual customers may want to combine both companies' products when building analytic applications. The Analysis Services analytical database includes OLAP cubes and possibly data mining models and can use data from the Teradata Database in a HOLAP, MOLAP, or ROLAP fashion. As detailed below, most Analysis Services features are supported using the Teradata Database.

This paper assumes that the Teradata Database source exists and that user query, analysis, and data latency requirements have been defined.

Before you begin, it is recommended that you be familiar with the following:

  • Dimensional modeling concepts including facts and dimensions, surrogate keys, and slowly changing dimensions.

  • Analysis Services fundamentals. As a minimum introduction, work through the Analysis Services 2000 tutorial that is available from the start screen of the Analysis Manager.

  • Teradata Database fundamentals, including Teradata physical design techniques, Teradata Database open interfaces such as the OLEDB Provider for Teradata or the ODBC Driver for Teradata, and database features like the Aggregate Join index.

Analysis Services features supported by Microsoft and Teradata

Almost every feature of Analysis Services is supported using the Teradata Database. The following list includes features which are supported using the Teradata Database:

  • Alternative storage modes ROLAP, HOLAP, and MOLAP

  • Relational storage of dimensions

  • User-defined partitions

  • Parent-child dimensions

  • Linked cubes

  • Drillthrough from the cube to the Teradata Database

  • Analysis Services actions

  • Data mining

  • Calculated members

  • Calculated cells

  • Custom rollups

The Microsoft SQL Server Books Online provides an introduction to these features. In particular, the topic Features Supported by the Editions of SQL Server 2000 specifies which of these features is supported in the different editions of SQL Server 2000. Many of the features are available only in the Enterprise Edition and Developer Edition of SQL Server 2000. Notably, user-defined partitions are not available in the Standard Edition. Partitioning is key to successfully building and deploying a large scale Analysis Services database with the Teradata Database, which is discussed later in this whitepaper. It is highly recommended that you use the Enterprise Edition.

Analysis Services features not supported by Microsoft and Teradata

The complete list of Analysis Services features that are not supported by the Microsoft and Teradata are:

  • ROLAP aggregations: An Analysis Services cube partition that uses ROLAP partition storage with the Teradata Database, must be defined with zero aggregations. This issue is discussed in the section on "Design Considerations" later.

  • Automated real-time OLAP: An Analysis Services real-time OLAP system built on a Teradata Database, must manage latency notifications to the analysis server. This issue is discussed later in the section on "Design Considerations".

  • Cell writeback: An Analysis Services analytical application that requires cell writeback cannot be developed directly on a Teradata Database. The cell writeback feature is used most often by budgeting and "what-if" applications. It is recommended that a customer who wants to build or install an application with cell writeback populate a dependent data mart in Microsoft SQL Server. The dependent data mart would be the immediate source for the Analysis Services analytical application.

  • Dimension writeback: An Analysis Services analytical application that requires dimension writeback cannot be developed directly on a Teradata Database. The dimension writeback feature is used most often by financial applications. As with cell writeback, a customer who needs this feature may build a dependent data mart in Microsoft SQL Server, which serves as the immediate source for the Analysis Services application.

Getting started

Installation and configuration

The recommended production configuration is to install Analysis Services on a separate server from the Teradata Database. The analysis server requires the following software:

Operating System

As defined in the Books Online topic Operating Systems Supported by the Editions of SQL Server 2000.

Analysis Services

Microsoft SQL Server 2000 Analysis Services (Service Pack 3) or higher. Editions include:

  • Standard Edition

  • Enterprise Edition contains a superset of features from the Standard Edition

  • Developer Edition is feature-equivalent to Enterprise Edition

Many customers use Developer Edition on their development servers. If you are using Standard Edition in the production environment, you need to remain aware of the features available in Developer Edition are not available in Standard Edition. These are detailed in the Books Online topic Features Supported by the Editions of SQL Server 2000.

Connectivity

  • OLEDB Provider for Teradata, or

  • ODBC Driver for Teradata combined with the Microsoft OLEDB Provider for ODBC (installed with Microsoft SQL Server). See the Microsoft Knowledgebase article KB822215 for information about the current versions of Teradata Database connectivity software.

Analysis Services configuration

There are several Analysis Services settings that get changed, as soon as the product is installed.

  • For improved performance, reliability, and manageability, migrate the Analysis Services metadata repository to a dedicated SQL Server database. On installation, the metadata repository is stored in a Microsoft Access database. You can migrate the repository at any time, however you are encouraged to do so immediately. When migrating the metadata repository, choose Native format, do not choose Metadata Services format. It is recommended that your system management procedures back up the metadata repository whenever cube data or metadata change. Isolating the Analysis Services metadata repository in its own SQL Server database simplifies the design of the backup schedule.

  • The process buffer size is set to 32 MB by default. Most applications that perform MOLAP or HOLAP processing benefit from a much larger process buffer size. Please read the section "How Analysis Services Uses and Manages Memory" in the Analysis Services Performance Guide for details.

Test connectivity

To test connectivity first verify that Analysis Services is installed properly on the development machine. Launch Analysis Manager and browse the definition of the Foodmart 2000 database. Reprocess a dimension or cube. If you have never used Analysis Services before, it is suggested that you work through the tutorial that is available on the "Getting Started" screen of Analysis Manager.

Next, verify connectivity to the Teradata Database. In Analysis Manager, create a new data source that points to the Teradata Database. On the first screen of the data source creation wizard, choose the OLE DB Provider for Teradata, as pictured below:

The following illustration shows the Provider screen of the data source creation wizard.

Cc966536.tdas2k01(en-us,TechNet.10).gif

On the Connection tab of the Data Link Properties dialog, enter the information necessary to connect to the Teradata Database. The example below illustrates a connection to a server named Teradata1. You can also connect to a specific IP address. Note that, the "Allow saving password" checkbox must be checked.

The following illustration shows the Connection screen of the data source creation wizard.

Cc966536.tdas2k02(en-us,TechNet.10).gif

Use the "Test Connection" button to test connectivity to the Teradata Database.

You can set several parameters of the OLE DB Provider for Teradata, by editing Extended Properties in the "All" tab of the Data Link Properties dialog. The parameters that will most likely be of interest are:

  • Max response size

  • Use X Views

  • Default database

  • Session mode

For additional configuration options and troubleshooting techniques, refer to the OLE DB Provider for Teradata Installation and Users Guide.

When you use the Analysis Manager Dimension Wizard and Cube Wizard, you choose the database from which to source the dimension or cube.

In order to ensure that connectivity is working properly, create and process a small dimension. Use a very small table in a familiar Teradata Database as the source table. If you can create and process a dimension, the data source was defined correctly.

Design Considerations

The logical design of an OLAP cube that is sourced from the Teradata Database will be the same as that for a cube from any other source, with the exception of the unsupported features described above.

Most of the existing information about best practices for designing and building Analysis Services databases has been written assuming that data is sourced from the SQL Server relational database. Most of these recommendations hold true for data sourced from a Teradata Database, with the following exceptions:

  • Database-specific features such as index types. In particular, Teradata Database features such as Aggregate Join indexes may offer substantial performance improvements for relational querying of aggregate data.

  • Considerations of bandwidth between the Teradata Database and the Analysis Services database. It is important to have good network connectivity and an optimally configured OLE DB provider transporting data between the two systems. This is especially important during partition processing when the Analysis Services partitions are stored in MOLAP mode, as discussed later.

  • Although the principles are the same whether data are sourced from SQL Server or the Teradata Database, these different characteristics can lead to different assessment of the costs and benefits of alternative approaches to the physical design.

  • At this point, it will be useful to review some basic features of Analysis Services cube structure, processing, storage modes, and querying. This information is important in evaluating alternative designs.

Analysis Services cubes and dimensions

An Analysis Services database contains dimensions and cubes. Dimensions are analogous to, and usually populated from, relational dimension tables. Cubes are analogous to and populated from relational fact tables. Most dimensions are shared between multiple cubes.

Cubes consist of two kinds of fact data: leaf level and aggregate. Leaf level facts are the most detailed fact data that are contained in the cube. Often, the grain of the cube is the same as the grain of the fact table from which the cube is sourced. In other words, the cube contains as many leaf level rows as the relational source table. Other times, the grain of the cube is higher than the grain of the source fact table: the bottom-most level of the cube is in effect an aggregate of the source fact table.

Aggregates are optional, although creating useful aggregates is the single most important way to improve cube query performance.

Cube fact data and aggregate data created by Analysis Services are stored in one or more partitions. If a cube has multiple partitions, it is usually partitioned along a dimension that is often used for slicing in queries. Time is the most common partitioning dimension. As discussed in the Analysis Services Performance Guide, partitioning is valuable for improving both query and processing performance, as well as cube manageability. A partitioned cube can:

  • Process multiple partitions in parallel.

  • Have a different aggregate design for each partition.

  • Have a different storage mode for each partition.

  • Improve query performance through partition elimination from the query.

Partitioned cubes are strongly recommended. This feature requires the Enterprise Edition of SQL Server 2000.

Analysis Services processing

If you were to fully process an Analysis Services database, the first step is to process the dimensions. In almost all cases, the dimension data are pulled from the relational database into the Analysis Services engine, validated, and written to the analysis server in the Analysis Services dimension format. The exception to this case, relational storage of dimensions, is discussed briefly later in this whitepaper. The next step is to fully process all the cubes in the database, also known as processing each partition in each cube. Cube partition processing itself consists of two steps: leaf fact processing, and aggregate processing.

Full processing is relatively unusual. After the initial load of historic data, the periodic (usually daily, weekly, or monthly) processing is incremental. Incremental processing consists of the same steps as full processing, but only dimension members and fact data that have been added to the relational store are processed.

Appendix A contains many references to detailed discussions of Analysis Services processing.

Analysis Services querying

An Analysis Services query is generated by a client tool such as Microsoft Excel® Pivot Tables. This query, in the Multidimensional Expression (MDX) language, is passed to a client or middle-tier component called PTS. PTS contains a data cache and resolves the query from that cache if possible. If the PTS cache does not contain the requested data, it makes a request of the analysis server.

The analysis server also has a cache that is shared across all users. If the query cannot be resolved from server cache, the server requests data from permanent storage. The server fetches data from the highest level aggregates that can be used to resolve the query; aggregates the resulting cellset up to the requested level; and computes all calculations such as calculated members.

Calculated members are defined in MDX, and can be very simple: MeasureA minus MeasureB. But, calculated members can also be very complex MDX expressions that implicitly request a broad set of leaf data. When designing the cube and evaluating query performance, it's important to remember that a seemingly simple calculated member such as "Average Sales of Top 10 Customers" might require a substantial volume of leaf data in order to evaluate. In this example, a dynamic evaluation of the top customers might be an expensive operation. This evaluation always takes place on the analysis server, even if the leaf data are stored in the relational database in ROLAP or HOLAP storage modes.

Analysis Services storage modes

Analysis Services supports cube partition data storage in three modes:

MOLAP

Leaf data and all aggregates are stored in the Analysis Services format on the analysis server. During processing, leaf data are pulled from the relational source and stored on the analysis server in the Analysis Services format. Aggregates are computed and stored in the Analysis Services format.

MOLAP storage is the most common Analysis Services storage mode, and provides the best performance at query time.

When there is poor bandwidth between the Teradata Database and Analysis Services database, MOLAP partition full processing speed is throttled by the OLE DB provider. Care must be taken to design the cube management system so that cube full processing is of sufficiently high performance.

Most MOLAP cubes require 20-30 percent of the storage of the corresponding source data as stored in the relational database (data only, no indexes). Of this storage, approximately half is required by the leaf MOLAP data, and half by the aggregations. Cubes that contain Distinct Count measures have somewhat larger aggregations.

ROLAP

Detailed data and aggregates (if any) are stored in the relational database. As stated earlier, in the Teradata Database / Analysis Services configuration, ROLAP aggregates created by Analysis Services are not supported: ROLAP storage in this case means ROLAP with zero aggregates as defined in the Storage Design Wizard. To ensure ROLAP aggregates are set to zero, run the Storage Design Wizard, and on the Set Aggregation Options pane set performance gain to 0 percent. Aggregates used in this scenario would need to be created through the Teradata Aggregate Join index feature. See the Teradata Database and Client User Documentation for information on implementing Aggregate Join indexes.

Usually, dimensions are stored in the Analysis Services format on the analysis server, which provides an extremely fast dimension browsing user experience. At query time, the Analysis Services engine resolves data requests by identifying which cube cells are requested by the client application, and then issuing one or more queries against the relational database. Any calculations such as calculated members, calculated cells, or custom rollups are performed on the data, and the result cellset is returned to the client application.

If all cube partitions are stored in ROLAP mode, cube dimensions may also be stored in the Teradata Database. This architecture is intellectually appealing and is a supported configuration. However, this configuration is not common.

One of the great appeals of the ROLAP storage mode is low latency data delivery, often called real-time OLAP. This topic is discussed in greater detail later, in the section titled "Designing for low latency OLAP."

HOLAP

Detailed data are stored in the Teradata Database, and aggregates are stored in MOLAP format. During processing, detailed data are pulled from the Teradata Database but not stored; aggregates are computed and stored in the Analysis Services format.

Processing times for HOLAP partitions are nearly the same as for MOLAP partitions with the same aggregate design. Although MOLAP partitions require approximately twice the storage on the analysis server as HOLAP partitions, HOLAP partitions are seldom recommended. If you can design the cube processing application for adequate processing performance, you should reap the benefit of MOLAP storage of the leaf data, at the relatively minor cost of disk space.

Regardless of storage mode, the cube behaves the same in all respects except possibly performance. At query time, most computation occurs in the Analysis Services database engine. If relational data are required to resolve a query on a partition using ROLAP or HOLAP, relatively simple queries are submitted from the Analysis Services engine to the Teradata Database. A MOLAP cube does not access the Teradata Database except during cube partition processing.

Relational design tips

Most of the existing documentation and best practices for building Analysis Services cubes recommends that the relational source be structured dimensionally. A dimensional relational model has two kinds of table structures:

  • Dimensions that have been flattened into the classic single table star dimension structure, fully normalized as a snowflake dimension, or a hybrid structure somewhere between. An Analysis Services dimension is built from a dimension table structure. It is also possible for a dimension to be built from a fact table, if the source data model is highly denormalized.

  • Facts for a partition of a cube must be stored in one table or view, and that table must contain data of a single granularity – in other words, do not mix data at a "Daily" and "Monthly" grain together within a single fact table. The fact table contains foreign key references to the dimension tables. A cube is built from a fact table and one or more dimensions.

Memory requirements

As described in detail in many resources, notably the Analysis Services Performance Guide, the analysis server must contain enough memory to hold all dimension members. During dimension processing, the server must contain memory to hold the shadow copy of the dimension being processed. The Performance Guide provides information on how to calculate memory requirements, but this is a fundamental characteristic of Analysis Services cubes that strongly influences cube design.

If you have more dimension members and member properties than will fit in memory, you may choose one of three options:

  • Modify the cube design to eliminate the very large dimension.

  • Relational storage of dimensions (discussed earlier)

  • SQL Server 2000 Analysis Services (64-bit) supports very large dimensions, but requires a 64-bit OLEDB driver to connect to the relational database. At the time of this writing, a 64-bit OLEDB Provider for Teradata is not available.

Eliminating the very large dimension sounds like a compromise, but it is often the best solution. Business users seldom want to see individual customer names during analysis. It is often the case that defining more aggregate dimensions, and providing a capability to view detailed data through Drillthrough or Actions, is a better means for meeting business user requirements.

Relational schema requirements

The data model that underlies an Analysis Services database must conform to a simple dimensional structure: a star schema, snowflake schema, or a single flat table. These models can be mixed. For example, within the same Analysis Services database, some dimensions can be sourced from a classic star dimension table, others from a fully normalized set of "snowflake" tables, and still others teased apart from the fact table's data. Analysis Services does not require integer surrogate keys, between the fact and dimension tables, although their use is recommended. For large datasets, and especially for large dimensions, use the smallest possible datatype for keys.

When a dimension is processed, Analysis Services verifies referential integrity between dimension levels, ensuring that all child dimension members have valid parents. Similarly, when a cube is processed, Analysis Services verifies referential integrity between facts and all dimensions. No fact is permitted within the cube if it does not have corresponding members in all dimensions.

Use views as cube source

A general best practice is to source the Analysis Services database from relational views rather than directly from tables. This level of indirection provides a layer of insulation between the relational and Analysis databases. In the simplest case, define the views as "select * from PhysicalTable".

If the relational source data are not already in a dimensional structure, it may possible to implement a logical dimensional database by defining views that simplify the relational structure. A common example would be to denormalize a Order/Detail table structure into a single Fact_OrdersLineItems view.

It is recommended that you go through the process of developing the business requirements of the analytical users, and designing a logical dimensional model. This logical dimensional model maps very closely to the Analysis Services dimensions and cubes that you build. Your decision to restate a normalized data warehouse in order to physically instantiate the dimensional structure in the Teradata Database depends on:

  • Dimension change management requirements. Many business users need to track the history of changes in dimension structure or attributes. It would be very difficult to provide this functionality, known as "Type 2 changing dimensions," without physically creating a dimensional relational model.

  • Relational query performance, the cube population query for MOLAP partitions, and business user queries for ROLAP partitions.

Minimize dimension key size

It is a best practice for a large Analysis Services database to use a small data type for the dimension ID. Recall that all dimensions must fit in memory – a small key minimizes dimension size, reducing memory requirements and improving performance. For MOLAP cubes, the leaf fact multidimensional structure uses the dimension ID as its key. By using small keys, you minimize disk storage requirements of the ROLAP or MOLAP fact structure and improve performance.

A general dimensional warehouse best practice is to use integer surrogate keys for dimensions. Most of the effort of populating and maintaining a dimensional data warehouse surround issues of evaluating when a dimension member has "changed," creating a new dimension member and surrogate key, and propagating that surrogate key into the fact tables. Teradata data warehouses often are not dimensional, and so surrogate keys may not be available. If a surrogate key is available, specify it as the dimension ID in the definition of the Analysis Services dimension.

Optimize the schema for MOLAP storage

Many Analysis Services practices and whitepapers discuss the importance of "optimizing the schema." This process defines the cube in such a way that the system generates simplified relational queries for cube processing and ROLAP queries. In most systems this step provides a substantial improvement in MOLAP processing. Cubes built on a Teradata Database may find that using Teradata Join or Aggregate Join indexes make this step less important. Nonetheless, the cost of dropping several dimension tables from the relational queries is often zero, and this step is recommended for cubes that use MOLAP storage. Detailed instructions are provided in the Books Online topic Optimizing Cube Schemas.

If Teradata Aggregate Join indexes are deployed, do not optimize the schema in this way. The Teradata Database query optimizer prefers the query syntax generated by the default, non-optimized, cube definition.

Parent-child dimensions and ROLAP storage

Analysis Services supports standard dimensions, which have a fixed and known number of named levels, and parent-child dimensions, which are less structured. Parent-child dimensions are not recommended if any of your partitions are stored in the relational database.

Indexing the Teradata Database

Some Analysis Services cubes are built at the same level of granularity as the source Teradata Database; other cubes are built at a higher level of granularity. For example, the relational data may contain transactions that are time stamped during the day, but the cube might not include a time-of-day dimension. In this case, building an Aggregate Join index at the same level of granularity as the cube, significantly improves performance for user queries against a ROLAP store.

If the cube uses ROLAP storage, significant performance improvements may result from adding one or more Aggregate Join indexes.

The best recommendations for which Aggregate Join indexes to build will come from an analysis of users' queries. This can be accomplished through the Database Query Logging feature (in versions of the Teradata Database V2R5 or later), or the Access Logging feature (in earlier versions of the Teradata Database).

The addition of an Aggregate Join index to a relational schema has a potential impact on the overall maintenance strategy. Aggregate Join indexes potentially impose a cost on the data management infrastructure. To optimize user query performance, the choices are ROLAP with Aggregate Join indexes, or MOLAP storage mode. The HOLAP storage mode is supported, but it is seldom chosen over MOLAP.

For the Teradata Database optimizer to do its job well, it is recommended that statistics be collected on all tables after the initial load. It is also recommended that statistics be recollected periodically, if subsequent data changes modify the table data demographics significantly.

Virtual cubes

An Analysis Services OLAP database typically contains several physical cubes, which are often combined into one or more virtual cubes. To a user querying the Analysis Services database, a virtual cube looks and behaves no differently than a physical cube. Virtual cubes are a very powerful analysis design construct used by experienced designers to:

  • Provide multiple user experiences to different user groups.

  • Simplify a complex physical structure.

  • Enable users to construct queries that combine data from multiple fact tables.

Designing for low latency OLAP

Relational storage of one or more cube partitions enables low latency or "real time" OLAP. An OLAP application with low latency means that the analytical data are available to business users very soon after a transaction occurs. For the Analysis Services application to have low latency, upstream processes must populate the Teradata data warehouse with low latency.

Alternative low latency designs

Very low latency can be delivered if the cube includes at least one ROLAP partition. Many customers store most of the cube in MOLAP partitions, and have a single ROLAP partition, with zero aggregates, to hold low latency data. For example, the ROLAP partition could hold data for "today" or "this week." In this configuration, a nightly or weekly process could convert the ROLAP partition into MOLAP. All the cube's partitions can be stored as ROLAP partitions if desired.

There are more options, and challenges, for managing dimension changes. In the most common configuration, dimensions are stored in the multidimensional store. New transactions for existing dimension members (for example, existing customers) are available for analysis with low latency. Periodically (nightly or more frequently), dimension incremental processing makes the new dimension members available for analysis.

Customers who need low latency access to dimension changes must use relational storage of dimensions, which in turn requires relational storage for all cube partitions. It is recommended that you use relational dimension storage only for dynamic dimensions, and use the normal multidimensional storage for static dimensions.

Other technical materials describe a popular technique for low latency, where the cube is incrementally processed on an automated schedule, such as every five to 15 minutes.

Flushing the cache

Recall that Analysis Services will attempt to use local and server caches to resolve queries. In a low latency application, the caches introduce a risk of query result inconsistency. Imagine that two queries are executed simultaneously, one at an aggregate level and the second at a detailed level. In this example, imagine that the aggregate query is answered from the local or server cache, but the detailed query must be resolved from the RDBMS. Any new transactions will be picked up by the detailed query, and an aggregation of the detailed data may be inconsistent with the aggregate query resolved from cache. There is no way to turn off caching; the caches must be flushed on a frequent schedule.

As discussed above in the section on "Analysis Services Unsupported Features," automated real-time OLAP is not a supported feature for the Teradata Database. This feature is unique to cubes built from the SQL Server 2000 relational database, and relies on a communication between the relational and analysis servers to flush the cache. It is not difficult to flush the cache; a VBscript CacheFlusher.vbs is available as Appendix C in the Analysis Services Performance Guide. CacheFlusher.vbs must run on the analysis server. Flushing the server cache automatically sends a message to client-side caches to clear their contents. CacheFlusher.vbs could be scheduled on the analysis server to run every N seconds. Alternatively, you could write an application that monitors the Teradata data warehouse, and triggers the VBscript when rows have been inserted or updated.

Query performance always degrades in a low latency environment, because the cache effectively becomes useless.

Improving MOLAP processing performance

The Analysis Services Performance Guide provides many suggestions for how to improve MOLAP cube processing performance. In general, all those recommendations hold true for cubes built from Teradata databases.

The most notable difference for Teradata MOLAP implementations is the importance of parallel processing, and the recommended degree of parallelism. As discussed earlier in this paper, the bottleneck in MOLAP partition processing is most likely to occur in the communication between the Teradata Database and Analysis Services. When this communication channel is constrained, it makes sense to process more partitions in parallel than is generally recommended in other Analysis Services technical documents.

You should always test parallel processing in your own environment to find the best performance. The usual case is that Analysis Services fact and aggregate processing is the gating factor, and most technical documents recommend that you devote two CPU to each partition. In the case of constrained bandwidth, you can process more partitions in parallel, with one or more partitions per CPU leading to improved performance.

In a simple lab test on a 4x700 MHz processor server, parallel processing performance for 10 partitions were tested, each with 160,000 rows (1.6 M rows total), 30 MOLAP aggregates, and sourced from a two-node Teradata Database. In this application, on this analysis server, processing 4 partitions in parallel was the best choice, providing performance 3 times better than serial processing. The fastest processing rate obtained on the system was 300,000 rows / minute (including reading, writing leaf facts, and computing and writing aggregates). The system was constrained by the communication channel.

MOLAP processing performance is of greatest concern for the occasional full refresh of the cube. A parallel processing rate of 300,000 rows / minute (including reading, writing leaf facts, and computing and writing aggregates) is an acceptable daily or weekly incremental processing rate for many Business Intelligence (BI) applications. The pre-update image of the cube is available for querying during partition processing.

MOLAP storage provides the most reliable query performance. Analysis Services MOLAP cubes built on a Teradata Database should process partitions in parallel. For more information see the Analysis Services Performance Guide, and the Parallel Processing Utility available from the SQL Server 2000 Resource Kit.

Summary of Design Considerations

The greatest technical challenge in implementing an Analysis Services database on top of a Teradata Database is deciding how and where to store the data. The three most useful options are summarized here.

ROLAP, no Analysis Services aggregates, using Aggregate Join indexes: This configuration provides excellent query performance on most user queries, and is most appropriate for low latency applications, The existence of Aggregate Join indexes on a table requires that the data loading scenario be carefully designed.

ROLAP, no Analysis Services aggregates, no Aggregate Join indexes: This configuration provides good query performance on most user queries, and is most appropriate for very low latency applications.

MOLAP: The MOLAP configuration provides the most consistent high performance storage mode for user queries. However, in situations with constrained communication bandwidth, the time required to process a MOLAP cube may be unacceptable. This is particularly true if the cube requires full reprocessing after it has been brought into production. MOLAP storage is recommended when 1) cube incremental processing times meet system requirements, 2) resource partitioning scenarios need to be accommodated. A dimensional change that invalidates the cube structure, in a system with constrained communication bandwidth, requires time-consuming reprocessing of MOLAP partitions. The same dimensional change also triggers reprocessing of ROLAP partitions, but because these partitions in the Teradata Database are implemented with zero aggregations the processing occurs quickly. In depth discussion of the kinds of dimensional changes that trigger reprocessing is provided in the Analysis Services Performance Guide.

The HOLAP configuration, which keeps detailed data in the Teradata Database and aggregations in the Analysis Services multidimensional structure, is seldom preferred to MOLAP.

Summary

Analysis Services and the Teradata Database are very complementary for creating world-class analytic environments. The products provide unprecedented flexibility in how the solution is implemented so that with a proper understanding of user requirements, implementers can build the best solution in the most appropriate way.

Appendix A: Resources

Microsoft SQL Server 2000 Books Online

Microsoft SQL Server 2000 Resource Kit

Analysis Services Performance Guide whitepaper

Microsoft Developer Network Library

The Teradata site

Teradata Database and Client User Documentation, available at https://www.info.ncr.com.