Export (0) Print
Expand All

Accelerating Microsoft adCenter with Microsoft SQL Server 2008 Analysis Services

How Microsoft adCenter takes advantage of groundbreaking performance and scalability enhancements and innovative design tools available with Microsoft SQL Server 2008 Analysis Services

Technical White Paper

Published: August 2009

Download

Download Technical White Paper, 1.23 MB, Microsoft Word file

Situation

Solution

Benefits

Products & Technologies

With data volumes exceeding two terabytes per month and increasing numbers of data mart users, adCenter experienced performance issues with its existing SQL Server Analysis Services 2005 infrastructure. Query responsiveness and throughput did not meet SLAs during peak usage. It proved difficult to increase scalability beyond established levels.

SQL Server Analysis Services 2008 provides adCenter with next-generation OLAP technology that offers increased performance and scalability on existing hardware, without infrastructure changes.

  • Faster MDX query execution and higher code quality for analytics solutions.
  • Better cube designs through Analysis Management Objects (AMO) warnings in Business Intelligence Development Studio (BIDS) 2008.
  • Optimized cube processing that doubles performance.
  • Increased return on investment on server hardware by enabling more users per server.
  • Increased infrastructure flexibility to scale out query servers.
  • New maintenance and monitoring tools for faster root cause discovery.
  • Readiness for future relational data-warehousing technologies, such as SQL Server code-named Project "Madison."
  • SQL Server 2008 Enterprise Edition
  • SQL Server 2008 Analysis Services
  • SQL Server 2008 Integration Services
  • Business Intelligence Development Studio
  • Storage Area Networks
  • Direct Attached Storage

Contents

Ee410017.arrow_px_down(en-us,TechNet.10).gif Executive Summary

Ee410017.arrow_px_down(en-us,TechNet.10).gif Introduction

Ee410017.arrow_px_down(en-us,TechNet.10).gif Upgrade Goals and Benefits

Ee410017.arrow_px_down(en-us,TechNet.10).gif OLAP Infrastructure Design

Ee410017.arrow_px_down(en-us,TechNet.10).gif Test Lab Integration

Ee410017.arrow_px_down(en-us,TechNet.10).gif Cube Processing Optimization

Ee410017.arrow_px_down(en-us,TechNet.10).gif Query Processing Optimization

Ee410017.arrow_px_down(en-us,TechNet.10).gif Conclusion

Ee410017.arrow_px_down(en-us,TechNet.10).gif For More Information

Executive Summary

As an Internet platform for online advertising campaigns on Microsoft® Bing™, Microsoft adCenter offers customers a wide range of intelligence and analytics tools to target, reach, and engage the right audience with the right ad at the right time. A complex, multi-faceted Online Analytical Processing (OLAP) environment based on SQL Server® 2008 Analysis Services drives the adCenter intelligence solutions. This large-scale environment includes numerous data marts that consolidate information from various sources through SQL Server 2008 Integration Services. The adCenter data marts, based on SQL Server 2008 Enterprise Edition, maintain up to two terabytes of data in relational databases, equivalent to one month of Web activities at a current accumulation rate of approximately 60 to 70 gigabytes per day. The OLAP cubes built on top of these relational stores hold 13 months of data and are up to three terabytes in size.

Prior to SQL Server 2008, adCenter relied on Microsoft SQL Server 2005 as the technological foundation of its intelligence and analytics infrastructure. Among other benefits, SQL Server 2005 enabled adCenter to take full advantage of the 64-bit platform, the Unified Dimensional Model (UDM), and Multidimensional Expressions (MDX). However, it was difficult to sustain processing and query execution performance levels during peak usage as scalability became a limiting factor. At 500 users per cube, adCenter began to notice performance issues, challenging a continually growing user base. By upgrading to SQL Server 2008, adCenter was able to break through existing limitations without the need to replace server hardware or redesign infrastructure. The performance and scalability improvements are vital for keeping adCenter agile, competitive, and ready for further growth.

In comparison to its predecessor, SQL Server 2008 offers better hardware utilization, optimized cube processing, and substantially faster MDX query execution, and it promotes efficient UDM designs and code quality in analytics solutions. These are key factors that enable adCenter to increase its return on investment (ROI) in server hardware by allowing more users per server, while at the same time improving user experience and service level agreements (SLAs). For the most critical data marts, adCenter SLAs now state that 80 percent of the queries must return results in 3 seconds or less. Moreover, adCenter gained new options to track service level compliance and establish baselines for Analysis Services capacity planning. By exploiting the new monitoring capabilities in addition to Analysis Services traces, such as Management Data Warehouse (MDW) and new performance counters for Analysis Services-related system resources, adCenter can locate root causes of performance issues faster than with traces alone. This ultimately increases the productivity of OLAP developers when troubleshooting performance issues and optimizing analytics solutions.

The upgrade to SQL Server 2008 also allows adCenter to prepare for future OLAP infrastructure optimizations and relational data-warehousing technologies, such as SQL Server code-named Project "Madison." The adCenter Data Mart team has developed concrete plans to assess a load-balanced scale-out data warehouse appliance for the largest data marts and data warehouses. adCenter also plans to increase the size of the relational data stores to a massive volume of more than 150 terabytes as soon as Madison becomes available. This will allow adCenter to maintain between five and six months of Web activity data online, assuming an increase in data collection rates of one terabyte per day. For adCenter, the upgrade to SQL Server 2008 represents an important step toward the future.

The purpose of this technical white paper is to share Microsoft knowledge, experiences, and recommendations related to the architecture and design of SQL Server 2008 Analysis Services for high performance and scalability in a large-scale OLAP environment. This paper is not intended to serve as a procedural guide. Although many organizations have similar requirements, each enterprise environment also has unique needs, making it necessary to adapt the information discussed in this paper.

This white paper assumes that readers are IT professionals and technical decision makers, already familiar with Windows Server®, Active Directory®, and SQL Server. Specifically, knowledge about SQL Server 2008 Analysis Services, Integration Services, and Business Intelligence Development Studio is helpful. Detailed product information is available in the SQL Server 2008 TechCenter at http://technet.microsoft.com/en-us/sqlserver/default.aspx.

Note: For security reasons, the sample names of forests, domains, servers, databases, and other internal resources used in this paper do not represent real resource names used within Microsoft and are for illustration purposes only.

Introduction

Like most large-scale enterprise environments, Microsoft adCenter relies on dedicated teams of experts that specialize in designing, implementing, and operating its OLAP infrastructure and solutions. There is an OLAP Development Team responsible for all enterprise OLAP-based data marts and key analytics services within adCenter; the Engineering Team creates business data models and analytics solutions; and the Operations Team deploys and runs the data services, server hardware, and storage systems. While preparing to upgrade to Analysis Services 2008, the adCenter teams established close relationships with other Microsoft teams, such as the SQL Server Customer Advisory Team (SQLCAT) and the SQL Performance Team, to solve complicated design and performance issues. The adCenter upgrade was a significant undertaking for all participants. It enabled the adCenter teams to reach higher service levels in the OLAP environment. The SQLCAT team was able to develop best practices and document lessons learned as part of its Project REAL | Practices effort (http://www.projectrealpractices.com). SQL Performance quantified the benefits of upgrading to SQL Server 2008 and drove high product quality through extended performance research.

The initial project preparation began in early 2008 with an assessment of the existing OLAP infrastructure to justify the upgrade plans from a business perspective, determine improvement opportunities, and define project goals. The assessment focused on general product features and improvement potential without conducting specific performance tests.

Performance research commenced with the release of SQL Server 2008 to manufacturing on August 6, 2008. Throughout the third and most of the fourth quarter of 2008, the OLAP Development Team, SQLCAT, and the SQL Performance Team collaborated to conduct performance tests and optimizations based on authentic adCenter data in various storage area network (SAN)-based and direct attached storage (DAS)-based lab configurations. As part of this effort, the SQL Performance Team devised a solution to mask the adCenter data, replacing actual customer information with fictitious names and values while maintaining the data's high level of complexity and depth to the schema. The masked adCenter data, unlike synthetic data, provides a very relevant basis for meaningful large-scale performance tests.

The OLAP Development Team performed the final adCenter upgrade on November 30, 2008. It was a straightforward switchover because Analysis Services 2008 is fully backward compatible with existing analytics solutions and because the OLAP Development Team opted to postpone advanced infrastructure optimizations until after the successful completion of the upgrade project. From the beginning, the Analysis Services 2008-based environment delivered stable performance levels that corroborated the research results determined during the preparation phase.

The successful completion of the adCenter upgrade also marks an important milestone in the relationship between the OLAP Development Team, SQLCAT, and the SQL Performance Team. The teams continue their collaboration to maintain adCenter's performance targets ahead of hardware upgrades and infrastructure optimizations, and to share their findings and experiences with the customer community as part of Project REAL | Practices. Project REAL | Practices is a cooperative effort between Microsoft and business intelligence (BI) partners to develop best practices for creating BI applications based on SQL Server 2008. The Project REAL | Practices workload is based on the adCenter Application BI set of Analysis Services cubes, data warehouses, and extract, transform, and load (ETL) pipelines.

"Upgrading to SQL Server 2008 Analysis Services is like taking the driver's seat in a race car. Out of the box, we noticed a 27.5 percent performance increase during cube processing, and with optimizations, we were able to gain another 31.2 percent, cutting the time required for cube processing effectively in half. Our cubes build 26 hours faster and most of our queries now return with results in less than 3 seconds, some 200 times faster than before. It's exhilarating."

Bilal Obeidat

Senior Development Lead

Microsoft Corporation

Upgrade Goals and Benefits

According to Bilal Obeidat, Lead of the OLAP Development Team, optimizing processing and query execution performance is a continuous effort at adCenter to ensure cube usability and a positive user experience overall. Achieving high performance requires addressing a combination of factors including MDX query efficiency, aggregation scheme effectiveness, appropriate cube design and partitioning, and high performance and scalability in the underlying Analysis Services platform. In this effort, the OLAP Development Team tunes and optimizes its analytics solutions prior to production deployment, and traces Analysis Services activities in the production environment on a daily basis. The team keeps track of discovered issues and the corresponding solution steps in an issue tracking and support system and communicates findings related to the underlying Analysis Services platform back to the SQL Server developer units.

By reviewing the existing OLAP infrastructure, including patterns of past efficiency and performance issues, the OLAP Development Team determined the following key improvement opportunities and requirements for the upgrade to Analysis Services 2008:

  • Seamless transition to ensure customer satisfaction adCenter customers work with a diverse portfolio of OLAP cube client solutions based on Microsoft Excel® 2007, SQL Server 2008 Reporting Services, and Microsoft Office SharePoint® Server 2007. It is vital that adCenter remains fully compatible with these existing solutions while delivering service improvements in terms of cube processing and query execution performance.
  • More users per server With 500 users per cube, adCenter reached the scalability ceiling in the existing OLAP environment. By exploiting the performance gains achievable with an upgrade to Analysis Services 2008, adCenter can increase the number of users per server for better hardware utilization, which provides adCenter with new options to maximize ROI.
  • Optimized cube processing With adCenter cube sizes between 1.5 and 3 terabytes, full cube processing easily exceeds 50 hours with SQL Server 2005 (54 hours for an adCenter cube of 1.5 terabytes on a lab server running Analysis Services 2005). Taking into consideration that adCenter plans to increase capacities to 150 terabytes and that future accumulation rates might exceed one terabyte of data per day, it is clear that adCenter requires much faster cube loads.
  • Faster query execution Analysis Services 2008 provides tremendous query performance improvements in the vast majority of the MDX functions and the formula engine through improvements in subspace computation, which directly benefits quality of service and user friendliness in analytics solutions. Subspace computation improvements open new tuning and optimization possibilities for adCenter to reduce query execution times from seconds to milliseconds.
  • Better code quality During tuning and optimization, adCenter repeatedly noticed certain avoidable performance-related issues, such as suboptimal use of attribute relationships and hierarchies. Considering that manual optimization of analytics solutions is time-consuming and tedious (especially when dealing with recurring issues), adCenter welcomed Business Intelligence Development Studio (BIDS) 2008 features that help BI developers adhere to solution designs that follow best practices and performance guidelines, such as Analysis Management Objects (AMO) warnings. The upgrade to Analysis Services 2008 enables adCenter to standardize BI development for all new solution projects on BIDS 2008.
  • Enhanced monitoring capabilities In the past, adCenter primarily relied on server-side traces created in SQL Server Profiler to track Analysis Services activities and locate performance issues, but server-side traces do not necessarily reveal performance issues that correlate well with the overall consumption of underlying hardware and operating system resources. By using new Analysis Services 2008 performance counters, adCenter can extend system monitoring activities to locate and eliminate performance bottlenecks faster and with greater precision, such as by tracking idle and busy threads, current client connections and clients waiting for a lock, total number of queries answered, data rows processed per second during cube processing, and so forth.
  • Improved Analysis Services database backups Although Analysis Services 2005 eliminated the 2-gigabyte limitation for database backups, adCenter found that it was difficult to backup large databases because the backup time increased exponentially with the database size. Backup times now increase only linearly because of an optimized storage subsystem in Analysis Services 2008. In view of future growth plans, the backup improvements are critical for adCenter.

OLAP Infrastructure Design

"The performance tuning techniques are practically identical between Analysis Services 2005 and Analysis Services 2008. Use 64-bit hardware, provide sufficient memory and processor resources, and design the storage subsystem for high I/O performance and capacity. Engineers don't have to learn new tricks to optimize server and storage performance."

Bong Kang

Infrastructure Engineering Lead

Microsoft Corporation

The adCenter OLAP infrastructure, established during the Analysis Services 2005 timeframe, consists of various data marts that consolidate subsets of adCenter data for specific intelligence and analytics purposes. For example, the Coverage data mart, one of the largest adCenter data marts, includes the Hourly Impression Coverage cube and the Hourly Search cube to analyze current and historical Web activities by distribution channel, distribution medium, calendar, and fiscal dates. Both of these cubes are multiple terabytes in size. Other important data marts allow adCenter to manage advertising campaigns, analyze Web site statistics and conversion rates, track SLA compliance, and more.

Figure 1 shows the data-mart design in the adCenter production environment. Senior Systems Engineer Mike Anderson and Infrastructure Engineering Lead Bong Kang created this design based on separate OLAP servers, Relational Database Management System (RDBMS) servers, and SQL Server Integration Services (SSIS) data feeds; all systems eliminate single points of failure and ensure high availability. High availability is crucial in all areas of the production environment because adCenter processes large amounts of data on a continuous basis. Downtime could quickly lead to massive workload backlog. The RDBMS and OLAP servers rely on SAN-based failover cluster configurations and the SSIS farm uses hardware load balancing as its high-availability solution. The load-balancing cluster also provides the necessary scalability to support adCenter's large number of data marts.

The adCenter data mart design

Figure 1. The adCenter data mart design

The adCenter OLAP infrastructure relies on the following server systems and processes:

  • Server farm for SSIS-based data feeds adCenter imports, cleanses, validates, and combines the data from various sources through ETL processes on an hourly and daily basis. Every hour, an SSIS package loads new facts data into the delta and auditing tables of a relational staging database. Other packages load dimension data from different sources at different intervals, while auto-discovery processes derive additional dimension data from facts data, such as dynamically generated Web activity categories. Having completed the hourly processes for a given calendar day, adCenter moves the data from the staging database into facts and dimension databases by means of a separate daily ETL job, which also purges the staging data after each daily run to maintain a lightweight staging database.
  • RDBMS Server The RDBMS server maintains the staging database and the relational facts and dimension databases. While the staging database acts as an intermediate repository for the ETL pipeline, the facts and dimension databases represent the relational data sources of the corresponding data marts. Each OLAP cube in a data mart has its own set of facts and dimension databases for better performance management. Keeping the facts and dimension data in separate databases also facilitates maintenance and backups. At the present time, adCenter maintains one month of activity data in facts databases. The dimension database holds the complete set of dimension data.
  • OLAP Servers The OLAP servers maintain the Analysis Services databases and cubes. The OLAP processing server also handles the user queries. The standby server facilitates cube processing, maintenance, and performance tuning of new solutions prior to release. At night, after completion of the daily ETL job, adCenter runs an XML for Analysis (XMLA)-based script on each OLAP server to add new dimension and facts data to the cubes. Every three months, adCenter fully reprocesses the indexes on the OLAP cubes to update the dimensions, indexes, and aggregations. The process can take several days to complete depending on cube size. adCenter processes the cubes on the standby server. When all standby-server processing has completed, all users are migrated to the standby server. Post-migration jobs then process cubes on the OLAP processing server.

Minimizing Upgrade Downtime

The standby server also played in important role during the adCenter upgrade. By first upgrading the standby server, verifying its operational state, and then switching the users over prior to upgrading the processing server, adCenter was able to keep downtime at a minimum. The rolling standby upgrade allows a robust rollback strategy, thus safeguarding data and operational SLAs.

OLAP Server and Storage Configuration

"The key to achieving superior performance with Analysis Services 2008 in large-scale environments is finding the right balance between processor power, memory capacity, and storage performance. Inadequate processor resources cause thread contention. Insufficient memory capacity causes excessive paging and premature evictions from the data and calculation caches with a subsequent increase in data reads from disk. Slow and high latency storage systems cause disk queuing and other I/O bottlenecks."

Mike Anderson

Senior Systems Engineer

Microsoft Corporation

The hardware configuration of the OLAP servers is identical and includes 16 processor cores, 64 gigabytes of memory, redundant host bus adaptors (HBA) running Microsoft Multipath I/O (MPIO). Each OLAP server is allocated separate, identical SAN-based storage systems.

Although the processing server and the standby server maintain the same cube data, the servers perform cube processing separately. As a legacy from Analysis Services 2005, adCenter does not use Analysis Services synchronization in the production environment because the time required for synchronizing terabytes of OLAP data and metadata far exceeded the cube processing time. It was more efficient to load the cubes on each server individually. Analysis Services 2008 exhibits much better synchronization behavior. For this reason, adCenter is increasingly using Analysis Services synchronization as the preferred method to copy OLAP data and metadata on developer systems.

Table 1 summarizes the server and storage configurations in the adCenter production environment.

Table 1. adCenter Server and Storage Configurations

Server

Processor Cores

Memory

Storage Configuration

Comments

OLAP processing server

8 Intel Xeon dual-core processors

64 GB

RAID 10

The RAID drives include 180 physical 300GB 10K RPM disks for a raw capacity of approximately 26 terabytes and a random I/O performance of 9,000 reads and 4,500 writes per second.

OLAP standby server

8 Intel Xeon dual-core processors

64 GB

RAID 10

Identical to the OLAP processing server.

RDBMS server

8 Intel Xeon dual-core processors

32 GB

RAID 6

The relational data is spread over 32 physical146GB 10K RPM disks for a raw capacity of approximately 4 terabytes.

With 16 processor cores, adCenter OLAP servers can run a sufficient number of query and processing threads in parallel to satisfy queries from multiple users. The storage system must also be able to sustain the resulting input/output (I/O) load. If the data and calculation caches do not contain the data, Analysis Services allocates storage threads to retrieve the data from disk. At the present time, the resulting I/O load reaches up to 6,500 I/O operations per second (IOPS) during peak hours. Figure 2 shows a weekly report for total IOPS performed on an OLAP processing server per day and hour (each line represents a different day).

Total IOPS on adCenter OLAP servers per day and hour

Figure 2. Total IOPS on adCenter OLAP servers per day and hour (April 2009)

The current SAN-based RAID 10 Volume supports 18,000 random IOPS (100 IOPS per 10K RPM disk * 180 = 18,000 IOPS total). Assuming a conservative read/write mix of 1:1, this amounts to 9,000 reads and 4,500 writes per second (each write request requires two write operations to mirror the data). Considering that write requests occur mainly at nighttime according to the adCenter schedule for cube processing, read requests represent almost 100 percent of the I/O load during peak hours. This implies a theoretical maximum read performance of 18,000 reads per second, which is ample I/O headroom in the current SAN configuration. If necessary, adCenter can increase performance by adding more fibre adapters to each server, adding more fibre ports to the array, and adding more physical disks to the volume.

Dedicated Query Servers

In the Analysis Services 2005 environment, adCenter did not take advantage of dedicated query servers in a load-balanced configuration due to high storage costs and administrative overhead. Although load-balanced query servers can help to ensure high availability if a query server fails, adCenter determined that its OLAP design already provided a sufficient level of redundancy to ensure 99.99 percent availability. The SAN-based storage subsystem can sustain multiple component failures without causing a cluster node failure, and the server cluster can sustain a single server failure. The MSCS failover process is rapid, completing within a brief window. In the unlikely event that an entire OLAP server cluster fails, adCenter can still switch the users over to the standby system. The solution is focused on scaling up instead of scaling out and emphasized good aggregation and partitioning design to increase Analysis Services performance.

Test Lab Integration

"We performed our performance research on a 1.5-terabyte adCenter cube with1,928 partitions and a schema, skew, and inner relationships representative of our largest enterprise customers. This workload enabled us to collect very relevant key performance metrics. Going forward, we will use these metrics as a baseline for continued performance research to drive product quality at every major SQL Server Analysis Services milestone."

Tim Shea

Senior Development Lead

Microsoft Corporation

At adCenter, business justification is an integral part of project preparation. Justification entails quantifying the benefits of the proposed undertaking, such as upgrading to Analysis Services 2008. As part of the business justification effort, the OLAP Development Team contacted Senior Program Manager Denny Lee in the SQLCAT Best Practices Team for assistance. Prior to joining the SQLCAT team Denny Lee spent four years on the adCenter OLAP Development Team so he knew the adCenter environment very well. Recognizing the significance of the adCenter upgrade as a showcase for other large-scale OLAP environments and acting as adCenter liaison, Denny turned to Senior Development Lead Tim Shea of the SQL Performance Team for help in devising a performance research strategy that would deliver meaningful results not only for adCenter but also for other OLAP customers.

Together, the OLAP Development, SQLCAT, and SQL Performance teams developed the following approach to conduct Analysis Services 2008 performance research:

  1. The SQL Performance Team deployed a DAS-based lab environment on commodity server hardware while the SQLCAT team used a separate SAN-based lab environment on high-end server hardware to conduct performance research (see Figure 4 below).
  2. The SQL Performance Team developed a solution to replicate and mask 240 days of adCenter production data, pertaining to the Hourly Impression Coverage cube from the Coverage data mart. At the time, this was equivalent to 6.9 terabytes (39,743,726,688 rows) of facts data, uncompressed. The SQL Performance Team applied page compression to the facts table to reduce this data volume to 2.5 terabytes on disk. It was important to test database compression on an authentic, large-scale data warehouse because compression is a great, new feature for customers and a compelling reason to upgrade to SQL Server 2008, as the overall gains found in performance, the drop in system price, and the corresponding drop in price-for-performance, on the data warehouse, can be significant. Moreover, database compression facilitates the use of a modern, high-performing storage technology, such as solid-state flash disks.
  3. The SQLCAT team copied the masked data from the SQL Performance lab to the SQLCAT lab environment. SQLCAT decided not to use compression initially in its SAN-based environment to arrive at a configuration that resembles current customer large-scale OLAP environments as closely as possible, and so they could subsequently evaluate the impacts of compression in highly concurrent, large-volume data warehousing environments. According to Denny Lee, data compression can help to lower the I/O load and capacity requirements, and the CPU overhead is relatively minor in comparison to the I/O capacity and performance gains so that most SQLCAT data-warehouse workloads now use page compression.
  4. The OLAP Development Team supplied a set of actual MDX queries that adCenter analysts used to create daily and weekly reports. The SQLCAT team provided guidance regarding usage scenarios and the SQL Performance Team used the queries to develop corresponding MDX query templates for the ASQueryGen tool to generate 100 MDX query instances for 100 users to test a variety of performance scenarios.

    ASQueryGen Source Code and Query Template

    The ASQueryGen source code and a sample query template are available for download as part of the Microsoft SQL Server Community Samples for Analysis Services at http://sqlsrvanalysissrvcs.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=22769.

  5. The SQL Performance Team and the SQLCAT team conducted performance research and communicated their findings to the OLAP Development Team. The OLAP Development Team verified the results on its standby servers and fine-tuned the adCenter systems as necessary prior to upgrading the production environment.

SQL Performance and SQLCAT test lab integration

Figure 3. SQL Performance and SQLCAT test lab integration

Table 2 summarizes the lab server and storage configurations depicted in Figure 3.

Table 2. SQL Performance Team and SQLCAT Lab Configurations

Server

Processor Cores

Memory

Storage Configuration

Comments

SQLPerf LAB DW

4 Intel Tigerton quad-core processors

32 GB

RAID 6

The relational data is spread over 6 disk arrays using SQL Server file groups. Each disk array contains 8 data disks and 2 parity disks in a RAID 6 configuration. All disks are 146GB 15K RPM drives.

SQLPerf OLAP server

4 Intel Tigerton quad-core processors

64 GB

RAID 6

40 drives in total in a RAID 6 configuration. All disks are 146GB 15K RPM drives.

SQLCAT LAB DW

4 Intel Xeon quad-core processors

64 GB

RAID 5

RAID 1

6 LUNs (REL_DATA) Each disk array contains 7 data disks and 1 parity disk in a RAID 5 configuration.

2 LUNs (REL_LOG1 and REL_LOG2) Each array has 2*12 disks (24 disks) in a RAID 1 configuration.

4 LUNs (TEMPDB) Each array has 2*8 disks (16 disks) in a RAID 1 configuration.

All disks are 300GB 15K RPM drives.

SQLCAT OLAP server

4 Intel Xeon six-core processors

64 GB

RAID 5

RAID 1

1 LUN (AS_CUBE) with 15 data disks and 1 parity disk in a RAID 5 configuration.

1 LUN (AS_TEMP_AREA) with 2*8 disks (16 disks) in a RAID 1 configuration.

1 LUN (WORK_AREA) with 2*48 disks (96 disks) in a RAID 1 configuration.

All disks are 300GB 15K RPM drives.

SQLCAT Articles and Project REAL | Practices

The Project REAL | Practices documentation and the majority of the SQLCAT articles related to SQL Server 2008 Analysis Services are based on the SQLCAT lab environment illustrated in Figure 4. To visit the Project REAL | Practices site, go to http://www.projectrealpractices.com. The SQLCAT articles are available at http://sqlcat.com/ .

Dedicated Query Servers

In the Analysis Services 2008 environment, the adCenter team is now considering the scale-out scenario. Figure 4 demonstrates the load-balanced query server design presently under review by the adCenter OLAP team. The SQLCAT team has tested this configuration with positive results.

adCenter data-mart scale out

Figure 4. adCenter data-mart scale out

The adCenter data-mart scale out is based on the following strategy:

  • Query server farm Multiple read-only query servers running Analysis Services 2008 use a shared LUN in a SAN-based storage system, optimized for read operations. Query servers in read-only mode do not generate write requests. A network load-balancing solution distributes the query load within the server farm.
  • Database detach after cube processing After cube processing, detach the Analysis Services database on the processing server and copy the resulting detached log file to the shared SAN volume of the query servers.
  • Database reattach in read/write mode After completion of the copy process, the Analysis Services database is reattached on the processing server to bring the OLAP processing system back into operational state.

Database detach and attach in read-only mode When attaching the copied database to the query servers, the old database is first detached and then the new database is attached, setting the ReadWriteMode to ReadOnly so that the query servers can access the same database files without conflicts. It is necessary to detach the old database first in order to attach the new database because both databases have the same identifier. Microsoft does not support changing the ID of an Analysis Services database.

Cube Processing Optimization

"For best cube processing performance, I highly recommend using an intermediate data warehouse dedicated to the task of supplying pre-processed source data to Analysis Services during cube processing. SQL Server 2008 is a perfect choice. The Relational Engine includes numerous OLAP-related improvements, such as optimized star-query joins, partitioned table parallelism, minimally logged insert operations, and Change Data Capture, and can succeed in this role with little or no tuning."

Denny Lee

Senior Program Manager

Microsoft Corporation

Given the amount of data generated on a daily basis and the size of the OLAP cubes, it is necessary for adCenter to materialize much of the dimensionalized data prior to loading the actual cubes. The Coverage data mart is a good example. Every day, this data mart accumulates 60 to 70 gigabytes of new data, amounting to around two terabytes per month. The Hourly Impression Coverage cube alone exceeds 2.75 terabytes in size, with 26 dimensions and 14 measures in a single measure group spread over more than 3,000 partitions across more than 1 million files. Fully processing this cube takes 48 hours with Analysis Services 2008 (equivalent to more than 96 hours with Analysis Services 2005). The Hourly Search cube is only slightly smaller with 2.0 terabytes. To retain a daily data load cycle for these types of cubes while keeping existing cube data available for browsing, adCenter implemented a solution based on hourly ETL processes, daily cube loads, and quarterly cube reprocessing.

adCenter implemented the following cube processing strategy:

  1. Hourly: ETL processes derive measures and calculate measure values during the transfer of new data from the source systems into the staging database of the data mart. There are separate processes for facts and dimension data, and multiple processes can run in parallel. An individual ETL process might take several hours to complete, but every hour new processes start to keep the data flowing into the staging database. Fact data arrives every hour, although in varying quantity, while dimension members change less frequently or not at all. adCenter removes the processed data from the staging database every day.
  2. Daily: An ETL job materializes the processed source data from the staging database in separate relational fact and dimension databases for subsequent cube loading. Every cube has a separate set of fact and dimension databases. adCenter maintains one month of data in the relational fact and dimension databases.
  3. Daily: An XMLA script loads the new data from the relational fact and dimension databases into the cube at nighttime. During nightly cube processing, adCenter does not delete or update existing dimension members to retain all indexes and aggregations, which benefits performance. adCenter maintains 13 months of data in the cubes.
  4. Quarterly: adCenter performs a ProcessUpdate operation every quarter to refresh dimension data that might have changed during the last three months. The ProcessUpdate operation preserves valid dimension contents, performs inserts, updates, and deletions as necessary, and drops invalid aggregations and indexes. To maintain query performance, adCenter runs the ProcessIndexes command after ProcessUpdate to create new aggregations and bitmap indexes. Because reprocessing takes an extended amount of time, adCenter performs the quarterly processes on the standby server, as discussed earlier in this paper.

Figure 5 illustrates the adCenter solution for hourly and daily cube processing.

Cube processing waterfall

Figure 5. Cube processing waterfall

Fact Table and Cube Partitioning

In the relational portion of the data mart, adCenter uses a single fact database per cube. The only purpose of this database is to supply fact data to Analysis Services during cube processing. adCenter currently does not use the relational databases for querying. All adCenter cubes use the multidimensional OLAP (MOLAP) storage model, which copies all source data to the cube for fastest query performance.

The facts database holds one month of data in a large table partitioned according to calendar days. Each calendar day corresponds to a separate table partition, which facilitates data pruning based on a monthly sliding window. Every day, adCenter archives and drops one expired data partition and adds a new partition for the current day to the fact table. The partition function uses a date key in the format yyyyMMdd, converted to the int data type. The fact table also includes a tinyint column to track the hour of the day for each fact row. adCenter uses this hourly information to distribute the daily facts data across eight equal-sized, non-overlapping cube partitions to increase cube processing and querying performance. With 16 processor cores, Analysis Services can process the eight partitions in parallel while retaining 50 percent of the processor power for queries running on the system. An XMLA script creates the new cube partitions and then performs a ProcessFull operation on these partitions to add the new data without affecting existing partitions or taking the cube offline.

An interesting aspect of the adCenter design revolves around the distribution of the hourly facts data across the eight cube partitions. Each cube partition holds three hours of the data. Taking into consideration that the facts volume varies with peaks and valleys across the day, adCenter does not place the hours sequentially in each cube partition. To achieve even distribution, adCenter uses a non-sequential pattern, which was determined through an analysis of the daily workload over the course of a month. The source queries use a WHERE clause that matches the hourly fact value to a list of three-hour values per partition. Figure 6 illustrates this approach. To achieve fast fact retrieval, adCenter created a clustered, partition-aligned index for the facts table based on the date key and the hourly information.

Cube processing based on one relational partition and eight cube partitions

Figure 6. Cube processing based on one relational partition and eight cube partitions

The adCenter cube processing design provides the following advantages:

  • Simplified data maintenance and archiving The sliding-window technique facilitates loading of new data into the data mart and swapping out aged data.
  • Fast cube processing By distributing the workload evenly across eight cube partitions, Analysis Services can run eight concurrent processing jobs with equal efficiency to load the data. Each individual job sequentially performs bitmap indexing and aggregation processing for a single partition.
  • High availability cubes Loading facts data every day into eight new cube partitions does not affect the availability of analytics data from previous days. The new partitions become available for browsing as soon as Analysis Services completes the processing.
  • Different retention policies for relational data and cubes While the relational data stores currently hold only one month of data, the cubes maintain up to 13 months. adCenter does not discard the contents of the cube partitions during quarterly reprocessing.

Transact-SQL Views

As a best practice, adCenter generally references views instead of tables in the relational databases. In this way, adCenter gains flexibility regarding dimension processing optimization and backward compatibility between cube releases. For fast dimension processing, adCenter relies heavily on indexes views used for creating dimensions and uses Transact-SQL hints to remove locking overhead from the database. adCenter does not utilize data source views (DSVs) for data modeling or shaping. The DSVs in the adCenter cubes are straightforward mappings to views in the underlying fact and dimension databases, optimized for loading data into the cubes.

Regarding backward compatibility, views provide the option to maintain dimension consistency between releases. According to Bilal Obeidat, "this is a real lifesaver for adCenter because dimensions can change or vanish between cube releases. If a dimension disappears from the underlying data source, existing intelligence and analytics solutions might break." adCenter solves this issue by simulating the required dimension members with default values directly in the Transact-SQL view definitions of the fact tables, which comprise a set of UNION ALL SELECT statements.

Optimizing Server Performance

In order to ensure the best daily and quarterly cube processing performance, adCenter optimizes the server configuration according to performance research results. The research revealed that Analysis Services 2008 performs approximately 30 percent better than Analysis Services 2005, straight out of the box, and that adCenter can achieve an additional 30 percent performance increase by maximizing CPU utilization, pre-allocating memory, and tuning processing threads and aggregation tasks. Figure 7 shows the cube processing performance observed during research in the SQL Performance Team lab.

Performance comparison based on a 1.5 terabytes adCenter cube

Figure 7. Performance comparison based on a 1.5 terabytes adCenter cube

Table 3 summarizes the configuration parameters that adCenter uses to optimize cube processing in comparison to Analysis Services 2008 default values on Windows Server 2003 (most are similar for Windows Server 2008).

Table 3. Server configuration

Configuration Setting

Default

Optimized

Comments

LimitSystemFileCacheSizeMB

Unlimited

48000

Reducing the maximum file cache size on a server with 64 gigabytes of memory ensures that system processes have enough memory when Analysis Services processes large quantities of data. In the default configuration, Analysis Services might exhaust the memory resources, causing excessive paging during cube processing.

Memory\PreAllocate

0

20

A value of 20 causes Analysis Services to allocate 20 percent of the available memory at startup. PreAllocate has little impact on small cube partitions, but helps to push the CPU workload to 100 percent at full cube scale. Note, increasing this parameter may also increase the size of your data.

Memory\HeapTypeForObjects

1

0

By default, Analysis Services uses the standard Windows heap allocator. Switching to the Low Fragmentation Heap allocator by setting Memory\HeapTypeForObjects to 0 and Memory\MemoryHeapType to 2 helps to reduce memory fragmentation and stabilize query throughput rates.

Memory\MemoryHeapType

1

2

Threadpool\Process\MaxThreads

160

256

Increasing the number of processing threads achieves full CPU utilization and better parallelism during cube processing if the underlying I/O subsystem provides the necessary throughput.

OLAP\Process\ AggregationMemoryLimitMin

10

1

Lowering the minimum amount of server memory used for aggregation processing to 1 percent enables Analysis Services to spin up more aggregation tasks in parallel and initially consume a smaller percentage of the system memory.

OLAP\Process\ AggregationMemoryLimitMax

80

5

Analysis Services can only use a maximum of 5 percent of the server memory for aggregation processing.

Network packet size

4096

32767

A packet size of 32 kilobytes helps to lower the TCP/IP protocol overhead incurred during communication between SQL Server and Analysis Services.

Query Processing Optimization

"After all is said and done, what matters most is query execution performance. Having grown accustomed to Analysis Services 2008 performance levels over the last six month, I can say with confidence that I would never want to go back to Analysis Services 2005."

Bilal Obeidat

Senior Development Lead

Microsoft Corporation

To ensure a positive querying experience and meet SLAs, adCenter performs query optimization on a continuous basis. Apart from analyzing and tuning MDX code, this effort also includes optimizing processor, memory, and storage utilization. Processor power determines the number of queries, jobs, and aggregation tasks that Analysis Services can run concurrently. Memory capacity directly translates into cache capacity for the formula and storage engines to reuse calculation results and aggregations. Storage performance determines how fast Analysis Services can retrieve facts, bitmap indexes, and aggregation data from disk. For details regarding the server and storage configuration, see the section "OLAP Infrastructure Design" earlier in this white paper.

In addition to scaling up server and storage resources, adCenter relies on the following optimization techniques to achieve optimum query performance:

  • Establishing query baselines As a prerequisite to efficient query performance optimization, adCenter establishes query baselines by using SQL Server Profiler traces and performance counters. adCenter uses the standby server for baselining to minimize interference from other queries and non-query activities.
  • Performing usage-based optimization for long-running queries The Usage-Based Optimization Wizard is a popular adCenter optimization tool for creating effective aggregations. Prior to full production release of new analytics solutions, adCenter sets the aggregations for the cube at a low percentage, tracks the user queries in the query log, and then uses the Usage-Based Optimization Wizard to create aggregation designs automatically. adCenter then reviews the suggested aggregations and applies those with the most promising performance gains. adCenter tests the results with an aggregation level of 100 percent and repeats usage-based optimization steps until the cube exhibits optimum performance characteristics. For details regarding usage-based optimization, refer to the SQLCAT article " Reintroducing Usage-Based Optimization in SQL Server 2008 Analysis Services" at http://sqlcat.com/technicalnotes/archive/2008/11/18/reintroducing-usage-based-optimization-in-sql-server-2008-analysis-services.aspx.
  • Optimizing MDX queries Tremendous query performance improvements are achievable by exploiting subspace computation mode given that the vast majority of the MDX functions support subspace computations in Analysis Services 2008.
  • Warming storage engine and query processor caches During querying, Analysis Services performance varies depending on the state of the storage engine and query processor caches. Query responsiveness generally increases when these caches are warm. In contrast, a cold cache forces Analysis Services to go to disk more often with a corresponding high number of file reads across a larger set of files. To exploit the benefits of the storage engine and query processor caches, adCenter executes a set of generalized queries to simulate typical user activity after operations that flush the cache contents, such as at the end of cube processing.
  • Optimizing the querying thread pool The size of the querying thread pool determines the maximum number of querying threads that Analysis Services can use to processes queries. The default size is 32 threads (twice the number of cores), but adCenter doubles this number to increase parallelism during querying. adCenter uses the default CoordinatorExecutionMode setting of zero for unlimited parallelism.
  • Pre-calculating values during ETL Calculations can be moved to the relational engine and processed as simple aggregates with much better performance. Computing this result in the source database offers superior performance.
  • Scaling out large data marts As discussed earlier in this white paper, adCenter is currently testing a load-balanced querying environment to increase Analysis Services scalability in the largest data marts.

SQL Server Analysis Services 2008 Performance Guide

For a detailed discussion of performance-tuning techniques for query and cube processing optimization, including comprehensive background information and server configuration settings, refer to the Microsoft SQL Server Analysis Services 2008 Performance Guide published by SQLCAT and available for download at http://www.microsoft.com/downloads/details.aspx?FamilyID=3be0488d-e7aa-4078-a050-ae39912d2e43&displaylang=en. Also refer to the SQLCAT Web site at http://SQLCAT.com for updated information and additional performance-tuning recommendations.

Conclusion

SQL Server Analysis Services 2008 represents a natural choice for adCenter to realize significant performance gains and break through scalability limitations in existing large-scale OLAP environments. The new Analysis Services version offers better hardware utilization, optimized cube processing, substantially faster MDX query execution, and cost-efficient infrastructure flexibility, and it provides a seamless migration path with straightforward cutover that preserves backward compatibility. The upgrade to SQL Server 2008 does not require modifications on server hardware, storage systems, or the OLAP infrastructure. It is likewise not necessary to modify existing intelligence and analytics solutions or provide developer and analyst training on new and improved OLAP development tools, such as BIDS 2008.

It was easy for adCenter to justify the upgrade decision. Performance research based on an authentic 1.5-terabyte adCenter cube proved that Analysis Services 2008 performs substantially better than Analysis Services 2005, even without extended server tuning. And with server optimizations, Analysis Services 2008 performs approximately 50 percent better during cube processing.

It is straightforward to take advantage of Analysis Services 2008 innovations for faster query processing. The vast majority of the MDX functions now support subspace computation mode and BIDS 2008 promotes good solution designs that follow best practices and performance guidelines through helpful features, such as AMO warnings and optimization wizards. adCenter took the upgrade to Analysis Services 2008 as an opportunity to standardize all new BI development on BIDS 2008.

Analysis Services 2008 also puts adCenter in a better position to extend the OLAP infrastructure to meet future demand by deploying a scalable querying environment for the largest data marts. Analysis Services 2008 supports query servers attached to the same Analysis Services database in read-only mode, which eliminates the need for redundant SAN-based storage, reducing costs and administrative overhead. Analysis Services 2008 provides the answers adCenter needs in the current business environment. Key business benefits include higher service levels and user satisfaction through faster MDX queries and larger data marts, higher ROI through better hardware utilization, and ultimately lower total cost of ownership through improved maintainability and infrastructure flexibility. Innovative development tools help adCenter to lower OLAP development costs while at the same time increasing the quality of the solutions. Another strategic benefit is readiness for further growth through future relational data-warehousing technologies, such as SQL Server code-named Project "Madison." The adCenter data marts will soon maintain 150 terabytes of data. The upgrade to SQL Server 2008 is an important step toward this future.

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information through the World Wide Web, go to:

http://www.microsoft.com

http://www.microsoft.com/technet/itshowcase

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place, or event is intended or should be inferred.

© 2009 Microsoft Corporation. All rights reserved.

Microsoft, Active Directory, Bing, Excel, SharePoint, SQL Server, Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

All other trademarks are property of their respective owners.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft