Massively Parallel Processing (DW)---a Technical Reference Guide for Designing Mission-Crtical DW Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

Almost all tier-one companies are likely to have data warehousing (DW) initiatives in place, and most will probably have the need for very large data warehouse (VLDW) capabilities. Most will also have reached a more mature level in the DW capability model and will appreciate the benefits of massively parallel processing (MPP) DW.

Parallel data warehousing (PDW) takes data warehousing to new levels by introducing an MPP capability to Microsoft SQL Server. In simple terms, PDW extends the data capacity of SQL Server by distributing the data across multiple "nodes," which are separate and distinct instances of SQL Server. However, to the end-user, the MPP appears as one database.

Furthermore, many of the same DW design techniques, query techniques apply, but without all of the complex DBA work (for example, capacity planning, storage management, and TEMPDB issues). The appliance model presents a comprehensive, fault-tolerant, less-expensive environment for VLDW applications making it an attractive augmentation to their SQL Server implementations.

Best Practices

The following resources provide reference material and additional information.

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

  • Fully understand the scalability and performance requirements.

  • The client must have at least 20 TB or more of data to enjoy the benefits of PDW.

  • The best PDW workloads are for very large table scans, near real-time data loads with heavy query concurrency, and well-developed query plans.

  • Given the price-point of PDW (including maintenance and implementation), this technology should not be targeted to a simple project; unless that project has VLDW needs (e.g. Telco, Large Retail, Large Web Analytics (clickstream), and so on.

  • Consider carefully the complexity of the client’s ETL/ELT and reporting requirements. PDW does not yet have all the T-SQL capabilities, stored procedures, UDF’s, and so on. normally seen in standard SQL Server implementations.

  • Understand where the pain is for the client:

    • Query/Load Performance

    • DW manageability/maintenance

    • Cost of Ownership

    • Scalability

    • Fault tolerance (High Availability or "HA")

    • Internal DW politics with the data center or H/W groups

Appendix

Following are the full URLs for the hyperlinked text.

1 Data warehouse appliance http://en.wikipedia.org/wiki/Data_warehouse_appliance

2 Getting Started with Parallel Data Warehouse http://www.windowsitpro.com/article/business-intelligence/Getting-Started-with-Parallel-Data-Warehouse/5.aspx

3 SQL Server 2008 R2 Parallel Data Warehouse Demo http://www.demomate.com/content/demos/SQL Server 2008 R2 Parallel Data Warehouse Demo.zip