Pre-configured Reference Architectures give you a tested roadmap for configuring SQL Server to manage business intelligence workloads.
Poorly performing queries are the bane of any online transaction processing (OLTP) system. You could look to improve hardware performance, but the best way to solve this is to optimize those queries.
You could accomplish this by changing the index strategy, changing the code or, more dramatically, changing the table schema. The reason this works is that OLTP typically has a predictable workload. You know what’s coming from the application servers and you can design the system to cater for that.
This approach rarely works, however, for data warehouse workloads, especially as database size and concurrency increases. Community forums are full of DBAs seeking help addressing poorly performing, complex data warehouse queries.
Although there may be many commonly run reports in a typical business intelligence (BI) workload, there’s often a mix of ad hoc queries coming from various other tools. These are unpredictable and difficult to influence. Users also regularly modify their queries, often through a BI tool, as they drill into the data to extract business value. Attempting to optimize these queries can feel like a losing battle.
IT managers, reacting out of frustration, often attempt to resolve this by throwing hardware at the problem. They’ll buy a high-end server and connect it to the largest enterprise storage array they can afford within their budget. If you want to lose credibility with your business community, follow this path. This approach rarely succeeds and is certainly an expensive option to implement once you factor in licensing costs.
Microsoft has teamed up with a number of well-respected hardware vendors to turn this problem on its head. The companies posed the questions: “What would happen if we accepted that users often scanned large fact tables and performed complex Group By queries? What would happen if every component in a solution were able to operate at full speed and not be bottlenecked by another component? What would that look like?”
Enter Fast Track Data Warehouse (Fast Track DW) for SQL Server. Since 2005, Microsoft has been investing heavily in optimizing SQL Server for BI workloads. In 2009, Microsoft released its first Fast Track Reference Architecture for Data Warehouses. Today there are new Reference Architectures emerging that take advantage of new features in SQL Server 2012.
Fast Track DW isn’t a product. It’s a series of well-engineered, pre-tested hardware and software configurations designed specifically to address this problem. There’s no special hardware and no magic software. It’s all built on commodity components, such as Windows Server and SQL Server.
You probably already have many of these components operating within your environment. What’s missing is that operational mind shift away from preventing large data scans. Fast Track DW openly encourages that behavior, so much so that it actually depends on it for its lightning-fast performance.
If you’ve ever done any hardware performance testing, you’ve quickly come to the conclusion that most systems like sequential reads. If you analyze most BI workloads, you’ll see they typically consist of 80 percent to 90 percent sequential reads. What would happen if you designed your system to cater only to sequential reads? What would happen if your components—CPU, memory, PCI bus, host bus adapters (HBAs), network, storage, SQL Server and database files—were designed for this as well?
There can be problems, though. What happens if you have slow disks or not enough RAM, or the HBAs can’t keep up with the disks, or a PCI bus becomes saturated? Worse yet, as you remove one bottleneck by adding more memory or upgrading the disks from 10,000 to 15,000 RPM, another bottleneck will soon appear.
A typical scenario might go something like this: A DBA asks the infrastructure team for more RAM in a SQL Server. Adding the RAM solves the first problem, but only boosts performance by 3 percent. Immediately after adding the RAM, a bottleneck in the disk subsystem reveals itself. When IT next asks the business owner for a big chunk of money, how would you rate their chances of securing additional funding?
To overcome this, hardware and software engineers have developed a series of balanced systems that can service different-size data warehouses. Entry-level systems start at 5TB, while larger systems can service almost 100TB databases. For data warehouses in the hundreds of terabytes, Microsoft and its hardware partners have assembled a boxed product called the Parallel Data Warehouse.
Most infrastructure teams could build a balanced system like this, yet they rarely have time to investigate and match components to achieve the maximum potential throughput. If you have 20TB of storage on 600GB disks, for example, you must determine how many HBAs you’ll need on how many switch ports to drive two eight-core processors at 100 percent. You would also have to plan such capacity issues as how many storage processors you’d need to achieve 6GB per second of sustained throughput.
Although many infrastructure teams consider and plan for these issues, balanced systems rarely make it into a production environment. The Reference Architecture takes out all the guesswork.
Infrastructure implementation issues are renowned for causing project delays. Fast Track DW is designed to improve time to implement metrics. Each Reference Architecture comes with a pre-defined commodity hardware parts list. You can quickly convert this into a bill of materials to submit to your hardware vendor.
Because hardware vendors were involved in putting these Reference Architectures together, they should take less time actually fulfilling the order. No longer do you need to go back and forth with your supplier regarding the order list. There are no more discussions about using a 2.4 GHz processor or a 2.5 GHz processor, Fibre Channel or iSCSI—the specification is predefined.
When the hardware arrives on site, the infrastructure guys don’t need to discuss how to best implement the solution, as that’s clearly articulated in the Reference Architecture. These details include physical disk placement, cabling, software and driver versions, firmware, storage configuration, HBA queue depths, SQL Server configuration, and even database file placement.
Fast Track DW operates on the concept of Maximum Core Rate (MCR). This describes the maximum number of megabytes per second the processor core can consume within the CPU. Today’s multi-core processors can consume 300MB to 400MB of data per second per core. For example, on a server with two CPU sockets and eight cores per CPU, that translates to approximately 6GB per second. To drive that, you’ll have four dual-port HBAs capable of a maximum sustained total throughput of 6.4GB per second. Each underlying storage array contains four sets of physical disks in a RAID 10 configuration, which can generate 1.6GB per second for a total of 6.4GB per second.
Fast Track Reference Architectures typically specify 10Gbit iSCSI or 8Gbit Fibre Channel for the storage network on a dedicated switch. Unlike a typical SAN environment—where storage is shared with multiple workloads like file servers, database servers and virtual machine hosts—all the storage is dedicated to the Fast Track server.
Nothing is left to chance. In many Reference Architectures, LUN paths are mapped to HBAs, switch ports, storage processors and physical disk sets. This reduces the contention that can happen when you let I/O traffic from one disk set share a path with I/O traffic from another. No component should be allowed to flood the channel of another component. Everything should be able to run in parallel at maximum speed.
Building and configuring your own high-performance data warehouse solution on the latest hardware would be a significant task. To create a repeatable build, infrastructure teams often spend hours trolling through installation guides, blog posts and community forums in order to build the complex scripts required.
The Reference Architectures are more than just parts lists and some performance statistics. Although there’s some variation between hardware vendors, Reference Architectures also include scripts to configure the hardware. If the thought of assembling all the components seems a little daunting, there are some vendors who have programs to ship everything racked and pre-assembled.
SQL Server 2012 introduces a new type of index called a ColumnStore. ColumnStores are all about performance and improving the price to performance ratio. Each row of data is processed in a non-ColumnStore query. With ColumnStore, you can have SQL Server process rows in batches. Not only is the data for a column over multiple rows stored on a single data page, but you can also have it processed in batches. On top of that the data is heavily compressed. This works out to be approximately a 7-1 ratio.
ColumnStore provides much better throughput because the CPU overhead is reduced while executing queries. Processing requires less I/O and RAM, which is well suited to the Fast Track DW architecture. ColumnStore indexes provide a massive 10- to 100-times performance improvement over regular row-based indexes.
Keep in mind that not all queries can take advantage of ColumnStore indexes. Recent performance tests have shown an average overall performance boost of two times is more reasonable across a mixed workload. Still, a 100 percent performance boost for relatively little effort is certainly worthwhile. So what’s the catch? You can’t update ColumnStore indexes. But most data warehouse applications can cope with this limitation during their extract, transform and load process.
Newer Reference Architectures are also starting to support high availability for Fast Track DW. As BI becomes more business-critical, this is certainly good news. Currently available Reference Architectures use the ever-dependable Windows Server Failover Clustering technology, which has been used in OLTP circles for more than a decade.
A key benefit of using Fast Track Reference Architectures is that they use regular software such as Windows Server and SQL Server. This is helpful for systems administrators, DBAs and support staff. Although developers will need to start thinking about taking an index-light approach, regular T-SQL will still run on Fast Track. Because Fast Track is a Reference Architecture and not a boxed product, patch management is straightforward as well. You can simply add the server to your regular patch-management process.
Because Fast Track is built on SQL Server 2012, it integrates well into most standard BI architectures. Source systems can feed the Fast Track DW via a dedicated SQL Server Integration Server or an operational data store. You can expose data through departmental data marts built on top of SQL Server Analysis Services or have your BI tools access Fast Track directly.
There’s a growing trend to give users access to data portals with dashboards and PowerPivot or PowerView with SharePoint. SQL Server Reporting Services can expose structured reports via analysis services cubes. Your users can build ad hoc reports in Report Builder, or use PowerPivot for Excel. With these options and performance improvements, it’s easy to see how a scalable platform like Fast Track DW for SQL Server 2012 can become a core component of your BI strategy.
Stephen Strong has more than 25 years of experience with database systems, from application architecture and DBA mentoring to infrastructure architecture and design. Over the last nine years, working in conjunction with Microsoft Services, he has been instrumental in the architectural design and support of some of Australian’s largest and most complex SQL Server implementations.