Planning Your First Microsoft BI Solution
At a Glance:
- Introduction to BI
- Avoiding common problems with corporate databases
- A walk through the Microsoft BI stack
- Building a simple solution
Most database administrators (DBAs) have encountered some form of business intelligence (BI) while managing their organization’s data and Microsoft SQL Server implementation. Many other IT professionals, who don't have DBA responsibilities, have likely heard of BI but don't have firsthand experience with it or really even know what it is. In this article, you'll find out what all the fuss is about. Once you understand what BI technologies can really do and see how BI solutions are constructed on the Microsoft BI platform, you'll discover why BI isn't just for DBAs. If you're savvy about BI, you'll be in a better position to support your organization's BI initiatives when they come up (and they will) and you'll also realize how you can use BI to track and analyze data relevant to your role, making your job easier and deepening your understanding of that data.
This article is the first in a series of articles that introduce the Microsoft BI stack. In this initial article, I define BI and describe the high-level architecture of a BI solution in general terms. I also provide some insight into the process of building a BI solution. To delve into the SQL Server BI technologies mentioned in this article, you can read the companion articles written by Derek Comingore in this issue (see "Building a Data Foundation for a BI Solution") and by Scot Reagin and me in upcoming issues.
What Is BI?
Several years ago, while I was coauthoring Business Intelligence: Making Better Decisions Faster (Microsoft Press, 2002), I was surprised by how strongly my peers disagreed about which technologies should be considered within the domain of BI and thus within the scope of the book. It was an enlightening experience to hear such divergent opinions among BI professionals about which tools they considered BI tools and which they excluded. This difference of opinion still exists within the IT industry, and many continue to debate the definition of BI. To me, BI is as much about business process as it is about technology, so I start defining BI from that perspective.
As a business process, BI is a series of activities you perform to gather and analyze data so that you can make better decisions and improve your business by sharing the results of your analysis with others. Whether you need information to decide how to make your daily routine more efficient or to support long-range planning, such as next year's budget, the steps you take to find, transfer, format and study the data are all part of BI. In addition, BI includes the processes you use to make your results available for later reference so you and others can measure the impact of the decisions you made after studying the data. Typically, BI is an iterative process. You analyze the data to see what has happened, you take action to ensure good things keep happening and bad things stop, and you then analyze the data at a later point to determine whether your actions made things better or worse and whether external factors helped or hindered your efforts.
Given this broad definition of BI, you're using BI even when you're jotting down bits of information or creating lists to help you make decisions throughout the day. Introducing technology into some or all of the business processes you use to gather, analyze and share information can make these processes more efficient. Organizations tend to start small when adopting BI technology, often using it at first to solve specific problems. Over time, the use of BI technology grows incrementally as the emphasis shifts to disseminating information efficiently across the organization. As its BI strategy continues to mature, the organization usually acquires more sophisticated tools to enable greater interaction with and exploration of data.
What's Wrong with Querying Your Corporate Databases?
In any definition of BI, data is always the focal point. You might be wondering why you need to bother to create a BI solution when you can simply query one or more of your organization's databases to get the data you need. If you're the sole consumer of the data that you're gathering, if you have the tools, skills and security privileges to access the corporate databases, and if all the data you need is in those sources, you probably don't need a formal BI solution. By contrast, if you regularly need to share information with others whose technical skills and security privileges vary, you need to create a BI solution that is simple to use and maintain. You also need to consider that once your colleagues hear about this neat BI solution, they'll want to use it too. To anticipate this demand, your solution should be scalable from the beginning.
On the surface, allowing everyone in the company to run reports to get data from the corporate databases might seem like a good idea, but this approach probably won't be very popular with the DBAs. As they will tell you, querying the corporate databases directly can come with more than a few problems. Here are a few of the more common ones:
- If the organization's data is stored in different platforms in different formats, consolidating the data into a common format that is useful for analysis can be challenging. Data can't be copied directly from source tables into a common target table but will have to be manipulated in some way before it's stored.
- Data definitions might not be consistent across multiple databases, and reconciling data that appears to be similar can be difficult. For example, the revenue in a sales database might be calculated according to one set of business rules but be subject to an entirely different set of business rules in a general ledger database.
- Each database is probably structured and optimized for inserting data or for performing lookups. Even if you want to query just one database, running analytical queries usually requires the summarization of large quantities of data, which is an expensive operation in terms of database resources. Consequently, your queries can take a long time to execute and can create contention for resources with other applications performing insert, update or lookup operations.
- Historical data is often archived rather than maintained indefinitely in a corporate database. If you want to look at trends over time—a common BI activity—your solution might need to be a repository for data that doesn't persist in the corporate database.
- Some data required for analysis might not be available in the corporate database. It might be in flat files, spreadsheets, or unstructured data formats like Microsoft Word documents. More problematic is getting to the information people have stored on their local machine—or in their head.
- Even if data is available, quality issues sometimes mean you can't use it directly from the source. You might need to download the data and clean it up before you can analyze it. Unless the data can be cleaned within the source, you'll need to cleanse the data manually each time you access it—making sure you apply the same rules every time. Furthermore, you can't be sure that everyone else using the same data is following the same rules for cleaning it.
To address these problems with data access, a BI solution often includes a database created exclusively to house the data used for analysis. Having such a database means you can avoid any resource contention problems between activities generating data and activities consuming that data. Going a step further, you can restructure the data so that queries summarizing data can run much faster. When you need to consolidate data from multiple data sources, you can centralize it and apply business logic to put the data into a common format with consistent meaning. You can also incorporate data that didn't come from a database, such as an XML document or a spreadsheet, into this central location. Another benefit of creating a separate database is that you can preserve historical data for as long as necessary after that data is purged from the source databases. Finally, you can automate the process of cleansing and enhancing the data for analysis, ensuring that the same rules are applied each time the data is accessed.
Support for Decision Making
A BI solution should do more than give you better access to data. It should specifically support your decision-making efforts. In general, a BI solution should help you assess and respond to business conditions, whether you require an all-encompassing view of the entire organization or a narrow perspective of a department, workgroup, or even a team of one. In fact, the ability to move quickly from a summarized view to a detailed view of data is an important capability in BI.
The goal of a BI solution is to let you spend your time analyzing the data and finding answers to questions rather than tracking down, consolidating, reformatting and reconciling the data itself. When you have enough quality time to analyze the data, you can usually spot problems early and take steps to stop negative trends from continuing. You can also use BI to discover correlations between seemingly unrelated data points and then adapt your strategies to turn your insights into dollars saved or dollars earned. Every decision you make each day as you do your job, whether you're solving a problem or planning for the future, translates directly or indirectly into a cost or a profit for your company.
BI in Action
Understanding BI on a theoretical level is good, but seeing it in action really helps you understand its benefits. To show you how BI works, the three other articles in this series describe the development of a BI solution for a fictional company called Adventure Works. At the end of this article, I'll explain how to obtain the sample database for Adventure Works, which is a fictional bicycle manufacturer that sells its products worldwide. It maintains a sales staff to sell its products wholesale to resellers such as small specialty bicycle shops or large warehouse-style shopping outlets. Adventure Works also sells products directly to individual customers through the Internet. The Adventure Works data provides lots of analysis opportunities that fit well into a BI solution.
Remember that a BI solution is intended to support decision making. With this in mind, let's consider the types of questions that Adventure Works needs to answer before making key decisions:
- Which sales channel is more profitable? Adventure Works must decide whether to invest in adding more sales staff to develop relationships with more resellers or to expand its Internet sales presence. To help guide this decision, analysts need the ability to compare the sales performance over time between resellers and its Internet site. The sales performance data points (called measures) that analysts need to compare include sales dollars, order quantity and profitability. A positive trend in profitability is the most important measure because high sales measured in dollars or number of units sold don't benefit the company if these sales result in a net loss.
- Is demand for certain products growing or declining? Adventure Works must match production levels to sales demand. If demand for some products is growing, Adventure Works must adjust its manufacturing processes to ensure more of these products are available for sale and thus increase sales. If demand for other products is declining, Adventure Works needs to reduce their production or possibly eliminate product lines to avoid an oversupply situation in which these products must be sold at a loss.
Even if you don't analyze sales data in your job, I encourage you to follow along as we build a BI solution for Adventure Works. You can apply the same design and development principles described in this series to your own data.
The Microsoft BI Stack
Now let's take a closer look at the technology architecture of BI. The Microsoft BI stack provides all the tools you need to build, manage and use a BI solution. SQL Server 2008 is the foundation of the stack as the data platform hosting the data mart or data warehouse. A data mart is a subject-specific data store. A data warehouse is an enterprise-wide collection of data containing multiple subjects. The line between data marts and data warehouses is blurry, but you don't need to worry about the distinctions. In this series, I use the term data mart. (Although this series of articles refers specifically to SQL Server 2008, you can also build a similar BI solution by using SQL Server 2005 and its BI components with little or no modification of the instructions provided.)
SQL Server 2008 includes three BI components: Integration Services (SSIS), Analysis Services (SSAS), and Reporting Services (SSRS). These components extend the data platform with data integration functionality, multidimensional database support and a data presentation layer, respectively. Figure 1 illustrates the relationship of these components to one another in a BI solution.
Figure 1 SQL Server 2008 Components in a BI Solution
After you design the physical structures of your data mart, you use SSIS to populate it with the data you extract from other data sources. SSIS provides the tools necessary to automate the processes for cleansing data, consolidating data from multiple sources and transforming the data into a structure well suited for analysis. You can schedule the periodic execution of these processes using SQL Server Agent. In his article on SSIS on p. 31, Derek Comingore explains how to develop extract, transform and load (ETL) processes for a BI solution.
Adding an SSAS database to your BI solution allows you to support more sophisticated, high-performing interactive queries. You use SSAS to copy your relational data into a multidimensional database structure called a cube. A well-designed cube optimizes data for ad hoc queries by adding indexes and the functional equivalent of summary tables (known as aggregations) to return query results that can be exponentially faster than a comparable query to a relational database. You can also embed complex calculation logic in the cube to simplify queries that would otherwise require hundreds of lines of Transact-SQL code to replicate when using a relational data source. Many front-end tools (called cube browsers) let you query a cube without writing a single line of code. In next month's issue, Scot Reagin will show you how to develop a cube as part of a BI solution.
Whether you're storing your data in a SQL Server data mart or an SSAS cube, you can add SSRS to your solution architecture to make the data available to users. SSRS is a reporting platform that includes tools to develop reports, to secure and manage published reports using a centralized administrative infrastructure and to support user access to reports. You can use an SSRS Web application or Microsoft Office SharePoint Server 2007 (MOSS) to view reports, use the subscription feature to receive reports via e-mail, or call the SSRS Web service in your own application to display your reports. The default view of a report displays in HTML format, but you can also export a report to other file types, such as PDF or Excel. Next month, I'll explain more about using SSRS in the data presentation layer of your BI solution.
The Microsoft BI stack also includes several Microsoft Office technologies that expand your options for the data presentation layer. Excel 2007 is a popular choice for supporting data analysis in BI solutions. You can access your SQL Server data mart or browse an SSAS cube directly from Excel (as shown in Figure 2) and explore data much more freely than you can when viewing an SSRS report.
Figure 2 Using Excel 2007 to Browse a Cube
In addition to using Excel to explore both relational and multidimensional data, you can use its data mining algorithms to uncover hidden patterns of information in your data or to detect anomalies in the data (which means you can fix problems before populating your data mart). A great way to get started with data mining is to download a free add-in for Excel from Microsoft and then use it to analyze data that you import into Excel from any source or to view the output of a data mining model created and stored on the SSAS server. An add-in for Visio 2007 is also available for sharing annotated data mining models. You can learn more about downloading and using these add-ins at "Data Mining Add-Ins for Office 2007."
An increasingly popular choice for sharing data is Excel Services, which is available in MOSS. Using Excel 2007, you first create a workbook containing a pivot table that uses an SSAS cube as its data source and you then publish the data connection and your workbook to Excel Services. Although you can publish workbooks that contain data from other types of data sources, one of the advantages of using SSAS, pivot tables and Excel Services together is the ability to use much of the same cube browsing functionality within a pure HTML interface that preserves the familiarity of Excel. Another advantage is the ability to centralize the administration and access of Excel workbooks. For more information about Excel Services, refer to "Excel Services Technical Overview."
Yet another advantage of using Excel Services for cube browsing is that you can embed pivot tables and pivot charts into a MOSS dashboard page using the Excel Web Access Web Part. A dashboard is a special SharePoint content type that allows you to present data from multiple sources on a single page using a variety of Web Parts. You can even add a filter to the dashboard page and connect it to all or some of the Web Parts to dynamically change the content on the page based on the filter selection. Figure 3 shows a sample MOSS dashboard. MOSS also includes a Web Part to display SSRS reports that are stored on the MOSS server (an optional SSRS configuration known as integrated mode), or you can use Web Parts that ship with SSRS to display reports stored on the report server (the default configuration known as native mode).
Figure 3 Using a MOSS Dashboard Page to Display Workbooks and Reports
For dashboards with even more functionality, you can use Microsoft Office PerformancePoint Server 2007 (PPS). You use PPS to develop scorecards and dashboards that you can deploy to MOSS. A scorecard is a report that compares actual performance to a defined target and shows the results using color-coded icons. You can display a PPS scorecard using a special type of Web Part in a standard MOSS dashboard or as part of a PPS dashboard. In the dashboard, you can add PPS reports to dashboard zones to present different views of the same data (as shown in Figure 4) or to show related data from different sources.
Figure 4 Using a PPS Dashboard to Display Data
As you can see, the Microsoft BI stack provides numerous options for the development, administration and implementation of your BI solution, but it by no means prevents you from using other options. The Microsoft BI stack from top to bottom is an extensible architecture into which you can plug your own custom applications or third-party applications when you need to support specific requirements.
An Approach to Solution Development
The best way to start learning about the Microsoft BI stack is to build a simple solution. The companion articles in this series show you how to build a solution by using SSIS, SSAS and SSRS. When you've finished the series, you'll have a general idea of how each of these components works. Don't feel compelled, however, to use each component in the stack for every BI solution you build.
As you put together your BI solution, plan to spend most of your time performing data preparation tasks, such as restructuring and cleansing. In BI terms, this set of tasks is called the extract, transform and load (ETL) process.
Before you start ETL development, you must carefully plan the design of your BI solution. In my experience, the development of a BI solution goes more smoothly when you have a specific business problem to solve and approach the design by first considering how people need to interact with information. By taking a user-centric approach, you can work your way backward through the applicable business processes to design a solution that properly retrieves and structures the data to support your business need. This recommendation might seem to be an obvious approach, but I've seen many people attempt to drive the solution development from the data available out to the users and wind up with a solution that never gets used.
After you come up with the initial design, you're ready to start developing. If your BI solution uses SSIS, SSAS and SSRS, you begin by creating and populating the BI solution's data structures using SSIS. Once the data is in place, you continue to the next step by building a cube. After your development is complete, you process the cube to load it with data. You then use SSRS to develop a report that queries the cube and displays the query results in a report. You'll see how to step through this process over the next three articles in this series. For your own projects, you should approach this process incrementally and iteratively to make sure the results of one step work satisfactorily in subsequent steps.
To complete the entire BI solution described in the remaining articles in this series, you need to install SQL Server 2008, including SSIS, SSAS and SSRS. If you have access to SQL Server, an SSAS server and a report server on your network, you need only install the development tools on your computer. For product installation instructions, see "How to: Install SQL Server 2008 (Setup) ." You can download the sample databases used to build the solution for Adventure Works from CodePlex. The first database, AdventureWorks2008, is representative of an online transactional processing (OLTP) database, which captures the transactions generated by business operations in the sales, production and human resources departments. You'll use this database as a source for the data mart that you build using SSIS. The second database, AdventureWorksDW2008, is a sample of data representing best practices in data warehouse design. You can use this database as a source for your SSAS cube if you decide to skip learning about SSIS and jump directly into cube development.
After you get familiar with the SQL Server BI components, you should find a simple project of your own to continue building your knowledge. You don't need a full-fledged data mart to get started, but you should try to structure your data using the principles described in the next article in this series. Once you start using BI, you'll probably never look at data quite the same way again.
Stacia Misner is a BI consultant, educator and author, as well as the founder and principal of Data Inspirations. She has 25 years of experience in the IT industry, with nine years focused on the Microsoft BI stack. Stacia has written several books about BI and SQL Server. Her latest book, Microsoft SQL Server 2008 Reporting Services Step by Step (Microsoft Press, 2009), was released earlier this year. She can be reached at firstname.lastname@example.org.