Book excerpt: Collaboration Meets Business Intelligence
Published: March 10, 2011
This book excerpt is from Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010, McGraw-Hill Osborne Media, 1st edition February 3, 2011.
Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010 is a good introduction to business intelligence and provides an overview of the current Microsoft BI technology that is available. This book describes Microsoft Office 2010, Microsoft SQL Server 2008 R2, Microsoft SharePoint Server 2010, PowerPivot for SharePoint, and PowerPivot for Excel, to name but a few.
Philo Janus, a senior technology specialist with Microsoft. He has presented before thousands of users and developers on various topics, including SQL Server and BI, and has worked with large organizations on BI implementations.
Stacia Misner, MCITP, MCTS, is a consultant, trainer, and author specializing in Microsoft BI technologies. She has written several books and articles on SQL Server and is a frequent speaker at SQL Server conferences.
Building Integrated Business Intelligence Solutions with SQL Server 2008 R2 & Office 2010
Book excerpt -- Chapter 2: Collaboration Meets Business Intelligence
In This Chapter
Business Intelligence and Collaboration
Business Connectivity Services
As we were researching this book, we found a curious theme throughout the business intelligence community. A lot of time and effort was expended discussing how to deal with semistructured and unstructured data, and how to bring them into a BI infrastructure:“More complex text analysis tools are used to try to transform the data embedded within the text into a structured form.” – David Loshin, Business Intelligence: The Savvy Manager’s Guide (Morgan Kaufmann, 2003)“The incorporation of text analytics with traditional business intelligence is still in its infancy.” – Cindi Howson, Successful Business Intelligence (McGraw-Hill/Professional, 2007)“Therefore, it is much more appealing to bring the unstructured data to the structured environment than it is to recreate the analytical infrastructure in the unstructured environment.” – William Inmon and Anthony Nesavich, Tapping Into Unstructured Data (Prentice Hall, 2007)
Each of these statements takes the approach of treating the structured business intelligence environment as the one-stop toolset and brings the unstructured data (e-mail, documents, and slide decks) into it. What we find fascinating is that Microsoft has done the exact opposite—they have brought business intelligence into the middle of the collaborative stack. Integration between structured and unstructured data isn’t going to be about moving data from one side to the other—the data will stay in place; the tools will reach out as necessary.
We’ve given you a basic foundation in business intelligence, and you should now have an appreciation for the requirements on the BI side to clean data; bring it into a data mart; create OLAP structures; and then build reporting, analysis, scorecards, and dashboards on top. In this chapter we’re going to start by looking at SQL Server and understanding how it fits into the whole “BI stack.” Then we’re going to look at SharePoint Server and the services that supplement the hard-structured data integration with the collaborative framework that SharePoint is known for.
When most people think of SQL Server, they think of the relational database—tables of data, running queries, etc. However, that’s just the beginning of what SQL Server is capable of. Consider “SQL Server” to be an umbrella technology label that encompasses:
SQL Server Relational Database Management System (RDBMS), the database engine
SQL Server Integration Services (SSIS), an enterprise-class ETL
SQL Server Analysis Services (SSAS), multidimensional data analysis
SQL Server Reporting Services (SSRS), web-enabled reporting from any data source
With SQL Server 2008 R2, two new technologies are introduced that have enormous implications for a business intelligence solution: PowerPivot for Excel and SharePoint, and Master Data Services, which provides a service-based framework for managing standard data structures for an organization.
This makes SQL Server a powerful basis for a business intelligence platform. With a SQL Server installation, you have everything you need to move and clean data, create dimensional OLAP repositories, and achieve enterprise-class reporting. You also have data mining, end-user ad hoc reporting, and now end-user ad hoc modeling. When you couple this with the user interface capabilities of Office and SharePoint server, it’s a complete BI suite.
Let’s take a look at the services in SQL Server and how they create the foundation for our business intelligence solution.
Business Intelligence Development Studio (BIDS)
Actually, before we start, let’s introduce one of the most compelling aspects of the SQL Server BI experience—BIDS, shown in Figure 2-1. SQL Server uses a Visual Studio shell with specialized projects to develop Integration Services packages, Analysis Services cubes and data mining solutions, and Reporting Services reports. The benefit here is that by using a common tool for all the BI tasks, it’s easier for BI developers to move from one task to another, and to learn the various services.
Figure 2-1 The Business Intelligence Development Studio
While the shell is based on Visual Studio, there’s no licensing necessary to install BIDS—all you have to worry about is the licensing of SQL Server with respect to your developers. Beyond that, BIDS is installed with the SQL client tools. If you already have the appropriate version of Visual Studio installed, then installing the client tools will just add the BI templates to Visual Studio, as shown in Figure 2-2.
Now that we understand our toolset, let’s look at the services. This is just a quick introduction to ease the transition from the vendor-agnostic BI material in the first chapter to the Microsoft BI platform that we will be using moving forward.
SQL Server Integration Services was introduced with SQL Server 2005 to replace Data Transformation Services (DTS). SSIS is an enterprise-class Extract, Transform, Load (ETL) engine for moving data from one place to another. One of the most compelling aspects to the designer is that the component/canvas approach allows for the creation of some complex processes, as shown in Figure 2-3.
Figure 2-2 BI project templates in Visual Studio
While SSIS is a SQL Server service, there is no requirement for its data to be in SQL Server. For example, you can use SSIS to load data from a text file into Oracle, or to extract data from SAP and load it into MySQL.
The components in SSIS cover dozens of tasks—processing Analysis Services cubes, profiling data, sending mail, interacting with a message queue, operating on the file system, or any number of database maintenance tasks. This is so that SSIS packages can perform maintenance operations in conjunction with moving data. For example, you may want to compact a database before loading a large-batch data dump. Or, you may want to process an OLAP cube after loading new data.
Figure 2-3 Integration Services package in BIDS
With respect to moving data, SSIS can extract from any OLE DB source, and has a number of data destinations, from OLE DB to more specific destinations (such as data mining, to train mining models, or an ADO.Net data reader). Between the source and destination, there are a number of data transformation tasks to manipulate the data while it’s in transit.
Once you’ve built an Integration Services package, you can deploy it to SQL Server and use the SQL Server agent to execute the package as necessary (on a schedule, as the result of another job running, etc.).
We’ll dig into Integration Services in Chapter 6.
Analysis Services has been part of SQL Server in various forms since SQL Server 7. SSAS is an online analytical processing (OLAP) server that provides a means for creating a business model for structured data, and then aggregating and caching numerical data in structures to enable fast response times when end users do analysis on the data.
As we discussed in Chapter 1, when analyzing data, we are generally more interested in aggregations of the values than the atomic values themselves. We want to know how many bicycles were purchased in California in October; we don’t care that John Smith bought a Red Touring 150 on October 15. SSAS lets a database administrator (DBA) design a model that reflects the business needs of analysts and then map underlying data sources into that framework. By creating dimensions in business-aware terms like “product categories” or “sales territory,” we can make it easy for an analyst to understand the data and how it relates. Figure 2-4 shows an example of working in the semantic layer and focusing on results instead of worrying about queries.
Analysis Services acts as a data source for various front ends—Excel, SharePoint, various reporting engines, PerformancePoint, and third-party vendor tools. One of the most important aspects of using SSAS to unify data is that as a single point source for analytic data, each user gets what is referred to as “one version of the truth.”
Figure 2-4 An Analysis Services cube in BIDS
SQL Server 2008 R2 introduces what is effectively a new capability, called PowerPivot. This is an OLAP engine that integrates with SharePoint Server and (with a free download) Excel 2010. This enables an end user to use Excel to connect to various data sources directly (providing they have the right credentials) and build a model in Excel. Essentially, they are creating an Analysis Services cube on the fly, on their desktop.
PowerPivot is actually an Analysis Services instance. However, it can only host PowerPivot models; it cannot host standard SSAS cubes.
Once a user is happy with their PowerPivot model, they can publish it to Analysis Services running in PowerPivot mode, and it can be accessed via the SharePoint Server. From there, it will look like a standard cube in an Analysis Services instance, so other users can connect reports to it, Excel spreadsheets (shown in Figure 2-5), PerformancePoint—any application that can connect to SSAS.
Figure 2-5 A PowerPivot report in Excel, hosted in SharePoint
In addition to the modeling aspects, PowerPivot technology is woven into the SharePoint Server infrastructure, giving administrators powerful capabilities regarding management and scalability. Performance metrics are tracked and rolled into a cube so they can be displayed in a dashboard, reported on, and so on.
Reporting is simply presenting a detailed list of records—answering structured questions. OLAP analysis lets analysts dig into data to ask various questions and try to find the answers. Data mining is what you do when you don’t even know what question to ask.
Essentially, data mining is about identifying patterns in large volumes of data. Data mining is how grocery stores identify purchases that are generally made together, or alternatively, purchases that are not made in the same trip. Data mining feeds the question when the fast food cashier asks, “Would you like a side of coleslaw with that?”
There are a number of definitions or explanations of data mining. We like the following:“Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.” – David Hand, Heikki Mannila, and Padhraic Smyth, Principles of Data Mining (MIT Press, 2001)
SQL Server Analysis Services has a data mining engine built in. This data mining engine provides a number of different algorithms to perform different types of analysis, from basket analysis (who will buy what based on what’s in their shopping cart), classification (estimating what a customer will purchase based on their demographics), to time-series analysis (predicting future values based on past performance).
Data mining in SQL Server was largely ignored until Office 2007 was released. Excel 2007 had an add-in available that enabled end-user creation of data mining models and performing mining analysis on data in Excel. This made data mining far more accessible, for experimentation and hands-on learning. Models created in Excel can be published back to Analysis Services for consumption in other front ends.
Speaking of front ends, SQL Server Reporting Services is an amazing tool for creating enterprise reports. Reporting Services was designed by the SQL product team for SQL Server 2005. However, after the initial alpha release, customer demand for it was so great that they released Reporting Services as an add-on to SQL Server 2000.
Reporting Services is a web-based application. IT professionals design reports in BIDS, shown in Figure 2-6, and then publish them to the server. From the server, users can view reports through the Report Manager website, or the URL for a report can be linked from any other website. Reports can also be embedded in webpages.
Figure 2-6 Designing a report in BIDS
In addition to simply looking at reports via a browser, reports can be exported to Excel, Word, XML, PDF, or TIFF formats. Reporting Services 2008 R2 adds the ability to make reports available as an Atom data feed. Users can also subscribe to reports and have them e-mailed on a regular basis or when specific data points change value. Through a .Net control, reports can also be embedded in Windows applications or ASP.Net websites.
If you’ve dealt with enterprise reporting in the past, you will know that often, reporting servers can get to be a bit of a mess—old reports that aren’t used, no configuration management, new reports created because users can’t find an existing one, and so on. SSRS offers one option out of that madness: SharePoint integration.
Starting with SQL Server 2005, Reporting Services can be integrated with SharePoint Server. In this case, there is no longer an independent report manager—all report management is performed via SharePoint. In addition to making it easier to collaborate on reports and put reports in a location where they make sense, this means the capabilities of SharePoint are available—workflow for report approval, version control, publishing, and content management capabilities.
Figure 2-7 A report with sparklines and data bars
Reporting Services continues to improve on the charting engine. In 2008 R2, you now have the ability to add data bars, sparklines, maps, and KPI indicators to a report or chart, as shown in Figure 2-7. This gives us a lot more flexibility to design charts that can speak visually at a quick glance.
If you haven’t worked with charts in Reporting Services since 2005, you absolutely must upgrade. The shift from 2005 to 2008 was remarkable, and 2008 R2 just adds to that.
SQL Server 2008 introduced the spatial data type, allowing DBAs to capture geometric and geographic data with their records. There were some interesting architecture changes to accompany this, but not a lot surfaced in the tools. SQL Server 2008 R2 changes this—Reporting Services now has a map control that can parse spatial data for display graphically, as shown in Figure 2-8. You can see how powerful this is for data that has a geographic component.
Reporting Services 2005 included an ad hoc report builder, which was a client application that end users could use to create reports and publish them back to Reporting Services. The big problem was that the Report Builder would only work against special reporting models created and published in Reporting Services. The combination of lead work required and limitations on the client meant Report Builder didn’t see much use.
Figure 2-8 Report with a map
SQL Server 2008 saw Report Builder 2.0, which no longer had the requirement for models—it could work directly against data sources, or even create data sources and publish them back to Reporting Services. SQL Server 2008 R2 introduces Report Builder 3.0, shown in Figure 2-9, which is a huge stride forward (again).
Report Builder also has all of the capabilities of BIDS with respect to Tablix, charting, sparklines, indicators, and maps.
Master Data Services
One problem often faced in large organizations is ensuring that the same thing is referred to in the same way across the organization. Whether it’s a project name, customer name and information, or job title, there is a tendency for semantics to drift across the organization. And if you want to change a name, there can be a dozen places or more to change it. You can see how this can get a bit nightmarish when talking business intelligence.
Figure 2-9 Report Builder 3.0
SQL Server 2008 R2 includes a new technology called Master Data Services. This is a web-based service that lets the business maintain a standard set of definitions for objects and standard values for those objects. In other words, the address of vendor companies will always be Street1: and Street2:, not Street & Apt, or Address & Suite. And the names and addresses of vendors will be uniform, and if they change, they only need to be changed in one place. Figure 2-10 shows the master data model for “Product.”
Figure 2-10 Master data model for “Product”
Business Intelligence and Collaboration
We’ve reviewed the data infrastructure that will serve as the foundation for our business intelligence solution. As mentioned earlier, the BI data and enterprise collaboration have traditionally been viewed as two separate worlds that had to interact on rare occasion somehow. However, Microsoft has successfully merged them; not by pulling the unstructured data into the BI world, as often suggested–they have pulled the BI world into the collaboration space.
In The Jericho Principle (Wiley and Sons, 2003), Ralph Welborn and Vince Kasten define a “Jericho Zone” (“as in bring down the walls of…”). They say:
In the Jericho Zone, you can rapidly create high-value collaborative relationships with other companies. Based on everything we have seen […], operating in the Jericho Zone means that you and your partners have found rapid and efficient ways of:
Quantifying the value each of you brings to the relationship
Controlling the risks associated with high intimacy
Equitably sharing the rewards of the collaboration
Now in The Jericho Principle, the discussion is about collaboration between two business entities or companies. But we found the approach also appropriate for considering merging business intelligence and collaboration. In an established organization, merging BI and collaboration will bring disparate groups into close proximity—it is worth remembering that this merger is nontrivial, but the payoff can be great.
We highly recommend The Jericho Principle. Although it is about enterprise collaboration, there is a lot of salient advice regarding collaboration and business agility. If you’re in the collaboration space, you should have this on your bookshelf.
SharePoint debuted in 2001 as a collaboration platform leveraging Exchange as a data store. Everyone hated it. After a significant redesign, it was relaunched in 2003 as an ASP.Net application with SQL Server as the data store. Despite some interesting drawbacks (nobody ever did figure out what area pages were for), it took off like wildfire as a collaboration platform.
In 2007 Microsoft rebranded SharePoint to Microsoft Office SharePoint Server (MOSS). There were a lot of revolutionary changes (no more area pages), but what we’re most interested in are Excel Services, the Business Data Catalog, and the architecture that made PerformancePoint Server possible. MOSS was definitely the foundation for a capable business intelligence platform.
SharePoint returns in 2010 with a facelift, as shown in Figure 2-11. Now Windows SharePoint Services is termed SharePoint Foundation, while MOSS has gone back to simply being called SharePoint Server. Add in PowerPivot capabilities, tighter integration with Reporting Services, improvements to Excel Services, and the fact that PerformancePoint Services is now included instead of a separate product, and SharePoint Server is a business intelligence platform in its own right.
SharePoint is really just the platform, though. The heavy lifting is done by the services under the SharePoint umbrella. Let’s take a look at them.
Figure 2-11 SharePoint Server 2010
Excel, Excel Services, and PowerPivot
The SharePoint service here is Excel Services, but it’s so tightly integrated with Excel and PowerPivot we wanted to be sure that was clear. Excel Services is an engine that runs in the SharePoint space and can emulate the behavior of the Excel desktop client. A lot of power Excel users will have incredibly complex spreadsheets with macros and reams of data. This gives rise to them having the most powerful desktop in the organization, and they still have to leave a “do not turn off” sign on it when they leave it crunching at night. With Excel Services, that same spreadsheet can be published to a server, where it will leverage the server hardware (and failover), and the results can be published to a webpage or other output.
Excel Services provides three key scenarios:
Using server hardware to process heavy computing jobs, either by publishing the spreadsheet into Excel Services or by leveraging the Excel Services object model from a custom code base
Publishing interactive spreadsheets to a server for public consumption
Publishing spreadsheets for secure consumption
Spreadsheets published to Excel Services can be displayed via a web browser and have some interactivity. They can be wired to live data and parameterized, but end users can’t just use the spreadsheet on an ad hoc basis. Thus, Excel is necessary as a front end for Excel Services. SharePoint 2010 adds Office Web Apps, which includes a web-based edition of Excel as an interactive end-user application. This enables the creation and editing of Excel spreadsheets via a browser.
Excel is also the front end for PowerPivot, the modeling engine in SQL Server Analysis Services 2008 R2. With PowerPivot, the primary scenario is publishing to SharePoint, so again they’re pretty well welded together. Figure 2-12 shows a PowerPivot gallery in SharePoint 2010. Note the automatic thumbnails created and presented.
PowerPivot gives us the data modeling and analysis, but for a full BI implementation, we also need a scorecard and analytics. Let’s look at PerformancePoint Services.
Figure 2-12 A PowerPivot gallery in SharePoint 2010
PerformancePoint Server launched in 2007. It had the ability to build scorecards, create and associate analytic charts, and it had a modeling engine for doing what-if analysis and budgeting. The reception was interesting—the BI community wanted to like it, but it had a lot of rough edges, and to really get the full benefit out of it, you needed MOSS Enterprise Edition, which was a commitment many organizations weren’t ready to make.
In January 2009, Microsoft folded PerformancePoint into SharePoint. It became a “feature” of SharePoint, available to users who had or bought MOSS Enterprise Edition. The planning module, which had shown a lot of promise but still had a way to go, was end-of-lifed. Its modeling functions would eventually be picked up by PowerPivot, while its budgeting and financial capabilities were migrated to Microsoft’s Dynamics line.
If you are planning to stay with MOSS 2007 for some time, and are interested in PerformancePoint Server, allow us to recommend our book Pro PerformancePoint Server 2007 (APress, 2008). Although written when PerformancePoint was a separate product, it’s still relevant.
In SharePoint Server 2010, PerformancePoint Services has been fully integrated into SharePoint. To create a scorecard or dashboard, you start at SharePoint at a PerformancePoint site, as shown in Figure 2-13. This site makes it easy to organize the content in PerformancePoint dashboards. (And, of course, you can have multiple sites, so as to keep everything clean.)
To build a dashboard in PerformancePoint, launch the Dashboard Designer—a click-once application that is deployed from SharePoint on demand. The Dashboard Designer allows users to create KPIs and connect them to data sources, as shown in Figure 2-14; collect KPIs into scorecards; create analytic reports; and combine everything into dashboards.
Figure 2-13 SharePoint Server PerformancePoint site
Figure 2-14 The KPI designer in Dashboard Designer
The scorecard designer is WYSIWYG (what you see is what you get)—if you’ve made data connections, you can see the scorecard as you build it. You can also build a scorecard from a dimension, automatically creating KPIs for every district in the state, for example. Dashboard design, however, is not WYSIWYG—there’s a block designer where you can drag items to the dashboard, but you’ll have to deploy it to SharePoint to see how it looks.
Dashboards can have scorecards, reports, embedded webpages, filters, and so on. Reports can be linked to scorecards, so clicking a KPI on the scorecard will filter the reports. In addition, the dashboard is active—there are means to drill down into data, drill up or down hierarchies, annotate KPI values, etc. A sample dashboard is shown in Figure 2-15.
Another feature in PerformancePoint Services is the ability to connect a strategy map to a scorecard. Strategy maps are just Visio diagrams, and the Dashboard Designer will open an editor that allows you to link shapes to KPIs, resulting in the KPIs being color-coded based on the KPI status. The cool thing here is that the Visio diagram doesn’t have to be a strategy map—any Visio diagram will work. So you could show a shop floor and color machines based on the status read from performance metrics.
And, on the topic of Visio, let’s talk about another brand-new feature in SharePoint Server 2010: Visio Services.
Figure 2-15 A dashboard in PerformancePoint Services
Since Visio 2003, Visio has been getting increasingly data-connected. Visio 2010 has amazing capabilities to connect it to data, and hopefully what we’ve covered so far in the Microsoft suite has you thinking in terms of pervasive data availability. However, the problem to date has been the ability to create actual reports out of Visio diagrams—you always had to have Visio to view a .vsd file.
Figure 2-16 Visio Services in action
In SharePoint Server 2010, there is a Visio Service that is similar to Excel Services. It allows you to publish a Visio diagram to SharePoint, enable live updates of the data-connected shapes, and dynamically zoom and pan around the diagram. Although this sounds like limited functionality, if you think about it, it’s just opened the door to Visio as a report designer. An example of a published Visio diagram is in Figure 2-16.
Business Connectivity Services
MOSS 2007 introduced the Business Data Catalog (BDC). This was an XML-driven modeling library to enable SharePoint developers to create business data connections to any business system. The idea was that then business data could show up in searches and be read by reporting applications. Unfortunately, the BDC was read-only, hard to configure correctly, connections required hacking XML, and the usefulness was somewhat limited.
Enter SharePoint 2010 and Business Connectivity Services (BCS). Business Connectivity Services provides “no-code” connectivity to external business systems so that using them internally is uniform and easy to understand for SharePoint users. In addition, the connections are now read/write, governed by the security around the connection ID.
With BCS, a DBA creates a model of the business system, which is then maintained in a metadata catalog. From that catalog, SharePoint users can access and leverage connections to any business system mapped.
Hopefully, these first two chapters have given you a general feel for business intelligence in general, and then how the SQL Server and Office technologies fit into a BI world. From here, we’re going to dive deeply into these technologies to give you a working understanding of their features, limitations, and how to use them, both as a content creator and as a content consumer.