Implementing Smart Reports with the Microsoft Business Intelligence Platform
This article presents an excerpt from the book, Applied Microsoft Analysis Services, by Teo Lachev. Learn how to author “smart” reports using Reporting Services (SSRS), Analysis Services (SSAS), and SQL Server CLR stored procedures. Get the reports demonstrated in this article by downloading the sample code.
On This Page
Authoring Data Mining Reports
Business intelligence should help users analyze patterns from the past to plan for the future. Data mining presents ample opportunities to implement a new generation of reports with predictive analytics features. Such report could process historical patterns to present views of opportunities and trends. The two sample reports (Targeted Campaign and Sales Forecast) included in this article demonstrate how you can leverage data mining features of SSAS to author “smart” reports. The first report presents a list of the top ten customers that are most likely to purchase a given product. The second report uses the past sales history to forecast future sales.
These reports use the SOS OLAP cube whose definition is included in the SOS OLAP Analysis Services project. The cube includes data mining models for targeted campaigning (Targeted Campaign.dmm) and time forecasting (Sales Forecasting.dmm). Discussing the data mining models is beyond the scope of this article. Before running the reports, make sure to deploy the SOS OLAP cube to the Analysis Services server (in the BI Studio, right-click on SOS OLAP project node and choose Deploy).
Suppose that you need to prepare a standard report that lists the most likely buyers. Figure 1 shows the Targeted Campaign report that meets this requirement. The end user can filter the report by product category. The report shows the top ten customers that may purchase a product from the selected category and the estimated probability.
Constructing the DMX query
To implement a report that sources data from a mining model, you need to create a DMX (Data Mining Extension) query to retrieve predicted data from a mining model. Constructing a DMX query statement is easy with the graphical Query Builder.
Create a new dataset dsCampaign that uses SOS OLAP as a data source.
In the Dataset dropdown (Data tab), select the dsCampaign dataset to open it in the Query Builder.
Select the pickaxe toolbar button to switch to the DMX command mode (Figure 2).Figure 2. Switch to the DMX command mode to create a DMX query.
To produce the query needed for the Targeted Campaign report, I used the Design mode of the Query Builder. The vTargetMail view fulfils the role of the case table that contains the new customers. As with MDX, DMX queries can have parameters. The @ProductCategory parameter placeholder will be replaced during runtime with the value of the selected product category. If you want to test the parameter during design time, you can click on the Query Parameters toolbar button. In the Query Parameters dialog that follows, enter @ProductCategory as a parameter name and the category name as the parameter value.
Since the report needs the top ten customers who are most likely to buy a product from the selected category, I had to finalize the query in SQL mode (right-click outside the two tables and choose SQL. I needed to introduce an ORDER clause in the query to sort by the predicted probability column and use SELECT TOP 10 to cap the query.
Constructing the report parameter
As with the MDX command mode, the moment you introduce a parameter placeholder in the DMX query, the Report Designer creates a report-level parameter. To make the parameter data-driven, I used a second dataset (dsCategory) that fetches the product categories from the AdventureWorks database. I used this dataset to derive the parameter available values.
Once the report query and parameters have been taken care of, authoring the report layout is easy. I used a table region that is bound to the dsCampaign dataset. Next, I dragged the dsCampaign fields from the Datasets pane and drop them on the table region to create the three columns. The report also demonstrates how expression-based conditional formatting can be used to alternate the background color of the table rows.
Sales forecasting is a critical reporting requirement for most businesses. At the same time, sales forecasting has been traditionally difficult to implement and forecasts have been often wrong as a result of the incorrect output from the mathematical models used. In this section, I would like to show you how SSAS data mining can be used to create compelling and consistent sales forecast reports.
From the user’s standpoint, the Sales Forecast report (Figure 3) is simple. The end user would enter the year for the historical period and the number of the forecasted months. The report displays the sales figures for the selected year followed by the forecasted figures (shaded in gray). The data is broken down by the product model and territory on rows and time (in months) on columns.
Choosing implementation approach
The Time Series algorithm doesn’t support passing an input dataset by using a Prediction Join statement. As a result, the only way to predict from an arbitrary dataset is to build a new Time Series data model that is bound to that dataset. One way to do this is to create a session mining model which the server automatically disposes of at the end of the user session. From an SSRS standpoint, you have at least three options to author a report that uses a session mining model. First, you can implement a custom data extension to generate the model, retrieve the results as an ADO.NET dataset (or reader), and translate the results to a format that SSRS understands.
Second, you can call custom code in an external .NET assembly that generates the session mining model and retrieve the results via expressions in your report. I demonstrated a similar approach in my book Microsoft Reporting Services in Action (see Resources). Both approaches mentioned above share the same tradeoffs. They are difficult to implement and debug. You have to distribute and register the custom dataset extension (or external assembly) with the Report Server. Finally, custom extensions are supported only by the Standards and Enterprise editions of SSRS.
With the advent of the CLR stored procedures in SQL Server 2005, a third approach is to move the programming logic into a CLR stored procedure. A CLR stored procedure is .NET code packaged as a stored procedure. In my opinion, CLR stored procedures will open a new world of implementation and integration scenarios that were difficult, if not impossible, with prior releases of SQL Server. Once the CLR stored procedure is implemented and deployed, it can be called from your report just like a regular stored procedure. The downside of this approach is that it requires SQL Server 2005. Let’s use a CLR stored procedure to implement the Sales Forecast report.
Implementing the SalesForecast mining model
When implementing a CLR stored procedure, you may find it easier to adopt a two-step approach. First, for faster development and testing, you write and debug the .NET code in a standard .NET project, e.g. a console application. Once the code is ready, you move it to a CLR stored procedure and register the procedure with the SQL Server 2005. In the Ch18 solution, you will find a TimeSeries console application that demonstrates the first step of the process, and the AdventureWorks assembly that includes the final version of the CreateSessionModel stored procedure.
You can use AMO or DMX DDL statements to create the mining model. Listing 1 shows the abbreviated code of the TimeSeries console application that generates and queries the session model using DMX only. In the TimeSeries folder, you will find a SalesForecast.dmx file which you can use to test the DMX statements in Management Studio.
Listing 1. Creating a session mining model that returns the historical and forecasted data.
Private Sub CreateSessionModel(ByVal Year As Integer, _ ByVal numberMonths As Integer) command.CommandText = "DROP MINING STRUCTURE SessionForecast_Structure" command.Connection = connection command.ExecuteNonQuery() ' Create a session model query = "CREATE SESSION MINING MODEL SessionForecast ( " _ & "TimeIndex LONG KEY TIME, ModelRegion TEXT KEY," _ & "Amount LONG CONTINUOUS PREDICT" _ & ") Using Microsoft_Time_Series WITH DRILLTHROUGH" command.CommandText = query command.ExecuteNonQuery() Dim startMonth As Integer = Int32.Parse(Year.ToString() & "01") Dim endMonth As Integer = Int32.Parse(Year.ToString() & "12") ' Train the model query="INSERT INTO SessionForecast,TimeIndex,ModelRegion,Amount,Quantity) " _ & "OPENQUERY([Adventure Works DW], " _ & "'Select TimeIndex, ModelRegion, Amount " _ & "FROM [vTimeSeries] WHERE (TimeIndex >= " & startMonth & _ & "And TimeIndex <= " & endMonth & ") ORDER BY TimeIndex')" command.CommandText = query command.ExecuteNonQuery() Dim predictions As New DataSet ' Get predicted results query = "SELECT ModelRegion, " _ & "(SELECT $TIME, [Amount], PredictVariance([Amount]) " _ & "FROM PredictTimeSeries([Amount], @NumberMonths)) " _ & "FROM [SessionForecast]" command.CommandText = query command.Parameters.Add("NumberMonths", numberMonths) adapter.SelectCommand = command ' Fill the dataset adapter.Fill(predictions) ' Get the cases query = "SELECT ModelRegion, TimeIndex, Amount FROM [SessionForecast].CASES" command.CommandText = query adapter.SelectCommand = command Dim cases As New DataTable adapter.Fill(cases) ' Merge cases and predictions Dim results As DataTable = PopulatePredictedColumns(predictions.Tables(0)) results.Merge(cases) End Sub
First, the code attempts to drop the session mining structure. Normally, the code will fail to find it and will error out. That’s because a session mining model is automatically disposed of when the user session terminates. However, you may find the DROP MINING STRUCTURE statement useful during debugging, e.g. if you need to change a query and recreate the structure without restarting the application.
Creating the mining model
Next, the code creates the SalesForecast session mining model. For the purposes of the Sales Forecast report, we need only three columns – TimeIndex (the month), the Model Region (we are forecasting by the model-region combination on rows), and Amount (the column we are predicting against). I used the same columns for the model as the ones included in the vTimeSeries SQL view which is used to populate the model. When the server executes CREATE SESSION MINING MODEL statement, the server creates the SalesForecast model and the SalesForecast_Structure to host the model.
Training the model
Once the session model is generated, we are ready to train the model with the historical dataset. We do this by sending an INSERT INTO DMX statement which feeds the model with data from the vTimeSeries SQL view for the time period selected in the report. This statement is the workhorse of the CreateSessionModel function and it takes the most time to execute. During the statement execution, the server populates the model with data and processes (trains) the model to produce the predicted results.
Querying the model
The code proceeds by sending a prediction query to get the predicted results. A Time Series model exposes the predicted results as a nested table that contains as many rows as the number of the prediction steps. Therefore, to retrieve the forecasted numbers, the SELECT statement uses a subquery that returns the forecasted month ($TIME function), amount, and the estimated variance of the prediction (PredictVariance function). Given the results shown in Figure 4, the SalesForecast model has forecasted that the sales amount for the first predicted month (January 2004) will be $121,415.
In real life, you should spend some time to optimize the mining model and reduce the variance to obtain more accurate predictions.
Note that the number of the predicted months is passed as a parameter to the prediction query. The forecasted results are stuffed into the predictions ADO.NET dataset.
Besides the forecasted values, the Sales Forecast report displays the historical data. The second SELECT statement queries the mining model to retrieve the cases used to train the model. Each mining model exposes a CASES property which returns the case table. Needless to say, we can obtain the cases directly the vTimeSeries view for the time period requested.
Combining the historical and forecasted results
We need to combine the historical and forecasted results in a single dataset that contains all data needed to generate the report. However, as they stand, both datasets have different schemas. Only datasets with identical schemas can be merged together. The PopulatePredictedColumns helper function (not shown in Listing 1) converts the predictions dataset to the schema of the cases dataset by creating a new DataTable object.
Since the predictions dataset has two tables that are joined with a relation (see Figure 4), PopulatePredictedColumns loops through each parent row (ModelRegion), retrieves the child records (forecasted monthly values), and flattens both rows into a single row. Finally, the code merges the predicted results and the case results into one table. The Sales Forecast report uses a matrix region to rotate the TimeIndex field on columns.
Working with CLR Stored Procedures
The last implementation task is to convert the .NET code into a CLR stored procedure which will be invoked by the Sales Forecast report. This involves creating an SQL Server database project and changing the code to serialize the ADO.NET DataTable object. You will need VS.NET to create a SQL Server project. If you don’t have VS.NET, you can deploy the AdventureWorks assembly found in the AdventureWorks\bin folder.
If you need to execute just a DMX prediction query, consider using an SSAS stored procedure that returns IDataReader as a data source for your report. For example, an SSAS stored procedure can query a mining model and return the results using the AdomdDataReader.ExecuteReader method. The Sales Forecast report uses a CLR stored procedure for two main reasons. First, it needs the mining results as an ADO.NET dataset so it can combine the forecasted dataset with the historical dataset. As it stands, the ADOMD Server object library doesn’t expose the AdomdDataAdapter object and I wanted to avoid populating an ADO.NET table programmatically. Second, CLR stored procedures are more flexible because they are not restricted to using ADOMD Server only.
Creating an SQL Server project
The easiest way to work with a CLR stored procedure is to create an SQL Server project. An SQL Server project can include stored procedures, user-defined functions, user-defined types, and triggers written in .NET code. In addition, an SQL Server project automatically deploys the CLR objects to the database server and supports debugging.
Select the File --> New --> Project menu. Expand the Visual Basic project type and select Database.
In the Templates pane, choose SQL Server Project. Enter AdventureWorks as the project name. Choose the location where the project will be created and click OK.
Rename the Class1.vb file to Predictions. Rename the Class1 class in it to StoredProcedures.
Copy the CreateSessionModel function and paste it inside the StoredProcedures class and add the Shared modifier. That’s because, all CLR stored procedures must be shared (static in C#).
To tell BI Studio that the CreateSessionModel function is a CLR stored procedure, decorate it with the following attribute <Microsoft.SqlServer.Server.SqlProcedure()>.
The CreateSessionModel requires a reference to the AdomdClient object library. Much to my surprise, when I opened the Add References dialog in an SQL Server project, I found that the SQL Server tab lists only a few trusted assemblies and AdomdClient is not among them. To get AdomdClient to show up in Add References, you need to first register this assembly in the AdventureWorksDW database.
In Management Studio, connect to the SQL Server hosting the AdventureWorksDW database. Right-click on the AdventureWorksDW, select New Query to open a new query window and execute the following T-SQL command:
ALTER DATABASE [AdventureWorksDW] SET TRUSTWORTHY ONFigure 6. You need to register a non-trusted assembly explicitly before you can reference it from an SQL Server project.
This command tells the SQL Server to trust the AdventureWorksDW database and it is required because we need to elevate the permissions of the AdomdClient assembly to Unsafe.
Expand AdventureWorksDW database --> Programmability --> Assemblies.
Right-click on the Assemblies folder and choose New Assembly.
Browse to the location of Microsoft.AnalysisServices.AdomdClient.dll (default install location is C:\Program Files\Microsoft.NET\ADOMD.NET\90).
Change the permission set to Unsafe (Figure 6). Click OK to register AdomdClient.
Back to the AdventureWorks SQL Server project, choose Project --> Add Reference. In the Add References dialog, flip to the SQL Server tab, select the AdomdClient assembly and click OK to create a reference to it.
Build the project. If all is well, the project should compile successfully.
The last conversion task is to serialize the results of the stored procedure.
Returning the results
A CLR stored procedure cannot return tabular results as a DataTable object. Instead, the results must be streamed in a well-defined format using a connected pipe (SqlPipe object). Listing 2 demonstrates how this could be done.
Listing 2 A CLR stored procedure must pipe tabular results.
Dim results As DataTable = PopulatePredictedColumns(predictions.Tables(0)) results.Merge(cases) Dim record As SqlDataRecord = TableRowToSqlDataRecord(results) SqlContext.Pipe.SendResultsStart(record) For Each row As DataRow In results.Rows record.SetValues(row.ItemArray) SqlContext.Pipe.SendResultsRow(record) Next SqlContext.Pipe.SendResultsEnd()
First, the CLR stored procedure must inform the caller about the format of the returned results by sending back a metadata header (Pipe.SendResultsStart). The TableRowToSqlDataRecord (not shown in Listing 2 function is used to generate the header. Its task is to map the .NET data types to SQL Server data types. Next, CreateSessionModel loops through the table rows and calls Pipe.SendResultsRow to pipe each row. Finally, CreateSessionModel calls Pipe.SendResultsEnd to let the caller know that the end of the resultset is reached.
Testing CLR stored procedures
Testing CLR stored procedure is easy with BI Studio. Follow these steps to debug the CreateSessionModel stored procedure.
Open the AdventureWorks project properties and select the Database tab. Click the Browse button and choose the AdventureWorksDW database reference to deploy the project to the AdventureWorksDW database. If the AdventureWorksDW doesn’t exist, create a new database reference that points to the AdventureWorksDW database.
Back to the Solution Explorer, right-click on the AdventureWorks SQL Server project and choose Deploy. BI Studio registers the AdventureWorks assembly and all of its CLR objects in the AdventureWorksDW database.
Place a breakpoint inside the CLR stored procedure.
Press Ctrl+Alt+S to open the Server Explorer. Expand the AdventureWorksDW connection --> Stored Procedures. Right-click on the CreateSessionModel stored procedure and choose Step Into Stored Procedure (Figure 7).
In Run Stored Procedure dialog that follows, enter the parameter values, e.g. 2003 for the @Year parameter and 3 for the @NumberMonths parameter.
Once you’ve tested the CreateSessionModel procedure, you can use it to feed a report dataset.
Implementing Drillthrough Navigation
The Sales Forecast report demonstrates also the drillthrough navigation interactive feature that SSRS supports. Similar to a drillthrough UDM action, a drillthrough SSRS action can be used to generate a report that shows the details behind a report item. Once the Sales Forecast report is rendered, the user can click on any historical sales figure. This generates the Daily Product Sales report that displays the daily product history for the corresponding month and sales territory. For example, if the user clicks on the first cell ($41,227), the Daily Product Sales report will show the aggregated product sales in Europe broken down by day.
To implement a drillthrough action on the Amount textbox in the matrix region:
Right-click on the Amount textbox, choose Properties and flip to the Navigation tab.
In the Hyperlink panel, select the Jump to report option and select the Daily Product Sales report.
Click the Parameters button and match the parameters of the master report and the drillthrough report (Figure 8).
Initially, I was planning to demonstrate the Jump to URL option which gives more flexibility to construct the URL link. For example, assuming that you have an embedded function that sets the reportLink variable to contain the URL address of the Daily Product Sales report, the following URL navigation action would open the report in a new window and resize the window.
SSAS and SSRS are completing technologies that can be used together to implement end-to-end business intelligence solutions. In this article, I showed you how you can author smart reports that use data mining to display prediction results.
SQL Server CLR stored procedures open a new world for data retrieval, processing, and reporting. They can tap into the power of Microsoft .NET and encapsulate complex programming logic. Many data integration scenarios that required SSRS custom data extensions can be implemented much easier with SQL Server 2005 and CLR stored procedures.