Predicting Using the Averaged Forecasting Model (Intermediate Data Mining Tutorial)

In an earlier task in this lesson, you used the Prediction Query Builder to create a query that generates the default predictions for the Forecasting model. In this task, you will learn to create a prediction query that applies the worldwide sales model to one of the individual regions.

Using Replacement Data in a Time Series Prediction

In SQL Server 2008, the PredictTimeSeries (DMX) function has two new parameters, EXTEND_MODEL_CASES and REPLACE_MODEL_CASES, which enable you to customize time series predictions.

  • EXTEND_MODEL_CASES allows you to add data to an existing model and make predictions based on the new data.

  • REPLACE_MODEL_CASES allows you to replace the data series that was used to create the model, and use data from another series.

In this scenario, you will use REPLACE_MODEL_CASES. However, you must first set up a data source view that contains only the replacement data. Replacement data is the data used for each individual series. After you have created the replacement data source view, you can then create a prediction query that applies the general model to the replacement data.

To create the data source view that contains the replacement data

  1. In Solution Explorer, right-click Data Source Views, and then select New Data Source View.

  2. In the Data Source View wizard, make the following selections:

    Data Source: Adventure Works DW2008R2

    Select Tables and Views: Do not select any tables

    Name: T1000 Pacific Region

  3. Click Finish.

  4. Right-click the empty design surface for T1000 Pacific Region.dsv, and then select New Named Query.

    The Create Named Query dialog box appears. Retype the name, and then add the following description:

    Name: T1000 Pacific Region

    Description: Filter vTimeSeries by region and model

  5. In the text pane, type the following query:

    SELECT ReportingDate, ModelRegion, Quantity, Amount
    FROM dbo.vTimeSeries
    WHERE (ModelRegion = N'T1000 Pacific')
    

    Note

    You can also build the query by using the Query Builder, but it is faster to type the text of the query. However, after you have typed the query text, you can click another pane to see how the query text is parsed and the query is built from the supporting tables. You might also want to copy the query text and save it to a text file so that you can re-use it for another data series.

  6. Click OK..

  7. In the Data Source View design surface, right-click T1000 Pacific, and then select Explore Data to verify that the data is filtered correctly.

Now, you will build a time series prediction query as usual, but add the REPLACE_MODEL_CASES parameter to specify that the data series on which the prediction is based should be replaced with the new data that you provide. You must also specify the mapping between the mining model and the input table.

To build the prediction query and provide replacement data

  1. If the model is not already open, double-click the AllRegions structure, and in Data Mining Designer, click the Mining Model Prediction tab.

  2. In the Mining Model pane, the model AllRegions should already be selected. If it is not selected, click Select Model, and then select the model, AllRegions.

  3. In the Select Input Table(s) pane, click Select Case Table.

  4. In the Select Table dialog box, change the data source to T1000 Pacific Region, and then click OK.

    The data source view that you created appears as a table with columns. Some columns might be already mapped to the mining model columns.

  5. Right-click the join line between the mining model and the input data and select Modify Connections.

  6. In the Modify Mapping dialog box, verify that the ReportingDate column in the mining model is mapped to the ReportingDate column in the input data.

  7. In the Modify Mapping dialog box, in the row for AvgQty, click under Table Column and then select T1000 Pacific.Quantity. Click OK.

    This step maps the aggregated quantity column in the general model to the quantity column in the replacement data series.

  8. In the grid, click the first empty row, under Source, and then select AllRegions. In the Field column, select Region and in the Alias column, type Model Used.

    This step adds a column to the results that you can refer to for the predictions from the general model.

  9. Click an empty row, and under Source, select Custom Expression. In the Alias column, type ModelRegion. In the Criteria/Argument column, type 'T1000 Pacific'.

    This step adds a label to the results, so that you can see which series the prediction is for.

  10. Click an empty row, and under Source, select Prediction Function. In the Field column, select PredictTimeSeries. In the Alias column, type Predicted Values.

  11. Drag the field AvgQty from the Mining Model pane into the Criteria/Argument column by using the drag and drop operation.

  12. In the Criteria/Argument column, after the field name, type the following text: ,5, REPLACE_MODEL_CASES

    The complete text of the Criteria/Argument text box should be as follows: [AllRegions].[AvgQty],5,REPLACE_MODEL_CASES

  13. Click Results.

Viewing the Results

The prediction query returns results similar to the following table:

Model Used

ModelRegion

Predicted Quantity

All Regions

T-1000 Pacific

$TIMEAvg Qty
7/25/2008 12:00:00 AM68
8/25/2008 12:00:00 AM52
9/25/2008 12:00:00 AM48
10/25/2008 12:00:00 AM56
11/25/2008 12:00:00 AM44

To apply the general model to a different data series, such as the T1000 product model in the North America region, you must create a different query for each series. However, rather than starting the process again, you can edit the DMX statement that you created, and filter the inputs differently. For example, the following DMX statement represents the query that you just built:

SELECT
  ([All Regions].[Region]) as [Model Used],
  ( 'T-1000 Pacific') as [ModelRegion],
  (PredictTimeSeries([All Regions].[Avg Qty],5, REPLACE_MODEL_CASES)) as [Predicted Quantity]
FROM
  [All Regions]
PREDICTION JOIN
  OPENQUERY([Adventure Works DW2008R2],
    'SELECT
      [ReportingDate]
    FROM
      (SELECT        ReportingDate, ModelRegion, Quantity, Amount
FROM            dbo.vTimeSeries
WHERE        (ModelRegion = N''T1000 Pacific'')) as [T1000 Pacific]
    ') AS t
ON
  [All Regions].[Reporting Date] = t.[ReportingDate] AND
   [All Regions].[Avg Qty] = t.[Quantity]

To apply this to a different model, you can edit the query statement to replace the filter condition and the labels applied to each result. For example, if you change the filter conditions and column labels by replacing 'Pacific' with 'North America', you will get predictions for the T1000 product in North America, based on the patterns in the general model.

Model Used

ModelRegion

Predicted Quantity

All Regions

T-1000 North America

$TIMEAvg Qty
7/25/2008 12:00:00 AM103
8/25/2008 12:00:00 AM84
9/25/2008 12:00:00 AM79
10/25/2008 12:00:00 AM85
11/25/2008 12:00:00 AM68