Time Series Predictions using Updated Data (Intermediate Data Mining Tutorial)
Applies To: SQL Server 2016 Preview
In this lesson, you will create a prediction query that adds the new sales data to the model. By extending the model with new data, you can get up-to-date predictions that include the newest data points.
Creating time series predictions that use new data is easy: you simply add the parameter EXTEND_MODEL_CASES to the PredictTimeSeries (DMX) function, specify the source of the new data, and specify how many predictions you want to get.
The parameter EXTEND_MODEL_CASES is optional; by default the model is extended any time that you create a time series prediction query by joining new data as inputs.
To build the prediction query and add new data
If the model is not already open, double-click the Forecasting structure, and in Data Mining Designer, click the Mining Model Prediction tab.
In the Mining Model pane, the model Forecasting should already be selected. If it is not selected, click Select Model, and then select the model, Forecasting.
In the Select Input Table(s) pane, click Select Case Table.
In the Select Table dialog box, select the data source, Adventure Works DW Multidimensional 2012 .
From the list of data source views, select NewSalesData and then click OK.
Right-click the surface of the design area and select Modify Connections.
Using the Modify Mapping dialog box, map the columns in the model to the columns in the external data as follows:
Map the ReportingDate column in the mining model to the NewDate column in the input data.
Map the Amount column in the mining model to the NewAmount column in the input data.
Map the Quantity column in the mining model to the NewQty column in the input data.
Map the ModelRegion column in the mining model to the Series column in the input data.
Now you will build the prediction query.
First, add a column to the prediction query to output the series the prediction applies to.
In the grid, click the first empty row, under Source, and then select Forecasting.
In the Field column, select Model Region and for Alias, type Model Region.
Next, add and edit the prediction function.
Click an empty row, and under Source, select Prediction Function.
For Field, select PredictTimeSeries.
For Alias, type Predicted Values.
Drag the field Quantity from the Mining Model pane into the Criteria/Argument column.
In the Criteria/Argument column, after the field name, type the following text: 5,EXTEND_MODEL_CASES
The complete text of the Criteria/Argument text box should be as follows: [Forecasting].[Quantity],5,EXTEND_MODEL_CASES
Click Results and review the results.
The predictions begin in July (the first time slice after the end of the original data) and end at November (the fifth time slice after the end of the original data).
You can see that to use this type of prediction query effectively, you need to know when the old data ends, as well as how many time slices there are in the new data.
For example, in this model, the original data series ended in June, and the data is for the months of July, August, and September.
Predictions that use EXTEND_MODEL_CASES always begin at the end of the original data series. Therefore, if you want to get only the predictions for the unknown months, you need to specify the starting point and the end point for prediction. Both values are specified as a number of time slices starting at the end of the old data.
The following procedure demonstrates how to do this.
Change the start and end points of the predictions
In Prediction Query Builder, click Query to switch to DMX view.
Locate the DMX statement that contains the PredictTimeSeries function and change it as follows:
Click Results and review the results.
Now the predictions begin at October (the fourth time slice, counting from the end of the original data) and end at December (the sixth time slice, counting from the end of the original data).