Adding an Aggregated Forecasting Model (Intermediate Data Mining Tutorial)

You saw from exploring the forecasting model that although sales in most of the regions follow a pattern, some regions and some models, such as the M200 model in the Pacific region, exhibit very different trends. Differences among regions are common and can be caused by many factors, including marketing promotions, inaccurate reporting, or geopolitical events.

To minimize the effect of these factors on projections, you decide to build a generalized mining model that is based on aggregated measures of worldwide sales. You can then make predictions with this model and apply them to individual regions. Finally, you will compare the predictions created by the various products.

Creating the Data for the General Model

The first step in creating the generalized model is to aggregate the worldwide sales data. You do this by creating a special data source view that uses an existing data source but performs calculations such as sums or averages.

To create a data source view using a custom query or calculation

  1. In Solution Explorer, right-click Data Source Views, and select New Data Source View.

  2. On the welcome page of the wizard, click Next.

  3. On the Select Data Source page, select Adventure Works DW2008, then click Next.

  4. In the page, Select Tables and Views, click Next.

  5. On the page, Completing the Wizard, type the name AllRegions, then click Finish.

  6. Next, right-click the blank data source view design surface, and select New Named Query.

  7. In the Create Named Query dialog box, for Name, type AllRegions, and for Description, type Sum and average of sales for all models and regions.

  8. In the SQL text pane, type the following statement:

    SELECT ReportingDate, 
    SUM([Quantity]) as SumQty, AVG([Quantity]) as AvgQty,
    SUM([Amount]) AS SumAmt, AVG([Amount]) AS AvgAmt,
    'All Regions' as [Region]
    FROM dbo.vTimeSeries 
    GROUP BY ReportingDate
    
  9. Click OK.

  10. Right-click the AllRegions table and select Explore Data.

    The new data source view contains both a sum and an average for sales of all products, worldwide. You could also group and aggregate sales by model, but for the purpose of this tutorial you will create a single time series model that you can use for prediction with any combination of region or product.

After you have created the new view of the data, you must build a new mining structure and then build a mining model based on that structure. By now you should be familiar with how to build a mining structure. Therefore, the following directions are simplified.

To build a mining structure and mining model using the aggregated data

  1. In Solution Explorer, right click Mining Structures, and select New Mining Structure to start the Data Mining Wizard.

  2. In the Data Mining Wizard, make the following selections:

    • Algorithm: Microsoft Time Series

    • Data source view: AllRegions

    • Key: ReportingDate (Key Time)and Region (Key)

    • Input and Predict: AvgAmt, AvgQty, SumAmt, SumQty,

    • Mining structure name: All Regions

    • Mining model name: All Regions

  3. Process the structure and the model.

Viewing the Results

Before you decide which model to apply as the general model for making worldwide projections, you should understand the predictions better. When you look at the mining models and predictions for the various aggregated series in the Microsoft Time Series Viewer, several questions come to mind:

  • Until June 2002, each of the trend lines follows nearly the same pattern. At that point, the lines for quantity and amount diverge. What caused the change?

  • At July 2004, the lines diverge again. What happened?

  • Remember that the predictions for the M200 North America series were much higher than for other products and regions. You are concerned that these projections might be incorrect, and that incorporating this series might affect the general model that you created.

In the next task, you will compare the trend lines and predictions for the individual series models versus the model based on aggregated data, to see how the underlying data affects the model.

If you are satisfied that the model is correct, and you do not need to understand the results better, you can skip to the task, Predicting Using the Averaged Forecasting Model (Intermediate Data Mining Tutorial).