Testing Accuracy with Lift Charts (Basic Data Mining Tutorial)
On the Mining Accuracy Chart tab of Data Mining Designer, you can calculate how well each of your models makes predictions, and compare the results of each model directly against the results of the other models. This method of comparison is referred to as a lift chart. Typically, the predictive accuracy of a mining model is measured by either lift or classification accuracy. For this tutorial we will use the lift chart only. For more information about lift charts and other accuracy charts, see Tools for Charting Model Accuracy (Analysis Services - Data Mining).
In this topic, you will perform the following tasks:
The first step in testing the accuracy of your mining models is to select the data source that you will use for testing. You will test how well the models perform against your testing data and then you will use them with external data.
To select the data set
Switch to the Mining Accuracy Chart tab in Data Mining Designer in SQL Server Data Tools (SSDT) and select the Input Selection tab.
In the Select data set to be used for Accuracy Chart group box, select Use mining structure test cases to test your models by using the testing data that you set aside when you created the mining structure.
For more information on the other options, see Choose an Accuracy Chart Type and Set Chart Options.
The next step is to select the models that you want to include in the lift chart, the predictable column against which to compare the models, and the value to predict.
The mining model columns in the Predictable Column Name list are restricted to columns that have the usage type set to Predict or Predict Only and have a content type of Discrete or Discretized.
To show the lift of the models
On the Input Selection tab of Data Mining Designer, under Select predictable mining model columns to show in the lift chart, select the checkbox for Synchronize Prediction Columns and Values.
In the Predictable Column Name column, verify that Bike Buyer is selected for each model.
In the Show column, select each of the models.
By default, all the models in the mining structure are selected. You can decide not to include a model, but for this tutorial leave all the models selected.
In the Predict Value column, select 1. The same value is automatically filled in for each model that has the same predictable column.
Select the Lift Chart tab to display the lift chart.
When you click the tab, a prediction query runs against the server and database for the mining structure and the input table or test data. The results are plotted on the graph.
When you enter a Predict Value, the lift chart plots a Random Guess Model as well as an Ideal Model. The mining models you created will fall between these two extremes; between a random guess and a perfect prediction. Any improvement from the random guess is considered to be lift.
Use the legend to locate the colored lines representing the Ideal Model and the Random Guess Model.
You'll notice that the TM_Decision_Tree model provides the greatest lift, outperforming both the Clustering and Naive Bayes models.
For an in-depth explanation of a lift chart similar to the one created in this lesson, see Lift Chart (Analysis Services - Data Mining).