Lift Chart (Analysis Services - Data Mining)
You can view different types of charts in the Lift Chart tab of the Mining Accuracy Chart tab of Data Mining Designer, depending on the model that you select, the predictable attribute in the model, and other settings.
If your model predicts a discrete value, you can create a lift chart or profit chart. A lift chart compares the accuracy of the predictions of each model, and can be configured to show accuracy for predictions in general, or for predictions of a specific value. A profit chart is a related chart type that contains the same information as a lift chart, but also displays the projected increase in profit that is associated with using each model. Use the Chart Type list to select the type of chart you want.
Note You cannot display time series models in a lift chart or profit chart, but you can view a chart that contains both the historical series and predictions based on the series by using the Mining Model Prediction tab. For more information, see Microsoft Time Series Algorithm.
The Lift Chart tab displays a graphical representation of the change in lift that a mining model causes. For example, the marketing department at Adventure Works Cycles wants to create a targeted mailing campaign. From past campaigns, they know that a 10 percent response rate is typical. They have a list of 10,000 potential customers stored in a table in the database. Therefore, based on the typical response rate, they can expect 1,000 of the potential customers to respond.
However, the money budgeted for the project is not enough to reach all 10,000 customers in the database. Based on the budget, they can afford to mail an advertisement to only 5,000 customers. The marketing department has two choices:
Randomly select 5,000 customers to target
Use a mining model to target the 5,000 customers who are most likely to respond
If the company randomly selects 5,000 customers, they can expect to receive only 500 responses, based on the typical response rate. This scenario is what the random line in the lift chart represents. However, if the marketing department uses a mining model to target their mailing, they can expect a larger response rate because they can target those customers who are most likely to respond. If the model is perfect, it means that the model creates predictions that are never wrong, and the company could expect to receive 1,000 responses by mailing to the 1,000 potential customers recommended by the model. This scenario is what the ideal line in the lift chart represents. The reality is that the mining model most likely falls between these two extremes; between a random guess and a perfect prediction. Any improvement from the random guess is considered to be lift.
You can create two types of lift charts: one in which you specify a target value for the predictable column, and one in which you do not specify the value. When you switch between the Input Selection tab and the Lift Chart tab, the chart is updated to reflect any changes that you made in the column mappings or other settings.
Lift Chart with Target Value
The following chart shows a lift chart for the Targeted Mailing model that you create in the Basic Data Mining Tutorial. In this chart, the target attribute is [Bike Buyer] and the target value is 1, meaning that the customer purchased a bike or is likely to do so. The lift chart thus shows the improvement the model provides when identifying customers who are likely to buy a bike.
In addition to the basic model, the chart includes a related model that has been filtered to target specific customers. You can add multiple models to a lift chart, as long as the models all have the same predictable attribute. This filter restricts the cases used in both training and evaluation to customers who are under the age of 30. As a result, the number of cases that the model is evaluated against differs for the basic model and the filtered model. This point is important to remember when you interpret the prediction results and other statistics.
The x-axis of the chart represents the percentage of the test dataset that is used to compare the predictions. The y-axis of the chart represents the percentage of predicted values.
The diagonal straight line, shown here in blue, appears in every chart. It represents the results of random guessing, and is the baseline against which to evaluate lift. For each model that you add to a lift chart, you get two additional lines: one line shows the ideal results for the training data set if you could create a model that always predicted perfectly, and the second line shows the actual lift, or improvement in results, for the model.
In this example, the ideal line for the filtered model is shown in dark blue, and the line for actual lift in yellow. You can tell from the chart that the ideal line peaks at around 40 percent, meaning that if you had a perfect model, you could reach 100 percent of your targeted customers by sending a mailing to only 40% of the total population. The actual lift for the filtered model when you target 40 percent of the population is between 60 and 70 percent, meaning you could reach 60-70 percent of your targeted customers by sending the mailing to 40 percent of the total customer population.
The Mining Legend contains the actual values at any point on the curves. You can change the place that is measured by clicking the vertical gray bar and moving it. In the chart, the gray line has been moved to 30 percent, because this is the point where both the filtered and unfiltered models appear to be most effective, and after this point the amount of lift declines.
The Mining Legend also contains scores and statistics that help you interpret the chart. These results represent the accuracy of the model at the gray line, which in this scenario is positioned to include 30 percent of the overall test cases.
Targeted mailing all
Targeted mailing under 30
Random guess model
Ideal model for: Targeted mailing all
Ideal model for: Targeted mailing under 30
From these results, you can see that, when measured at 30 percent of all cases, the general model (Targeted mailing all) can predict the bike buying behavior of 47.40% of the target population. In other words, if you sent out a targeted mailing to only 30 percent of the customers in your database, you could reach slightly less than half of your target audience. If you used the filtered model, you could reach about 51 percent of your targeted customers.
The value for Predict probability represents the threshold required to include a customer among the "likely to buy" cases. For each case, the model estimates the accuracy of each prediction and stores that value, which you can use to filter out or to target customers. For example, to identify the customers from the basic model who are likely buyers, you would use a query to retrieve cases with a Predict probability of at least 61 percent. To get the customers targeted by the filtered model, you would create query that retrieved cases that met all the criteria: age and a PredictProbability value of at least 46 percent.
It is interesting to compare the models. The filtered model appears to capture more potential customers, but when you target customers with a prediction probability score of 46 percent, you also have a 53 percent chance of sending a mailing to someone who will not buy a bike. Therefore, if you were deciding which model is better, you would want to balance the greater precision and smaller target size of the filtered model against the selectiveness of the basic model.
The value for Score helps you compare models by calculating the effectiveness of the model across a normalized population. A higher score is better, so in this case you might decide that targeting customers under 30 is the most effective strategy, despite the lower prediction probability.
Lift Chart for Model with No Target Value
If you do not specify the state of the predictable column, you create the type of chart shown in the following diagram. This chart shows how the model performs for all states of the predictable attribute. For example, this chart would tell you how well the model predicts both customers who are likely to buy a bike, and those who are unlikely to buy a bike.
The x-axis is the same as in the chart with the predictable column specified, but the y-axis now represents the percentage of predictions that are correct. Therefore, the ideal line is the diagonal line, which shows that at 50 percent of the data, the model correctly predicts 50% of the cases, the maximum that can be expected.
You can click in the chart to move the vertical gray bar, and the Mining Legend displays the percentage of cases overall, and the percentage of cases that were predicted correctly. For example, if you position the gray slider bar at the 50 percent mark, the Mining Legend displays the following accuracy scores. These figures are based on the TM_Decision Tree model created in the Basic Data Mining Tutorial.
This table tells you that, at 50 percent of the population, the model that you created correctly predicts 40 percent of the cases. You might consider this a reasonably accurate model. However, remember that this particular model predicts all values of the predictable attribute. Therefore, the model might be accurate in predicting that 90 percent of customers will not buy a bike.
The prediction accuracy for all discrete values of the predictable attribute is shown in a single line. If you want to see prediction accuracy lines for any individual value of the predictable attribute, you must create a separate lift chart for that value.
Creating a Lift Chart
The Basic Data Mining Tutorial includes a walkthrough of how to create a lift chart for the Targeted Mailing model. For more information, see Testing Accuracy with Lift Charts (Basic Data Mining Tutorial).
For a step-by-step procedure that applies to all chart types, see How to: Create an Accuracy Chart for a Mining Model.