Validating Data Mining Models

You can use the Mining Accuracy Chart tab of Data Mining Designer to validate the accuracy and compare the predictive ability of the mining models in a mining structure. This is helpful when you are trying to choose the correct algorithm to use or how to adjust parameters within an individual algorithm.

Validation is an important step in the data mining process. Knowing how well your mining models perform against real data is important before you deploy the models into a production environment. For more information about how model validation fits into the larger data mining process, see Data Mining Concepts.

Validation Tools

The Mining Accuracy Chart tab provides the following tools for use in validating mining models:

  • Lift Chart
  • Classification Matrix

Lift Chart

A lift chart is created by plotting the results of prediction queries from a testing dataset against known values for the predictable column that exist in the dataset. The following diagram provides an example of this kind of chart.

Lift chart of target versus overall populations

The chart displays a line for the results of the mining model, together with two other lines: one line that represents the results that an ideal model would produce, with perfect predictions that are never wrong, and one line that represents the results of a random guess. The results of your models will fall somewhere between the ideal model and the random guess. Any improvement over the random line is called lift, and the more lift that the model demonstrates, the more effective the model is.

Lift charts that are built from continuous predictable attributes display a scatter plot instead of lines.

To implement a lift chart, you need the following:

  • One or more trained mining models
  • An input dataset that contains a value for the predictable column
  • A mapping between the input data and the structure of the mining model

For More Information: Mining Accuracy Chart Tab How-to Topics, Column Mappings (Lift Chart), Lift Chart

Back to Top

Classification Matrix

The Classification Matrix tab provides another way to examine how accurately the mining models in a structure create predictions. A classification matrix is built as a comparison of actual values that exist in the testing dataset against the values that the mining model predicts. The matrix is a valuable tool because it not only shows how frequently the model correctly predicted a value, but also shows which other values the model most frequently predicted incorrectly.

For example, consider the case where a model has been constructed to predict the type of member card that customers of a grocery store use. The cards come in three varieties: bronze, silver, and gold. The following table is a representation of a classification matrix for a model that predicts member card values on a testing database where the member card value is known.

Bronze Silver Gold

Bronze

Actual

Error for Bronze

Error for Bronze

Silver

Error for Silver

Actual

Error for Silver

Gold

Error for Gold

Error for Gold

Actual

The values that run diagonally from the upper-left corner to the lower-right corner of the matrix give the correct number of values that actually exist in the testing dataset. Columns in the matrix represent items that have been predicted in the testing dataset. Rows represent the actual state of the attribute as it exists in the testing dataset.

For example, look at how the mining model predicted customers who had a bronze card. The value for the intersect of the Bronze column and the Bronze row would represent the actual number of customers in the testing database who had a bronze card. The value for the intersect of the Silver column and the Bronze row would represent the number of cases that were incorrectly predicted to be Silver, when actually they were Bronze. The number of incorrectly predicted values for Bronze would be the summation of the intersect of the Bronze column and Silver row, and the Bronze column and Gold row. The same analysis is true for the other card types.

For More Information:Mining Accuracy Chart Tab How-to Topics, Column Mappings (Lift Chart), Classification Matrix

Back to Top

See Also

Concepts

Using the Data Mining Tools
Data Mining Concepts
Working with Data Mining

Other Resources

Mining Accuracy Chart Tab How-to Topics

Help and Information

Getting SQL Server 2005 Assistance