Cross-Validation Tab (Mining Accuracy Chart View)
Cross-validation lets you partition a mining structure into cross-sections and iteratively train and test models against each cross-section. You specify a number of folds to divide the data into, and each fold is used in turn as the test data, whereas the remaining data is used to train a new model. Analysis Services then generates a set of standard accuracy metrics for each model. By comparing the metrics for the models generated for each cross-section, you can get a good idea of how reliable the mining model is for the whole data set.
For more information, see Cross-Validation (Analysis Services - Data Mining).
Cross-validation cannot be used with models that were built by using the Microsoft Time Series algorithm or the Microsoft Sequence Clustering algorithm. If you run the report on a mining structure that contains these types of models, the models will not be included in the report.
Specify the number of folds.
Specify the maximum number of cases to use for cross-validation.
Specify the predictable column.
Optionally, specify a predictable state.
Optionally, set parameters that control how the accuracy of prediction is assessed.
Click Get Results to display the results of cross-validation.
For more information about how to interpret the results of the cross-validation report, see Cross-Validation Report (Analysis Services - Data Mining).
You can control the standard for measuring prediction accuracy by setting a value for Target Threshold. A threshold represents a kind of accuracy bar. Each prediction is assigned a probability that the predicted value is correct. Therefore, if you set the Target Threshold value closer to 1, you are requiring that the probability for any particular prediction to be fairly high to be counted as a good prediction. Conversely, if you set Target Threshold closer to 0, even predictions with lower probability values are counted as "good" predictions.
There is no recommended threshold value because the probability of any prediction depends on the amount of data and the type of prediction you are making. You should review some predictions at different probability levels to determine an appropriate accuracy bar for your data. It is important that you do this, because the value that you set for Target Threshold affects the measured accuracy of the model.
For example, suppose three predictions are made for a particular target state, and the probabilities of each prediction are 0.05, 0.15, and 0.8. If you set the threshold to 0.5, only one prediction is counted as being correct. If you set Target Threshold to 0.10, two predictions are counted as being correct.
When Target Threshold is set to null, which is the default value, the most probable prediction for each case is counted as correct. In the example just cited, 0.05, 0.15, and 0.8 are the probabilities for predictions in three different cases. Although the probabilities are very different, each prediction would be counted as correct, because each case generates only one prediction and these are the best predictions for these cases.