Microsoft Decision Trees Algorithm

Article
12/03/2008

The Microsoft Decision Trees algorithm is a classification and regression algorithm provided by Microsoft SQL Server 2005 Analysis Services (SSAS) for use in predictive modeling of both discrete and continuous attributes.

For discrete attributes, the algorithm makes predictions based on the relationships between input columns in a dataset. It uses the values, or states, of those columns to predict the states of a column that you designate as predictable. Specifically, the algorithm identifies the input columns that are correlated with the predictable column. For example, in a scenario to predict which customers are likely to purchase a bicycle, if nine out of ten younger customers buy a bicycle, but only two out of ten older customers do so, the algorithm infers that age is a good predictor of bicycle purchase. The decision tree makes predictions based on this tendency toward a particular outcome.

For continuous attributes, the algorithm uses linear regression to determine where a decision tree splits.

If more than one column is set to predictable, or if the input data contains a nested table that is set to predictable, the algorithm builds a separate decision tree for each predictable column.

Example

The marketing department of the Adventure Works Cycle company wants to identify characteristics of previous customers that might indicate whether those customers are likely to buy a product in the future. The AdventureWorks database stores demographic information that describes previous customers. By using the Microsoft Decision Trees algorithm to analyze this information, the marketing department can build a model that predicts whether a particular customer will purchase products, based on the states of known columns about that customer, such as demographics or past buying patterns.

How the Algorithm Works

The Microsoft Decision Trees algorithm builds a data mining model by creating a series of splits, also called nodes, in the tree. The algorithm adds a node to the model every time an input column is found to be significantly correlated with the predictable column. The way that the algorithm determines a split is different depending on whether it is predicting a continuous column or a discrete column. For a more detained explanation about how the Microsoft Decision Trees algorithm works with discrete predictable columns, see Scalable Classification over SQL Databases and Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. For more information about how the Microsoft Decision Trees algorithm works with a continuous predictable column, see the appendix of Autoregressive Tree Models for Time-Series Analysis.

Predicting Discrete Columns

The way that the Microsoft Decision Trees algorithm builds a tree for a discrete predictable column can be demonstrated by using a histogram. The following diagram shows a histogram that plots a predictable column, Bike Buyers, against an input column, Age. The histogram shows that the age of a person helps distinguish whether that person will purchase a bicycle.

Histogram from Microsoft Decision Trees algorithm

The correlation that is shown in the diagram would cause the Microsoft Decision Trees algorithm to create a new node in the model.

Decision tree node

As the algorithm adds new nodes to a model, a tree structure is formed. The top node of the tree describes the breakdown of the predictable column for the overall population of customers. As the model continues to grow, the algorithm considers all columns.

Predicting Continuous Columns

When the Microsoft Decision Trees algorithm builds a tree based on a continuous predictable column, each node contains a regression formula. A split occurs at a point of non-linearity in the regression formula. For example, consider the following diagram.

Multiple regression lines showing non-linearity

The diagram contains data that can be modeled either by using a single line or by using two connected lines. However, a single line would do a poor job of representing the data. Instead, if you use two lines, the model will do a much better job of approximating the data. The point where the two lines come together is the point of non-linearity, and is the point where a node in a decision tree model would split. For example, the node that corresponds to the point of non-linearity in the previous graph could be represented by the following diagram. The two equations represent the regression equations for the two lines.

Equation that represents a point of non-linearity

Using the Algorithm

A decision tree model must contain a key column, input columns, and one predictable column.

The Microsoft Decision Trees algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.

Input column content types	Continuous, Cyclical, Discrete, Discretized, Key, Table, and Ordered
Predictable column content types	Continuous, Cyclical, Discrete, Discretized, Table, and Ordered
Modeling flags	MODEL_EXISTENCE_ONLY, NOT NULL, and REGRESSOR

All Microsoft algorithms support a common set of functions. However, the Microsoft Decision Trees algorithm supports additional functions, listed in the following table.

IsDescendant	PredictNodeId
IsInNode	PredictProbability
PredictAdjustedProbability	PredictStdev
PredictAssociation	PredictSupport
PredictHistogram	PredictVariance

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.

The Microsoft Decision Trees algorithm supports using the Predictive Model Markup Language (PMML) to create mining models.

The Microsoft Decision Trees algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.

Parameter	Description
MAXIMUM_INPUT_ATTRIBUTES	Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255.
MAXIMUM_OUTPUT_ATTRIBUTES	Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255.
SCORE_METHOD	Determines the method that is used to calculate the split score. Available options: Entropy (1), Bayesian with K2 Prior (2), or Bayesian Dirichlet Equivalent (BDE) Prior (3). The default is 3.
SPLIT_METHOD	Determines the method that is used to split the node. Available options: Binary (1), Complete (2), or Both (3). The default is 3.
MINIMUM_SUPPORT	Determines the minimum number of leaf cases that is required to generate a split in the decision tree. The default is 10.
COMPLEXITY_PENALTY	Controls the growth of the decision tree. A low value increases the number of splits, and a high value decreases the number of splits. The default value is based on the number of attributes for a particular model, as described in the following list: For 1 through 9 attributes, the default is 0.5. For 10 through 99 attributes, the default is 0.9. For 100 or more attributes, the default is 0.99.
FORCED_REGRESSOR	Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm. This parameter is only used for decision trees that are predicting a continuous attribute.

Microsoft Decision Trees Algorithm

Example

How the Algorithm Works

Predicting Discrete Columns

Predicting Continuous Columns

Using the Algorithm

See Also

Concepts

Other Resources

Help and Information

Additional resources