Microsoft Naive Bayes Algorithm Technical Reference
The Microsoft Naive Bayes algorithm is a classification algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling. The algorithm calculates the conditional probability between input and predictable columns, and assumes that the columns are independent. This assumption of independence leads to the name Naive Bayes.
This algorithm is less computationally intense than other Microsoft algorithms, and therefore is useful for quickly generating mining models to discover relationships between input columns and predictable columns. The algorithm considers each pair of input attribute values and output attribute values.
A description of the mathematical properties of Bayes Theorem is beyond the scope of this documentation; for more information, see the paper by Microsoft Research titled Learning Bayesian Networks: The Combination of Knowledge and Statistical Data.
For a description of how probabilities in all models are adjusted to account for potential missing values, see Missing Values (Analysis Services - Data Mining).
The Microsoft Naive Bayes algorithm performs automatic feature selection to limit the number of values that are considered when building the model. For more information, see Feature Selection in Data Mining.
Method of analysis
Bayesian with K2 Prior
Bayesian Dirichlet with uniform prior (default)
Naive Bayes only accepts discrete or discretized attributes; therefore, it cannot use the interestingness score.
The algorithm is designed to minimize processing time and efficiently select the attributes that have the greatest importance; however, you can control the data that is used by the algorithm by setting parameters as follows:
To limit the values that are used as inputs, decrease the value of MAXIMUM_INPUT_ATTRIBUTES.
To limit the number of attributes analyzed by the model, decrease the value of MAXIMUM_OUTPUT_ATTRIBUTES.
To limit the number of values that can be considered for any one attribute, decrease the value of MINIMUM_STATES.
The Microsoft Naive Bayes algorithm supports several parameters that affect the behavior, performance, and accuracy of the resulting mining model. You can also set modeling flags on the model columns to control how data is processed, or set flags on the mining structure to specify how missing values or nulls should be handled.
Setting Algorithm Parameters
The Microsoft Naive Bayes algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.
The Microsoft Decision Trees algorithm supports the following modeling flags. When you create the mining structure or mining model, you define modeling flags to specify how values in each column are handled during analysis. For more information, see Modeling Flags (Data Mining).
Means that the column will be treated as having two possible states: Missing and Existing. A null is a missing value.
Applies to mining model column.
Indicates that the column cannot contain a null. An error will result if Analysis Services encounters a null during model training.
Applies to mining structure column.
A Naive Bayes tree model must contain a key column, at least one predictable attribute, and at least one input attribute. No attribute can be continuous; if your data contains continuous numeric data, it will be ignored or discretized.
Input and Predictable Columns
The Microsoft Naive Bayes algorithm supports the specific input columns and predictable columns that are listed in the following table. For more information about what the content types mean when used in a mining model, see Content Types (Data Mining).
Cyclical, Discrete, Discretized, Key, Table, and Ordered
Cyclical, Discrete, Discretized, Table, and Ordered
Cyclical and Ordered content types are supported, but the algorithm treats them as discrete values and does not perform special processing.