Distributions (DMX)

In Microsoft SQL Server 2005 Analysis Services (SSAS), you can define the content of columns in a mining structure, to affect how algorithms process the data in those columns when you create mining models. For some algorithms, it is useful to define the distribution of any continuous columns before you process the model, if the columns are known to contain common distributions of values. If you do not define the distributions, the resulting mining models may produce less accurate predictions than if the distributions were defined, because the algorithms will have less information from which to interpret the data.

Microsoft data mining algorithms support the following distribution types:

  • NORMAL
    The values for the continuous column form a histogram with a normal Gaussian distribution.
  • Log Normal
    The values for the continuous column form a histogram, where the logarithm of the values is normally distributed.
  • UNIFORM
    The values for the continuous column form a flat curve, in which all values are equally likely.

For more information about Microsoft data mining algorithms, see Data Mining Algorithms. Third-party algorithm providers may support additional distribution types. To determine which distribution types an algorithm supports, use the SUPPORTED_DISTRIBUTION_FLAGS schema rowset.

For more information about distribution types, see Column Distributions.

See Also

Reference

Data Mining Extensions (DMX) Reference
Data Mining Extensions (DMX) Syntax Elements
Data Mining Extensions (DMX) Function Reference
Data Mining Extensions (DMX) Operator Reference
Data Mining Extensions (DMX) Statement Reference
Data Mining Extensions (DMX) Syntax Conventions
Mapping Functions to Query Types (DMX)
Prediction Queries (DMX)
Understanding the Select Statement (DMX)

Other Resources

Content Types (Data Mining)

Help and Information

Getting SQL Server 2005 Assistance