Data Mining Algorithms
Central to the data mining process, data mining algorithms determine how the cases for a data mining model are analyzed. Data mining model algorithms provide the decision-making capabilities needed to classify, segment, associate and analyze data for the processing of data mining columns that provide predictive, variance, or probability information about the case set.
Many data mining algorithms are goal-oriented; given a case set, a data mining algorithm will predict something about the case, usually an attribute of the case itself. Most algorithms require a training set of cases where the attributes to be predicted are already known, at which point the algorithm constructs a data mining model capable of predicting these attributes for cases in which the attributes are unknown. For more information about training data mining models, see Introduction to Data Mining Models.
Each data mining algorithm is supported by a data mining algorithm provider, which is an OLE DB provider that supports the OLE DB for Data Mining specification. Because the needs and functions of each data mining algorithm provider are different, it may be necessary for a client application to first determine the capabilities of a data mining algorithm provider.
Not all data mining algorithm providers support all data mining options. Some providers may work with certain data mining column data or content types, and other providers may not support certain options for source data queries. To determine the capabilities of a data mining algorithm provider, the MINING_SERVICES schema rowset details data mining support options for each provider. Also, as each provider is an OLE DB provider, the standard OLE DB provider schema rowsets, such as the PROVIDER_TYPES schema rowset, can be used to give additional information.
Data Mining Algorithm Providers
Data mining algorithms fall into three general categories. This is not a comprehensive list of the various data mining algorithms that might be used; other data mining algorithm providers may be constructed based on, for example, back propagation neural network or genetic algorithms.
A decision tree is a form of classification shown in a tree structure, in which a node in the tree structure represents each question used to further classify data. The various methods used to create decision trees have been used widely for decades, and there is a large body of work describing these statistical techniques. For more information about the decision trees technique and the Microsoft® Decision Trees algorithm, see Microsoft Decision Trees.
Like decision trees, clustering is a well-documented data mining technique. Clustering is the classification of data into groups based on specific criteria. The topic discussing the Microsoft Clustering algorithm goes into greater detail regarding the details of clustering as a data mining technique. For more information about the clustering technique and the Microsoft Clustering algorithm, see Microsoft Clustering.