Data Mining Algorithms (Analysis Services - Data Mining)

The data mining algorithm is the mechanism that creates a data mining model. To create a model, an algorithm first analyzes a set of data and looks for specific patterns and trends. The algorithm uses the results of this analysis to define the parameters of the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.

The mining model that an algorithm creates can take various forms, including:

  • A set of rules that describe how products are grouped together in a transaction.

  • A decision tree that predicts whether a particular customer will buy a product.

  • A mathematical model that forecasts sales.

  • A set of clusters that describe how the cases in a dataset are related.

Microsoft SQL Server Analysis Services provides several algorithms for use in your data mining solutions. These algorithms are a subset of all the algorithms that can be used for data mining. You can also use third-party algorithms that comply with the OLE DB for Data Mining specification. For more information about third-party algorithms, see Plugin Algorithms.

Types of Data Mining Algorithms

Analysis Services includes the following algorithm types:

  • Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. An example of a classification algorithm is the Microsoft Decision Trees Algorithm.

  • Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. An example of a regression algorithm is the Microsoft Time Series Algorithm.

  • Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. An example of a segmentation algorithm is the Microsoft Clustering Algorithm.

  • Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. An example of an association algorithm is the Microsoft Association Algorithm.

  • Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. An example of a sequence analysis algorithm is the Microsoft Sequence Clustering Algorithm.

Applying the Algorithms

Choosing the best algorithm to use for a specific business task can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.

You also do not have to use algorithms independently. In a single data mining solution you can use some algorithms to explore data, and then use other algorithms to predict a specific outcome based on that data. For example, you can use a clustering algorithm, which recognizes patterns, to break data into groups that are more or less homogeneous, and then use the results to create a better decision tree model. You can use multiple algorithms within one solution to perform separate tasks, for example by using a regression tree algorithm to obtain financial forecasting information, and a rule-based algorithm to perform a market basket analysis.

Mining models can predict values, produce summaries of data, and find hidden correlations. To help you select algorithms for your data mining solution, the following table provides suggestions for which algorithms to use for specific tasks.

Task

Microsoft algorithms to use

Predicting a discrete attribute.

For example, predict whether the recipient of a targeted mailing campaign will buy a product.

Microsoft Decision Trees Algorithm

Microsoft Naive Bayes Algorithm

Microsoft Clustering Algorithm

Microsoft Neural Network Algorithm

Predicting a continuous attribute.

For example, forecast next year's sales.

Microsoft Decision Trees Algorithm

Microsoft Time Series Algorithm

Predicting a sequence.

For example, perform a clickstream analysis of a company's Web site.

Microsoft Sequence Clustering Algorithm

Finding groups of common items in transactions.

For example, use market basket analysis to suggest additional products to a customer for purchase.

Microsoft Association Algorithm

Microsoft Decision Trees Algorithm

Finding groups of similar items.

For example, segment demographic data into groups to better understand the relationships between attributes.

Microsoft Clustering Algorithm

Microsoft Sequence Clustering Algorithm

Because each model returns a different type of result, Analysis Services provides a separate viewer for each algorithm. When you browse a mining model in Analysis Services, the model is displayed on the Mining Model Viewer tab of Data Mining Designer, which uses the appropriate viewer for the model. For more information, see Viewing a Data Mining Model.

Algorithm Details

The following table provides links to the types of information available for each algorithm:

  • Basic algorithm description   Provides a basic explanation of what the algorithm does and how it works, together with a business scenario where the algorithm might be useful.

  • Technical reference   Lists the parameters that you can set to control the behavior of the algorithm and customize the results in the model. Provides additional technical detail about the implementation of the algorithm, performance tips, and data requirements.

  • Querying a model   Provides examples of queries that you can use with each model type.You can query a model to learn more about the patterns in the model, or to make predictions based on those patterns.

  • Mining model content    Describes how information is stored in a common structure for all model types, and explains how to interpret the information. After you have built a model, you can explore the model by using the viewers provided in BI Development Studio, or you can write queries to return information directly from the model content by using DMX.

Basic Algorithm Description

Technical Reference

Querying

Mining Model Content

Microsoft Association Algorithm

Microsoft Association Algorithm Technical Reference-

Querying an Association Model (Analysis Services - Data Mining)

Mining Model Content for Association Models (Analysis Services - Data Mining)

Microsoft Clustering Algorithm

Microsoft Clustering Algorithm Technical Reference

Querying a Clustering Model (Analysis Services - Data Mining)

Mining Model Content for Clustering Models (Analysis Services - Data Mining)

Microsoft Decision Trees Algorithm

Microsoft Decision Trees Algorithm Technical Reference

Querying a Decision Trees Model (Analysis Services - Data Mining)

Mining Model Content for Decision Tree Models (Analysis Services - Data Mining)

Microsoft Linear Regression Algorithm

Microsoft Linear Regression Algorithm Technical Reference

Querying a Linear Regression Model (Analysis Services - Data Mining)

Mining Model Content for Linear Regression Models (Analysis Services - Data Mining)

Microsoft Logistic Regression Algorithm

Microsoft Logistic Regression Algorithm Technical Reference

Querying a Logistic Regression Model (Analysis Services - Data Mining)

Mining Model Content for Logistic Regression Models (Analysis Services - Data Mining)

Microsoft Naive Bayes Algorithm

Microsoft Naive Bayes Algorithm Technical Reference

Querying a Naive Bayes Model (Analysis Services - Data Mining)

Mining Model Content for Naive Bayes Models (Analysis Services - Data Mining)

Microsoft Neural Network Algorithm

Microsoft Neural Network Algorithm Technical Reference

Querying a Neural Network Model (Analysis Services- Data Mining)

Mining Model Content for Neural Network Models (Analysis Services - Data Mining)

Microsoft Sequence Clustering Algorithm

Microsoft Sequence Clustering Algorithm Technical Reference

Querying a Sequence Clustering Model (Analysis Services - Data Mining)

Mining Model Content for Sequence Clustering Models (Analysis Services - Data Mining)

Microsoft Time Series Algorithm

Microsoft Time Series Algorithm Technical Reference

Querying a Time Series Model (Analysis Services - Data Mining)

Mining Model Content for Time Series Models (Analysis Services - Data Mining)