Mining Models (Analysis Services - Data Mining)
This section explains the basic architecture of a data mining model, provides an overview of the properties of a data mining model, and describes the ways that you can create and work with a mining model.
A data mining model gets data from a mining structure and then analyzes that data by using a data mining algorithm. The mining structure and mining model are separate objects. The mining structure stores information that defines the data source. A mining model stores information derived from statistical processing of the data, such as the patterns found as a result of analysis.
A mining model is empty until the data provided by the mining structure has been processed and analyzed. After a mining model has been processed, it contains metadata, results, and bindings back to the mining structure.
The metadata specifies the name of the model and the server where it is stored, as well as a definition of the model, including a list of the columns from the mining structure that were used in building the model, the definitions of optional filters that are applied when processing the model, and the algorithm that was used to analyze the data. The choice of columns, filters, and algorithm greatly influences the results of analysis. For example, if you create a clustering model and a decision trees model by using the same data, the model content might be very different, as these models use different algorithms and filters. For more information, see Mining Model Content (Analysis Services - Data Mining).
The results that are stored in the model vary depending on the algorithm, but can include patterns, itemsets, rules, and formulas. These results can be used for making predictions.
The bindings that are stored in the model point back to the data cached in the mining structure. If the data has been cached in the structure and has not been cleared after processing, these bindings enable you to drill through from the results to the cases that support the results. However, the actual data is stored in the structure cache, not in the model.
You can create a data mining model in the following steps:
Create the underlying mining structure.
Select an algorithm.
Specify the model columns and usage.
Optionally, set parameters to fine-tune the processing by the algorithm.
Process the model.
Analysis Services provides the following tools to help you manage your mining models:
The Data Mining Wizard helps you create a structure and related mining model. This is the easiest method to use. The wizard automatically creates the required mining structure and helps you with the configuration of the important settings.
A DMX CREATE MODEL statement can be used to define a model. The required structure is automatically created as part of the process; therefore, you cannot reuse an existing structure with this method. Use this method if you already know exactly which model you want to create.
A DMX ALTER STRUCTURE ADD MODEL statement can be used to add a new mining model to an existing structure. Use this method if you want to experiment with different models that are based on the same data set.
You can also create mining models programmatically, by using AMO or XML/A, or by using other clients such as the Data Mining Client for Excel. For more information, see the following topics:
Each mining model has properties that define the model and its metadata. These might include the name, description, the date the model was last processed, the permissions on the model, and any filters on the data that is used for training.
Each mining model also has properties that are derived from the mining structure, and that describe the columns of data used by the model. If the column is a nested table, the column can also have a separate filter applied.
Algorithm property Specifies the algorithm that is used to create the model. The algorithms that are available depend on the provider that you are using. For a list of the algorithms that are included with SQL Server Analysis Services, see Data Mining Algorithms (Analysis Services - Data Mining). The Algorithm property applies to the mining model and can be set only one time for each model. You can change the algorithm later but some columns in the mining model might become invalid if they are not supported by the algorithm that you choose. Moreover, you must always reprocess the model following the changes.
Usage property Defines how each column is used by the model. You can define the column usage as Input, Predict, Predict Only, or Key. The Usage property applies to individual mining model columns and must be set individually for every column that is included in a model. If the structure contains a column that you do not use in the model, the usage is set to Ignore.
You can change the value of mining model properties after you create a mining model. However, any change, even to the name of the mining model, requires that you reprocess the model. After you reprocess the model, you might see different results.
Like the mining structure, the mining model contains columns. You can choose which columns from the mining structure to use in the model. In addition to using the columns that are in the underlying mining structure, you can create copies of the mining structure columns and then rename them or change their usage.
Depending on which algorithm you choose, some columns in the mining structure might be incompatible with the model, or might lead to poor results. You should review the data in the structure carefully and include in the model only those columns that make sense for analysis. If you think a column should not be used, you do not need to delete it from the mining structure or mining model; instead, you can just set a flag on the column that specifies that it should be ignored when building the model. This means that the column will remain in the mining structure, but will not be used in the mining model; however, if drillthrough is enabled from the model to the mining structure, you can retrieve the information from the column later.
After you have created the model, you can make changes such as adding or removing columns, or changing the name of the model. However, any change, even only to the model metadata, requires that you reprocess the model.
A data mining model is an empty object until it is processed. When you process a model, the data that is cached by the structure is passed through a filter, if one has been defined in the model, and is analyzed by the algorithm. The algorithm identifies the rules and patterns within the data, and then uses these rules and patterns to populate the model. For more information about how algorithms are used to create mining models, see Data Mining Algorithms (Analysis Services - Data Mining).
After it has been processed, the mining model also stores information about the results of analysis. For more information about the kind of data that is stored in a mining model, see Mining Model Content (Analysis Services - Data Mining).
After you have processed a model, you can explore it by using the custom viewers that are provided in Business Intelligence Development Studio and SQL Server Management Studio. For more information about the custom viewers in Analysis Services, see Viewing a Data Mining Model.
You can also create queries against the mining model either to make predictions, or to retrieve model metadata or the patterns created by the model. You create queries by using Data Mining Extensions (DMX). For information about the different types of queries that you can use against a data mining model, see Querying Data Mining Models (Analysis Services - Data Mining).