Adding Mining Models to a Structure (Analysis Services - Data Mining)
Whereas a mining structure defines the data domain, a mining model defines how to apply the data in that domain to a particular problem. After you have created a mining structure, you can add multiple mining models to the structure. Each time that you create a model, you can target a different business problem. For example, you might change the parameters to use a slightly different approach, or use a different subset of the data to obtain different results or extract patterns specific to a target population.
There are two approaches to building mining models. You can define your mining structure, and then experiment with different models that also use that structure. Alternatively, you can create the model that you want, and then use the structure that is generated to create additional models.
When you use the Data Mining Wizard to create a new mining model, you create a mining structure first. The wizard then gives you the option to add an initial mining model to the structure, and to configure a training and test data set within that structure. However, you don't need to create a model right away. If you create the structure only, you do not need to make a decision about which column to use as the predictable attribute, or how to use the data in a particular model. Instead, you just set up the general data structure that you want to use in future, and later you can use Data Mining Designer to add new mining models that are based on the structure.
If you already know the type of mining model you want to build, you can build the structure, and then use the Data Mining Wizard to add your first model to the mining structure. You can add more models to the structure after the wizard finishes.
In DMX, the CREATE MINING MODEL statement begins with the mining model. That is, you define your choice of mining model, and Analysis Services automatically generates the underlying structure. Later you can continue to add new mining models to that structure, by using the ALTER STRUCTURE… ADD MODEL statement.
For More Information: Managing Mining Models in Data Mining Designer
After you have defined your data domain, you tell Analysis Services how to use each column in the data by specifying the column content and the column usage. You are not required to use each column that is included in the data mining structure in your new mining model. Even when two models are based on the same structure, you can tell Analysis Services to ignore a particular column for one model. For more information, see Logical Architecture (Analysis Services - Data Mining).
Choosing an Algorithm
When you add a model to a structure, you have to select a data mining algorithm to use in that model. Each algorithm performs a different type of analysis and some have different requirements as to the number and type of data columns used for input or prediction.
Therefore, depending on the algorithm that you select, some columns of data that you included in the mining structure might be ignored or might need to be converted to another data type, or the values might need to be removed. The Data Mining wizard will automatically change some values for you to make the model function. However, in other cases it will recommend that you fix the data first, or add a required column, such as a case key.
In some cases you can change the algorithm that is used in a model, but most changes in the definition of the model require you to reprocess the model and its data. Generally, whenever you change the algorithm used in a model, you should consider it as a completely new model that must be reprocessed.
For More Information: Data Mining Algorithms (Analysis Services - Data Mining)
Specifying Column Usage
After you have selected an algorithm, you must specify how the algorithm handles the data in your structure. This includes selecting a predictable column or columns, if the model requires one, selecting columns to serve as inputs, and specifying a case or nested table key. For each model, these column definitions might vary, even if the models use the same data, because the requirements of each algorithm are different. We recommend that you try to select only the columns that are most useful for analysis, because including unnecessary data increases processing time and can affect the quality of the results. The Data Mining wizard includes an optional Suggest feature that analyzes the columns included in the structure and recommends the columns that provide the most information, using an entropy-based score.
Specifying Column Content
For some columns, you might also need to specify the column content. In SQL Server data mining, the Content Type property of each data columns tells the algorithm how it should process the data in that column. For example, if your data has an Income column that has variable values, you must specify that the column contains continuous numbers by setting the content type to Continuous. However, you could also specify that the numbers in the Income column be grouped into buckets by setting the content type to Discretized and optionally specifying the exact number of buckets. You can create different models that handle columns differently: for example, you might try one model that buckets customers into three age groups, and another model that buckets customers into 10 age groups.