Modeling Flags (Data Mining)
You can use modeling flags in SQL Server Analysis Services to provide additional information to a data mining algorithm about the data that is defined in a case table. The algorithm can use this information to build a more accurate data mining model.
You can use Data Mining Extensions (DMX) to define modeling flags programmatically, or you can define them in Data Mining Designer in Business Intelligence Development Studio. For more information about how to define these flags, see Mining Model Columns.
Some modeling flags are defined at the level of the mining structure, whereas others are defined at the level of the mining model column. For example, the NOT NULL modeling flag is used with mining structure columns. You can define additional modeling flags on the mining model column.
The following list describes the modeling flags that are supported in Analysis Services. For information about modeling flags that are supported by specific algorithms, see the technical reference topic for the algorithm.
Third-party plug-ins might have other modeling flags, in addition to those pre-defined by Analysis Services.
In Data Mining Designer, you can view and modify the modeling flags associated with a mining structure or mining column by viewing the properties of the structure or model.
To view or change the modeling flag for a structure column or model column
In BI Development Studio, In Solution Explorer, double-click the mining structure.
To set the NOT NULL modeling flag, click the Mining Structure tab.
To set the REGRESSOR or MODEL_EXISTENCE_ONLY flags, click the Mining Model tab.
Right-click the column you want to view or change, and select Properties.
To add a new modeling flag, click the text box next to the ModelingFlags property, and select the check box or check boxes for the modeling flags you want to use.
Modeling flags are displayed only if they are appropriate for the column data type.
After you change a modeling flag, you must reprocess the model.
You cannot change the modeling flags used in an existing mining model and structure by using DMX. You must create a new mining model by using the ALTER MINING STRUCTURE….ADD MINING MODEL syntax.
If you are not sure which modeling flags are being used in the current structure, you can create a query that returns the modeling flags by using the following syntax:
SELECT COLUMN_NAME, MODELING_FLAG FROM $system.DMSCHEMA_MINING_STRUCTURE_COLUMNS WHERE STRUCTURE_NAME = '<structure name>'
When you set the REGRESSOR modeling flag on a column, you are indicating to the algorithm that the column contains potential regressors. The actual regressors that are used in the model are determined by the algorithm. A potential regressor can be discarded if it does not model the predictable attribute.
When you build a model using the Data Mining wizard, all continuous input columns are flagged as possible regressors. Therefore, even if you do not explicitly set the REGRESSOR flag on a column, the column might be used as a regressor in the final model.
You can determine the regressors that were actually used in the final model by performing a query against the schema rowset for the mining model, as shown in the following example:
SELECT COLUMN_NAME, MODELING_FLAG FROM $system.DMSCHEMA_MINING_columnS WHERE MODEL_NAME = '<model name>'
Note If you modify a mining model and change the content type of a column from continuous to discrete, you must manually change the flag on the mining column and then reprocess the model.
Regressors in Linear Regression Models
Linear regression models are based on the Microsoft Decision Trees algorithm. Even if you do not use the Microsoft Linear Regression algorithm, any decision tree model can contain a tree or nodes that represents a regression on a continuous attribute.
You do not need to specify that a continuous column represents a regressor. The Microsoft Decision Trees algorithm will partition the dataset into regions with meaningful patterns even if you do not set the REGRESSOR flag on the column. The difference is that when you set the modeling flag, the algorithm will try to find regression equations of the form a*C1 + b*C2 + ... to fit the patterns in the nodes of the tree. The sum of the residuals is calculated, and if the deviation is too great, a split is forced in the tree.
For example, if you are predicting customer purchasing behavior using Income as an attribute, and set the REGRESSOR modeling flag on the column, the algorithm would first try to fit the Income values by using a standard regression formula. If the deviation is too great, the regression formula is abandoned and the tree would be split on some other attribute. The decision tree algorithm would then try fit a regressor for income in each of the branches after the split.
You can use the FORCED_REGRESSOR parameter to guarantee that the algorithm will use a particular regressor. This parameter can be used with the Decision Trees algorithm and Linear Regression algorithm.