Modeling Flags (Data Mining)
You can use modeling flags in SQL Server Analysis Services to provide additional information to a data mining algorithm about the data that is defined in a case table. The algorithm can use this information to build a more accurate data mining model.
Some modeling flags are defined at the level of the mining structure, whereas others are defined at the level of the mining model column. For example, the NOT NULL modeling flag is used with mining structure columns. You can define additional modeling flags on the mining model columns, depending on the algorithm you use to create the model.
Note
|
|---|
|
Third-party plug-ins might have other modeling flags, in addition to those pre-defined by Analysis Services. |
You can view the modeling flags associated with a mining structure column or model column in Data Mining Designer by viewing the properties of the structure or model.
To determine which modeling flags have been applied to the current mining structure, you can create a query against the data mining schema rowset that returns the modeling flags for just the structure columns, by using a query like the following:
SELECT COLUMN_NAME, MODELING_FLAG FROM $system.DMSCHEMA_MINING_STRUCTURE_COLUMNS WHERE STRUCTURE_NAME = '<structure name>'
You can add or change the modeling flags used in a model by using the Data Mining Designer and editing the properties of the associated columns. Such changes require that the structure or model be reprocessed.
You can specify modeling flags in a new mining structure or mining model by using DMX, or by using AMO or XMLA scripts. However, you cannot change the modeling flags used in an existing mining model and structure by using DMX. You must create a new mining model by using the syntax, ALTER MINING STRUCTURE….ADD MINING MODEL.
When you set the REGRESSOR modeling flag on a column, you are indicating to the algorithm that the column contains potential regressors. The actual regressors that are used in the model are determined by the algorithm. A potential regressor can be discarded if it does not model the predictable attribute.
When you build a model by using the Data Mining wizard, all continuous input columns are flagged as possible regressors. Therefore, even if you do not explicitly set the REGRESSOR flag on a column, the column might be used as a regressor in the model.
You can determine the regressors that were actually used in the processed model by performing a query against the schema rowset for the mining model, as shown in the following example:
SELECT COLUMN_NAME, MODELING_FLAG FROM $system.DMSCHEMA_MINING_COLUMNS WHERE MODEL_NAME = '<model name>'
Note If you modify a mining model and change the content type of a column from continuous to discrete, you must manually change the flag on the mining column and then reprocess the model.
Regressors in Linear Regression Models
Linear regression models are based on the Microsoft Decision Trees algorithm. Even if you do not use the Microsoft Linear Regression algorithm, any decision tree model can contain a tree or nodes that represents a regression on a continuous attribute.
Therefore, in these models you do not need to specify that a continuous column represents a regressor. The Microsoft Decision Trees algorithm will partition the dataset into regions with meaningful patterns even if you do not set the REGRESSOR flag on the column. The difference is that when you set the modeling flag, the algorithm will try to find regression equations of the following form to fit the patterns in the nodes of the tree.
a*C1 + b*C2 + ...
Then, the sum of the residuals is calculated, and if the deviation is too great, a split is forced in the tree.
For example, if you are predicting customer purchasing behavior using Income as an attribute, and set the REGRESSOR modeling flag on the column, the algorithm would first try to fit the Income values by using a standard regression formula. If the deviation is too great, the regression formula is abandoned and the tree would be split on some other attribute. The decision tree algorithm would then try fit a regressor for income in each of the branches after the split.
You can use the FORCE_REGRESSOR parameter to guarantee that the algorithm will use a particular regressor. This parameter can be used with the Decision Trees algorithm and Linear Regression algorithm.
Use the following links to learn more about using modeling flags.
|
Task |
Topic |
|---|---|
|
Edit modeling flags by using the Data Mining Designer |
|
|
Specify a hint to the algorithm to recommend likely regressors |
|
|
See the modeling flags supported by specific algorithms (in the Modeling Flags section for each algorithm reference topic) |
|
|
Learn more about mining structure columns and the properties that you can set on them |
|
|
Learn about mining model columns and modeling flags that can be applied at the model level |
|
|
See syntax for working with modeling flags in DMX statements |
|
|
Understand missing values and how to work with them |
|
|
Learn about managing models and structures and setting usage properties |

Note