TechNet
Export (0) Print
Expand All

Customize Mining Models and Structure

 

Applies To: SQL Server 2016

After you have selected an algorithm that meets your business needs, you can customize the mining model in the following ways to potentially improve results.

  • Use different columns of data in the model, or change the usage, content type, or discretization method for the columns.

  • Create filters on the mining model to restrict the data used in training the model.

  • Change the algorithm that was used to analyze data.

  • Set algorithm parameters to control thresholds, tree splits, and other important conditions.

This topic describes these options.

The decisions that you make about which columns of data to use in the model, and how to use and process that data, greatly affect the results of analysis. The following topics provide information to help you understand these choices.

Using Feature Selection

Most data mining algorithms in Analysis Services use a process called feature selection to select only the most useful attributes for addition to a model. Reducing the number of columns and attributes can improve performance and the quality of the model. The feature selection methods that are available differ depending on the algorithm that you choose.

Feature Selection (Data Mining).

Changing Usage

You can change which columns are included in a mining model and how each column is used. If you do not get the results you expect, you should example the columns you used as input, and ask yourself whether the columns are a good choice, and whether there is anything you can do to improve the handling of data, including:

  • Identifying categorical variables that have mistakenly labeled as numbers.

  • Adding categories to collapse the number of attributes and make it easier t find correlations.

  • Changing the way that numbers are binned, or discretized.

  • Removing columns that have a lot of unique values, or columns that are really reference data and not useful for analysis, such as addresses or middle names.

You don’t need to physically remove columns from the mining structure; you can just flag the column as Ignore. The column is removed from the mining model, but can still be used by other mining models in the structure, ore referenced in a drillthrough query.

Creating Aliases for Model Columns

When Analysis Services creates the mining model, it uses the same column names that are in the mining structure. You can add an alias to any column in the mining model. This might make it easier to understand the column contents or usage, or make the name shorter for convenience in creating queries. Aliases are also helpful when you want to create a copy of a column and name it something descriptive.

You create an alias by editing the Name property of the mining model column. Analysis Services continues to use the original name as the ID of the column, and the new value that you type for Name becomes the column alias, and appears in the grid in parentheses next to the column usage.

aliases on mining model columns

The graphic shows related models that have multiple copies of a mining structure column, all related to Income. Each copy of the structure column has been discretized in a different way. The models in the diagram each use a different column from the mining structure; however, for convenience in comparing the columns across the models, the column in each model has been renamed to [Income].

Adding Filters

You can add a filter to a mining model. A filter is a set of WHERE conditions that restrict the data in the model cases to some subset. The filter is used when training the model, and can optionally be used when you test the model or create accuracy charts.

By adding filters, you can reuse mining structures but create models based on very different subsets of the data. Or, you can simply use filters to eliminate certain rows and improve the quality of analysis.

For more information, see Filters for Mining Models (Analysis Services - Data Mining).

Although new models that you add to a mining structure share the same data set, you can get different results by using a different algorithm (if the data supports it), or by changing the parameters for the algorithm. You can also set modeling flags.

The choice of algorithm determines what kind of results you will get. For general information about how a specific algorithm works, or the business scenarios where you would benefit from using a particular algorithm, see Data Mining Algorithms (Analysis Services - Data Mining).

See the technical reference topic for each algorithm for a description of the requirements and restrictions, as well as detailed information about the customizations that each algorithm supports.

Microsoft Decision Trees AlgorithmMicrosoft Time Series Algorithm
Microsoft Clustering AlgorithmMicrosoft Neural Network Algorithm
Microsoft Naive Bayes AlgorithmMicrosoft Logistic Regression Algorithm
Microsoft Association AlgorithmMicrosoft Linear Regression Algorithm
Microsoft Sequence Clustering Algorithm

Each algorithm supports parameters that you can use to customize the behavior of the algorithm and fine-tune the results of your model. For a description of how to use each parameter, see the following topics:

The topic for each algorithm type also lists the prediction functions that can be used with models based on that algorithm.

Property nameApplies to
AUTO_DETECT_PERIODICITYMicrosoft Time Series Algorithm Technical Reference
CLUSTER_COUNTMicrosoft Clustering Algorithm Technical Reference

 Microsoft Sequence Clustering Algorithm Technical Reference
CLUSTER_SEEDMicrosoft Clustering Algorithm Technical Reference
CLUSTERING_METHODMicrosoft Clustering Algorithm Technical Reference
COMPLEXITY_PENALTYMicrosoft Decision Trees Algorithm Technical Reference

 Microsoft Time Series Algorithm Technical Reference
FORCE_REGRESSORMicrosoft Decision Trees Algorithm Technical Reference

 Microsoft Linear Regression Algorithm Technical Reference

 Modeling Flags (Data Mining)
FORECAST_METHODMicrosoft Time Series Algorithm Technical Reference
HIDDEN_NODE_RATIOMicrosoft Neural Network Algorithm Technical Reference
HISTORIC_MODEL_COUNTMicrosoft Time Series Algorithm Technical Reference
HISTORICAL_MODEL_GAPMicrosoft Time Series Algorithm Technical Reference
HOLDOUT_PERCENTAGEMicrosoft Logistic Regression Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference

Note: This parameter is different from the holdout percentage value that applies to a mining structure.
HOLDOUT_SEEDMicrosoft Logistic Regression Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference

Note: This parameter is different from the holdout seed value that applies to a mining structure.
INSTABILITY_SENSITIVITYMicrosoft Time Series Algorithm Technical Reference
MAXIMUM_INPUT_ATTRIBUTESMicrosoft Clustering Algorithm Technical Reference

 Microsoft Decision Trees Algorithm Technical Reference

 Microsoft Linear Regression Algorithm Technical Reference

 Microsoft Naive Bayes Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference

 Microsoft Logistic Regression Algorithm Technical Reference
MAXIMUM_ITEMSET_COUNTMicrosoft Association Algorithm Technical Reference
MAXIMUM_ITEMSET_SIZEMicrosoft Association Algorithm Technical Reference
MAXIMUM_OUTPUT_ATTRIBUTESMicrosoft Decision Trees Algorithm Technical Reference

 Microsoft Linear Regression Algorithm Technical Reference

 Microsoft Logistic Regression Algorithm Technical Reference

 Microsoft Naive Bayes Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference
MAXIMUM_SEQUENCE_STATESMicrosoft Sequence Clustering Algorithm Technical Reference
MAXIMUM_SERIES_VALUEMicrosoft Time Series Algorithm Technical Reference
MAXIMUM_STATESMicrosoft Clustering Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference

 Microsoft Sequence Clustering Algorithm Technical Reference
MAXIMUM_SUPPORTMicrosoft Association Algorithm Technical Reference
MINIMUM_IMPORTANCEMicrosoft Association Algorithm Technical Reference
MINIMUM_ITEMSET_SIZEMicrosoft Association Algorithm Technical Reference
MINIMUM_DEPENDENCY_PROBABILITYMicrosoft Naive Bayes Algorithm Technical Reference
MINIMUM_PROBABILITYMicrosoft Association Algorithm Technical Reference
MINIMUM_SERIES_VALUEMicrosoft Time Series Algorithm Technical Reference
MINIMUM_SUPPORTMicrosoft Association Algorithm Technical Reference

 Microsoft Clustering Algorithm Technical Reference

 Microsoft Decision Trees Algorithm Technical Reference

 Microsoft Sequence Clustering Algorithm Technical Reference

 Microsoft Time Series Algorithm Technical Reference
MISSING_VALUE_SUBSTITUTIONMicrosoft Time Series Algorithm Technical Reference
MODELLING_CARDINALITYMicrosoft Clustering Algorithm Technical Reference
PERIODICITY_HINTMicrosoft Time Series Algorithm Technical Reference
PREDICTION_SMOOTHINGMicrosoft Time Series Algorithm Technical Reference
SAMPLE_SIZEMicrosoft Clustering Algorithm Technical Reference

 Microsoft Logistic Regression Algorithm Technical Reference

 Microsoft Neural Network Algorithm Technical Reference
SCORE_METHODMicrosoft Decision Trees Algorithm Technical Reference
SPLIT_METHODMicrosoft Decision Trees Algorithm Technical Reference
STOPPING_TOLERANCEMicrosoft Clustering Algorithm Technical Reference

Data Mining Algorithms (Analysis Services - Data Mining)
Physical Architecture (Analysis Services - Data Mining)

Community Additions

ADD
Show:
© 2016 Microsoft