Creating Filters for Mining Models (Analysis Services - Data Mining)
Data-based model filtering helps you create mining models that use subsets of data in a mining structure. Filtering gives you flexibility when you design your mining structures and data sources, because you can create a single mining structure, based on a comprehensive data source view. You can then create filters to use only a part of that data for training and testing a variety of models, instead of building a different structure and related model for each subset of data.
For example, you define the data source view on the Customers table and related tables. Next, you define a single mining structure that includes all the fields you need. Finally, you create a model that is filtered on a particular customer attribute, such as Region. You can then easily make a copy of that model, and change just the filter condition to generate a new model based on a different region.
Some real-life scenarios where you might benefit from this feature include the following:
Creating separate models for discrete values such as gender, regions, and so forth. For example, a clothing store might use customer demographics to build separate models by gender, even though the sales data comes from a single data source for all customers.
Experimenting with models by creating and then testing multiple groupings of the same data, such as ages 20-30 vs. ages 20-40 vs. ages 20-25.
Specifying complex filters on nested table contents, such as requiring that a case be included in the model only if the customer has purchased at least two of a particular item.
This section explains how to build, use, and manage filters on mining models.
You can create and apply filters in the following ways:
Using the Mining Models tab in Data Mining Designer to build conditions with the help of filter editor dialog boxes.
Typing a filter expression directly into the Filter property of the mining model.
Setting filter conditions on a model programmatically, by using AMO.
Creating Model Filters using Data Mining Designer
You filter a model in Data Mining Designer by changing the Filter property of the mining model. You can either type a filter expression directly into the Properties pane, or you can open a filter dialog box to build conditions.
There are two filter dialog boxes. The first lets you create conditions that are applied to the case table. If the data source contains multiple tables, first you select a table, and then you select a column and specify operators and conditions that apply to that column. You can link multiple conditions by using AND/OR operators. The operators that are available for defining values depend on whether the column contains discrete or continuous values. For example, with continuous values, you can use greater than and less than operators. However, for discrete values, you can only use = (equal to), != (not equal to), and is null operators.
The LIKE keyword is not supported. If you want to include multiple discrete attributes, you must create separate conditions and link them by using the OR operator.
If the conditions are complex, you can use the second filter dialog box to work with one table at a time. When you close the second filter dialog box, the expression is evaluated and then combined with filter conditions that have been set on other columns in the case table.
Creating Filters on Nested Tables
If the data source view contains nested tables, you can use the second filter dialog box to build conditions on the rows in the nested tables.
For example, if your case table is related to customers, and the nested table shows the products that a customer has purchased, you can create a filter for customers who have purchased particular items by using the following syntax in the nested table filter: [ProductName]=’Water Bottle’ OR ProductName=’Water Bottle Cage'.
You can also filter on the existence of a particular value in the nested table by using the EXISTS or NOT EXISTS keywords and a subquery. This lets you create conditions such as EXISTS (SELECT * FROM Products WHERE ProductName=’Water Bottle’). The EXISTS SELECT(<subquery>) returns true if the nested table contains at least one row that includes the value, Water Bottle.
You can combine conditions on the case table with conditions on the nested table. For example, the following syntax includes a condition on the case table (Age > 30 ), a subquery on the nested table (EXISTS (SELECT * FROM Products)), and multiple conditions on the nested table (WHERE ProductName=’Milk’ AND Quantity>2) ).
(Age > 30 AND EXISTS (SELECT * FROM Products WHERE ProductName=’Milk’ AND Quantity>2) )
When you have finished building the filter, the filter text is evaluated by Analysis Services, translated to a DMX expression, and then saved with the model.
For instructions on how to use the filter dialog boxes in Business Intelligence Development Studio, see How to: Apply a Filter to a Mining Model.
Data-based model filtering greatly simplifies the task of managing mining structures and mining models, because you can easily create multiple models that are based on the same structure. You can also quickly make copies of existing mining models and then change only the filter condition. The following are some considerations for managing model filters:
To determine the filter that is applied to a model, view the Filter property of the mining model.
You can rename the model to reflect the contents of the filter. This can make it easier to tell related models apart.
The filter expression is saved as a script that is stored with the associated mining model or nested table. If you delete the filter text, it can only be restored by manually re-creating the filter expression. Therefore, if you create complex filter expressions, you should create a backup copy of the filter text.
Whenever you change or add a filter expression, you must reprocess the structure and model before you can view the effects of the filter.
For more information about filter syntax, and examples of filter expressions, see Model Filter Syntax and Examples (Analysis Services - Data Mining).
For information about how to use model filters when you are testing a mining model, see Measuring Mining Model Accuracy (Analysis Services - Data Mining).