Data Mining Wizard

Article
12/03/2008

The Data Mining Wizard in Microsoft SQL Server 2005 Analysis Services (SSAS) starts every time that you add a new mining structure to a data mining project. The wizard defines new structures, and also defines the initial mining model for each structure. The structure of the initial model, including tables and columns, is derived from an existing data source view or cube.

You can complete the following tasks with the Data Mining Wizard:

Specify whether to build the data mining structure and models from a relational database or from an existing cube from an OLAP database.
Specify the algorithm to use in building the initial model.
Specify the data source view or cube dimension that contains the training data on which to build the model.
Define the case set, by specifying a case table and nested tables from a relational data source, or the case key and case level columns from an OLAP data source.
Define the usage for the columns in the case set.
Auto-detect and set the column content and data types.
Slice the cube, if your mining model is based on an OLAP data source.
Name the new mining structure and the initial mining model.
Create a new data mining dimension, if your mining model is based on an OLAP data source.

After you complete the Data Mining Wizard, you use Data Mining Designer to modify the mining structure and models, and to browse and predict against the models.

For More Information:Data Mining Designer

Using the Data Mining Wizard

To start the Data Mining Wizard, add a new mining structure to an Analysis Services project by using Solution Explorer or the Project menu in Business Intelligence Development Studio.

The Data Mining Wizard has two branches, one for each type of data source:

Relational Mining Models
OLAP Mining Models

Relational Mining Models

When you build a mining model from a relational data source in Analysis Services, you first specify in the Data Mining Wizard that you want to use an existing relational database to define the structure of the model. Next, you specify the data mining technique to use, by selecting the algorithm that is most appropriate for the type of data mining analysis that you want.

For More Information:Data Mining Algorithms

Specifying the Data Source View and Table Types

The next steps in the wizard are to select the specific data source view that you want to use to define the mining structure, and to specify a case table. The case table includes the training data that you will use to train the mining model. You can also specify tables to nest within the case table, if your data is contained in multiple tables.

For More Information:Nested Tables

Specifying the Column Usage

After you specify the case table and the nested tables, you determine the usage type for each column in the tables that you will include in the mining structure. If you do not specify a usage type for a column, the column will not be included in the mining structure. Data mining columns can be one of four types: key, input, predictable, or a combination of input and predictable. Key columns contain a unique identifier for each row in a table. Some mining models, such as sequence clustering and time series models, can contain multiple key columns. Input columns provide the information from which predictions are made. Predictable columns contain the information that you try to predict in the mining model.

For example, a series of tables may contain customer IDs, demographic information, and the amount of money each customer spends at a specific store. The customer ID uniquely identifies the customer and also relates the case table to the nested tables, therefore you would use it as the key column. You could use a selection of columns from the demographic information as input columns, and the column that describes the amount of money each customer spends as a predictable column. You could then build a mining model that relates demographics to how much money a customer spends in a store. You could use this model as the basis for targeted marketing.

The Data Mining Wizard provides the Suggest feature, which is enabled when you select a predictable column. Datasets frequently contain more columns than you want to use to build a mining model. The Suggest feature calculates a numeric score, from 0 to 1, that describes the relationship between each column in the dataset and the predictable column. Based on this score, the feature suggests columns to use as input for the mining model. If you use the Suggest feature, you can use the suggested columns, modify the selections to fit your needs, or ignore the suggestions.

Specifying the Content and Data Types

After you select one or more predictable columns and input columns, you can specify the content and data types for each column.

For More Information:Data Types (Data Mining), Content Types (Data Mining)

Completing the Wizard

OLAP Mining Models

When you build a multidimensional mining model from an OLAP data source in Analysis Services, you first specify in the Data Mining Wizard that you want to use an existing cube to define the structure of the model. Next, you specify the data mining technique to use, by selecting the algorithm that is most appropriate for the type of data mining analysis that you want.

For More Information:Data Mining Algorithms

Specifying the Data Source and Case Key

Next, you select the cube dimension to use as the data source to define the mining structure. Then you select an attribute to use as the key, or case key, of the mining model.

Note

The OLAP mining model that you are building and the source cube that you use to create the model must be contained in the same Analysis Services database.

Specifying Case Level Columns and Column Usage

After you select a case key, the attributes and measures that are associated with that key are displayed in a tree view on the next page of the wizard. From this list, you select the attributes and measures to be used as the columns of the structure. These columns are known as case level columns. As with a relational model, you must also specify how each column should be used in the structure, which you can do on the next page of the wizard. Columns can be key, input, predictable, input and predictable, or unselected.

Adding Nested Tables

The OLAP branch of the Data Mining Wizard includes the option to add nested tables to the mining model structure. On the Specify Mining Model Column Usage page of the wizard, click Add Nested Tables to open a separate dialog box that guides you through the steps to add nested tables. Only the measure groups that apply to the dimension are displayed. Select a measure group that contains the foreign key of the case dimension. Next, specify the usage for each column in the measure group, either input or predictable. The wizard then adds the nested table to the case table. The default name for the nested table is the nested dimension name, but you can rename the nested table and its columns. For More Information:Nested Tables

Specifying the Content and Data Types

After you select one or more predictable columns and input columns, you can specify the content and data types for each column.

For More Information:Data Types (Data Mining), Content Types (Data Mining)

Slicing the Source Cube

In the OLAP branch of the wizard, you can limit the scope of your mining model by slicing the source cube before you train the mining model. Slicing the cube is similar to adding a WHERE clause to an SQL statement. For example, if a cube contains information about the purchase of products, you might limit an age attribute to more than 30, a gender column to only female, and a purchase date to no earlier than March 2000. In such a way you can limit the model to cover the scope of a female who is older than 30 years and who bought a product after March, 2000.

Completing the Wizard

The last step in the wizard is to name the mining structure and the associated mining model. If you select Allow drill through, the drill through functionality is enabled in the model. This lets you explore the source data that is used to build the model. You can also specify, based on the mining model, whether you want to add a new dimension to the source cube, or create a new cube from the mining model.

Data Mining Wizard

Using the Data Mining Wizard

Relational Mining Models

Specifying the Data Source View and Table Types

Specifying the Column Usage

Specifying the Content and Data Types

Completing the Wizard

OLAP Mining Models

Specifying the Data Source and Case Key

Specifying Case Level Columns and Column Usage

Adding Nested Tables

Specifying the Content and Data Types

Slicing the Source Cube

Completing the Wizard

See Also

Concepts

Other Resources

Help and Information

Additional resources