Creating a New Mining Structure

When you build a data mining solution in Microsoft SQL Server Analysis Services, the first step is to create a mining structure by using the Data Mining Wizard (Analysis Services - Data Mining) in Business Intelligence Development Studio. The mining structure defines the data domain from which mining models are built. All mining models are based on a structure.

Mining structures use either relational or online analytical processing (OLAP) data sources. Relational mining structures are based on data that is stored in tables, files, or relational database systems and that has been defined as a data source view. OLAP mining structures are based on a dimension and related measures from an OLAP cube that exists on the same database as the mining structure.

For More Information:Designing Databases, Designing Analysis Services Multidimensional Database Objects

The Data Mining Wizard automatically defines a mining structure, and lets you add an initial mining model to the structure. Because a mining structure can contain multiple mining models, you can use Data Mining Designer to add more mining models to the structure.

The following sections provide more information about how to create new mining structures with the Data Mining Wizard, and how to set options on the mining structure that will let you create a test set or run queries on the data in the mining structure.

  • Creating a Relational Mining Structure

  • Creating an OLAP Mining Structure

  • Enabling Holdout and Drillthrough

Creating a Relational Mining Structure

Relational mining structures can be based on any data that is available through an OLE DB data source. If the source data is contained within multiple tables, you can create a data source view that brings together the tables and columns that you need. If the tables include any one-to-many relationships -- for example, you have multiple purchase records for each customer that you want to analyze -- you can add both tables and use one table as the case table and data from the many side of the relationship as nested tables.

For More Information:Nested Tables (Analysis Services - Data Mining)

The Data Mining Wizard guides you through the following steps to create the structure for a new mining model:

  1. Selecting a data source type, in this case a relational database.

  2. Deciding whether to build just a structure, or a structure with a mining model.

  3. Selecting an algorithm for the model.

  4. Selecting a data source.

  5. Selecting a case table and, optionally, any nested tables.

  6. Selecting the type for each column; predictable, input, or key.

  7. Specifying the column content types.

  8. Specifying an optional holdout data set.

  9. Enabling drillthrough on the structure; naming and saving the new mining structure and the associated mining model.

For More Information:Data Mining Algorithms (Analysis Services - Data Mining), Mining Model Columns, Mining Structure Columns, Data Types (Data Mining), Content Types (Data Mining)

Creating an OLAP Mining Structure

OLAP cubes frequently contain so many members and dimensions that it can be difficult to know where to begin with data mining. To help identify the patterns that the cubes contain, typically you identify a single dimension of interest, and then begin to explore patterns related to that dimension. The following table lists several common OLAP data mining tasks, describes sample scenarios in which you might apply each task, and identifies the data mining algorithm to use for each task.

Task

Sample scenario

Algorithm

Group members into clusters

Segment a customer dimension based on customer member properties, the products that the customers buy, and the amount of money that the customers spend.

Microsoft Clustering Algorithm

Find interesting or abnormal members

Identify interesting or abnormal stores in a store dimension based on sales, profit, store location, and store size.

Microsoft Decision Trees Algorithm

Find interesting or abnormal cells

Identify store sales that go against typical trends over time.

Microsoft Time Series Algorithm

The Data Mining Wizard guides you through the following process to create the structure for a new mining model:

  1. Selecting a data source type, in this case a cube.

    Note

    The OLAP cube must exist within the same database as the mining structure. You cannot use a cube created by the PowerPivot for Excel add-in as a source for data mining.

  2. Selecting an algorithm.

  3. Selecting a source cube dimension.

  4. Selecting a case key.

  5. Selecting case columns.

  6. Selecting any nested tables.

  7. Selecting the usage for each column; predictable, input, or key.

  8. Specifying the column content types.

  9. Slicing the source cube.

  10. Creating an optional test data set.

  11. Naming and saving the new mining structure and the associated mining model.

You can set the following options on the last page of the wizard:

  • Allow drillthrough

  • Create mining model dimension

  • Create a cube using mining model dimension

If you choose to create a new mining model dimension in the source cube, you can include the information that the data mining algorithm finds in the OLAP data source. By creating a mining model dimension, you can browse and query the model content that has been stored in the form of a dimension. This option is available for models that are built by using the Microsoft Clustering, the Microsoft Decision Trees, and Microsoft Association Rules algorithms.

If you select the option to create a new cube, a new cube is defined on the database that includes the mining model dimension, and optionally any related dimensions. This lets you slice your fact data by the hierarchy that was discovered by the mining model.

For More Information:Data Mining Algorithms (Analysis Services - Data Mining), Mining Model Columns, Mining Structure Columns, Data Types (Data Mining), Content Types (Data Mining)

Enabling Drillthrough and Holdout

When you create the mining structure, you must also set two important options for working with the data: holdout and drillthrough. Holdout enables you to partition the data in the mining structure into a training set and a testing set, for use with all models associated with that structure. For more information, see Partitioning Data into Training and Testing Sets (Analysis Services - Data Mining).

Drillthrough lets you view source data in the mining structure by querying the mining model. This is useful when you are viewing the results of a mining model, and want to see additional details from the underlying cases. For example, you may want to find contact information, the cases that were used to a train a particular cluster, and so forth. To use drillthrough, you must enable it when you create the mining structure; you cannot enable it later. For more information, see Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).