Data Mining Projects (Analysis Services - Data Mining)

When you develop a data mining solution in Analysis Services, you first create an Analysis Services project. Within this project, you define the source of data that you will use for analysis, and then set up a model that includes an algorithm and custom instructions for handling the data. You can also continue to test and refine the model within the project. When you are satisfied with the solution, you can deploy it to another server or use it in an application to provide predictions and analysis.

The following sections outline the tools and processes for creating a data mining solution, and provide links to resources to use for each step.

Creating an Analysis Services Project

When you develop a data mining solution, you must first create a new Analysis Services project by using Business Intelligence Development Studio. Each data mining project contains the following four kinds of objects: data sources; data source views, which are based on the data sources; mining structures, which define how the data is used in the model; and mining models, which create and store patterns.

For More Information:Defining an Analysis Services Project, Defining a Data Source Using the Data Source Wizard (Analysis Services)

Defining a Data Source

The data source defines the connection string and authentication information that the Analysis Services server will use to connect to the data source. The data source can contain multiple tables or views. Analysis Services can use datasets from both relational and Online Analytical Processing (OLAP) databases, or from external providers.

After you have defined this connection to a data source, you create a view that identifies the specific data that is relevant to your model. The data source view also enables you to customize the way that the data in the data source is supplied to the mining model. You can modify the structure of the data to make it more relevant to your project, or choose only certain kinds of data. If you want to filter the data, you can do so in the data source view, or in filters that are applied at the level of the model.

The requirements for how much data you will need, and how that data should be cleaned and formatted, will differ depending on the algorithm that you use to investigate that data.

For More Information:Defining a Data Source View (Analysis Services)

Adding Mining Structures to an Analysis Services Project

Once you have enough data to begin analysis, you select the columns of data that are most relevant to your business problem, and add mining structures to the project. A mining structure defines the columns of data, and columns with nested tables, that are obtained from the data source view or from an OLAP cube in the project.

To add a new mining structure, you start the Data Mining Wizard, which walks you through the process of defining the data and optionally creating an initial data mining model. When you create a structure, you can also partition your data to include a training data set, used for building models, and a testing data set, which can be used to test or validate all mining models that are based on that structure. You can use the Mining Structure tab of Data Mining Designer to modify existing mining structures, including adding columns and nested tables.

For More Information:Creating a New Mining Structure, Data Mining DesignerData Mining Wizard (Analysis Services - Data Mining)

Working with Data Mining Models

To each mining structure, you add one or more mining models. The mining model defines the algorithm, or the method of analysis that you will use on the data. You process each model by running the data in the data source view through the algorithm, which generates a mathematical model of the data. This process is also known as training the model.

After the model has been processed, you can then visually explore the mining model and create prediction queries against it.

Analysis Services provides several options for processing mining model objects, including the ability to control which objects are processed and how they are processed. For example, you can process a structure and cache the data, and then continue to add new models to the structure. If the data is cached, you can use drillthrough queries to return detailed information about the cases used in the model.

For More Information:Data Mining Algorithms (Analysis Services - Data Mining), Processing Analysis Services Objects, Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).

Validating Data Mining Models

After you have created a model, you can investigate the results and make decisions about which models perform the best. On the Mining Model Viewer tab in Data Mining Designer, Analysis Services provides viewers for each mining model type, which you can use to explore the mining models.

In the Mining Accuracy Chart tab of the designer, Analysis Services provides tools that you can use to directly compare mining models and choose the most accurate or useful mining model. These tools include a lift chart, profit chart, and a classification matrix. 

You can also use the cross-validation report, new in SQL Server 2008, to perform iterative subsampling of your data to determine whether the model is biased to a particular set of data. The statistics that the report provides can be used to objectively compare models and assess the quality of your training data.

For More Information:Viewing a Data Mining Model, Validating Data Mining Models (Analysis Services - Data Mining)

Creating Predictions

The main goal of most data mining projects is to use a mining model to create predictions. After you explore and compare mining models, you can use one of several tools to create predictions. Analysis Services provides a query language called Data Mining Extensions (DMX) that is the basis for creating predictions and is easily scriptable. To help you build DMX prediction queries, SQL Server provides a query builder, available in SQL Server Management Studio and Business Intelligence Development Studio, and DMX templates for the query editor in Management Studio. Within BI Development Studio, you access the query builder from the Mining Model Prediction tab of Data Mining Designer.

For More Information:Creating DMX Prediction Queries, Data Mining Extensions (DMX) Statement Reference

SQL Server Management Studio

After you have used BI Development Studio to build mining models for your data mining project, you can manage and work with the models and create predictions in Management Studio. By using the query tools in SQL Server Management Studio, you can explore the data in your models, create complex content queries, or manage data mining objects stored in an instance of SQL Server.

For More Information:Data Mining in SQL Server Management Studio

SQL Server Reporting Services

After you create a mining model, you may want to distribute the results to a wider audience. Because the results of data mining are stored in a consistent schema that is readily accessible via database queries, you can use a variety of client tools to present the results of analysis, to explore the patterns in the model, or to make predictions.

You can use Report Designer in Microsoft SQL Server Reporting Services to create reports, which you can use to present the information that a mining model contains. You can use the result of any DMX query as the basis of a report, and can take advantage of the parameterization and formatting features that are available in Reporting Services.

For More Information:Using the Analysis Services DMX Query Designer (Reporting Services), Integrating Reporting Services into Applications

Working Programmatically with Data Mining

Analysis Services provides several tools that you can use to programmatically work with data mining. The DMX language provides statements that you can use to create, train, and use data mining models. You can also perform these tasks by using a combination of XML for Analysis (XMLA) and Analysis Services Scripting Language (ASSL), or by using Analysis Management Objects (AMO).

You can access all the metadata that is associated with data mining by using data mining schema rowsets. For example, you can use schema rowsets to determine the data types that an algorithm supports, or the model names that exist in a database.

For More Information:Data Mining Extensions (DMX) Reference, Data Mining Schema Rowsets, Using XML for Analysis in Analysis Services (XMLA)