Using Integration Services with Data Mining
Data mining is frequently described as the process of extracting valid and actionable information from large databases. In other words, data mining helps you discover patterns and trends that exist in data. Microsoft SQL Server Analysis Services provides powerful data mining capabilities and tools for extracting and analyzing these patterns. Integration Services offers a variety of components to assist you in managing and querying these data mining models.
For more information about the data mining features in SQL Server Analysis Services, see Overview (Analysis Services - Data Mining) and Data Mining Projects (Analysis Services - Data Mining).
Some of the Integration Services components described in this topic are available only in the Enterprise and Developer editions of SQL Server.
A connection manager is a logical representation of a connection to a data source. You can use the Analysis Services connection manager to connect to an Analysis Services database in which mining structures and models are stored. All of the tasks and data flow components described on this topic require an Analysis Services connection manager.
Analysis Services Connection Manager
An Analysis Services connection manager enables a package to connect to a server that runs an Analysis Services database or to an Analysis Services project that provides access to mining structures and models. You can only connect to an Analysis Services project while developing packages in Business Intelligence Development Studio. At run time, packages connect to the server and the database to which you deployed the Analysis Services.
For more information, see Analysis Services Connection Manager.
Tasks are control flow elements that define units of work that are performed in a package. You can use the following tasks to create or modify a data mining structure or model, and to process or query a model.
Analysis Services Execute DDL Task
The Analysis Services Execute DDL task runs data definition language (DDL) statements that can create, drop, or alter mining structures and models. The DDL statements are represented as statements in Analysis Services Scripting Language (ASSL), and are framed in an XML for Analysis (XMLA) command.
For more information about this task, see Analysis Services Execute DDL Task. For more information about creating and altering Analysis Services objects by using XMLA, see Creating and Altering Objects (XMLA).
Analysis Services Processing Task
The Analysis Services Processing task automates the processing of Analysis Services objects, such as mining models.
Data Mining Query Task
The Data Mining Query task runs prediction queries based on data mining models built in Analysis Services. The prediction query creates a prediction for new data by using mining models and saves its output to a table or tables.
Transformations are the components in the data flow of a package that aggregate, merge, distribute, and modify data.
Data Mining Query Transformation
The Data Mining Query transformation performs prediction queries against data mining models. This transformation contains a query builder for creating Data Mining Extensions (DMX) queries. The query builder lets you create custom statements, using the DMX language, for evaluating the transformation input data against an existing mining model. Unlike the Data Mining Query task, which saves its output directly to a table or tables, the Data Mining Query transformation makes its output available to downstream components in the data flow of the package.
Destinations are the data flow components that load the output of a data flow into different types of data sources or create an in-memory dataset.
Data Mining Model Training Destination
The Data Mining Model Training destination trains data mining models by passing the data that the destination receives through the data mining model algorithms. During the training process, the algorithms calculate the patterns and relationships between elements in the data. After you have trained a new mining model, you can use the model to run prediction queries.
For more information, see Data Mining Model Training Destination.
Integration Services has many other components that may be useful as part of a data mining solution.
For training or testing your data mining models with a subset of your data, you can use:
Row Sampling transformation. Copies only a subset of rows based on a count that you specify. For more information, see Row Sampling Transformation.
Percentage Sampling transformation. Copies only a subset of rows based on a percentage that you specify. For more information, see Percentage Sampling Transformation.
For cleansing your data by removing duplicates and standardizing values, you can use:
Sort transformation. Removes duplicate rows while sorting data. For more information, see Sort Transformation.
Fuzzy Grouping transformation. Groups rows based on similarity rather than an exact match. For more information, see Fuzzy Grouping Transformation and How to: Identify Similar Data Rows by Using the Fuzzy Grouping Transformation.
Fuzzy Lookup transformation. Locates a match in a reference table based on similarity rather than an exact match. For more information, see Fuzzy Lookup Transformation.
To perform text mining, you can use:
Term Extraction transformation. Creates a table of the terms discovered in the source, after filtering out standard and user-defined noise words. For more information, see Term Extraction Transformation.
Term Lookup transformation. Matches terms in your input with terms in a reference table. For more information, see Term Lookup Transformation.