Processing Data Mining Objects

A data mining object is only an empty container until it has been processed. Processing a data mining model is also called training.

Processing mining structures:   A mining structure gets data from an external data source, as defined by the column bindings and usage metadata, and reads the data. This data is read in full and then analyzed to extract various statistics. Analysis Services stores a compact representation of the data, which is suitable for analysis by data mining algorithms, in a local cache. You can either keep this cache or delete it after your models have been processed. By default, the cache is stored. For more information, see Process a Mining Structure.

Processing mining models:    A mining model is empty, containing definitions only, until it is processed. To process a mining model, the mining structure that it is based on must have been processed. The mining model gets the data from the mining structure cache, applies any filters that may have been created on the model, and then passes the data set through the algorithm to detect patterns. After the model is processed, the model stores only the results of processing, not the data itself. For more information, see Process a Mining Model.

The following diagram illustrates the flow of data when a mining structure is processed, and when a mining model is processed.

Processing of data: source to structure to model

After a mining structure has been processed, it contains a compact representation of the data for use in statistical analysis. If the cache has not been cleared, you can access the data in this cache in the following ways:

After a mining model has been processed, it contains only the patterns that were derived from analysis, and mappings from the model results to the cached training data. You can browse or query the model results, called model content, or you can query the model and structure cases, if they have been cached.

The model content for each mining model depends on the algorithm that was used to create it. For example, if one model is a clustering model and another is a decision trees model, the model content is very different even though the models use exactly the same data. For more information, see Mining Model Content (Analysis Services - Data Mining).

Processing requirements may differ depending on whether your mining models are based solely on relational data, or on multidimensional data source.

For relational data source, processing requires only that you create training data and run mining algorithms on that data. However, mining models that are based on OLAP objects, such as dimensions and measures, require that the underlying data be in a processed state. This may requires that the multidimensional objects be processed to populate the mining model.

For more information, see Processing Requirements and Considerations (Data Mining).

Community Additions