Logical Architecture (Analysis Services - Data Mining)
Topic Status: Some information in this topic is preview and subject to change in future releases. Preview information describes new features or changes to existing features in Microsoft SQL Server 2016 Community Technology Preview 2 (CTP2).
Data mining is a process that involves the interaction of multiple components.
You access sources of data in a SQL Server database or any other data source to use for training, testing, or prediction.
You define data mining structures and models by using SQL Server Data Tools (SSDT) or Visual Studio.
You manage data mining objects and create predictions and queries by using SQL Server Management Studio.
When the solution is complete, you deploy it to an instance of Analysis Services.
The process of creating these solution objects has already been described elsewhere. For more information, see Data Mining Solutions.
The following sections describe the logical architecture of the objects in a data mining solution.
The data that you use in data mining is not stored in the data mining solution; only the bindings are stored. The data might reside in a database created in a previous version of SQL Server, a CRM system, or even a flat file. When you train the structure or model by processing, a statistical summary of the data is created and stored in a cache that can be persisted for use in later operations, or deleted after processing. For more information, see Mining Structures (Analysis Services - Data Mining).
You combine disparate data within the Analysis Services data source view (DSV) object, which provides an abstraction layer on top of your data source. You can specify joins between tables, or add tables that have a many-to-one relationship to create nested table columns. The definition of these objects, the data source and the data source view, are stored within the solution with the file name extensions, *.ds and *.dsv. For more information about creating and using Analysis Services data sources and data source views, see Supported Data Source Types (SSAS Multidimensional).
You can also define and alter data sources and data source views by using AMO or XMLA. For more information about working with these objects programmatically, see Logical Architecture Overview (Analysis Services - Multidimensional Data).
A data mining structure is a logical data container that defines the data domain from which mining models are built. A single mining structure can support multiple mining models.
When you need to use the data in the data mining solution, Analysis Services reads the data from the source and generates a cache of aggregates and other information. By default this cache is persisted so that training data can be reused to support additional models. If you need to delete the cache, change the CacheMode property on the mining structure object to the value, ClearAfterProcessing. For more information, see AMO Data Mining Classes.
SQL Server 2016 Analysis Services (SSAS) also provides the ability to separate your data into training and testing data sets, so that you can test your mining models on a representative, randomly selected set of data. The data is not actually stored separately; rather, case data in the structure cache is marked with a property that indicates whether that particular case is used for training or for testing. If the cache is deleted, that information cannot be retrieved.
For more information, see Mining Structures (Analysis Services - Data Mining).
A data mining structure can contain nested tables. A nested table provides additional detail about the case that is modeled in the primary data table. For more information, see Nested Tables (Analysis Services - Data Mining)
Before processing, a data mining model is only a combination of metadata properties. These properties specify a mining structure, specify a data mining algorithm, and a define collection of parameter and filter settings that affect how the data is processed. For more information, see Mining Models (Analysis Services - Data Mining).
When you process the model, the training data that was stored in the mining structure cache is used to generate patterns, based both on statistical properties of the data and on heuristics defined by the algorithm and its parameters. This is known as training the model.
The result of training is a set of summary data, contained within the model content, which describes the patterns that were found and provides rules by which to generate predictions. For more information, see Mining Model Content (Analysis Services - Data Mining).
In limited cases, the logical structure of the model can also be exported into a file that represents model formulas and data bindings according to a standard format, the Predictive Modeling Markup Language (PMML). This logical structure can be imported into other systems that utilize PMML and the model so described can then be used for prediction. For more information, see Understanding the DMX Select Statement.
Other objects that you use in the context of a data mining project, such as accuracy charts or prediction queries, are not persisted within the solution, but can be scripted using ASSL or built using AMO.
Additionally, you can extend the services and features available on an instance of Analysis Services by adding these custom objects: