What's New (Analysis Services - Data Mining)
This latest release of Microsoft SQL Server Analysis Services introduces new features and enhancements.
When you create a mining structure, you can now divide the data in the mining structure into training and testing sets. The definition of the partition is stored with the structure, so that you can reuse the training and testing sets with any mining models that are based on that structure.
The data is divided randomly between the training and testing partitions, but you can also specify the seed used to create the partitions if you need to re-create a partition.
You specify the size of the testing set either as a percentage of the total number of rows, as a maximum number of rows, or as a combination of these requirements. For more information about how to use training and testing data sets, see Partitioning Data into Training and Testing Sets (Analysis Services - Data Mining).
For more information about all the model validation features in SQL Server 2008, see Validating Data Mining Models (Analysis Services - Data Mining).
You can now attach filters to a mining model, and apply the filter during both training and testing. Applying a filter to the model lets you control the data that is used to train the model, and lets you more easily assess the performance of the model on subsets of the data.
For example, you might want to know the accuracy of a targeted mailing model for only those customers over a certain income level. You can apply the filter on the Income column of the mining model when you create the lift chart, and see results for that demographic only.
Business Intelligence Development Studio also provides new filter editors to help you build complex conditions that can apply to both case tables and nested tables.
For more information about how to create mining model filters, see Creating Filters for Mining Models (Analysis Services - Data Mining).
For information about filtering data for mining model testing, see Tools for Charting Model Accuracy (Analysis Services - Data Mining).
Cross-validation is an established method of assessing the accuracy of data mining models. In cross-validation, you iteratively partition the mining structure data into subsets, build models on the subsets, and then measure the accuracy of the model for each partition. By reviewing the returned statistics, you can determine how reliable the mining model is, and more easily compare models that are based on the same structure.
Cross-validation is available in the Mining Accuracy Chart view of Data Mining Designer. You can also partition a mining structure, test multiple mining models, and generate an analysis by using Analysis Services stored procedures.
In SQL Server 2008, to generate a cross-validation report, you specify the mining structure and the predictable attribute, and then specify the number of folds into which to segment the case data.
Analysis Services returns a table that reports statistics such as likelihood or root mean square error for individual partitions, and the mean and standard deviation of all measures for the aggregate models. For more information, Cross-Validation (Analysis Services - Data Mining).
SQL Server 2008 supports the creation, management, and use of data mining models from Microsoft Excel when you use the SQL Server 2008 Data Mining Add-ins for Office 2007. The newest version of this popular free add-in has been enhanced by adding support for server-side training and test partitions, cross-validation, and several new analytic tools such as shopping basket analysis and a printable prediction calculator.
You can also use the Document Model wizard to easily create documentation for structures and models that are stored in an instance of SQL Server 2008. For more information about the add-ins, see Data Mining Add-ins for Office 2007..
To improve the accuracy and stability of some predictions in time series models, a new algorithm has been added to the Microsoft Time Series algorithm. Based on the well-known ARIMA algorithm, the new algorithm provides better long-term predictions than the ARTxp algorithm that Analysis Services has been using. (ARTxp is an auto-regressive tree algorithm that is optimized for either a single time slice or short-term predictions.)
By default, the new implementation of the Microsoft Time Series algorithm uses the ARTxp algorithm to train one version of the model and the ARIMA algorithm to train another version. The algorithm then weights the results of these two models to provide the prediction characteristics that you prefer. If you do not want to use this default implementation, you can specify that the Microsoft Time Series algorithm use only the ARTxp or the ARIMA algorithm. In SQL Server 2008 Enterprise, you can specify a custom weighting of the algorithms to provide the best prediction over a variable time span.
The Microsoft Time Series algorithm also now accepts data during prediction to enable new business scenarios. For example, you can create a revenue prediction model that is based on averages across products, regional aggregates, or some other broad data set. You can then apply that model to the time series that shows sales of an individual product. By applying the general model, you can take advantage of the stability and availability of aggregate data and customize prediction to the individual product.
You could also train models by using multiple series and then apply the models to new data to predict "what if" scenarios.
In SQL Server 2008, if you enable drillthrough on a mining structure, you can query the mining structure and return details about the cases used for both training and testing. You can create drillthrough queries on a structure by using Data Mining Extensions (DMX).
Moreover, if drillthrough is enabled on a mining model and the related mining structure, you can create queries that retrieve columns of data from the underlying mining structure. This is useful if you want to discover detailed information about cases in a particular node. For example, you could retrieve contact information for customers in a particular cluster.
For more information, see Using Drillthrough on Mining Models and Mining Structures (Analysis Services - Data Mining).
For examples of DMX queries on a mining structure, see SELECT FROM <structure>.CASES.
For examples of drillthrough from a model to structure data, see SELECT FROM <model>.CASES (DMX).
You can now add aliases to columns in a mining model to make it easier to understand column content and reference the column in DMX statements.
For example, if you build a mining structure that contains continuous and discretized versions of the same data, you might give both columns the same name for easier comparison.
For information about how to create a column alias by using DMX, see ALTER MINING STRUCTURE (DMX).
In SQL Server 2008. many of the existing OLE DB data mining schema rowsets have been exposed as a set of system tables that you can easily query by using DMX statements. This makes it easy to retrieve metadata related to models and structures, to extract details from the mining model content, or to monitor an Analysis Services instance or service.
For more information, see Querying the Data Mining Schema Rowsets (Analysis Services - Data Mining).
Books Online no longer includes SQL Server sample databases and sample applications. These sample databases and sample applications are now available on the SQL Server Samples Web site. This Web site makes it easier for users to find these samples, and provides additional new samples that are related to Microsoft SQL Server and Business Intelligence. On the SQL Server Samples Web site, you can do the following:
Browse through samples contributed by developers, users, and the Microsoft Most Valuable Professional (MVP) community.
Download both sample databases and code projects.
View or participate in a discussion area where you can report issues and ask questions about the samples for each technology area.
SQL Server 2008 Analysis Services can now be installed side-by-side with SQL Server 2005 Analysis Services (SSAS). For more information, see Working with Multiple Versions and Instances of SQL Server.
For important side-by-side installation issues that affect Analysis Services, see the Readme that accompanies this release.
The ability to back up and restore an Analysis Services database has been enhanced. There are fewer restrictions on the size of the database, and the time that is required for backup and restore operations has been greatly reduced.
For more information, see What's New (Analysis Services - Multidimensional Database).
If you use Analysis Services to create OLAP cubes that you also use for data mining, you might find it much easier to design dimensions and their related hierarchies and attributes. The Dimension Designer includes a new Attribute Relationship designer that helps you design attribute relationships and make sure that attribute relationships follow best practices.
For more information, see What's New (Analysis Services - Multidimensional Database).