Article
10/01/2010

Querying a Linear Regression Model (Analysis Services - Data Mining)

When you create a query against a data mining model, you can create a content query, which provides details about the patterns discovered in analysis, or you can create a prediction query, which uses the patterns in the model to make predictions for new data. For example, a content query might provide additional details about the regression formula, while a prediction query might tell you if a new data point fits the model. You can also retrieve metadata about the model by using a query.

This section explains how to create queries for models that are based on the Microsoft Linear Regression algorithm. For more information about the structure of a linear regression model, see Mining Model Content for Linear Regression Models (Analysis Services - Data Mining).

Note

Because linear regression is based on a special case of the Microsoft Decision Trees algorithm, some decision tree models that use continuous predictable attributes can contain regression formulas. For more information, see Microsoft Decision Trees Algorithm Technical Reference.

Content Queries

Returning Model Parameters by Using the Data Mining Schema Rowset

Returning the Regression Formula for the Model by Using DMX

Returning the Coefficient for the Model
Prediction Queries

Making Predictions by Using a Singleton Query

Returning Descriptive Statistics from a Regression Model

To reproduce the results for the sample queries, you can create a linear regression model as described in the following topic. The model is very simple but demonstrates basic concepts of how to use the Data Mining Designer to customize a linear regression model.

How to: Force the Use of Regressors in a Model

Finding Information about the Linear Regression Model

The structure of a linear regression model is extremely simple: the mining model represents the data as a single node, and that node defines the regression formula.

This section provides samples of how to get more information about the model itself, including the regression formula and descriptive statistics about the data.

Sample Query 1: Returning Model Parameters by Using the Data Mining Schema Rowset

By querying the data mining schema rowset, you can find metadata about the model. This might include when the model was created, when the model was last processed, the name of the mining structure that the model is based on, and the name of the column designated as the predictable attribute. You can also return the parameters that were used when the model was first created.

SELECT MINING_PARAMETERS 
FROM $system.DMSCHEMA_MINING_MODELS
WHERE MODEL_NAME = 'Call Center Regression'

Sample results:

MINING_PARAMETERS
MINING_PARAMETERS MAXIMUM_INPUT_ATTRIBUTES=255, MAXIMUM_OUTPUT_ATTRIBUTES=255, FORCE_REGRESSOR=[Average Time Per Issue],[Total Operators]

MINING_PARAMETERS

MAXIMUM_INPUT_ATTRIBUTES=255, MAXIMUM_OUTPUT_ATTRIBUTES=255, FORCE_REGRESSOR=[Average Time Per Issue],[Total Operators]

Note

If you have not specified any regressors, the parameter, FORCE_REGRESSOR, has the value "FORCE_REGRESSOR = ".

Sample Query 2: Returning the Regression Formula for the Model by Using DMX

The following query returns the mining model content for the linear regression model. The query returns the contents of the node that contains the regression formula. You can also use this query in a decision trees model if one of the nodes contains a regression formula.

Each variable and coefficient is stored in a separate row of the nested NODE_DISTRIBUTION table. If you want to view the complete regression formula, use the Microsoft Tree Viewer, click the (All) node, and open the Mining Legend.

SELECT FLATTENED NODE_DISTRIBUTION as t
FROM [Call Center Regression].CONTENT

Note

If you reference individual columns of the nested table by using a query such as SELECT <column name> from NODE_DISTRIBUTION, some columns, such as SUPPORT or PROBABILITY, must be enclosed in brackets to distinguish them from reserved keywords of the same name.

Expected results:

t.ATTRIBUTE_NAME	t.ATTRIBUTE_VALUE	t.SUPPORT	t.PROBABILITY	t.VARIANCE	t.VALUETYPE
Service Grade	Missing	0	0	0	1
Service Grade	0.09875	120	1	0.00157927083333334	3
Average Time Per Issue	0.00136989326310586	0	0	187.866597222222	7
Average Time Per Issue	12.0822151449249	0	0	0	8
Average Time Per Issue	79.8416666666667	0	0	187.866597222222	9
Total Operators	-0.000426156789860463	0	0	24.0799305555556	7
Total Operators	-3.19762422385219	0	0	0	8
Total Operators	10.6916666666667	0	0	24.0799305555556	9
	-0.00606823493688524	0	0	0.00121526993847281	11

In comparison, in the Mining Legend, the regression formula appears as follows:

Service Grade = 0.070+0.001*(Average Time Per Issue-79.842)-0.0004*(Total Operators-10.692)

In the Mining Legend, some numbers might be rounded; however, the NODE_DISTRIBUTION table and the Mining Legend essentially contain the same values.

The values in the VALUETYPE column tell you what kind of information is contained in each row, which is useful if you are processing the results programmatically. The following table shows the value types that are output for a linear regression formula.

VALUETYPE
1 (Missing)
3 (Continuous)
7 (Coefficient)
8 (Score Gain)
9 (Statistics)
7 (Coefficient)
8 (Score Gain)
9 (Statistics)
11 (Intercept)

For more information about the meaning of each value type for regression models, see Mining Model Content for Linear Regression Models (Analysis Services - Data Mining).

Sample Query 3: Returning the Coefficient for the Model

By using the VALUETYPE enumeration, you can return only the coefficient for the regression equation, as shown in the following query:

SELECT FLATTENED MODEL_NAME,
    (SELECT ATTRIBUTE_VALUE, VALUETYPE
     FROM NODE_DISTRIBUTION
     WHERE VALUETYPE = 11) 
AS t
FROM [Call Center Regression].CONTENT

This query returns two rows, one from the mining model content, and the row from the nested table that contains the coefficient. The ATTRIBUTE_NAME column is not included here because it is always blank for the coefficient.

MODEL_NAME	t.ATTRIBUTE_VALUE	t.VALUETYPE
Call Center Regressors2	-0.00606823493688524	11

Making Predictions Using the Model

You can build prediction queries on linear regression models by using the Mining Model Prediction tab in Data Mining Designer. The prediction query builder is available in both SQL Server Management Studio and Business Intelligence Development Studio.

Note

You can also create queries on regression models by using the SQL Server 2005 Data Mining Add-ins for Excel or the SQL Server 2008 Data Mining Add-ins for Excel. Even though the Data Mining Add-ins for Excel do not create regression models, you can browse and query any mining model that is stored on an instance of Analysis Services.

Sample Query 4: Making Predictions by Using a Singleton Query

The easiest way to create a singleton query on a regression model is by using the Singleton Query Input dialog box. For example, you can build the following DMX query by selecting the appropriate regression model, choosing Singleton Query, and then typing 10 as the value for Total Operators.

SELECT
  Predict([Call Center Regression].[Service Grade])
FROM
  [Call Center Regression]
NATURAL PREDICTION JOIN
(SELECT 10 AS [Total Operators]) AS t

Sample results:

Yearly Income
0.0992841946529471

Sample Query 5: Returning Descriptive Statistics from a Regression Model

You can use many of the standard prediction functions with linear regression models. The following example illustrates how to add some descriptive statistics to the prediction query results.

SELECT
  Predict([Call Center Regression].[Service Grade]) as [Predicted Service],
  PredictStdev([Call Center Regression].[Service Grade]) as [Standard Deviation]
FROM
  [Call Center Regression]
NATURAL PREDICTION JOIN
(SELECT 10 AS [Total Operators]) AS t

Sample results:

Predicted Service	Standard Deviation
0.0990447584463201	0.0348607220015996

List of Prediction Functions

All Microsoft algorithms support a common set of functions. However, the Microsoft Linear Regression algorithm supports the additional functions listed in the following table.

IsDescendant (DMX)	PredictStdev (DMX)
IsInNode (DMX)	PredictSupport (DMX)
PredictHistogram (DMX)	PredictVariance (DMX)
PredictNodeId (DMX)

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms (Analysis Services - Data Mining). For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.

Change History

Updated content
Added in-topic links to make it easier to review query samples.
Changed sample to use a model with a better fit. Added link to new topic that describes how to build the sample linear regression model.

Querying a Linear Regression Model (Analysis Services - Data Mining)

Finding Information about the Linear Regression Model

Sample Query 1: Returning Model Parameters by Using the Data Mining Schema Rowset

Sample Query 2: Returning the Regression Formula for the Model by Using DMX

Sample Query 3: Returning the Coefficient for the Model

Making Predictions Using the Model

Sample Query 4: Making Predictions by Using a Singleton Query

Sample Query 5: Returning Descriptive Statistics from a Regression Model

List of Prediction Functions

See Also

Reference

Concepts

Change History

Additional resources