Microsoft Linear Regression Algorithm

The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm, where the MINIMUM_LEAF_CASES parameter is set to be greater than or equal to the total number of cases in the dataset that the algorithm uses to train the mining model. With the parameter set in this way, the algorithm will never create a split, and therefore performs a linear regression.

You can use linear regression to determine a relationship between two continuous columns. The relationship takes the form of an equation for a line that best represents a series of data. For example, the line in the following diagram is the best possible linear representation of the data.

A line that models a set of data

The equation that represents the line in the diagram takes the general form of y = ax + b, and is known as the regression equation. The variable Y represents the output variable, X represents the input variable, and a and b are adjustable coefficients. Each data point in the diagram has an error associated with its distance from the regression line. The coefficients a and b in the regression equation adjust the angle and location of the regression line. You can obtain the regression equation by adjusting a and b until the sum of the errors that are associated with points reaches the lowest number.

Using the Algorithm

Use the Microsoft Tree Viewer to explore a linear regression mining model.

A linear regression model must contain a key column, input columns, and at least one predictable column.

The Microsoft Linear Regression algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.

Input column content types

Continuous ,Cyclical, Key, Table, and Ordered

Predictable column content types

Continuous, Cyclical, and Ordered

Modeling flags

NOT NULL and REGRESSOR

All Microsoft algorithms support a common set of functions. However, the Microsoft Linear Regression algorithm supports additional functions, listed in the following table.

IsDescendant

PredictStdev

IsInNode

PredictSupport

PredictHistogram

PredictVariance

PredictNodeId

   

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.

The Microsoft Linear Regression algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.

Parameter Description

MAXIMUM_INPUT_ATTRIBUTES

Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.

The default is 255.

MAXIMUM_OUTPUT_ATTRIBUTES

Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.

The default is 255.

FORCED_REGRESSOR

Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm.

See Also

Concepts

Data Mining Algorithms
Data Mining Wizard
Feature Selection in Data Mining
Viewing a Mining Model with the Microsoft Tree Viewer

Other Resources

CREATE MINING MODEL (DMX)

Help and Information

Getting SQL Server 2005 Assistance