Microsoft Linear Regression Algorithm

Article
12/03/2008

The Microsoft Linear Regression algorithm is a variation of the Microsoft Decision Trees algorithm, where the MINIMUM_LEAF_CASES parameter is set to be greater than or equal to the total number of cases in the dataset that the algorithm uses to train the mining model. With the parameter set in this way, the algorithm will never create a split, and therefore performs a linear regression.

You can use linear regression to determine a relationship between two continuous columns. The relationship takes the form of an equation for a line that best represents a series of data. For example, the line in the following diagram is the best possible linear representation of the data.

A line that models a set of data

The equation that represents the line in the diagram takes the general form of y = ax + b, and is known as the regression equation. The variable Y represents the output variable, X represents the input variable, and a and b are adjustable coefficients. Each data point in the diagram has an error associated with its distance from the regression line. The coefficients a and b in the regression equation adjust the angle and location of the regression line. You can obtain the regression equation by adjusting a and b until the sum of the errors that are associated with points reaches the lowest number.

Using the Algorithm

Use the Microsoft Tree Viewer to explore a linear regression mining model.

A linear regression model must contain a key column, input columns, and at least one predictable column.

The Microsoft Linear Regression algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.

Input column content types	Continuous ,Cyclical, Key, Table, and Ordered
Predictable column content types	Continuous, Cyclical, and Ordered
Modeling flags	NOT NULL and REGRESSOR

All Microsoft algorithms support a common set of functions. However, the Microsoft Linear Regression algorithm supports additional functions, listed in the following table.

IsDescendant	PredictStdev
IsInNode	PredictSupport
PredictHistogram	PredictVariance
PredictNodeId

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.

The Microsoft Linear Regression algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.

Parameter	Description
MAXIMUM_INPUT_ATTRIBUTES	Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255.
MAXIMUM_OUTPUT_ATTRIBUTES	Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection. The default is 255.
FORCED_REGRESSOR	Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm.

MAXIMUM_INPUT_ATTRIBUTES

Defines the number of input attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.

The default is 255.

MAXIMUM_OUTPUT_ATTRIBUTES

Defines the number of output attributes that the algorithm can handle before it invokes feature selection. Set this value to 0 to turn off feature selection.

The default is 255.

FORCED_REGRESSOR

Forces the algorithm to use the indicated columns as regressors, regardless of the importance of the columns as calculated by the algorithm.

Microsoft Linear Regression Algorithm

Using the Algorithm

See Also

Concepts

Other Resources

Help and Information

Additional resources