Article
10/01/2010

Querying a Decision Trees Model (Analysis Services - Data Mining)

When you create a query against a data mining model, you can create a content query, which provides details about the patterns discovered in analysis, or you can create a prediction query, which uses the patterns in the model to make predictions for new data. For example, a content query for a decision trees model might provide statistics about the number of cases at each level of the tree, or the rules that differentiate between cases. Alternatively, a prediction query maps the model to new data in order to generate recommendations, classifications, and so forth. You can also retrieve metadata about the model by using a query.

This section explains how to create queries for models that are based on the Microsoft Decision Trees algorithm.

Content Queries

Retrieving Model Parameters from the Data Mining Schema Rowset

Returning Details about Trees in the Model by Using DMX

Returning Regression Nodes from a Decision Tree Model
Prediction Queries

Returning Predictions and Probabilities

Predicting Associations from a Decision Tree

Finding Information about a Decision Trees Model

To create meaningful queries on the content of a decision trees model, you should understand the structure of the model content, and which node types store what kind of information. For more information, see Mining Model Content for Decision Tree Models (Analysis Services - Data Mining).

Sample Query 1: Retrieving Model Parameters from the Data Mining Schema Rowset

By querying the data mining schema rowset, you can find metadata about the model, such as when it was created, when the model was last processed, the name of the mining structure that the model is based on, and the name of the column used as the predictable attribute. You can also return the parameters that were used when the model was first created.

select MINING_PARAMETERS 
from $system.DMSCHEMA_MINING_MODELS
WHERE MODEL_NAME = 'TM_Decision Tree'

Sample results:

MINING_PARAMETERS

COMPLEXITY_PENALTY=0.5, MAXIMUM_INPUT_ATTRIBUTES=255,MAXIMUM_OUTPUT_ATTRIBUTES=255,MINIMUM_SUPPORT=10,SCORE_METHOD=4,SPLIT_METHOD=3,FORCE_REGRESSOR=

Sample Query 2: Returning Details about Trees in the Model by Using DMX

The following query returns some basic information about the decision tree model that you created in the Basic Data Mining Tutorial. This query returns all the nodes of type 2, which are the top level nodes of a tree that represents a particular predictable attribute. Because this model contains a single predictable attribute, there is only one tree node. However, if you create an association model by using the Decision Trees algorithm, there might be hundreds of trees, one for each product.

Note

The column, CHILDREN_CARDINALITY, must be enclosed in brackets to distinguish it from the MDX reserved keyword of the same name.

SELECT MODEL_NAME, NODE_NAME, NODE_CAPTION, 
NODE_SUPPORT, [CHILDREN_CARDINALITY]
FROM TM_DecisionTrees.CONTENT
WHERE NODE_TYPE = 2

Example results:

MODEL_NAME	NODE_NAME	NODE_CAPTION	NODE_SUPPORT	CHILDREN_CARDINALITY
TM_DecisionTree	000000001	All	12939	5

In a decision trees model, the cardinality of a particular node tells you how many immediate children that node has. Therefore, the cardinality for the node returned by the previous query tells you that the model divided the target population of potential bike buyers into 5 subgroups. The following related query returns the children for these five subgroups, together with the distribution of attributes and values in the child nodes. Because statistics such as support, probability, and variance are stored in the nested table, NODE_DISTRIBUTION, this example uses the FLATTENED keyword to output the nested table columns.

Note

The nested table column, SUPPORT, must be enclosed in brackets to distinguish it from the reserved keyword of the same name.

SELECT FLATTENED NODE_NAME, NODE_CAPTION,
(SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT]
FROM NODE_DISTRIBUTION) AS t
FROM TM_DecisionTree.CONTENT
WHERE [PARENT_UNIQUE_NAME] = '000000001'

Example results:

NODE_NAME	NODE_CAPTION	T.ATTRIBUTE_NAME	T.ATTRIBUTE_VALUE	SUPPORT
00000000100	Number Cars Owned = 0	Bike Buyer	Missing	0
00000000100	Number Cars Owned = 0	Bike Buyer	0	1067
00000000100	Number Cars Owned = 0	Bike Buyer	1	1875
00000000101	Number Cars Owned = 3	Bike Buyer	Missing	0
00000000101	Number Cars Owned = 3	Bike Buyer	0	678
00000000101	Number Cars Owned = 3	Bike Buyer	1	473

From these results, you can tell that customers who bought a bike ([Bike Buyer] = 1), 1067 customers had 0 cars and 473 customers had 3 cars.

Suppose you wanted to discover more about the customers who did buy a bike. You can view additional detail for any of the sub-trees by using the IsDescendant (DMX) function in the query, as shown in the following example. The query returns the count of bike purchasers by retrieving leaf nodes (NODE_TYPE = 4) from the tree that contains customers who are over 42 years of age. The query restricts rows from the nested table to those where Bike Buyer = 1.

SELECT FLATTENED NODE_NAME, NODE_CAPTION,NODE_TYPE,
(
SELECT [SUPPORT] FROM NODE_DISTRIBUTION WHERE ATTRIBUTE_NAME = 'Bike Buyer' AND ATTRIBUTE_VALUE = '1'
) AS t
FROM TM_DecisionTree.CONTENT
WHERE ISDESCENDANT('0000000010001')
AND NODE_TYPE = 4

Example results:

NODE_NAME	NODE_CAPTION	t.SUPPORT
000000001000100	Yearly Income >= 26000 and < 42000	266
00000000100010100	Total Children = 3	75
0000000010001010100	Number Children At Home = 1	75

Sample Query 3: Returning Regression Nodes from a Decision Tree Model

When you create a decision tree model that contains a regression on a continuous attribute, you can use the regression formula to make predictions, or you can extract information about the regression formula. For more information about queries on regression models, see Querying a Linear Regression Model (Analysis Services - Data Mining).

If the model contains a mixture of regression nodes and nodes that split on discrete attributes or ranges, you can create a query that returns only the regression node. The NODE_DISTRIBUTION table contains the details of the regression formula. In this example, the columns are flattened and the NODE_DISTRIBUTION table is aliased for easier viewing. However, in this model, no regressors were found to relate Income with other continuous attributes. In such cases, Analysis Services returns the mean value of the attribute and the total variance in the model for that attribute.

SELECT FLATTENED NODE_DISTRIBUTION AS t
FROM DT_Predict. CONTENT
WHERE NODE_TYPE = 25

Example results:

t.ATTRIBUTE_NAME	t.ATTRIBUTE_VALUE	t.SUPPORT	t.PROBABILITY	t.VARIANCE	t.VALUETYPE
Yearly Income	Missing	0	0.000457142857142857	0	1
Yearly Income	57220.8876687257	17484	0.999542857142857	1041275619.52776	3
	57220.8876687257	0	0	1041216662.54387	11

For more information about the value types and the statistics used in regression models, see Mining Model Content for Linear Regression Models (Analysis Services - Data Mining).

Making Predictions Using the Model

Because decision trees can be used for various tasks, including classification, regression, and even association, when you create a prediction query on a decision tree model you have many options available to you. You must understand the purpose for which the model was created to understand the results of prediction. The following query samples illustrate three different scenarios:

Returning a prediction for a classification model, together with the probability of the prediction being correct, and then filtering the results by the probability;
Creating a singleton query to predict associations;
Retrieving the regression formula for a part of a decision tree where the relationship between the input and output is linear.

Sample Query 4: Returning Predictions and Probabilities

The following sample query uses the decision tree model that was created in the Basic Data Mining Tutorial and passes the sample data table in AdventureWorks DW, dbo.ProspectiveBuyers into the model to predict which of the customers in a list will purchase a bike. The query includes the prediction function PredictHistogram (DMX), which returns a nested table that contains useful information about the probabilities discovered by the model. The predictions are filtered, by adding a WHERE clause, to return only those who are predicted as likely bike buyers with a probability greater than 50%.

SELECT
  [TM_DecisionTree].[Bike Buyer],
  PredictHistogram([Bike Buyer]) as Results
FROM
  [TM_DecisionTree]
PREDICTION JOIN
  OPENQUERY([Adventure Works DW],
    'SELECT
      [FirstName],
      [LastName],
      [MaritalStatus],
      [Gender],
      [YearlyIncome],
      [TotalChildren],
      [NumberChildrenAtHome],
      [HouseOwnerFlag],
      [NumberCarsOwned]
    FROM
      [dbo].[ProspectiveBuyer]
    ') AS t
ON
  [TM_DecisionTree].[First Name] = t.[FirstName] AND
  [TM_DecisionTree].[Last Name] = t.[LastName] AND
  [TM_DecisionTree].[Marital Status] = t.[MaritalStatus] AND
  [TM_DecisionTree].[Gender] = t.[Gender] AND
  [TM_DecisionTree].[Yearly Income] = t.[YearlyIncome] AND
  [TM_DecisionTree].[Total Children] = t.[TotalChildren] AND
  [TM_DecisionTree].[Number Children At Home] = t.[NumberChildrenAtHome] AND
  [TM_DecisionTree].[House Owner Flag] = t.[HouseOwnerFlag] AND
  [TM_DecisionTree].[Number Cars Owned] = t.[NumberCarsOwned]
WHERE [Bike Buyer] = 1
AND PredictProbability([Bike Buyer]) >'.05'

By default, Analysis Services returns nested tables with the column label Expression. You can change this label by aliasing the column that is returned. If you do this, the alias (in this case, Results) is used as both the column heading and as the value in the nested table. You must expand the nested table to see the results.

Example results:

Bike Buyer	Results
1	Results Bike Buyer$SUPPORT$PROBABILITY$ADJUSTEDPROBABILITY$VARIANCE$STDEV 125400.6348492420456440.01356216828156200 014600.3649841745793770.0066133693255091500 00.0001665833749791770.00016658337497917700

If your provider does not support hierarchical rowsets such as the results shown here, you can use the FLATTENED keyword in the query to return the results as a table that contains nulls in place of the repeated column values. For more information, see Nested Tables (Analysis Services - Data Mining) or Understanding the Select Statement (DMX).

Sample Query 5: Predicting Associations from a Decision Tree

To use the following query, you can create a copy of the association model in the tutorial, and change the algorithm to the Decision Trees algorithm. For more information, see Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial).

The following sample query is a singleton query, which you can create easily in Business Intelligence Development Studio by choosing fields and then selecting values for those fields from a drop-down list.

SELECT PredictAssociation([DT_Association].[v Assoc Seq Line Items],3)
FROM
  [DT_Association]
NATURAL PREDICTION JOIN
(SELECT (SELECT 'Patch kit' AS [Model]) AS [v Assoc Seq Line Items]) AS t

Expected results:

Model
Mountain-200
Mountain Tire Tube
Touring Tire Tube

The results tell you the three best products to recommend to customers who have purchased the Patch Kit product. You can also provide multiple products as input when you make recommendations, either by typing in values, or by using the Singleton Query Input dialog box and adding or removing values. The following sample query shows how the multiple values are provided, upon which to make a prediction. Values are connected by a UNION clause in the SELECT statement that defines the input values.

SELECT PredictAssociation([DT_Association].[v Assoc Seq Line Items],3)
From
  [DT_Association]
NATURAL PREDICTION JOIN
(SELECT (SELECT 'Racing Socks' AS [Model]
  UNION SELECT 'Women''s Mountain Shorts' AS [Model]) AS [v Assoc Seq Line Items]) AS t

Expected results:

Model
Long-Sleeve Logo Jersey
Mountain-400-W
Classic Vest

List of Prediction Functions

All Microsoft algorithms support a common set of functions. However, the Microsoft Decision Trees algorithm supports the additional functions listed in the following table.

IsDescendant (DMX)	PredictNodeId (DMX)
IsInNode (DMX)	PredictProbability (DMX)
PredictAdjustedProbability (DMX)	PredictStdev (DMX)
PredictAssociation (DMX)	PredictSupport (DMX)
PredictHistogram (DMX)	PredictVariance (DMX)

For a list of the functions that are common to all Microsoft algorithms, see Mapping Functions to Query Types (DMX). For the syntax of specific functions, see Data Mining Extensions (DMX) Function Reference.

Change History

Updated content
Added in-topic links to make it easier to review query samples.
Moved regression node sample to the appropriate section, on content queries.

Querying a Decision Trees Model (Analysis Services - Data Mining)

Finding Information about a Decision Trees Model

Sample Query 1: Retrieving Model Parameters from the Data Mining Schema Rowset

Sample Query 2: Returning Details about Trees in the Model by Using DMX

Sample Query 3: Returning Regression Nodes from a Decision Tree Model

Making Predictions Using the Model

Sample Query 4: Returning Predictions and Probabilities

Sample Query 5: Predicting Associations from a Decision Tree

List of Prediction Functions

See Also

Reference

Concepts

Change History

Additional resources