Article
10/01/2010

Querying a Naive Bayes Model (Analysis Services - Data Mining)

When you create a query against a data mining model, you can create either a content query, which provides details about the patterns discovered in analysis, or you can create a prediction query, which uses the patterns in the model to make predictions for new data. You can also retrieve metadata about the model by using a query against the data mining schema rowset. This section explains how to create these queries for models that are based on the Microsoft Naive Bayes algorithm.

Content Queries

Getting Model Metadata by Using DMX

Retrieving a Summary of Training Data

Finding More Information about Attributes

Using System Stored Procedures
Prediction Queries

Predicting Outcomes using a Singleton Query

Returning Predictions with Probability and Support

Predicting Associations

Finding Information about the Naive Bayes Model

The model content of a Naive Bayes model provides aggregated information about the distribution of values in the training data. You can also retrieve information about the metadata of the model by creating queries against the data mining schema rowsets.

Sample Query 1: Getting Model Metadata by Using DMX

By querying the data mining schema rowset, you can find metadata for the model. This might include when the model was created, when the model was last processed, the name of the mining structure that the model is based on, and the name of the columns used as the predictable attribute. You can also return the parameters that were used when the model was created.

SELECT MODEL_CATALOG, MODEL_NAME, DATE_CREATED, LAST_PROCESSED,
SERVICE_NAME, PREDICTION_ENTITY, FILTER
FROM $system.DMSCHEMA_MINING_MODELS
WHERE MODEL_NAME = 'TM_NaiveBayes_Filtered'

Sample results:

MODEL_CATALOG	AdventureWorks
MODEL_NAME	TM_NaiveBayes_Filtered
DATE_CREATED	3/1/2008 19:15
LAST_PROCESSED	3/2/2008 20:00
SERVICE_NAME	Microsoft_Naive_Bayes
PREDICTION_ENTITY	Bike Buyer,Yearly Income
FILTER	[Region] = 'Europe' OR [Region] = 'North America'

The model used for this example is based on the Naive Bayes model you create in the Basic Data Mining Tutorial, but was modified by adding a second predictable attribute and applying a filter to the training data.

Sample Query 2: Retrieving a Summary of Training Data

In a Naive Bayes model, the marginal statistics node stores aggregated information about the distribution of values in the training data. This summary is convenient and saves you from having to create SQL queries against the training data to find the same information.

The following example uses a DMX content query to retrieve the data from the node (NODE_TYPE = 24). Because the statistics are stored in a nested table, the FLATTENED keyword is used to make the results easier to view.

SELECT FLATTENED MODEL_NAME,
(SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE, [SUPPORT], [PROBABILITY], VALUETYPE FROM NODE_DISTRIBUTION) AS t
FROM TM_NaiveBayes.CONTENT
WHERE NODE_TYPE = 26

Note

You must enclose the name of the columns, SUPPORT and PROBABILITY, in brackets to distinguish them from the Multidimensional Expressions (MDX) reserved keywords of the same names.

Partial results:

MODEL_NAME	t.ATTRIBUTE_NAME	t.ATTRIBUTE_VALUE	t.SUPPORT	t.PROBABILITY	t.VALUETYPE
TM_NaiveBayes	Bike Buyer	Missing	0	0	1
TM_NaiveBayes	Bike Buyer	0	8869	0.507263784	4
TM_NaiveBayes	Bike Buyer	1	8615	0.492736216	4
TM_NaiveBayes	Gender	Missing	0	0	1
TM_NaiveBayes	Gender	F	8656	0.495081217	4
TM_NaiveBayes	Gender	M	8828	0.504918783	4

For example, these results tell you the number of training cases for each discrete value (VALUETYPE = 4), together with the computed probability, adjusted for missing values (VALUETYPE = 1).

For a definition of the values provided in the NODE_DISTRIBUTION table in a Naive Bayes model, see Mining Model Content for Naive Bayes Models (Analysis Services - Data Mining). For more information about how support and probability calculations are affected by missing values, see Missing Values (Analysis Services - Data Mining).

Sample Query 3: Finding More Information about Attributes

Because a Naive Bayes model often contains complex information about the relationships among different attributes, the easiest way to view these relationships is to use the Microsoft Naive Bayes Viewer. However, you can create DMX queries to return the data.

The following example shows how to return information from the model about a particular attribute, Region.

SELECT NODE_TYPE, NODE_CAPTION, 
NODE_PROBABILITY, NODE_SUPPORT, MSOLAP_NODE_SCORE
FROM TM_NaiveBayes.CONTENT
WHERE ATTRIBUTE_NAME = 'Region'

This query returns two types of nodes: the node that represents the input attribute (NODE_TYPE = 10), and nodes for each value of the attribute (NODE_TYPE = 11). The node caption is used to identify the node, rather than the node name, because the caption shows both the attribute name and attribute value.

NODE_TYPE	NODE_CAPTION	NODE_PROBABILITY	NODE_SUPPORT	MSOLAP_NODE_SCORE	NODE_TYPE
10	Bike Buyer -> Region	1	17484	84.51555875	10
11	Bike Buyer -> Region = Missing	0	0	0	11
11	Bike Buyer -> Region = North America	0.508236102	8886	0	11
11	Bike Buyer -> Region = Pacific	0.193891558	3390	0	11
11	Bike Buyer -> Region = Europe	0.29787234	5208	0	11

Some of the columns stored in the nodes are the same that you can get from the marginal statistics nodes, such as the node probability score and the node support values. However, MSOLAP_NODE_SCORE is a special value provided only for the input attribute nodes, and indicates the relative importance of this attribute in the model. You can see much the same information in the Dependency Network pane of the viewer; however, the viewer does not provide scores.

The following query returns the importance scores of all attributes in the model:

SELECT NODE_CAPTION, MSOLAP_NODE_SCORE
FROM TM_NaiveBayes.CONTENT
WHERE NODE_TYPE = 10
ORDER BY MSOLAP_NODE_SCORE DESC

Sample results:

NODE_CAPTION	MSOLAP_NODE_SCORE
Bike Buyer -> Total Children	181.3654836
Bike Buyer -> Commute Distance	179.8419482
Bike Buyer -> English Education	156.9841928
Bike Buyer -> Number Children At Home	111.8122599
Bike Buyer -> Region	84.51555875
Bike Buyer -> Marital Status	23.13297354
Bike Buyer -> English Occupation	2.832069191

By browsing the model content in the Microsoft Generic Content Tree Viewer, you will get a better idea of what statistics might be interesting. Some simple examples were demonstrated here; more often you may need to execute multiple queries or store the results and process them on the client.

Sample Query 4: Using System Stored Procedures

In addition to writing your own content queries, you can use some Analysis Services system stored procedures to explore the results. To use a system stored procedure, prefix the stored procedure name with the CALL keyword:

CALL GetPredictableAttributes ('TM_NaiveBayes')

Partial results:

ATTRIBUTE_NAME	NODE_UNIQUE_NAME
Bike Buyer	100000001

Note

These system stored procedures are for internal communication between the Analysis Services server and the client and should only be used for convenience when developing and testing mining models. When you create queries for a production system, you should always write your own queries by using DMX, AMO, or XMLA.

For more information about Analysis Services system stored procedures, see Data Mining Stored Procedures (Analysis Services - Data Mining).

Making Predictions Using the Model

The Microsoft Naive Bayes algorithm is typically used less for prediction than it is for exploration of relationships among the input and predictable attributes. However, the model supports the use of prediction functions for both prediction and association.

Sample Query 5: Predicting Outcomes using a Singleton Query

The following query uses a singleton query to provide a new value and predict, based on the model, whether a customer with these characteristics is likely to buy a bike. The easiest way to create a singleton query on a regression model is by using the Singleton Query Input dialog box. For example, you can build the following DMX query by selecting the TM_NaiveBayes model, choosing Singleton Query, and selecting values from the dropdown lists for [Commute Distance] and Gender.

SELECT
  Predict([TM_NaiveBayes].[Bike Buyer])
FROM
  [TM_NaiveBayes]
NATURAL PREDICTION JOIN
(SELECT '5-10 Miles' AS [Commute Distance],
  'F' AS [Gender]) AS t

Example results:

Expression
0

The prediction function returns the most likely value, in this case, 0, which means this type of customer is unlikely to purchase a bike.

Sample Query 6: Returning Predictions with Probability and Support

In addition to predicting an outcome, you often want to know how strong the prediction is. The following query uses the same singleton query as the previous example, but adds the prediction function, PredictHistogram (DMX), to return a nested table that contains statistics in support of the prediction.

SELECT
  Predict([TM_NaiveBayes].[Bike Buyer]),
  PredictHistogram([TM_NaiveBayes].[Bike Buyer])
FROM
  [TM_NaiveBayes]
NATURAL PREDICTION JOIN
(SELECT '5-10 Miles' AS [Commute Distance],
  'F' AS [Gender]) AS t

Example results:

Bike Buyer	$SUPPORT	$PROBABILITY	$ADJUSTEDPROBABILITY
0	10161.5714	0.581192599	0.010530981
1	7321.428768	0.418750215	0.008945684
	0.999828444	5.72E-05	5.72E-05

The final row in the table shows the adjustments to support and probability for the missing value. Variance and standard deviation values are always 0, because Naive Bayes models cannot model continuous values.

Sample Query 7: Predicting Associations

The Microsoft Naive Bayes algorithm can be used for association analysis, if the mining structure contains a nested table with the predictable attribute as the key. For example, you could build a Naive Bayes model by using the mining structure created in Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial) of the data mining tutorial. The model used in this example was modified to add information about income and customer region in the case table.

The following query example shows a singleton query that predicts products that are related to purchases of the product, 'Road Tire Tube'. You might use this information to recommend products to a specific type of customer.

SELECT   PredictAssociation([Association].[v Assoc Seq Line Items])
FROM [Association_NB]
NATURAL PREDICTION JOIN
(SELECT 'High' AS [Income Group],
  'Europe' AS [Region],
  (SELECT 'Road Tire Tube' AS [Model]) 
AS [v Assoc Seq Line Items]) 
AS t

Partial results:

Model
Women's Mountain Shorts
Water Bottle
Touring-3000
Touring-2000
Touring-1000

Function List

All Microsoft algorithms support a common set of functions. However, the Microsoft Naive Bayes algorithm supports the additional functions that are listed in the following table.

Predict (DMX)	IsDescendant (DMX)
PredictAssociation (DMX)	PredictProbability (DMX)
PredictHistogram (DMX)	PredictNodeId (DMX)
PredictAdjustedProbability (DMX)	PredictSupport (DMX)

For a list of the functions that are common to all Microsoft algorithms, see Algorithm Reference (Analysis Services - Data Mining). For the syntax of specific functions, see Data Mining Extensions (DMX) Function Reference.

Change History

Updated content
Added navigation links to make it easier to review query samples.

Querying a Naive Bayes Model (Analysis Services - Data Mining)

Finding Information about the Naive Bayes Model

Sample Query 1: Getting Model Metadata by Using DMX

Sample Query 2: Retrieving a Summary of Training Data

Sample Query 3: Finding More Information about Attributes

Sample Query 4: Using System Stored Procedures

Making Predictions Using the Model

Sample Query 5: Predicting Outcomes using a Singleton Query

Sample Query 6: Returning Predictions with Probability and Support

Sample Query 7: Predicting Associations

Function List

See Also

Reference

Concepts

Change History

Additional resources