Querying an Association Model (Analysis Services - Data Mining)

When you create a query against a data mining model, you can create either a content query, which provides details about the rules and itemsets discovered during analysis, or you can create a prediction query, which uses the associations discovered in the data to make predictions. For an association model, predictions typically are based on rules, and can be used to make recommendations, whereas queries on content typically explore the relationship among itemsets. You can also retrieve metadata about the model.

This section explains how to create these kinds of queries for models that are based on the Microsoft Association Rules algorithm.

Content Queries

Getting model metadata data by using DMX

Getting metadata from the schema rowset

Retrieving the original parameters for the model

Retrieving a list of itemsets and products

Returning the top 10 itemsets

Prediction Queries

Predicting associated items

Determining confidence for related itemsets

Finding Information about the Model

All mining models expose the content learned by the algorithm according to a standardized schema, which is named the mining model schema rowset. You can create queries against the mining model schema rowset either by using Data Mining Extensions (DMX) statements, or by using Analysis Services stored procedures. In SQL Server 2008, you can also query the schema rowsets directly as system tables, by using a SQL-like syntax.

Sample Query 1: Getting Model Metadata by Using DMX

The following query returns basic metadata about the association model, Association, such as the name of the model, the database where the model is stored, and the number of child nodes in the model. This query uses a DMX content query to retrieve the metadata from the parent node of the model:

SELECT MODEL_CATALOG, MODEL_NAME, NODE_CAPTION, 
NODE_SUPPORT, [CHILDREN_CARDINALITY], NODE_DESCRIPTION
FROM Association.CONTENT
WHERE NODE_TYPE = 1

Note

You must enclose the name of the column, CHILDREN_CARDINALITY, in brackets to distinguish it from the MDX reserved keyword of the same name.

Example results:

MODEL_CATALOG

Association Test

MODEL_NAME

Association

NODE_CAPTION

Association Rules Model

NODE_SUPPORT

14879

CHILDREN_CARDINALITY

942

NODE_DESCRIPTION

Association Rules Model; ITEMSET_COUNT=679; RULE_COUNT=263; MIN_SUPPORT=14; MAX_SUPPORT=4334; MIN_ITEMSET_SIZE=0; MAX_ITEMSET_SIZE=3; MIN_PROBABILITY=0.400390625; MAX_PROBABILITY=1; MIN_LIFT=0.14309369632511; MAX_LIFT=1.95758227647523

For a definition of what these columns mean in an association model, see Mining Model Content for Association Models (Analysis Services - Data Mining).

Return to Top

Sample Query 2: Getting Additional Metadata from the Schema Rowset

By querying the data mining schema rowset, you can find the same information that is returned in a DMX content query. However, the schema rowset provides some additional columns, such as the date the model was last processed, the mining structure, and the name of the column used as the predictable attribute.

SELECT MODEL_CATALOG, MODEL_NAME, SERVICE_NAME, PREDICTION_ENTITY, 
MINING_STRUCTURE, LAST_PROCESSED
FROM $system.DMSCHEMA_MINING_MODELS
WHERE MODEL_NAME = 'Association'

Example results:

MODEL_CATALOG

AdventureWorks DW

MODEL_NAME

Association

SERVICE_NAME

Association Rules Model

PREDICTION_ENTITY

v Assoc Seq Line Items

MINING_STRUCTURE

Association

LAST_PROCESSED

9/29/2007 10:21:24 PM

Return to Top

Sample Query 3: Retrieving Original Parameters for Model

The following query returns a single column that contains details about the parameter settings that were used when the model was created.

SELECT MINING_PARAMETERS 
from $system.DMSCHEMA_MINING_MODELS
WHERE MODEL_NAME = 'Association'

Example results:

MAXIMUM_ITEMSET_COUNT=200000,MAXIMUM_ITEMSET_SIZE=3,MAXIMUM_SUPPORT=1,MINIMUM_SUPPORT=9.40923449156529E-04,MINIMUM_IMPORTANCE=-999999999,MINIMUM_ITEMSET_SIZE=0,MINIMUM_PROBABILITY=0.4

Return to Top

Finding Information about Rules and Itemsets

There are two common uses of an association model: to discover information about frequent itemsets, and to extract details about particular rules and itemsets. For example, you might want to extract a list of rules that were scored as being especially interesting, or create a list of the most common itemsets. You retrieve such information by using a DMX content query. You can also browse this information by using the Microsoft Association Viewer.

Sample Query 4: Retrieving List of Itemsets and Products

The following query retrieves all of the itemsets, together with a nested table that lists the products included in each itemset. The NODE_NAME column contains the unique ID of the itemset within the model, whereas the NODE_CAPTION provides a text description of the items. In this example, the nested table is flattened, so that an itemset that contains two products generates two rows in the results. You can omit the FLATTENED keyword if your client supports hierarchical data.

SELECT FLATTENED NODE_NAME, NODE_CAPTION,
NODE_PROBABILITY, NODE_SUPPORT,
(SELECT ATTRIBUTE_NAME FROM NODE_DISTRIBUTION) as PurchasedProducts
FROM Association.CONTENT
WHERE NODE_TYPE = 7

Example results:

NODE_NAME

37

NODE_CAPTION

Sport-100 = Existing

NODE_PROBABILITY

0.291283016331743

NODE_SUPPORT

4334

PURCHASEDPRODUCTS.ATTRIBUTE_NAME

v Assoc Seq Line Items(Sport-100)

Return to Top

Sample Query 5: Returning Top 10 Itemsets

This example demonstrates how to use some of the grouping and ordering functions that DMX provides by default. The query returns the top 10 itemsets when ordered by the support for each node. Note that you do not need to explicitly group the results, as you would in Transact-SQL; however, you can use only one aggregate function in each query.

SELECT TOP 10 (NODE_SUPPORT),NODE_NAME, NODE_CAPTION
FROM Association.CONTENT
WHERE NODE_TYPE = 7

Example results:

NODE_SUPPORT

4334

NODE_NAME

37

NODE_CAPTION

Sport-100 = Existing

Return to Top

Making Predictions using the Model

An association rules model is often used to generate recommendations, which are based on correlations discovered in the itemsets. Therefore, when you create a prediction query based on an association rules model, you are typically using the rules in the model to make guesses based on new data. PredictAssociation (DMX) is the function that returns recommendations, and has several arguments that you can use to customize the query results.

Another example of where queries on an association model might be useful is to return the confidence for various rules and itemsets so that you can compare the effectiveness of different cross-sell strategies. The following examples illustrate how to create such queries.

Sample Query 6: Predicting Associated Items

This example uses the Association model created in the Basic Data Mining Tutorial. It demonstrates how to create a prediction query that tells you what products to recommend to a customer who has purchased a particular product. This type of query, where you provide values to the model in a SELECT…UNION statement, is called a singleton query. Because the predictable model column that corresponds to the new values is a nested table, you must use one SELECT clause to map the new value to the nested table column, [Model], and another SELECT clause to map the nested table column to the case-level column, [v Assoc Seq Line Items]. Adding the keyword INCLUDE-STATISTICS to the query lets you see the probability and support for the recommendations.

SELECT PredictAssociation([Association].[vAssocSeqLineItems],INCLUDE_STATISTICS, 3)
FROM [Association]
NATURAL PREDICTION JOIN 
(SELECT
(SELECT 'Classic Vest' as [Model])
AS [v Assoc Seq Line Items])
AS t

Example results:

Model

$SUPPORT

$PROBABILITY

$ADJUSTEDPROBABILITY

Sport-100

4334

0.291283

0.252696

Water Bottle

2866

0.19262

0.175205

Patch kit

2113

0.142012

0.132389

Return to Top

Whereas rules are useful for generating recommendations, itemsets are more interesting for deeper analysis of the patterns in the data set. For example, if you were not satisfied with the recommendation that are returned by the previous sample query, you could examine other itemsets that contain Product A, to can get a better idea of whether Product A is an accessory that people tend to buy with all kinds of products, or whether A is strongly correlated with purchases of particular products. The easiest way to explore these relationships is by filtering the itemsets in the Microsoft Association Viewer; however, you can retrieve the same information with a query.

The following sample query returns all itemsets that include the Water Bottle item, including the single item Water bottle.

SELECT TOP 100 FROM 
(
SELECT FLATTENED NODE_CAPTION, NODE_SUPPORT, 
(SELECT ATTRIBUTE_NAME from NODE_DISTRIBUTION
WHERE ATTRIBUTE_NAME = 'v Assoc Seq Line Items(Water Bottle)') as D
FROM Association.CONTENT
WHERE NODE_TYPE = 7
) AS Items
WHERE [D.ATTRIBUTE_NAME] <> NULL
ORDER BY NODE_SUPPORT DESC

Example results:

NODE_CAPTION

NODE_SUPPORT

D.ATTRIBUTE_NAME

Water Bottle = Existing

2866

v Assoc Seq Line Items(Water Bottle)

Mountain Bottle Cage = Existing, Water Bottle = Existing

1136

v Assoc Seq Line Items(Water Bottle)

Road Bottle Cage = Existing, Water Bottle = Existing

1068

v Assoc Seq Line Items(Water Bottle)

Water Bottle = Existing, Sport-100 = Existing

734

v Assoc Seq Line Items(Water Bottle)

This query returns both the rows from the nested table that match the criteria, and all the rows from the outside or case table. Therefore, you must add a condition that eliminates the case table rows that have a null value for the target attribute name.

Return to Top

Function List

All Microsoft algorithms support a common set of functions. However, the Microsoft Association algorithm supports the additional functions listed in the following table.