Exploring the Market Basket Models (Data Mining Tutorial)

Now that you have built the Association model, you can explore it by using the Microsoft Association Viewer in the Mining Model Viewer tab of Data Mining Designer. When you explore the model, you can easily see which products tend to appear together, and explore the relationships between items. You can also filter out the weaker associations, and get a general idea of the emerging patterns.

The Microsoft Association Viewer contains three tabs: Itemsets, Rules, and Dependency Network. For more information about this viewer, see Viewing a Mining Model with the Microsoft Association Rules Viewer.

Itemsets Tab

The Itemsets tab displays three important pieces of information that relate to the itemsets that the Microsoft Association algorithm discovers: the support, which is the number of transactions in which the itemset occurs; the size, which is the number items that are in the itemset; and the actual makeup of the itemset. Depending on how the algorithm parameters are set, the algorithm can generate a large number of itemsets. By using the controls at the top of the Itemsets tab, you can filter the viewer to show only itemsets that contain a specified minimum support and itemset size.

You can also use the Filter Itemset box to filter the itemsets that are shown in the viewer. For example, to see only the itemsets that contain information about the Mountain-200 bicycle, enter Mountain-200 in Filter Itemset. As you can see in the viewer, only itemsets that contain the words "Mountain-200" are displayed. Each itemset that is returned in the viewer contains information about transactions in which a Mountain-200 bicycle was sold. For example, the itemset that contains the value 710 in the Support column indicates that out of all the transactions, 710 people who bought the Mountain-200 bicycle also bought the Sport-100 bicycle.

Rules Tab

The Rules tab displays the following information that is related to the rules that the algorithm finds.

  • Probability
    The likelihood of a rule occurring.
  • Importance
    A measure of the usefulness of a rule; a higher value means a better rule. Just looking at the probability can be misleading. For example, if every transaction contains an item x, the rule y predicts that x has a probability of 1, meaning that x will always occur. Even though the accuracy of the rule is very good, it does not relay very much information, because every transaction contains x regardless of y.
  • Rule
    The definition of the rule.

As with the Itemsets tab, you can filter the rules so that only the most interesting rules are shown. For example, if you want to see only the rules that include the Mountain-200 bicycle, enter Mountain-200 in the Filter Rule box. The viewer will then display only the rules that contain the words "Mountain-200". Each rule can be used to predict the presence of an item in a transaction based on the presence of other items. For example, the first rule tells you that when someone buys a Mountain-200 bicycle and a water bottle, there is a probability of 1 that this person will also buy a Mountain bottle cage.

Dependency Network Tab

With the Dependency Network tab, you can investigate the interaction of the different items in the model. Each node in the viewer represents an item; for example, the Mountain-200 = Existing node indicates that Mountain-200 exists in a transaction. By selecting a node, you can use the color legend at the bottom of the tab to determine which other items either determine other items in the model or are determined by other items in the model.

The slider is associated with the probability of a rule. Move the slider up or down to filter out weak associations. For example, in the Show box, select Show attribute name only, and then click the Mountain Bottle Cage node. The viewer shows that the Mountain bottle cage both predicts and is predicted by the water bottle and the Mountain-200 bicycle. This means that these items are likely to show up in a transaction together. In other words, if a customer buys a bike, this same customer is also likely to buy a water bottle cage and water bottle.

Next Lesson

Lesson 5: Building the Sequence Clustering Scenario