Exploring the Sequence Clustering Models (Data Mining Tutorial)

Now that you have built the Sequence Clustering model, you can explore it by using the Microsoft Sequence Clustering Viewer in the Mining Model Viewer tab of Data Mining Designer. The Microsoft Sequence Clustering Viewer contains five tabs: Cluster Diagram, Cluster Profiles, Cluster Characteristics, ClusterDiscrimination, and State Transitions. For more information about how to use this viewer, see Viewing a Mining Model with the Microsoft Sequence Cluster Viewer.

Cluster Diagram Tab

The Cluster Diagram tab graphically displays the clusters that the algorithm discovered in the database. The layout in the diagram represents the relationships of the clusters, where similar clusters are grouped close together. By default, the shade of the node color represents the density of all cases in the cluster—the darker the node, the more cases it contains. You can change the meaning of the shading of the nodes so that it represents an attribute and state. For example, select Model in the Shading Variable list, and select Cycling Cap in the State list. The cluster diagram shows that Cluster 9 contains the highest density of cycling caps.

Cluster Profiles Tab

The Cluster Profiles tab displays the sequences that exist in each cluster. The clusters are listed in individual columns to the right of the States column.

In the viewer, the Model.samples row represents sequence data, and the Model row describes the overall distribution of items in a cluster. Each line of the color sequences in each cell of the Model.samples row represents the behavior of a randomly selected user in the cluster. Each color in an individual sequence histogram represents a product model.

For example, the aqua color in cluster 3 represents the Mountain-200 bicycle. Its presence as the first color in most of the sequences indicates that a customer is very likely to put the Mountain-200 bike in the shopping basket first.

Cluster Characteristics Tab

The Cluster Characteristics tab summarizes the transitions between states in a cluster, with bars describing the importance of the attribute value for the selected cluster. For example, in Cluster 10, one of the most important profiles is that customers tend to put an ML Mountain tire in their shopping basket first.

Cluster Discrimination Tab

With the Cluster Discrimination tab, you can compare two clusters, to determine which models favor which clusters. The tab contains four columns: Variables, Values, Cluster 1, and Cluster 2. If the cluster favors a specific model, a blue bar appears in the Cluster 1 or Cluster 2 column in the row of the corresponding model in the Variables column. The longer the blue bar, the more the model favors the cluster.

For example, use the Cluster Discrimination tab in the viewer to compare Cluster 2 with Cluster 5 by selecting Cluster 2 in Cluster 1 and Cluster 5 in Cluster 2. A customer who purchases a bottle cage for a mountain bike, as indicated by Mountain Bottle Cage in the Values column, is more likely to be in Cluster 5, and a customer who purchases a Touring tire, as indicated by Touring Tire in the Values column, is more likely to be grouped into Cluster 2.

State Transitions Tab

On the State Transitions tab, you can select a cluster and browse through its state transitions. Each node represents a state of the model, such as Mountain-200. A line represents the transition between states, and each node is based on the probability of a transition. The background color represents the frequency of the node in the cluster.

For example, select Cluster 3 from Cluster, select the Touring-3000 node, and lower the All Links slider several spaces. As you can see in the viewer, if a customer puts a Touring Tire into the shopping basket, there is a probability of 0.63, as indicated by the blue arrow, that the customer will next put a Touring Tire Tube into the basket, and there is a probability of 0.26 that the customer will also place a Sport 100 bike into the shopping basket.