Exploring the Decision Tree Model (Basic Data Mining Tutorial)
The Microsoft Decision Trees algorithm predicts which columns influence the decision to purchase a bike based upon the remaining columns in the training set.
The Microsoft Decision Tree Viewer provides the following tabs for use in exploring decision tree mining models:
The following sections describe how to select the appropriate viewer and explore the other mining models.
On the Decision Tree tab, you can examine all the tree models that make up a mining model.
Because the targeted mailing model in this tutorial project contains only a single predictable attribute, Bike Buyer, there is only one tree to view. If there were more trees, you could use the Tree box to choose another tree.
Reviewing the TM_Decision_Tree model in the Decision Tree viewer reveals that age is the single most important factor in predicting bike buying. Interestingly, once you group the customers by age, the next branch of the tree is different for each age node. By exploring the Decision Tree tab we can conclude that purchasers age 34 to 40 with one or no cars are very likely to purchase a bike, and that single, younger customers who live in the Pacific region and have one or no cars are also very likely to purchase a bike.
To explore the model in the Decision Tree tab
Select the Mining Model Viewer tab in Data Mining Designer.
By default, the designer opens to the first model that was added to the structure -- in this case, TM_Decision_Tree.
Use the magnifying glass buttons to adjust the size of the tree display.
By default, the Microsoft Tree Viewer shows only the first three levels of the tree. If the tree contains fewer than three levels, the viewer shows only the existing levels. You can view more levels by using the Show Level slider or the Default Expansion list.
Slide Show Level to the fourth bar.
Change the Background value to 1.
By changing the Background setting, you can quickly see the number of cases in each node that have the target value of 1 for [Bike Buyer]. Remember that in this particular scenario, each case represents a customer. The value 1 indicates that the customer previously purchased a bike; the value 0 indicates that the customer has not purchased a bike. The darker the shading of the node, the higher the percentage of cases in the node that have the target value.
Place your cursor over the node labeled All. An tooltip will display the following information:
Total number of cases
Number of non bike buyer cases
Number of bike buyer cases
Number of cases with missing values for [Bike Buyer]
Alternately, place your cursor over any node in the tree to see the condition that is required to reach that node from the node that comes before it. You can also view this same information in the Mining Legend.
Click on the node for Age >=34 and < 41. The histogram is displayed as a thin horizontal bar across the node and represents the distribution of customers in this age range who previously did (pink) and did not (blue) purchase a bike. The Viewer shows us that customers between the ages of 34 and 40 with one or no cars are likely to purchase a bike. Taking it one step further, we find that the likelihood to purchase a bike increases if the customer is actually age 38 to 40.
Because you enabled drillthrough when you created the structure and model, you can retrieve detailed information from the model cases and mining structure, including those columns that were not included in the mining model (e.g., emailAddress, FirstName).
For more information, see Drillthrough Queries (Data Mining).
To drill through to case data
Right-click a node, and select Drill Through then Model Columns Only.
The details for each training case are displayed in spreadsheet format. These details come from the vTargetMail view that you selected as the case table when building the mining structure.
Right-click a node, and select Drill Through then Model and Structure Columns.
The same spreadsheet displays with the structure columns appended to the end.
Dependency Network Tab
The Dependency Network tab displays the relationships between the attributes that contribute to the predictive ability of the mining model. The Dependency Network viewer reinforces our findings that Age and Region are important factors in predicting bike buying.
To explore the model in the Dependency Network tab
Click the Bike Buyer node to identify its dependencies.
The center node for the dependency network, Bike Buyer, represents the predictable attribute in the mining model. The pink shading indicates that all of the attributes have an effect on bike buying.
Adjust the All Links slider to identify the most influential attribute.
As you lower the slider, only the attributes that have the greatest effect on the [Bike Buyer] column remain. By adjusting the slider, you can discover that age and region are the greatest factors in predicting whether someone is a bike buyer.