Exploring the Naive Bayes Model (Basic Data Mining Tutorial)
Topic Status: Some information in this topic is preview and subject to change in future releases. Preview information describes new features or changes to existing features in Microsoft SQL Server 2016 Community Technology Preview 2 (CTP2).
The Microsoft Naive Bayes algorithm provides several methods for displaying the interaction between bike buying and the input attributes.
The Microsoft Naive Bayes Viewer provides the following tabs for use in exploring Naive Bayes mining models:
The Dependency Network tab works in the same way as the Dependency Network tab for the Microsoft Tree Viewer. Each node in the viewer represents an attribute, and the lines between nodes represent relationships. In the viewer, you can see all the attributes that affect the state of the predictable attribute, Bike Buyer.
To explore the model in the Dependency Network tab
Use the Mining Model list at the top of the Mining Model Viewer tab to switch to the TM_NaiveBayes model.
Use the Viewer list to switch to Microsoft Naive Bayes Viewer.
Click the Bike Buyer node to identify its dependencies.
The pink shading indicates that all of the attributes have an effect on bike buying.
Adjust the slider to identify the most influential attribute.
As you lower the slider, only the attributes that have the greatest effect on the [Bike Buyer] column remain. By adjusting the slider, you can discover that a few of the most influential attributes are: number of cars owned, commute distance, and total number of children.
The Attribute Profiles tab describes how different states of the input attributes affect the outcome of the predictable attribute.
To explore the model in the Attribute Profiles tab
In the Predictable box, verify that Bike Buyer is selected.
If the Mining Legend is blocking display of the Attribute profiles, move it out of the way.
In the Histogram bars box, select 5.
In our model, 5 is the maximum number of states for any one variable.
The attributes that affect the state of this predictable attribute are listed together with the values of each state of the input attributes and their distributions in each state of the predictable attribute.
In the Attributes column, find Number Cars Owned. Notice the differences in the histograms for bike buyers (column labeled 1) and non-buyers (column labeled 0). A person with zero or one car is much more likely to buy a bike.
Double-click the Number Cars Owned cell in the bike buyer (column labeled 1) column.
The Mining Legend displays a more detailed view.
With the Attribute Characteristics tab, you can select an attribute and value to see how frequently values for other attributes appear in the selected value cases.
To explore the model in the Attribute Characteristics tab
In the Attribute list, verify that Bike Buyer is selected.
Set the Value to 1.
In the viewer, you will see that customers who have no children at home, short commutes, and live in the North America region are more likely to buy a bike.
With the Attribute Discrimination tab, you can investigate the relationship between two discrete values of bike buying and other attribute values. Because the TM_NaiveBayes model has only two states, 1 and 0, you do not have to make any changes to the viewer.
In the viewer, you can see that people who do not own cars tend to buy bicycles, and people who own two cars tend not to buy bicycles.
See the following topics to explore the other mining models.