Exploring the Call Center Model (Intermediate Data Mining Tutorial)
Applies To: SQL Server 2016 Preview
Now that you have built the exploratory model, you can use it to learn more about your data by using the following tools provided in SQL Server Data Tools (SSDT).
Microsoft Neural Network Viewer: This viewer is available in the Mining Model Viewer tab of Data Mining Designer, and is designed to help you experiment with interactions in the data.
Microsoft Generic Content Tree Viewer: This standard viewer provides in-depth detail about the patterns and statistics discovered by the algorithm when it generated the model.
The viewer has three panes — Input, Output, and Variables.
By using the Output pane, you can select different values for the predictable attribute, or dependent variable. If your model contains multiple predictable attributes, you can select the attribute from the Output Attribute list.
The Variables pane compares the two outcomes that you chose in terms of contributing attributes, or variables. The colored bars visually represent how strongly the variable affects the target outcomes. You can also view lift scores for the variables. A lift score is calculated differently depending on which mining model type you are using, but generally tells you the improvement in the model when using this attribute for prediction.
The Input pane lets you add influencers to the model to try out various what-if scenarios.
In this initial model, you are interested in seeing how various factors affect the grade of service. To do this, you can select Service Grade from the list of output attributes, and then compare different levels of service by selecting ranges from the dropdown lists for Value 1 and Value 2.
To compare lowest and highest service grades
For Value 1, select the range with the lowest values. For example, the range 0-0-0.7 represents the lowest abandon rates, and therefore the best level of service.
The exact values in this range may vary depending on how you configured the model.
For Value 2, select the range with the highest values. For example, the range with the value >=0.12 represents the highest abandon rates, and therefore the worst service grade. In other words, 12% of the customers who phoned during this shift hung up before speaking to a representative.
The contents of the Variables pane are updated to compare attributes that contribute to the outcome values. Therefore, the left column shows you the attributes that are associated with the best grade of service, and the right column shows you the attributes associated with the worst grade of service.
In this model, it appears that Average Time Per Issue is an important factor. This variable indicates the average time that it takes for a call to be answered, regardless of call type.
To view and copy probability and lift scores for an attribute
In the Variables pane, pause the mouse over the colored bar in the first row.
This colored bar shows you how strongly Average Time Per Issue contributes toward the service grade. The tooltip shows an overall score, probabilities, and lift scores for each combination of a variable and a target outcome.
In the Variables pane, right-click any colored bar and select Copy.
In an Excel worksheet, right-click any cell and select Paste.
The report is pasted as an HTML table, and shows only the scores for each bar.
In a different Excel worksheet, right-click any cell and select Paste Special.
The report is pasted as text format, and includes the related statistics described in the next section.
Suppose that you are interested in looking at the effect of a particular factor, such as the shift, or number of operators. You can select a particular variable by using the Input pane, and the Variables pane is automatically updated to compare the two previously selected groups given the specified variable.
To review the effect on service grade by changing input attributes
In the Input pane, for attribute, select Shift.
For Value, select AM.
The Variables pane updates to show the impact on the model when the shift is AM. All other selections remain the same — you are still comparing the lowest and highest service grades.
For Value, select PM1.
The Variables pane updates to show the impact on the model when the shift changes.
In the Input pane, click the next blank row under Attribute, and select Calls. For Value, select the range that indicates the greatest number of calls.
A new input condition is added to the list. The Variables pane updates to show the impact on the model for a particular shift when the call volume is highest.
Continue to change the values for Shift and Calls to find any interesting correlations between shift, call volume, and service grade.
To clear the Input pane so that you can use different attributes, click Refresh viewer content.
Longer waiting times are a strong predictor of a high abandon rate, meaning a poor service grade. This may seem an obvious conclusion; however, the mining model provides you with some additional statistical data to help you interpret these trends.
Score: Value that indicates the overall importance of this variable for discriminating between outcomes. The higher the score, the stronger the effect the variable has on the outcome.
Probability of value 1: Percentage that represents the probability of this value for this outcome.
Probability of value 2: Percentage that represents the probability of this value for this outcome.
Lift for Value 1 and Lift for Value 2: Scores that represents the impact of using this particular variable for predicting the Value 1 and Value 2 outcomes. The higher the score, the better the variable is at predicting the outcomes.
The following table contains some example values for the top influencers. For example, the Probability of value 1 is 60.6% and Probability of value 2 is 8.30%, meaning that when the Average Time Per Issue was in the range of 44-70 minutes, 60.6% of cases were in the shift with the highest service grades (Value 1), and 8.30% of cases were in the shift with the worse service grades (Value 2).
From this information, you can draw some conclusions. Shorter call response time (the range of 44-70) strongly influences better service grade (the range 0.00-0.07). The score (92.35) tells you that this variable is very important.
However, as you look down the list of contributing factors, you see some other factors with effects that are more subtle and more difficult to interpret. For example, shift appears to influence service, but the lift scores and the relative probabilities indicate that shift is not a major factor.
Favors < 0.07
Favors >= 0.12
Average Time Per Issue
89.087 - 120.000
Average Time Per Issue
44.000 - 70.597
This viewer can be used to view even more detailed information created by the algorithm when the model is processed. The Microsoft Generic Content Tree Viewer represents the mining model as a series of nodes, wherein each node represents learned knowledge about the training data. This viewer can be used with all models, but the contents of the nodes are different depending in the model type.
For neural network models or logistic regression models, you might find the marginal statistics node particularly useful. This node contains derived statistics about the distribution of values in your data. This information can be useful if you want to get a summary of the data without having to write many T-SQL queries. The chart of binning values in the previous topic was derived from the marginal statistics node.
To obtain a summary of data values from the mining model
In Data Mining Designer, in the Mining Model Viewer tab, select <mining model name>.
From the Viewer list, select Microsoft Generic Content Tree Viewer.
The view of the mining model refreshes to show a node hierarchy in the left-hand pane and an HTML table in the right-hand pane.
In the Node Caption pane, click the node that has the name 10000000000000000.
The topmost node in any model is always the model root node. In a neural network or logistic regression model, the node immediately under that is the marginal statistics node.
In the Node Details pane, scroll down until you find the row, NODE_DISTRIBUTION.
Scroll down through the NODE_DISTRIBUTION table to view the distribution of values as calculated by the neural network algorithm.
To use this data in a report, you could select and then copy the information for specific rows, or you can use the following Data Mining Extensions (DMX) query to extract the complete contents of the node.
SELECT * FROM [Call Center EQ4].CONTENT WHERE NODE_NAME = '10000000000000000'
You can also use the node hierarchy and the details in the NODE_DISTRIBUTION table to traverse individual paths in the neural network and view statistics from the hidden layer. For more information, see Neural Network Model Query Examples.