Exploring the Call Center Model (Intermediate Data Mining Tutorial)
Now that you have built the exploratory model, you can use it to learn more about your data by using the following tools provided in Business Intelligence Development Studio.
Microsoft Neural Network Viewer: This viewer is available in the Mining Model Viewer tab of Data Mining Designer, and is designed to help you experiment with interactions in the data.
Microsoft Generic Content Tree Viewer: This standard viewer provides in-depth detail about the patterns and statistics discovered by the algorithm when it generated the model.
Data Source View designer: This viewer provides tables, charts, and PivotCharts and PivotTables to help you explore the source data. Returning to the source data can be very helpful for understanding the trends that are highlighted by the model.
The viewer has three panes — Input, Output, and Variables.
By using the Output pane, you can select different values for the predictable attribute, or dependent variable. If your model contains multiple predictable attributes, you can select the attribute from the Output Attribute list.
The Variables pane compares the two outcomes that you chose in terms of contributing attributes, or variables. The colored bars visually represent how strongly the variable affects the target outcomes. You can also view lift scores for the variables. A lift score is calculated differently depending on which mining model type you are using, but generally tells you the improvement in the model when using this attribute for prediction.
The Input pane lets you add influencers to the model to try out various what-if scenarios.
Using the Output Pane
In this initial model, you are interested in seeing how various factors affect the grade of service. To do this, you can select Service Grade from the list of output attributes, and then compare different levels of service by selecting ranges from the dropdown lists for Value 1 and Value 2.
To compare lowest and highest service grades
For Value 1, select the range with the lowest values. For example, a range such as 0.0-0.7 represents the lowest abandon rates, and therefore the best level of service.
The exact values in this range may vary depending on how you configured the model.
For Value 2, select the range with the highest values. For example, a range with values such as >=0.12 represents the highest abandon rates, and therefore the worst service grade.
The contents of the Variables pane are updated to compare attributes that contribute to the outcome values. Therefore, the left column shows you the attributes that are associated with the best grade of service, and the right column shows you the attributes associated with the worst grade of service.
Using the Variables Pane
After you have configured the results that you wish to compare by using the Output pane, you can use the Variables list to see how each variable contributes to that result. In this model, the top three factors differentiating between the highest and lowest service grades are Average Time Per Issue, Orders, and Day Of Week. The Variables pane also provides detailed statistics about the importance of each contributing variable.
To view and copy probability and lift scores for an attribute
In the Variables pane, pause the mouse over the colored bar in the first row.
This colored bar shows you how strongly Average Time Per Issue contributes toward the service grade. The tooltip shows an overall score, probabilities, and lift scores for each combination of a variable and a target outcome.
In the Variables pane, right-click any colored bar and select Copy.
In an Excel worksheet, right-click any cell and select Paste.
The report is pasted as an HTML table, and shows only the scores for each bar.
In a different Excel worksheet, right-click any cell and select Paste Special.
The report is pasted as text format, and includes the related statistics described in the next section.
Using the Input Pane
Suppose that you are interested in looking at the effect of a particular factor, such as the day of the week, the shift, or the number of orders. You can select a particular variable by using the Input pane, and the Variables pane is automatically updated to compare the two previously selected groups given the specified variable. You can use this information to explore factors that might be influencing the target outcome. For example, the initial review of the model indicated that service was better on Monday and Tuesday and tended to be poor on Thursday, so you decide to look at contributing factors.
To review the effect on service grade by changing input attributes
In the Input pane, for attribute, select Day Of Week.
For Value, select Thursday.
The Variables pane updates to show the influence of all other variables given that the day of the week is Thursday. All other selections remain the same — you are still comparing the lowest and highest service grades.
For Value, change the day of the week to Saturday.
The Variables pane updates to show the variables that most influence service grade on Saturdays.
In the Input pane, click the next blank row under Attribute, and select Shift. For Value, select AM.
A new input condition is added to the list. The Variables pane updates to show the impact on the model for a particular shift on the weekday already selected.
Continue to change the values for Shift and Day Of Week to find any interesting correlations between service grade and these variables.
To clear the Input pane so that you can use different attributes, click Refresh viewer content.
Interpreting the Statistics Provided in the Viewer
Longer waiting times are a strong predictor of a high abandon rate, meaning a poor service grade. This may seem an obvious conclusion; however, the mining model provides you with some additional statistical data to help you interpret these trends.
Score: Value that indicates the overall importance of this variable for discriminating between outcomes. The higher the score, the stronger the effect the variable has on the outcome.
Probability of value 1: Percentage that represents the probability of this value for this outcome.
Probability of value 2: Percentage that represents the probability of this value for this outcome.
Lift for Value 1 and Lift for Value 2: Scores that represents the impact of using this particular variable for predicting the Value 1 and Value 2 outcomes. The higher the score, the better the variable is at predicting the outcomes.
How should you interpret this information? Say, for example, that we are comparing the highest and lowest service grades, and looking at the influence of Average Time Per Issue. The model lists some ranges of values for Average Time Per Issue, and provides these statistics: Probability of value 1 is 60.6% and Probability of value 2 is 8.30%. This means that when the Average Time Per Issue was in the range of 44-70 minutes, 60.6% of cases were in the shift with the highest service grades (Value 1), and 8.30% of cases were in the shift with the worse service grades (Value 2).
From this information, you can draw some conclusions. Shorter call response time (the range of 44-70) strongly influences better service grade (the range 0.00-0.07). The score (92.35) tells you that this variable is very important.
However, other factors have effects that are more subtle and more difficult to interpret. For example, shift appears to influence service, but it varies by day of week; also, the lift scores and the relative probabilities indicate that shift is not an important factor.
The value of a neural network model is that it exposes very subtle relationships that would be difficult to detect by standard analytics. You must take some time to explore the model and interpret the relationships given your own understanding of the business problem.
This viewer can be used to view even more detailed information created by the algorithm when the model is processed. The Microsoft Generic Content Tree Viewer represents the mining model as a series of nodes, wherein each node represents learned knowledge about the training data. This viewer can be used with all models, but the contents of the nodes are different depending in the model type.
For neural network models or logistic regression models, you might find the marginal statistics node particularly useful. This node contains derived statistics about the distribution of values in your data. This information can be useful if you want to get a summary of the data without having to write many T-SQL queries. The chart of binning values in the previous topic was derived from the marginal statistics node.
To obtain a summary of data values from the mining model
In Data Mining Designer, in the Mining Model Viewer tab, select <mining model name>.
From the Viewer list, select Microsoft Generic Content Tree Viewer.
The view of the mining model refreshes to show a node hierarchy in the left-hand pane and an HTML table in the right-hand pane.
In the Node Caption pane, click the node that has the name 10000000000000000.
The topmost node in any model is always the model root node. In a neural network or logistic regression model, the node immediately under that is the marginal statistics node.
In the Node Details pane, scroll down until you find the row, NODE_DISTRIBUTION.
Scroll through the NODE_DISTRIBUTION table to view the distribution of values as calculated by the neural network algorithm.
To use this data in a report, you could select and then copy the information for specific rows, or you can use the following Data Mining Extensions (DMX) query to extract the complete contents of the node.
SELECT * FROM [Call Center Binned NN].CONTENT WHERE NODE_NAME = '10000000000000000'
You can also use the node hierarchy and the details in the NODE_DISTRIBUTION table to traverse individual paths in the neural network and view statistics from the hidden layer. For more information, see Querying a Neural Network Model (Analysis Services- Data Mining).
You use this viewer when building a mining structure or a cube, but this viewer also provides an assortment of tools that are useful for better understanding the source data. For example, if the model has found a trend that you do not fully understand, you might want to view individual rows in the underlying data or create summaries or charts that help you understand correlations.
This section provides one example of how you can use the Data Source View designer to explore trends revealed by the model without having to copy data to Excel or run multiple T-SQL queries against the data source.
In this scenario, you will create some charts that graphically show the correlation between response time and service grade that were found by the model.
To create a pivot chart that illustrates a trend from the mining model
In Solution Explorer, under Data Source Views, double-click Call Center.dsv.
On the tab, Call Center.dsv, right-click the table, FactCallCenter, and select Explore Data.
A new tab opens, titled Explore FactCallCenter Table. This tab contains four sections on different tabs: Table, Pivot Table, Chart, and Pivot Chart.
Click the Pivot Chart tab.
In the Chart Field List, select AverageTimePerIssue, and then drag it to the chart area, into the box titled Drop Category Fields Here.
Because the source data comes from a flat table, the hierarchy in the Chart Field List contains the same information at the hierarchy level and at the field level. However, if you are working with a cube or dimension, the hierarchy might contain multiple members. For example, a Date hierarchy might contain fields such as Quarter, Month, or Day. You can drag either the entire hierarchy or a single member of the hierarchy into the chart.
In the Chart Field List, find ServiceGrade, and drag it into the center of the chart area.
The chart updates to add a box titled Sum of ServiceGrade at the top of the chart.
In the toolbar, click the sigma icon, and select Average.
The title is updated to Average of ServiceGrade.
In the Chart Field List, select Shift, and then drag it to the chart area, into the box titled Drop Filter Fields Here. Drag WageType from the Chart Field List and drop it next to Shift.
Now you can filter by shift to see if the trend is any different depending on the shift, or if the day is a holiday or working day.
Select AverageTimePerIssue at the bottom of the chart, and drag it back to the Chart Field List.
In the Chart Field List, select AverageTimePerLevelTwoOperators, and drag it to the chart area, into the box titled Drop Category Fields Here.
The chart updates to show the correlation between an increase in operators and the average service grade. There appears to be no linear relationship. You can continue to experiment by dropping new fields into the chart, or by changing the chart type.
However, notice that these charts can typically display only a few attributes at a time, whereas the neural network algorithm analyzes many complex interactions among multiple inputs. Also, the neural network model detects many correlations that are too subtle for expression in a chart.
If you want to export the dependency network diagram of a neural network model, you can also use the Data Mining Templates for Visio. This free add-in for Visio 2007 provides sophisticated, customizable diagrams for data mining models that you can use for presentations or reports. For more information, see Data Mining Add-ins for Office 2007.