Predictor Schema Example

This example shows all the related tables and entries for a model configuration named PurchaseCfg1.

PredictorModelCfgs

ModelCfgName SiteName
PurchaseCfg1 Retail

This example assumes that the PurchaseCfg1 model configuration consists of three sources of data:

Demographic information from the User table (additional rows not shown).

UserID Age Gender Education
jilluser@roguecellars.com 23 F 19
barneyuser@arborshoes.com 49 M 3
jackuser@frogkick.com 28 M 13

Product Purchase information from the Purchases table (additional rows not shown).

UserID SKU QTY
jilluser@roguecellars.com wine_fine_au_zin_19 3
jilluser@roguecellars.com tv_big_36_m3 1
jilluser@roguecellars.com pumps_bl_6 1

Ad Click information from the AdClicks table (additional rows not shown). If Click is set to 1, the ad was seen but not clicked. If Click is set to 2, the ad was seen and clicked.

UserID Ad Click
jilluser@roguecellars.com www.thewinecellar.com/ad_zin12 1
jilluser@roguecellars.com www.thewinecellar.com/ad_cab3 2
jilluser@roguecellars.com www.arborshoes.com/ad4c 1

PredictorDataTables

ModelCfgName TableName Type Case
Column
Pivot
Column
Aggregate
Column
Aggregate
Type
PurchaseCfg1 User 0 UserID <NULL> <NULL> <NULL>
PurchaseCfg1 Purchases 1 UserID SKU QTY 0
PurchaseCfg1 AdClicks 1 UserID Ad Click 1
PurchaseCfg1 PurchaseCfg1_Attributes 2 "N/A" <NULL> <NULL> <NULL>

When data from the three tables (User, Purchases, and AdClicks) are combined for analysis, a left join is done with User as the master table, on the UserID column. That is, each user from the User table defines a case. If there are users represented in either the Purchases or AdClicks table that are not represented in the User table, they will not be included in the analysis. In general, if there is a dense table (there can be, at most, one), then it is the master table in the left join. If all tables are sparse, then the first table listed is the master table for the left join. Case column names do not have to be identical, but their values must be correlated.

The first row in the following attributes table specifies all of the SKUs to have a Discrete and Modeled As Binary distribution and specifies that SKUs should be used to predict other properties and not to be predicted. The second row specifies to use the Age property to predict other properties, and not to be predicted. This means that no Decision Tree that predicts the Age or SKU properties will be built. The row also specifies the Distribution to be Continuous, Lognormal and not Modeled As Binary. The third row specifies all of the ads are to have a distribution of type Advertisement and that ads are to be predicted but not used for prediction.

The following table is an example of a model configuration that will predict ads based on SKU and Age.

PredictorAttributes_PurchaseCfg1

Prop
ID
Parent
ID
Name Table
Name
Column
Name
Distri-
bution
UseTo
Predict
Predict
1 -1 SKU <NULL> <NULL> 4 True False
2 -1 Age <NULL> <NULL> 2 True False
3 -1 AD <NULL> <NULL> 8 False True

The following table shows a model built from the PurchaseCfg1 model configuration.

PredictorModels

Model
Name
ModelCfg
Name
Model
Type
Date
Created
Build
Time
Measured
Accuracy
Sample
Fraction
Measured
Accuracy
Max
Predictions
K
Purchase1 PurchaseCfg1 0 getdate() 30000 .05 10 <NULL>

(PredictorModels continued)

Max
Buffer
Size
Input
Attribute
Fraction
Output
Attribute
Fraction
Sample
Size
Complexity
Penality
Minimum
CasesTo
Split
Data
Fit
Score
Recommend
Score
Data
<NULL> 1 1 -1 100 5 11.08 0.3946 <Binary>

Copyright © 2005 Microsoft Corporation.
All rights reserved.