Attributes Table
This table indicates the attributes of each high-level property used in the model, including the distribution type for the property, and whether the property is used to build the model or is to be predicted by the model. For example, SKU and Age are high-level properties while QTY(sku1) is a low-level property. Entries in the attributes table are not required unless the default behavior needs to be overridden. The default behavior is for each property to be used for the prediction as well as to be predicted. The default distribution is Autodetect. A blank attributes table is created if one is not specified.
Note
- There is no table named Attributes. Each model configuration specifies the name of its attribute table as an entry in the PredictorDataTables table.
Column Name | Type | Description | Required? |
PropID | DBTYPE_UI4 | Unique key, system generated. Leave blank. | Yes |
ParentID | DBTYPE_UI4 | Must be -1. | Yes |
Name | DBTYPE_WSTR | Name of property if referring to a Dense table, or name of Pivot Column if referring to a Sparse table. The capitalization of this string determines the capitalization of the property in the model. | Yes |
TableName | DBTYPE_WSTR | Must be Null. | No |
ColumnName | DBTYPE_WSTR | Must be Null. | No |
Distribution | DBTYPE_UI2 | Distribution type
B - Model As Binary NB - Not Model As Binary Valid values are:
|
No
NULL implies Autodetect. |
Predict | DBTYPE_BOOL | Indicates whether this is an output property.
Valid values are:
|
No
NULL implies True. |
UseToPredict | DBTYPE_BOOL | Indicates whether this is an input property.
Valid values are:
|
No
NULL implies True. |
Distribution Attribute
- Discrete vs. Continuous. A discrete property has values with no specific relationship between sequential values. For example, the two-letter state abbreviations are discrete. All non-numeric properties are treated as discrete. Numerical properties may or may not be discrete.
- Normal vs. LogNormal. Certain continuous data, such as Income, is better represented as a Normal (Gaussian) distribution while other data, such as the number of products purchased, better fit a LogNormal distribution which is skewed towards 0. A property with a LogNormal distribution means the logarithm of the property has a Normal distribution.
- Autodetect. An algorithm to auto-detect the above attributes can be used. For discrete properties, the algorithm also eliminates those properties that have only one distinct value or too many distinct values to be useful (over 500). Note that "missing" counts as a distinct value.
- Model As Binary. When the Model As Binary attribute of a property is set to True, the model is only concerned with whether the property exists, not the value of the property. For example, consider the case of a property that corresponds to the number of products purchased. It may be more useful to model whether the product was purchased or not, rather than modeling the quantity purchased.
- Advertisement. This distribution type applies to properties corresponding to online advertisements. It is most often used in prediction models for the ad pipeline. Properties with this distribution must have three possible states corresponding to: missing (not seen), seen but not clicked (usually indicated by the value of 1), and seen and clicked (usually indicated by the value 2). When a property has this distribution, no prediction is generated for the missing state. This distribution type can be used only with prediction models (not segment models).
Copyright © 2005 Microsoft Corporation.
All rights reserved.