Creating a Targeted Mailing Mining Model Structure (Data Mining Tutorial)

The first step in creating a targeted mailing scenario is to use the Data Mining Wizard in Business Intelligence Development Studio to create a new mining structure and decision tree mining model.

For More Information

Data Mining Wizard, Data Mining Designer, Microsoft Decision Trees Algorithm

To create a mining structure for a targeted mailing scenario

  1. In Solution Explorer, right-click Mining Structures and select New Mining Structure.

    The Data Mining Wizard opens.

  2. On the Welcome to the Data Mining Wizard page, click Next.

  3. On the Select the Definition Method page, verify that From existing relational database or data warehouse is selected, and then click Next.

  4. On the Select the Data Mining Technique page, under Which data mining technique do you want to use?, select Microsoft Decision Trees.

    In this tutorial you will create several models based on this initial mining structure. The first model will be created together with the structure when you complete the wizard, and will be based on the Microsoft Decision Trees algorithm.

  5. Click Next.

  6. On the Select Data Source View page, notice that Adventure Works DW is selected by default. Click Browse to view the tables in the data source view, and then click Close to return to the wizard.

  7. Click Next.

  8. On the Specify Table Types page, select the check box in the Case column next to the vTargetMail table, and then click Next.

  9. On the Specify the Training Data page, verify that the check box in the Key column is selected next to the CustomerKey column.

    If the source table from the data source view indicates a key, the Data Mining Wizard automatically chooses that column as a key for the model.

  10. Select Input and Predictable next to the BikeBuyer column.

    When you indicate that a column is predictable, the Suggest button is enabled. Clicking Suggest opens the Suggest Related Columns dialog box, which lists the columns that are most closely related to the predictable column.

    The Suggest Related Columns dialog box orders the attributes by their correlation with the predictable attribute. Columns with a value greater than 0.05 are automatically selected to be included in the model. If you agree with the suggestions, click OK, which marks the selected columns as input columns in the wizard. For this tutorial, ignore the suggestions by clicking Cancel.

  11. Select the Input check boxes next to the following columns:

    • Age
    • CommuteDistance
    • EnglishEducation
    • EnglishOccupation
    • FirstName
    • Gender
    • GeographyKey
    • HouseOwnerFlag
    • LastName
    • MaritalStatus
    • NumberCarsOwned
    • NumberChildrenAtHome
    • Region
    • TotalChildren
    • YearlyIncome

    You can select multiple columns by using the SHIFT key.

  12. Click Next.

  13. On the Specify Columns' Content and Data Type page, click Detect.

    An algorithm runs that samples numeric data and determines whether the numeric columns contain continuous or discrete values. For example, a column can contain salary information as actual salary values, which are continuous, or it can contain integers that represent encoded salary ranges, such as 1 = < $25,000; 2 = from $25,000 to $50,000, which are discrete.

  14. After clicking Detect, make sure that the entries in the Content Type and Data Type columns have the settings listed in the following table.

    Column Content Type Data Type

    Age

    Continuous

    Long

    BikeBuyer

    Discrete

    Long

    CommuteDistance

    Discrete

    Text

    CustomerKey

    Key

    Long

    EnglishEducation

    Discrete

    Text

    EnglishOccupation

    Discrete

    Text

    FirstName

    Discrete

    Text

    Gender

    Discrete

    Text

    GeographyKey

    Discrete

    Text

    HouseOwnerFlag

    Discrete

    Text

    LastName

    Discrete

    Text

    MaritalStatus

    Discrete

    Text

    NumberCarsOwned

    Discrete

    Long

    NumberChildrenAtHome

    Discrete

    Long

    Region

    Discrete

    Text

    TotalChildren

    Discrete

    Long

    YearlyIncome

    Continuous

    Double

Note

Based solely on the numeric values, the Data Mining algorithm suggests that the GeographyKey column contains continuous numbers. However, numbers such as postal codes typically should be treated as discrete, rather than continuous numeric values, because mathematical operations using these numbers are meaningless.

  1. Click Next.
  2. On the Completing the Wizard page, in Mining structure name, type Targeted Mailing.
  3. In Mining model name, type TM_Decision_Tree.
  4. Select the Allow drill through check box.
  5. Click Finish.

Next Task in Lesson

Modifying the Targeted Mailing Model (Data Mining Tutorial)