Building Data Mining Models

Note

  This feature will be removed in the next version of Microsoft SQL Server. Do not use this feature in new development work, and modify applications that currently use this feature as soon as possible.

To create a new relational or OLAP data mining model programmatically using Decision Support Objects (DSO), follow these basic steps:

  1. Connect to the target Analysis server and select a database from the MDStores collection of the Server object.

  2. Create a new data mining model object using the MiningModels collection of the Database object, with the appropriate SubClassType for the relational or OLAP data mining model.

  3. If needed, create and assign mining model roles to the new relational or OLAP mining model object.

  4. Set the properties needed for the relational or OLAP mining model object. The following table displays the differences between the needed properties for relational and OLAP data mining models.

    Property

    OLAP mining model

    Relational mining model

    CaseDimension

    Defines the case dimension used by the data mining model.

    Not used.

    CaseLevel

    Defines the case level within the case dimension used by the data mining model. A read-only property, it identifies the lowest level in the dimension whose data mining model column has its IsDisabled property set to False.

    Not used.

    Description

    Contains a user-friendly description of the data mining model.

    Contains a user-friendly description of the data mining model.

    FromClause

    Not used.

    Defines the case table, in the form of a FROM clause, used by the data mining model.

    JoinClause

    Not used.

    Defines any supporting tables, in the form of a JOIN clause, used by the data mining model.

    MiningAlgorithm

    Defines the data mining algorithm provider, such as Microsoft_Decision_Trees or Microsoft_Clustering, used by both types of data mining models.

    Defines the data mining algorithm provider, such as Microsoft_Decision_Trees or Microsoft_Clustering, used by both types of data mining models.

    SourceCube

    Defines the OLAP cube used by the data mining model for training data.

    Not used.

    SubClassType

    Is set to sbclsOlap when the MiningModel object is created.

    Is set to sbclsRelational when the MiningModel object is created.

    TrainingQuery

    Defines the Multidimensional Expressions (MDX) query used to insert training data into the data mining model. In most instances, this property is left blank; DSO will construct an appropriate training query if this property is not used.

    Defines the Multidimensional Expressions (MDX) query used to insert training data into the data mining model. In most instances, this property is left blank; DSO will construct an appropriate training query if this property is not used.

  5. Create a new data mining model column in the Columns collection of the MiningModel object.

  6. Set the properties needed for the new data mining model column. The following table displays the differences between the needed column properties for relational and OLAP data mining models.

    Property

    OLAP mining model

    Relational mining model

    DataType

    Defines the expected data type of the data mining column.

    Defines the expected data type of the data mining column.

    Description

    Contains a user-friendly description of the data mining model column.

    Contains a user-friendly description of the data mining model column.

    ContentType

    Should contain a value from the SUPPORTED_CONTENT_TYPES column of the MINING_SERVICES schema rowset. For example, if the column contained text data that corresponded to income ranges for customers, the ContentType property would be set to DISCRETE to reflect the discrete valuations of the data. If, on the other hand, the column contained actual salaries, the property would be set to either CONTINUOUS or DISCRETIZED, depending on the capabilities of the data mining algorithm provider.

    Should contain a value from the SUPPORTED_CONTENT_TYPES column of the MINING_SERVICES schema rowset. For example, if the column contained text data that corresponded to income ranges for customers, the ContentType property would be set to DISCRETE to reflect the discrete valuations of the data. If, on the other hand, the column contained actual salaries, the property would be set to either CONTINUOUS or DISCRETIZED, depending on the capabilities of the data mining algorithm provider.

    IsKey

    Not used. This property is read-only, and is automatically set to True for the lowest enabled level in the case dimension specified in the CaseDimension property of the mining model.

    Defines the key columns for the data mining model. Set to True to specify a key column in the case set.

    IsInput

    Defines the input columns for the data mining model. For a set of related columns, changing the IsInput property for one of the columns automatically changes the property for the other related columns.

    Defines the input columns for the data mining model. For a set of related columns, changing the IsInput property for one of the columns automatically changes the property for the other related columns.

    IsPredictable

    Defines the predictable columns for the data mining model. A column can have both IsInput and IsPredictable set to True. For a set of related columns, changing the IsPredictable property for one of the columns automatically changes the property for the other related columns.

    Defines the predictable columns for the data mining model. A column can have both IsInput and IsPredictable set to True. For a set of related columns, changing the IsPredictable property for one of the columns automatically changes the property for the other related columns.

    IsDisabled

    Defines the columns to be used in analysis for the data mining model.

    Defines the columns to be used in analysis for the data mining model.

    Distribution

    This property is used to optimize the mining model by giving the mining model algorithm some indication of the statistical nature of the data in the column. The values for this property should come from the SUPPORTED_DISTRIBUTION_FLAGS of the MINING_SERVICES schema rowset.

    This property is used to optimize the mining model by giving the mining model algorithm some indication of the statistical nature of the data in the column. The values for this property should come from the SUPPORTED_DISTRIBUTION_FLAGS of the MINING_SERVICES schema rowset.

    SourceOlapObject

    The value of this property is an object within the OLAP cube. For instance, this property might contain a DSO level object or a DSO member property object.

    Not used.

    SourceColumn

    Not used.

    The value of this property is the fully qualified name of a field in the case or supporting table for the data mining model.

    There are other differences in how column properties are handled between OLAP and relational models. For more information about data mining model columns, see clsColumn.

  7. Save the mining model object using the Update method.

    To optionally train the newly created data mining model, the following additional steps should be used. Although a new data mining model does not need to be processed, the data mining model cannot be browsed until processing is completed.

  8. Lock the mining model object using the olapLockProcess flag.

  9. Train the mining model object using the Process method.

  10. Unlock the mining model object.

Locking the data mining model during processing prevents access by other users until the mining model is unlocked, improving performance during the training of the mining model and ensuring that repository integrity is maintained.

Creating an OLAP Data Mining Model

The following code example creates an OLAP data mining model, following the steps outlined earlier in this topic, that attempts to predict the salary range of a customer in the Sales cube of the FoodMart 2000 database based on gender, marital status and education.

Unlike the process of creating a relational data mining model, the column structure is directly drawn from the source cube specified in the SourceCube property of the mining model object. To determine which columns are to be processed by the data mining model, the column objects stored in the Columns collection of the mining model object can be changed. The IsDisabled property determines which columns are to be used as part of the analysis, while the IsInput and IsPredictive properties of each column object can be set to determine the behavior of the column, including whether it will serve as an input, predictive, or input and predictive column in the data mining model.

Because the structure of the OLAP data mining model is drawn from the structure of the source cube, all source OLAP objects used by the mining model must be visible to the mining model. The following requirements must be met:

  • The source cube must be visible.

  • The case dimension must be visible.

  • The SourceOlapObject property for each data mining column must contain a visible source OLAP object.

This code example creates and processes an OLAP data mining model named CustSalesModelOLAP, based on the Sales cube of the FoodMart 2000 database, that analyzes salaries for customers based on gender, marital status and education:

Public Sub CreateOLAPMiningModel()
   Dim dsoServer As New DSO.Server
   Dim dsoDB As DSO.MDStore
   Dim dsoDMM As DSO.MiningModel
   Dim dsoColumn As DSO.Column
   Dim dsoRole As DSO.Role

   ' Constants used for DataType property
   ' of the DSO.Column object.
   ' Note that these constants are identical to
   ' those used in ADO in the DataTypeEnum enumeration.
   Const adInteger = 3
   Const adWChar = 130
   
   ' Connect to the server on this computer.
   dsoServer.Connect "LocalHost"
   
   ' Select the FoodMart 2000 database.
   Set dsoDB = dsoServer.MDStores("FoodMart 2000")
   
   ' Check for the existence of the model on this computer.
   If Not dsoDB.MiningModels("CustSalesModelOLAP") Is Nothing Then
       ' If this model exists, delete it.
       dsoDB.MiningModels.Remove "CustSalesModelOLAP"
   End If
   
   ' Create a new OLAP mining model
   ' called CustSalesModelOLAP.
   Set dsoDMM = dsoDB.MiningModels.AddNew("CustSalesModelOLAP", _
      sbclsOlap)
   
   ' Create a new mining model role called All Users
   Set dsoRole = dsoDMM.Roles.AddNew("All Users")
   
   ' Set the needed properties for the new mining model.
   With dsoDMM
      .DataSources.AddNew "FoodMart", sbclsRegular
      ' Set the description of the model.
      .Description = "Analyzes the salaries " & _
         "of customers"
      ' Select the algorithm provider for the model.
      .MiningAlgorithm = "Microsoft_Decision_Trees"
      ' Set the source cube for the model to the Sales cube.
      .SourceCube = "Sales"
      ' Set the case dimension for the model to the
      ' Customers shared dimension.
      .CaseDimension = "Customers"
      ' Let DSO define the training query.
      .TrainingQuery = ""
      ' Let DSO add the cube structure to the
      ' data mining model structure, automatically
      ' creating needed data mining model columns.
      .Update
   End With
   
   ' Set the column properties pertinent to the new model.
   ' Note that, when columns are automatically added to
   ' the model in this fashion, the are disabled. You
   ' must choose which columns are to be enabled
   ' before you can process the
   ' model, and at least one column must be enabled,
   ' or an error will result.
   
   ' Enable the Name column. As this column is the
   ' lowest enabled level on the Customers case dimension,
   ' it becomes the case level for the data mining model.
   Set dsoColumn = dsoDMM.Columns("Name")
   dsoColumn.IsDisabled = False
   
   ' Enable the Gender column as an input column.
   Set dsoColumn = dsoDMM.Columns("Gender")
   dsoColumn.IsInput = True
   dsoColumn.IsDisabled = False
   
   ' Enable the Marital Status column as an input column.
   Set dsoColumn = dsoDMM.Columns("Marital Status")
   dsoColumn.IsInput = True
   dsoColumn.IsDisabled = False
   
   ' Enable the Education column as an input column.
   Set dsoColumn = dsoDMM.Columns("Education")
   dsoColumn.IsInput = True
   dsoColumn.IsDisabled = False
   
   ' Enable the Unit Sales column as a predictable column.
   Set dsoColumn = dsoDMM.Columns("Yearly Income")
   dsoColumn.IsPredictable = True
   dsoColumn.IsDisabled = False
   
   ' Save the data mining model.
   With dsoDMM
      ' Set the LastUpdated property of the new mining model
      ' to the present date and time.
      .LastUpdated = Now
      ' Save the model definition.
      .Update
   End With
   
   ' Process the data mining model.
   With dsoDMM
      ' Lock the mining model for processing
      .LockObject olapLockProcess, _
         "Processing the data mining model in sample code"
      ' Fully process the new mining model.
      ' This may take up to several minutes.
      .Process processFull
      ' Unlock the model after processing is complete.
      .UnlockObject
   End With
   
   ' Clean up objects and close server connection
   Set dsoRole = Nothing
   Set dsoColumn = Nothing
   Set dsoDMM = Nothing
   
   dsoServer.CloseServer
   Set dsoServer = Nothing
   
End Sub

Creating a Relational Data Mining Model

The process of creating a relational data mining model is similar to the process of creating an OLAP data mining model, covered earlier in this topic. The major difference between the two, other than the type of data the model will process, is the handling of data mining model columns. Unlike an OLAP data mining model, a relational data mining model does not draw its structure directly from its case and supporting tables. Instead, each column is manually created and defined. The following code example demonstrates the difference in creating a relational data mining model by creating a mining model that duplicates the analysis of the OLAP data mining model created earlier.

This duplication is by design, to give you a direct comparison in structural differences between an OLAP and a relational data mining model.

The following code example creates the CustSalesModelRel relational data mining model that analyzes salaries for customers in the Customer table based on gender, marital status, and education in the FoodMart 2000 database:

Public Sub CreateRelMiningModel()
   Dim dsoServer As New DSO.Server
   Dim dsoDB As DSO.MDStore
   Dim dsoDS As DSO.DataSource
   Dim dsoDMM As DSO.MiningModel
   Dim dsoColumn As DSO.Column
   Dim dsoRole As DSO.Role

   Dim strLQuote As String, strRQuote As String
   Dim strFromClause As String
   
   ' Constants used for DataType property
   ' of the DSO.Column object.
   ' Note that these constants are identical to
   ' those used in ADO in the DataTypeEnum enumeration.
   Const adInteger = 3
   Const adWChar = 130
   
   ' Connect to the server on this computer.
   dsoServer.Connect "LocalHost"
   
   ' Select the FoodMart 2000 database.
   Set dsoDB = dsoServer.MDStores("FoodMart 2000")
   
   ' Retrieve the open and close quote characters for
   ' the FoodMart data source.
   strLQuote = dsoDB.DataSources("FoodMart").OpenQuoteChar
   strRQuote = dsoDB.DataSources("FoodMart").CloseQuoteChar
   
   ' The Customer table is the fact table for this
   ' relational data mining model; this variable will
   ' make it easier to understand the code that
   ' follows.
   strFromClause = strLQuote & "customer" & strRQuote
   
   ' Check for the existence of the model on this computer.
   If Not dsoDB.MiningModels("CustSalesModelRel") Is Nothing Then
       ' If this model exists, delete it.
       dsoDB.MiningModels.Remove "CustSalesModelRel"
   End If
   
   ' Create a new relational mining model
   ' called CustSalesModelRel.
   Set dsoDMM = dsoDB.MiningModels.AddNew("CustSalesModelRel", _
      sbclsRelational)
   
   ' Create a new mining model role called All Users
   Set dsoRole = dsoDMM.Roles.AddNew("All Users")
   
   ' Set the needed properties for the new mining model.
   With dsoDMM
      .DataSources.AddNew "FoodMart", sbclsRegular
      ' Set the description of the model.
      .Description = "Analyzes the salaries " & _
         "of customers"
      ' Set the case table for the model to the
      ' Customer table.
      .FromClause = strFromClause
      ' Select the algorithm provider for the model.
      .MiningAlgorithm = "Microsoft_Decision_Trees"
      ' Let DSO define the training query.
      .TrainingQuery = ""
      ' Save the existing structure.
      .Update
   End With
   
   ' Create the columns pertinent to the new model.
   
   ' Create the CustomerID column as a key column.
   Set dsoColumn = dsoDMM.Columns.AddNew("CustomerID", _
      sbclsRegular)
   ' Set the column properties for the new column.
   With dsoColumn
      ' Set the source field from the case table for
      ' the column.
      .SourceColumn = strFromClause & "." & strLQuote & _
         "customer_id" & strRQuote
      .DataType = adInteger
      .IsKey = True
      .IsDisabled = False
   End With
   
   ' Create the Gender column as an attribute column.
   Set dsoColumn = dsoDMM.Columns.AddNew("Gender", _
      sbclsRegular)
   With dsoColumn
      .ContentType = "DISCRETE"
      .SourceColumn = strFromClause & "." & strLQuote & _
         "gender" & strRQuote
      .DataType = adWChar
      .IsDisabled = False
   End With
   
   ' Create the Marital Status column as an attribute column.
   Set dsoColumn = dsoDMM.Columns.AddNew("Marital Status", _
      sbclsRegular)
   With dsoColumn
      .ContentType = "DISCRETE"
      .SourceColumn = strFromClause & "." & strLQuote & _
         "marital_status" & strRQuote
      .DataType = adWChar
      .IsDisabled = False
   End With
   
   ' Create the Education column as an attribute column.
   Set dsoColumn = dsoDMM.Columns.AddNew("Education", _
      sbclsRegular)
   With dsoColumn
      .ContentType = "DISCRETE"
      .SourceColumn = strFromClause & "." & strLQuote & _
         "education" & strRQuote
      .DataType = adWChar
      .IsDisabled = False
   End With
   
   ' Create the Yearly Income column as an predictable column.
   Set dsoColumn = dsoDMM.Columns.AddNew("Yearly Income", _
      sbclsRegular)
   With dsoColumn
      .ContentType = "DISCRETE"
      .SourceColumn = strFromClause & "." & strLQuote & _
         "yearly_income" & strRQuote
      .DataType = adWChar
      .IsInput = False
      .IsPredictable = True
      .IsDisabled = False
   End With
   
   ' Save the data mining model.
   With dsoDMM
      ' Set the LastUpdated property of the new mining model
      ' to the present date and time.
      .LastUpdated = Now
      ' Save the model definition.
      .Update
   End With
   
   ' Process the data mining model.
   With dsoDMM
      ' Lock the mining model for processing
      .LockObject olapLockProcess, _
         "Processing the data mining model in sample code"
      ' Fully process the new mining model.
      ' This may take up to several minutes.
      .Process processFull
      ' Unlock the model after processing is complete.
      .UnlockObject
   End With
   
   ' Clean up objects and close server connection
   Set dsoRole = Nothing
   Set dsoColumn = Nothing
   Set dsoDMM = Nothing
   
   dsoServer.CloseServer
   Set dsoServer = Nothing
   
End Sub