Export (0) Print
Expand All

ClusterDistance (DMX)

Topic Status: Some information in this topic is preview and subject to change in future releases. Preview information describes new features or changes to existing features in Microsoft SQL Server 2016 Community Technology Preview 2 (CTP2).

The ClusterDistance function returns the distance of the input case from the specified cluster, or if no cluster is specified, the distance of the input case from the most likely cluster.

ClusterDistance([<ClusterID expression>])

This function can be used only if the underlying data mining model supports clustering. The function can be used with any kind of clustering model (EM, K-Means, etc.), but the results differ depending on the algorithm.

A scalar value.

The ClusterDistance function returns the distance between the input case and the cluster that has the highest probability for that input case.

In case of K-Means clustering, since any case can belong to only one cluster, with a membership weight of 1.0, the cluster distance is always 0. However, in K-Means, each cluster is assumed to have a centroid. You can obtain the value of the centroid by querying or browsing the NODE_DISTRIBUTION nested table in the mining model content. For more information, see Mining Model Content for Clustering Models (Analysis Services - Data Mining).

In the case of the default EM clustering method, all the points inside the cluster are considered equally likely; therefore, by design there is no centroid for the cluster. The value of ClusterDistance between a particular case and a particular cluster N is calculated as follows:

ClusterDistance(N) = 1–(membershipWeight(N))


ClusterDistance(N) = 1–ClusterProbability (N))

Related Prediction Functions

Analysis Services provides the following additional functions for querying clustering models:

  • Use the Cluster (DMX) function to return the most likely cluster.

  • Use the ClusterProbability (DMX) function to get the probability that a case belongs to a particular cluster. This value serves as the inverse of the cluster distance.

  • Use the PredictHistogram (DMX) function to return a histogram of the likelihood of the input case existing in each of the model’s clusters.

  • Use the PredictCaseLikelihood (DMX) function to return a measure from 0 to 1 that indicates how likely an input case is to exist considering the model learned by the algorithm.

The following example returns the distance from the specified case to the cluster that the case most likely belongs to.

    [TM Clustering]
(SELECT 28 AS [Age],
    '2-5 Miles' AS [Commute Distance],
    'Graduate Degree' AS [Education],
    0 AS [Number Cars Owned],
    0 AS [Number Children At Home]) AS t

Example results:



To find out which cluster this is, you can substitute Cluster for ClusterDistance in the preceding sample.

Example results:


Cluster 6

The following syntax uses the mining model content schema rowset to return the list of node IDs and node captions for the clusters in the mining model. You can then use the node caption as the cluster identifier argument in the ClusterDistance function.


Example results:




Cluster 1


Cluster 2

The following syntax example returns the distance of the specified case from the cluster labeled Cluster 2.

    ClusterDistance('Cluster 2')
AS [Cluster 2 Distance]
FROM [TM Clustering]
(SELECT 28 AS [Age],
    '2-5 Miles' AS [Commute Distance],
    'Graduate Degree' AS [Education],
    0 AS [Number Cars Owned],
    0 AS [Number Children At Home]) AS t

Example results:

Cluster 2 Distance


Was this page helpful?
(1500 characters remaining)
Thank you for your feedback

Community Additions

© 2015 Microsoft