Resource failure

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Resource failure

The Cluster service considers a resource to be failed when the resource is not operational and cannot be restarted on the current host node. The Cluster service detects resource failure as follows:

  • At periodic intervals, the Cluster service checks to see whether the resource appears operational. It does this by periodically invoking the Resource Monitor. The Resource Monitor, in turn, relies on the resource dynamic-link library (DLL) for each resource to implement a procedure to detect whether the resource is functioning properly. The resource DLL communicates the results back through the Resource Monitor to the Cluster service. You can specify how frequently the Cluster service checks for failed resources by setting the Looks Alive and Is Alive poll intervals. The Cluster service requests a more thorough check of the resource's state at each Is Alive interval than it does at each Looks Alive interval; therefore, the Is Alive poll interval is typically longer than the Looks Alive poll interval. For more information on how to set poll intervals, see Specify the restart policy for a resource.

    For more information on Resource Monitors, see Resource Monitors.

  • If the resource DLL reports that the resource is not operational, the Cluster service attempts to restart the resource. You can specify the number of times the Cluster service can attempt to restart a resource in a given time interval. If the Cluster service exceeds the maximum number of restart attempts within the specified time period, and the resource is still not operational, the Cluster service considers the resource to be failed. For more information, see Set group failover policy.

You can configure whether a failed resource causes the group that contains the resource to be failed over to another node. For more information, see the section about setting advanced resource properties in Setting resource properties.

For more information on cluster resources, see Server Cluster Resources.

If the failed resource is configured to cause the group that contains the resource to fail over to another node, Cluster service will attempt a failover. If the number of failover attempts exceeds the group's threshold and the resource is still in a failed state, Cluster service will attempt a restart of the resource. The restart attempt will be made after a period of time specified by the resource's Retry Period On Failure property, a property common to all resources. It first tries to restart and then to fail over a resource.

Even though the unit value of this property is in milliseconds, choose a value that is in the order of minutes. Also, choose a value that is greater or equal to the value of the resource's restart period property and enforce this rule.