Export (0) Print
Expand All
Expand Minimize

Troubleshooting DataProtection Health Set

 

Applies to: Exchange Server 2013, Project Server 2013

Topic Last Modified: 2015-03-09

The DataProtection Health set monitors the redundancy of databases in a database availability group (DAG).

If you receive an alert that specifies that DataProtection is unhealthy, this indicates an issue that may affect the replication or cluster components, and that can prevent access to the Exchange databases.

The DataProtection Health service is monitored by using the following probes and monitors.

 

Probe Health Set Dependencies Associated Monitors

ClusterEndpointProbe

DataProtection

Active Directory

ClusterEndpointMonitor

ClusterGroupProbe

DataProtection

Active Directory

ClusterGroupMonitor        

ClusterNetworkProbe

DataProtection

Active Directory

ClusterNetworkMonitor

ClusterServiceCrashProbe

DataProtection

Active Directory

ClusterServiceCrashMonitor

ServerOneCopyProbe

DataProtection

Active Director

ServerOneCopyMonitor

ServerOneCopyInternalMonitorProbe

DataProtection

Active Directory

ServerOneCopyInternalMonitorMonitor

ServiceHealthMSExchangeReplEndpointProbe

DataProtection

Active Directory

ServiceHealthMSExchangeReplEndpointMonitor

ServiceHealthMSExchangeReplCrashProbe 

DataProtection

Active Directory

ServiceHealthMSExchangeReplCrashMonitor 

ServerSiteFailureProbe

DataProtection

Active Directory

ServerSiteFailureMonitor

StorageApparentControllerIssuesProbe

DataProtection

Active Directory

StorageApparentControllerIssuesMonitor

DatabaseHealthTooManyMountedDatabaseProbe

DataProtection

Active Directory

DatabaseHealthTooManyMountedDatabaseMonitor

For more information about probes and monitors, see Server health and performance.

It's possible that the service recovered after it issued the alert. Therefore, when you receive an alert that specifies that the health set is unhealthy, first verify that the issue still exists. If the issue does exist, perform the appropriate recovery actions outlined in the following sections.

  1. Identify the health set name and the server name in the alert.

  2. The message details provide information about the exact cause of the alert. In most cases, the message details provide sufficient troubleshooting information to identify the root cause. If the message details are not clear, do the following:

    1. Open the Exchange Management Shell, and then run the following command to retrieve the details of the health set that issued the alert:

      Get-ServerHealth <server name> | ?{$_.HealthSetName -eq "<health set name>"}
      

      For example, to retrieve the Autodiscover.Protocol health set details about server1.contoso.com, run the following command:

      Get-ServerHealth server1.contoso.com | ?{$_.HealthSetName -eq "Autodiscover.Protocol"}
      

      Review the command output to determine which monitor reported the error. The AlertValue value for the monitor that issued the alert will be Unhealthy.

    2. Identify the probe that the monitor is based on. Note that most probes share the same name prefix. By using the previous example, search for “ClusterNetwork*”:

      Get-MonitoringItemIdentity -Identity DataProtection -Server server1.contoso.com | ?{$_.Name -like "ClusterNet ItemType  
      work*"}
      

      The returned results should resemble the following.

       

      ItemType

      HealthSetName

      Name

      TargetResource

      Probe

      DataProtection

      ClusterNetworkProbe

      MSExchangeRepl

    3. Rerun the associated probe for the monitor that’s in an unhealthy state. Refer to the table in the Explanation section to find the associated probe. To do this, run the following command:

      Invoke-MonitoringProbe <health set name>\<probe name> -Server <server name> | Format-List
      

      For example, assume that the failing monitor is AutodiscoverSelfTestMonitor. The probe associated with that monitor is AutodiscoverSelfTestProbe. To run that probe on server1.contoso.com, run the following command:

      Invoke-MonitoringProbe Autodiscover.Protocol\AutodiscoverSelfTestProbe -Server server1.contoso.com | Format-List
      
    4. In the command output, review the Result value of the probe. If the value is Succeeded, the issue was a transient error, and it no longer exists. Otherwise, refer to the recovery steps outlined in the following sections.

When you receive an alert from a health set, the email message contains the following information:

  • Name of the server that sent the alert

  • Time and date when the alert occurred

  • Authentication mechanism that was used, and credential information

  • Full exception trace of the last error, including diagnostic data and specific HTTP header information

    You can use the information in the full exception trace to help troubleshoot the issue. The exception generated by the probe contains a failure Reason that describes why the probe failed.

For most issues that occur in high availability environments, you can run the Test-ReplicationHealth cmdlet to help troubleshoot the cluster/networking/ActiveManager/services. Other HealthSet/Components will have different Test-* cmdlets.

For example:

Test-ReplicationHealth <ServerName>

The returned results will resemble the following:

 

Server

Check

Result

<ServerName>

ClusterService

Passed

<ServerName>

ReplayService

Passed

<ServerName>

ActiveManager

Passed

<ServerName>

TasksRpcListener

Passed

<ServerName>

TcpListener

Passed

<ServerName>

ServerLocatorService

Passed

<ServerName>

DagMembersUp

Passed

<ServerName>

ClusterNetwork

Passed

<ServerName>

QuorumGroup

Passed

<ServerName>

FileShareQuorum

Passed

<ServerName>

DatabaseRedundancyCheck

Passed

<ServerName>

DatabaseAvailabilityCheck

Passed

<ServerName>

DBCopySuspended

Passed

<ServerName>

DBCopyFailed

Passed

<ServerName>

DBInitializing

Passed

<ServerName>

DBDisconnected

Passed

<ServerName>

DBLogCopyKeepingUp

Passed

<ServerName>

DBLogReplayKeepingUp

Passed

If all components display Passed in the Result column, try to rerun the associated probe as shown in step 2c in the Verifying the issue still exists section.

If the issue still exists, restart the server. After the server restarts, rerun the associated probe as shown in step 2c in the Verifying the issue still exists section.

If the probe continues to fail, you may need assistance to resolve this issue. Contact a Microsoft Support professional to resolve this issue. To contact a Microsoft Support professional, visit the Exchange Server Solutions Center. In the navigation pane, click Support options and resources and use one of the options listed under Get technical support to contact a Microsoft Support professional. Because your organization may have a specific procedure for directly contacting Microsoft Product Support Services, be sure to review your organization's guidelines first.

 
Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2015 Microsoft