Impact of Failure in Operations Manager 2007
Applies To: Operations Manager 2007 R2, Operations Manager 2007 SP1
Various Microsoft System Center Operations Manager 2007 servers and components can potentially fail, impacting Operations Manager functionality.
The amount of data and functionality lost during a failure is different in each failure scenario. It depends on the role of the failing component, on the Operations Manager deployment, on the length of time it takes to restore the failing component, and on the availability of backups.
Impact of Failure
The impact of failure is minimized if the Operations Manager deployment includes failover servers or clustering. The impact is greater if clustering and failover management servers are not implemented. This is because it will take longer to restore a failed component. When it takes longer to restore the functions provided by a failed component, there is a greater risk of data loss occurring and when data loss does occur, the amount of data lost will be greater. For more information about minimizing the impact of failure, see Reduce the Impact of Failure below.
In some failure scenarios, Operations Manager is able to continue to function properly for a short period of time without losing data. Then, after you repair the failing component, complete functionality is automatically restored without any further intervention.
The following table lists the impact of failure of various Operations Manager components. In this table, the assumption is that each server listed performs only a single role, as specified.
Failed Component | Impact: Best-Case Scenario | Impact: Worst-Case Scenario |
---|---|---|
Management server |
Workload on additional management servers in the management group is increased until the failed management server is restored. |
|
Root management server |
With at least one server in the cluster functioning, there is no impact. |
|
Operations Manager reporting server (OperationsManager database is intact) |
|
|
OperationsManager database |
If the OperationsManager database has been installed in a failover cluster, and as long as one of the cluster nodes is functioning, there is no impact. If log shipping is implemented, services might be reduced until the database is rebuilt. |
|
Data warehouse server (OperationsManagerDW database is intact) |
With at least one server in the cluster functioning, there is no impact. If the OperationsManagerDW database has failed, clustering does not reduce the impact of failure. (See the next column for impact.) |
|
Gateway server |
With multiple gateway servers deployed, agents can fail over to another gateway server, and communication with management servers is not interrupted. |
|
Audit Collection Database |
If the Audit Collection Database is intact, with at least one server in the cluster functioning, there is no impact. If the Audit Collection Database has failed, clustering does not reduce the impact of failure. (See the next column for impact.) |
|
Computer hosting the Operations console |
Not applicable. |
|
ACS Collector Server |
Not applicable. |
|
Reduce the Impact of Failure
The effects of some server failures can be reduced significantly by adding redundancy or implementing a failover solution, such as clustering. This also reduces the urgency of restoration.
The following list includes configuration options that add redundancy and clustering to the Operations Manager deployment. Implementing any of these options reduces the impact of failure and contributes to the high availability of Operations Manager in your organization:
Add management servers.
Install the root management server into a Cluster service failover cluster.
Place the databases in a Cluster service failover cluster.
Configure gateway servers for failover.
Configure log shipping.
Configure multihoming of agents across management groups.
Each option is further described below. For further information about deployment options that help ensure high availability and help reduce the impact of failure, see the Operations Manager 2007 Deployment Guide (https://go.microsoft.com/fwlink/?LinkId=93785).
Add Management Servers
Deploy more than one management server in a management group. This allows agents to fail over if a management server has failed.
If a management server has failed, the agents that report to that management server automatically start reporting to another management server in the same management group. After the failed management server is restored, agents can resume reporting to the original management server.
If the root management server is failing, you can promote an existing management server to the root management server role. After the root management server is restored, you can demote the temporary root management server and re-promote the restored server to its original root management server role.
Install the Root Management Server into a Cluster Services Failover Cluster
Install the root management server into a Cluster service failover cluster. If a node in the root management server cluster fails, the root management server role moves to another cluster node. This allows the RMS to continue to function normally.
After you restore the failed cluster node, you can move the RMS back to the original node or leave it running on another node in the failover cluster.
Place Databases in a Cluster Services Failover Cluster
Place the OperationsManager, the OperationsManagerDW, and the OperationsManagerAC databases in a Cluster service failover cluster. As in the case of the RMS cluster, if a node fails, all the databases would be moved to another node in the cluster and continue to function normally. If a database becomes corrupted however, you may need to restore it from your most recent backup.
Configure Gateway Server Failover
Deploy multiple gateway servers to allow agents to fail over between gateway servers and to distribute the management workload.
Gateway servers can also be configured for failover between collection management servers in a management group if multiple collection management servers are available.
Configure Log Shipping
Log shipping maintains a copy of an Operations Manager database on a separate Microsoft SQL Server 2005 or SQL Server 2008 server. Log shipping keeps the copy of the database up to date by sending the transaction logs from the source database in the active management group to the destination database in the standby management group.
If a database becomes corrupted, you can configure Operations Manager to temporarily use the standby database. After the original database is restored, you can reconfigure Operations Manager to use that database.