Case Study: How Microsoft Deploys Disaster Recovery for FIM 2010

Applies To: Forefront Identity Manager 2010

Disaster recovery involves restoring your systems and data in the event of partial or complete failure of computers due to natural or technical causes. Backing up the critical data in your Forefront Identity Manager (FIM) 2010 deployment is a necessary operational task for all organizations.

As an example, this document describes how Microsoft IT (MSIT) deploys FIM 2010 internally, its design for disaster recovery, and on how it recovers from hardware failures.

  • MSIT Environment and Topology

  • MSIT Hardware Specifications

  • Supported Resources

  • Planning for Disaster Recovery

  • Recovering From Hardware Failures

  • Backup, Copy and Restore

MSIT Environment and Topology

The following illustration shows the current topology used by MSIT to deploy FIM 2010.

Topology of MSIT FIM deployment

As illustrated above, MSIT uses two servers that both host an instance of the FIM Portal and the FIM Service. To help with load balancing, one server is dedicated to responding to client requests (for example, employee requests to create or join groups) and the other server is dedicated for administration. The FIM Service database and the FIM Synchronization database are deployed to separate servers. For more information about topology planning for FIM 2010, see the Pre-Planning and Topology Configuration Guide.

MSIT Hardware Specifications

The following table displays the details of the hardware used by MSIT for their FIM 2010 deployment. For more information about hardware planning for FIM 2010, see the Capacity Planning Guide.

Server Role Details

FIM Portal and FIM Service (Production)

8 CPU Cores

8 GB RAM

FIM Portal and FIM Service for Administration and Migration (Production)

8 CPU Cores

8 GB RAM

FIM Database (Production)

24 CPU Cores

64 GB RAM

FIM Synchronization Database and FIM Service (Production)

24 CPU Cores

64 GB RAM

FIM Portal and Service, FIM Database (Disaster Recovery)

24 CPU Cores

64 GB RAM

FIM FIM Synchronization Database and FIM Service (Disaster Recovery)

24 CPU Cores

64 GB RAM

Supported Resources

The following table displays the resources that are currently supported by the MSIT FIM 2010 deployment.

Resource Approximate Size or Count

Number of Users

200,000

Number of Groups

465,000 (includes both Distribution and Security groups)

Number of Distribution Groups

275,000

FIM Service Database

380,540 MB

Note

This is the database size, not the backed up file size.

FIM Synchronization Database

83,968 MB

Note

This is the database size, not the backed up file size.

Number of management agents

10

Planning for Disaster Recovery

Every organization will determine its own business requirements for recovery in a service level agreement (SLA). MSIT has a business requirement to recover FIM 2010 within 24 hours in the event of a disaster. To meet these requirements, the following steps are taken on a nightly basis:

  1. Full Backups are performed.

  2. Full Backups are copied to the disaster recovery site.

  3. Full Backups are restored on the disaster recovery site.

In addition to these steps, it may be necessary to perform one or more Full Synchronizations (depending on which data source is authoritative).

Recovering From Hardware Failures

Disaster recovery may not always involve a failure of the entire deployment. Hardware failures can affect individual servers in the current production environment. MSIT has a number of tools for dealing with individual hardware failures, such as server imaging tools. However, because MSIT currently participates in the internal testing of FIM 2010 and receives new updates on an on-going basis, if necessary, it will just reinstall specific FIM 2010 components. The following table lists the average time to reinstall a FIM 2010 component.

FIM Component Time to Re-install or update

FIM Portal and FIM Service

20 minutes

Note

Assuming that user and group accounts already exist.

FIM FIM Synchronization Service

20 minutes

Note

Assuming that user and group accounts already exist.

Backup, Copy and Restore

The following table describes the time it takes to copy and restore the MSIT FIM 2010 databases from the recovery site, and to perform a full synchronization of a typical management agent.

Operation Average time to perform

Copy from backup location

One hour

Time to restore

FIM Synchronization: 15 minutes

FIM Service: 30 minutes

Full synchronization of a management agent with a large number of affected resources (approximately 500,000 – 750,000).

20 hours