Plan for disaster recovery (SharePoint Server 2010)
Published: May 12, 2010
This article describes key decisions in choosing disaster recovery strategies for a Microsoft SharePoint Server 2010 environment.
In this article:
Disaster recovery overview
For the purposes of this article, we define disaster recovery as the ability to recover from a situation in which a data center that hosts SharePoint Server becomes unavailable.
The disaster recovery strategy that you use for SharePoint Server must be coordinated with the disaster recovery strategy for the related infrastructure, including Active Directory domains, Exchange Server, and Microsoft SQL Server. Work with the administrators of the infrastructure that you rely on to design a coordinated disaster recovery strategy and plan.
The time and immediate effort to get another farm up and running in a different location is often referred to as a hot, warm, or cold standby. Our definitions for these terms are as follows:
Hot standby A second data center that can provide availability within seconds or minutes.
Warm standby A second data center that can provide availability within minutes or hours.
Cold standby A second data center that can provide availability within hours or days.
Disaster recovery can be one of the more expensive requirements for a system. The shorter the interval between failure and availability and the more systems you protect, the more complex and costly a disaster recovery solution is likely to be. When you invest in hot or warm standby data centers, costs include:
Additional hardware and software, which often increase the complexity of operations between software applications, such as custom scripts for failover and recovery.
Additional operational complexity.
The costs of maintaining hot or warm standby data centers should be evaluated based on your business needs. Not all solutions within an organization are likely to require the same level of availability after a disaster. You can offer different levels of disaster recovery for different content, services, or farms — for example, content that has high impact on your business, or search services, or an Internet publishing farm.
Disaster recovery is a key area in which information technology (IT) groups offer service level agreements (SLAs) to set expectations with customer groups. Many IT organizations offer a variety of SLAs that are associated with different chargeback levels.
When you implement failover between server farms, we recommend that you first deploy and tune the core solution within a farm, and then implement and test disaster recovery.
Choose a disaster recovery strategy
You can choose among many approaches to provide disaster recovery for a SharePoint Server environment, depending on your business needs. The following examples show why companies might choose cold, warm, or hot standby disaster recovery strategies.
Cold standby disaster recovery strategy: A business ships backups to support bare metal recovery to local and regional offsite storage on a regular basis, and has contracts in place for emergency server rentals in another region.
Often the cheapest option to maintain, operationally.
Often an expensive option to recover, because it requires that physical servers be configured correctly after a disaster has occurred.
Cons: The slowest option to recover.
Warm standby disaster recovery strategy: A business ships virtual server images to local and regional disaster recovery farms.
Pros: Often relatively inexpensive to recover, because a virtual server farm can require little configuration upon recovery.
Cons: Can be very expensive and time consuming to maintain.
Hot standby disaster recovery strategy: A business runs multiple data centers, but serves content and services through only one data center.
Pros: Often relatively fast to recover.
Cons: Can be quite expensive to configure and maintain.
No matter which disaster recovery solution you decide to implement for your environment, you are likely to incur some data loss.
Planning for cold standby data centers
In a cold standby disaster recovery scenario, you can recover by setting up a new farm in a new location, (preferably by using a scripted deployment), and restoring backups. Or, you can recover by restoring a farm from a backup solution such as Microsoft System Center Data Protection Manager 2007 that protects your data at the computer level and lets you restore each server individually. This article does not contain detailed instructions for how to create and recover in cold standby scenarios. For more information, see:
Planning for warm standby data centers
In a warm standby disaster recovery scenario, you can create a warm standby solution by making sure that you consistently and frequently create virtual images of the servers in your farm that you ship to a secondary location. At the secondary location, you must have an environment available in which you can easily configure and connect the images to re-create your farm environment.
This article does not contain detailed instructions for creating warm standby solutions. For more information about how to plan to deploy farms by using virtual solutions, see Create a virtualization plan (SharePoint Server 2010).
Planning for hot standby data centers
In a hot standby disaster recovery scenario, you can set up a failover farm to provide disaster recovery in a separate data center from the primary farm. An environment that has a separate failover farm has the following characteristics:
A separate configuration database and Central Administration content database must be maintained on the failover farm.
All customizations must be deployed on both farms.
We recommend that you use scripted deployment to create the primary and failover farm by using the same configuration settings and customizations. For more information, see Install SharePoint Server 2010 by using Windows PowerShell.
Updates must be applied to both farms, individually.
SharePoint Server content databases can be successfully asynchronously mirrored or log-shipped to the failover farm.
SQL Server mirroring can only be used to copy databases to a single mirror server, but you can log-ship to multiple secondary servers.
Service applications vary in whether they can be log-shipped to a farm. For more information, see Service application redundancy across data centers later in this article.
This topology can be repeated across many data centers, if you configure SQL Server log shipping to one or more additional data centers.
Consult with your SAN vendor to determine whether you can use SAN replication or another supported mechanism to provide availability across data centers.
The following illustration shows primary and failover farms before failover.
Primary and failover farms before failover
Service application redundancy across data centers
To provide availability across data centers for service applications, we recommend that for the services that can be run cross-farm, you run a separate services farm that can be accessed from both the primary and the secondary data centers.
For services that cannot be run cross-farm, and to provide availability for the services farm itself, the strategy for providing redundancy across data centers for a service application varies. The strategy employed depends on whether:
There is business value in running the service application in the disaster recovery farm when it is not in use.
The databases associated with the service application can be log-shipped or asynchronously mirrored.
The service application can run against read-only databases.
The following sections describe the disaster recovery strategies that we recommend for each service application. The service applications are grouped by strategy.
Databases that can be log-shipped or asynchronously mirrored
After a service application has been initially deployed on a secondary farm, the databases that support the following service applications can be asynchronously mirrored or log-shipped across farms:
Managed Metadata service application
Databases: Managed Metadata service
If tagging is in use, to successfully use the Managed Metadata service application in the disaster recovery farm, you must run the User Profile Replication Engine that is included in the SharePoint Administration Toolkit. For more information, see User Profile Replication Engine overview (SharePoint Server 2010).
Databases: PerformancePoint Service application
Project Server service application
Databases: Draft, Published, Archive, Reporting
Project Server 2010 requires synchronization between its databases. Project Server can be replicated between farms by using an asynchronous replication mechanism (asynchronous database mirroring, log shipping, or asynchronous SAN replication), but, for recovery, you must ensure that the Project database logs are synchronized as you restore.
Although we recommend that you log-ship or mirror the Project Server databases to the disaster recovery farm, the Project Server service application cannot run against read-only databases. Therefore, we recommend that you do not run the Project Server service application on the disaster recovery farm until after failover. To successfully synchronize the Project Server databases on the disaster recovery farm, you must configure either time stamps or log marking for the databases.
Secure Store service application
Databases: Secure Store
Usage and Health Data Collection service application
It is possible to log-ship or mirror the Logging database. However, we recommend that you do not run the Usage and Health Data Collection service on the disaster recovery farm, and that you do not mirror nor log-ship the Logging database.
Web Analytics service application
Databases: Staging, Reporting
We recommend that you log-ship or mirror the Web Analytics Staging and Reporting databases. However, we recommend that you not run the Web Analytics service application on the disaster recovery farm until after failover.
Service applications and databases that cannot be log-shipped or asynchronously mirrored
The following service applications must be deployed on both the primary and failover farms, and cannot be log-shipped or asynchronously mirrored. For most of these service applications, we recommend that you deploy them and then verify that the failover farm has the same configuration settings as the primary farm. If configuration changes that affect the service are made on the primary farm, you must update the failover farm.
Application Registry service application
Databases: Application Registry service
Log-shipping the Application Registry service database is not supported.
Business Data Connectivity service application
Databases: Business Data Connectivity
User Profile service application
Databases: Profile, Synchronization, Social Tagging
The Profile, Synchronization, and Social Tagging databases cannot be log-shipped.
To provide redundancy for the User Profile service application, you must first deploy the service application in both the primary and secondary data centers.
To set up the Profile and Synchronization databases, we recommend that you recover a backup of the databases to the secondary data center and attach them to the User Profile service application in that data center.
To keep the profiles synchronized, you must run the User Profile Replication Engine that is included in the SharePoint Administration Toolkit after profile data has been updated on the primary farm. For more information, see User Profile Replication Engine overview (SharePoint Server 2010).
Microsoft SharePoint Foundation Subscription Settings service application
Log-shipping the Subscription Settings database is not supported.
Databases: Crawl, Property, Search Administration
Search requires complete synchronization between its databases and index. Because of this requirement, search cannot be replicated between farms by using an asynchronous replication mechanism (asynchronous database mirroring, log shipping, or asynchronous SAN replication).
To provide up-to-date search on a failover farm, you must run search on the secondary farm.
The Search service application on the failover farm must be set to actively crawl the secondary farm. On failover, you must configure the Web application association to use the failover Search service application.
Log-shipping the State database is not supported.
Word Automation Services
Databases: Word Automation Services
Log-shipping the Word Automation Services database is not supported.
System requirements for disaster recovery
In an ideal scenario, the failover components and systems match the primary components and systems in all ways: platform, hardware, and number of servers. At a minimum, the failover environment must be able to handle the traffic that you expect during a failover. Keep in mind that only a subset of users may be served by the failover site. The systems must match in at least the following:
Operating system version and all updates
SQL Server versions and all updates
SharePoint 2010 Products versions and all updates
Although this article primarily discusses the availability of SharePoint 2010 Products, the system uptime will also be affected by the other components in the system. In particular, make sure that you do the following:
Ensure that infrastructure dependencies such as power, cooling, network, directory, and SMTP are fully redundant.
Choose a switching mechanism, whether DNS or hardware load balancing, that meets your needs.