Creating a Backup and Recovery Plan
Updated: March 28, 2003
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
The backup and recovery plan establishes guidelines and procedures to prevent problems that might cause data loss or interruptions to your organization’s operations, and to allow recovery as quickly as possible if such events do occur.
Consider planning downtime or outages for the pilot in order to test rollback procedures, and, if applicable, disaster recovery and business continuity plans.
Creating a Backup Plan
The importance of a backup plan cannot be overstated. When you begin rolling out Windows Server 2003 in the business environment, problems might arise that even the most thorough testing could not reveal. By making regular and reliable backups, you ensure that the team can restore the system to its original state if your pilot rollout process changes or fails.
The backup plan should define procedures for:
Backing up baseline configurations (the state of a computer before it is upgraded) so a computer can quickly be restored to its prior state.
Backing up servers before they are upgraded.
Backing up the most recent system and user data before you begin switching systems.
Testing the restore process for each of the above processes using the backup files.
In addition, the backup plan should identify who is responsible for performing backups, and should include the schedule for all periodic backups and periodic testing of backups, as well as instructions for labeling and storing all backup files.
For more information about creating backup plans, see the Storage Technologies Collection of the Windows Server 2003 Technical Reference (or see the Storage Technologies Collection on the Web at http://www.microsoft.com/reskit).
Creating a Recovery Plan
The recovery plan describes the recovery and rollback process, which allows you to return your production system to whatever earlier state you require. Depending on the severity of a problem encountered in the pilot, you might need to return your production system to a baseline configuration or just roll it back to the state it was in at a particular point in time.
Include the following elements in the recovery plan:
A list of scenarios Analyze all of the systems involved in the rollout and identify the situations, or scenarios, under which problems are likely to occur. Determine which systems might be affected and the functional dependencies among them so you have a clear understanding of the larger impact that a single failure might have. Use these scenarios to create strategies that identify when and how to run backups and the types of recovery for which you need to plan.
A definition of acceptable downtime Define how much downtime your organization can accommodate. If your organization cannot afford for systems to go down during normal business hours, you might plan to roll out the pilot, or parts of it, at night or over a weekend. If systems must be operational at all times, you might plan to deploy servers and desktops on new computers and then quickly replace the old ones, instead of upgrading computers.
A list of critical systems and processes In the event that a failure does occur, you need to know which systems are the most critical and must be brought back online first. If resources such as bandwidth are limited, you need to know which systems have the highest priority and which should not take up network traffic. When evaluating how critical a system or process is, consider factors such as its effect on human health and safety, the legal liability it exposes, the risk to corporate confidentiality, and the cost of replacement.
A recovery strategy Your recovery strategy should define how you will recover data or systems in each of the scenarios you define in the recovery plan. This might include restoring data from backup tapes, switching over to redundant systems, rolling back to previous configurations, or other strategies. Include an additional procedure for recovering from severe data corruption in your directory service if that becomes necessary. By having your recovery strategy in place, you can quickly restore your production environment to the required state so that work can continue with minimal interruption.
A rollback strategy The rollback strategy defines how you plan to use backup and recovery procedures to return your pilot or production environment to the state it was in before changes were made. Specify the criteria that a problem should meet to warrant rolling the environment back to its previous state. For example, you might establish a system for classifying the severity of problems and describe which type of response is warranted by certain levels of severity. Also decide whether you need to have different rollback strategies for different types of problems. For example, you might develop one procedure for backing out the entire pilot if the problem is pervasive and another procedure for backing out specific components if the problem is isolated.
The roles and responsibilities for team members Make sure that every task in the plan is assigned to an appropriate team member, and that that person has the information needed to successfully perform required tasks. Consider including training in the plan.
Have the backup and recovery plan reviewed by the project team and by those responsible for potentially affected systems. After the plan has been approved, test it to ensure that the processes you put in place work as expected.