Overview of Site Maintenance, Backup, and Recovery
To ensure continued support for the clients, you should have an effective maintenance, backup and recovery plan for the sites in the hierarchy. Incorporate planning and implementation of maintenance, backup, and recovery into the overall hierarchy deployment phase as illustrated in Figure 1.1.
On This Page
After you have completed the SMS hierarchy deployment phase, SMS 2003 sites require regular maintenance to provide services effectively and continuously. Regular maintenance ensures that the hardware, software, and the SMS database in your sites function properly and efficiently. When the site performance is optimal, the risk of site failure is greatly reduced.
There are various maintenance and monitoring resources. SMS provides several predefined site maintenance tasks that you can use to regularly maintain the SMS site database to ensure that it stays healthy. An important predefined maintenance task is the Backup SMS Site Server task which automates site backup. You can also develop maintenance tasks customized to your organization.
During the hierarchy deployment planning phase, you should develop a maintenance and monitoring plan for each site in the hierarchy. After installing and setting up your SMS sites and hierarchy, you should start implementing that plan.
For detailed information about site maintenance, see Maintaining and Monitoring Sites later in this document.
Site Failure and Site Recovery
Even with the best maintenance plan and practices, sites can fail. A failure in an SMS site can happen for various reasons, such as hardware failure, operating system failure, or data corruption. When a failure occurs at a site, then that site can no longer provide some or all of the functionality it usually provides. The site also loses some or all of its data. Appendix A: The Effect of a Site Failure provides detailed information about how clients are affected by failure.
It is possible to repair some problems that cause failure in a site without reinstalling the site, but when the site cannot be repaired, you must reinstall the site using the site code of the failed site.
SMS has a hierarchical structure, and it requires that connected sites are synchronized at all times to be able to properly communicate and to manage their clients. When a site fails, it affects other sites in the hierarchy. You cannot only run SMS Setup to recover a failing site, because the setup program can not restore or synchronize data.
When reinstalling a site with a site code that was previously used in the hierarchy, you must perform some additional preparation steps before running setup, and you must repair and synchronize the data after running setup. Otherwise, such an operation will most likely result in data corruption throughout the hierarchy that is nearly impossible to repair. You must perform additional steps before and after you reinstall that site to ensure that the new site reconnects to parent and child sites without corrupting any data at the new site, or at any other site in the hierarchy. This whole operation is referred to as site recovery.
Recovering a failed site includes restoration of the site’s functionality, and then recovering and re-synchronizing as much data as possible. To assist in a recovery operation, SMS provides Recovery and Repair tools, which use reference sites to recover data.
For information about recovering a site, see Recovering a Site later in this document.
Backing Up Sites
When an SMS site fails, it is important that you are able to quickly recover that site with as little data loss as possible.
Backing up sites in your hierarchy is the most important step to ensure a minimum data loss, and a successful recovery in case of a site failure. Although it is possible to recover sites without a backup snapshot, recovering a site with a backup snapshot ensures the least data loss and a less complex recovery process.
Having a recent site backup snapshot does the following:
Simplifies a recovery operation
Shortens the non-operational time of the site
Helps reduce the amount of data lost
SMS stores most of the site data in the registry, in system files, and in Microsoft SQL Server™ databases. For SMS to function properly, data integrity and synchronization, among all data stores, is an absolute necessity. Therefore, when backing up a site, it is necessary to back up all those data stores as a snapshot.
Backing up only one of these data stores (such as the SMS site database) is not sufficient as a backup strategy. You cannot recover a site using a partial site backup because the site’s data will be out of synch. Also, you cannot use the System Restore feature in the Microsoft Windows Server™ 2003 family to recover a site because it does not restore all necessary data.
Backup SMS Site Server task To ensure that backing up a site is as easy as possible, SMS provides the Backup SMS Site Server task, referred to as the SMS backup task. This is a predefined maintenance task, and you can enable and configure the SMS backup task from the SMS Administrator console.
Backup Snapshot The Backup SMS Site Server task creates a backup snapshot, which is a snapshot of all of the site data from the site server, the SMS site database server, and the site provider.
Backup Destination The site’s backup snapshot is stored at a location that you specify, referred to as the backup destination.
For detailed information about site backup, see Backing up a Site later in this document.
Archiving the Site Backup Snapshot
The first time the SMS backup task runs, it produces a backup snapshot, which you can use to recover your system in the event of a failure. When the backup task runs again during subsequent cycles, it creates a new backup snapshot that overwrites the previous snapshot. As a result, the site has only a single backup snapshot. This can be risky because an earlier backup snapshot won’t be available if you need it.
It is therefore recommended to back up the backup snapshot. This is referred to as an archive. As a best practice, it is recommended to have multiple archives of the backup snapshots for the following reasons:
It is common for media to fail, get misplaced, or have only a partial backup stored on it. Recovering a failed site from an older backup is better than recovering with no backup at all.
A corruption in the site can go undetected for several backup cycles. The SMS administrator must be able to go back several cycles and use the backup snapshot from before the site became corrupted.
The site might have no backup snapshot at all if, for example, the Backup SMS Site Server task fails. Because the backup task removes the previous backup snapshot before it starts to back up the current data, there will not be a valid backup snapshot.
For more information about archiving the backup snapshot, see Develop a Backup and Archive Strategy later in this document.
Why Plan for Site Maintenance, Backup, Archive, and Recovery?
To avoid the loss of critical data, and to ensure the least impact on the site, you must plan and prepare for both backup and recovery operations. In order to back up sites in the most effective and efficient manner, after they are deployed, it is essential that you plan for backup and recovery during the site hierarchy planning phase.
If your site fails and you do not have a backup and recovery plan, the impact of a site failure is greater. You increase the time that the site server is not functioning, reducing the level and quality of services to clients.
If you do not have a backup and recovery plan, then:
The risk of site failure is higher.
The impact of failure on your site is greater.
A recovery process is more complex
A recovery process takes longer
More data is lost during a recovery process
For detailed information about planning for backup, archive and recovery, see the Planning for Site Maintenance, Backup, Archive, and Recovery later in this document.