Recovering a Site

Published : September 1, 2004

A site can experience different problems. Some problems might be easy to repair, and others might be more serious, requiring a total recovery operation to regain the site’s functionality. This section describes the SMS site recovery operation. Before you decide to perform a site recovery operation, you must troubleshoot the site and determine whether a recovery operation is the appropriate remedy.

Important

Due to the complexity of the procedures involved in recovery, the accuracy that is required to carry out these procedures , and the critical importance of a successful recovery, the information that follows is intended for  experienced SMS administrators, Microsoft Consulting Services consultants, solution providers and, technical support engineers who have a deep technical knowledge of SMS and the environment in which it is used.

On This Page

Determining Whether a Site Recovery Operation Is Necessary
Supported Configurations and Recovery Scenarios
The Recovery Operation
Recovering a Secondary Site
Recovering a Site Configured With SQL Replication
Recovering a Site Installed on a Server Running Terminal Services

Determining Whether a Site Recovery Operation Is Necessary

An SMS site may be exhibiting various failure symptoms. The severity of the failure symptoms is not always a good indicator of the severity of the underlying problem. It is important to correctly diagnose the problems that the site is experiencing, and then to recover the site if appropriate. This section provides some guidelines that can help you determine whether a recovery operation is necessary.

SMS stores most of the site data in the registry, system files, and Microsoft SQL Server™ databases. For SMS to function properly, data integrity and synchronization among all data stores, is an absolute necessity. Therefore, even if only one data store is corrupted, a snapshot of all data stores must be restored to prevent data corruption. Restoring all of the site’s data must be performed as part of a recovery operation.

Based on the data integrity requirement of SMS, if your site is failing due to any of the following reasons, you must perform a recovery operation:

  • The computer that the site server or the SMS site database server is running on has a failing operating system.

  • The drive that the operating system, SQL Server, or SMS is installed on is failing.

  • The file system of the site server has become corrupted.

  • SQL Server is failing and it must be restored.

  • The SMS site database has become corrupted.

Supported Configurations and Recovery Scenarios

You can recover a site with a recent or an old backup snapshot, or even without a backup snapshot. The amount of data restored depends on whether there is a recent site backup snapshot and whether SMS can obtain data from other sites in the hierarchy.

When recovering a site without a recent backup snapshot, expect a significant data loss. Without a recent backup snapshot, the SMS site database is not restored, and all inventory, status messages, and object definition data is lost. Using a recent backup snapshot dramatically increases the amount of data recovered, although some data loss is still expected.

Restoring a site by using a recent backup snapshot is supported only if:

  • The backup snapshot is restored to the original site that was backed up.

  • The version and service pack of SMS are the same for both the original site and the restored site.

  • The version and service pack of SQL Server are the same for both the original site and the restored site.

  • The recovered site name is identical to the site’s original name.

  • The recovered site belongs to the same domain as the original site.

  • No site accounts were changed between backup and restore.

  • The operating system of the original site and the recovered site are identical.

If you plan to recover a site without restoring a backup snapshot, then SMS supports:

  • Recovering the site with the same SMS service pack as the failed site had, or with a later SMS service pack.

  • Recovering the site with the same version or service pack of SQL Server as the failed site had, or with a later SQL Server service pack .

  • Recovering the site with the same Microsoft Windows NT® service pack as the failed site had, or with a later Windows NT service pack.

SMS does not support the following recovery scenarios:

  • Restoring a backup snapshot that was created before an SMS upgrade, to the upgraded site.

  • Restoring a backup snapshot to a site server on which the operating system has been upgraded.

    Note

    If you need to recover a site on which the operating system has been upgraded since the last backup, then re-install the operating system that existed prior to the upgrade before restoring the backup snapshot,.

  • SMS site systems installed on Windows 2000 servers running Terminal Services.

  • SMS site systems installed on Windows 2000 servers running Terminal Services Client.

The Recovery Operation

To successfully recover a site, you must prepare for the recovery operation and perform post-recovery tasks. This section describes the phases of a complete recovery operation. You can use the RecoveryOperationSteps.xls spreadsheet to track the progress of the recovery operation.

For specific recovery information for any SMS feature packs or non-Microsoft add-ins that were installed on the failing site, see the respective product documentation.

In This Section:

  • Preparing for a recovery operation

  • Recovering a site

  • Managing the site after recovery

To prepare for a recovery operation:

  1. Notify other SMS administrators and SMS users about the site failure.

  2. Analyze the SMS data traffic issues during site recovery and reduce the traffic load as much as possible.

  3. Analyze security issues.

  4. If you have been regularly backing up the site, then ensure that:

    1. You can access the most recent backup snapshot.

    2. The log file of the most recent site backup operation indicates that the site was backed up successfully.

    3. If any integrity tests were performed to ensure the integrity of the site’s backup snapshot (such as DBCC test), log files indicate that these tests have passed successfully.

    4. The Backup SMS Site Server task. is not scheduled to run during the recovery operation. If it runs, it will interfere with the recovery operation.

  5. If a valid site backup snapshot is not available for the site recovery operation, determine the date that the site was originally installed or upgraded. Later, use that date as the site backup date when you are prompted by the SMS Site Repair Wizard.

  6. Obtain the most recent copy of the hierarchy configuration document.

  7. Ensure that the Recovery Expert Web site is set up and that you can connect to that site and run the Recovery Expert. For information about how to set up and run the Recovery Expert, see the “Setting Up a Recovery Expert Web Site and Running the Recovery Expert” section earlier in this document.

  8. Ensure that you can run the rest of the recovery and repair tools from the failing site, or from a remote server.

To recover a site

After performing all preparation steps, you are ready to start the actual site recovery operation as follows

  1. Designate reference sites that the SMS Site Repair Wizard can use.

  2. Run the Recovery Expert from the Recovery Expert Web site and print the site recovery task list.

  3. Recover the site by performing all the tasks prescribed by the Recovery Expert, in the order that they are listed. Use other recovery and repair tools as indicated by the recovery tasks.

  4. Verify that the site is successfully recovered by following the respective Recovery Expert tasks.

  5. Restore all custom files that were manually backed up, such as custom SMS Administrator console files (.msc files), custom MOF files (such as SMS_def.mof), and Supplemental Reports.

  6. If the software update management feature is used, then restore the Definitive Software Library to its original folder. For information about restoring software update management packages without the Definitive Software Library or without the related objects, see Managing the Site After Recovery.

  7. Investigate the cause of the site failure, and make any necessary adjustments to ensure that this failure will not repeat.

  8. Schedule recurring backups on the recovered site.

Managing the Site After Recovery

After you have successfully recovered the site, there might be issues with large amounts of data that accumulated while the site was offline. Also, there might be data that could not be recovered during the recovery operation that you might still be able to manually recover.

To manage the site after recovery:

After you have successfully recovered the site, there might be issues with large amounts of data that accumulated while the site was offline. Also, there might be data that could not be recovered during the recovery operation that you might still be able to manually recover.

  1. Estimate the Amount of Pending Data

  2. Mitigate Data Loss:

    • Recover collections with direct membership rules

    • Recover software update management packages

    • Estimate the Amount of Pending Data

Estimating the Amount of Pending Data

After the site is fully recovered, even if you attempted to reduce the amount of data that accumulates, the amount of data waiting to be processed might still be more than the site’s usual amount of data.

It might be helpful to get an estimate of the amount of pending data on the recovered site. On a parent or child site server, check the SMS\inboxes\schedule.box folder for files with a .job extension. These files represent pending jobs to send data of various types, including packages, site control changes, and status messages. A large number of files in this folder represent a large amount of pending data. Some of this data is contained in subfolders of the SMS\inboxes\schedule.box, and other data, such as package replication, transfers packages from the compressed package folder \SMSPKG, which might be much larger.

Mitigating Data Loss

Even in a successful recovery operation, some site data might be lost. When the site is fully functional after the recovery operation is completed, you can mitigate some data loss by updating the site configuration, regenerating inventory, and redistributing packages. Although you can mitigate most of the data loss after a site failure, it is better to invest efforts in providing valid, recent backups.

Recovering collections with direct membership rules

The Site Repair Wizard does not properly restore collections with direct membership rules that did not exist in the site backup snapshot. After the wizard restores those collections, they contain data that is not valid. After recovering the site, you can mitigate this by doing the following:

  1. Delete the collection that is not valid on the recovered site and all child sites to which the collection has propagated.

  2. Recreate the collection.

  3. If any programs were advertised to this collection, delete the advertisements to the collection that is not valid, and advertise again the programs to the new collection.

Recovering software update management packages

Software update management-related objects, such as package and advertisement objects, are restored during the site recovery operation. However, if the Definitive Software Library was not backed up, then these objects are useless and you must remove them. After removing those objects, you can use the Distribute Software Updates Wizard to recreate those packages.

If a backup of the Definitive Software Library exists, but the related objects are missing, you can restore the Definitive Software Library and then recreate those objects. Those objects might be missing if, for example, they cannot be recovered during a site recovery operation, or if they are accidentally deleted.

To recreate package objects for existing software updates management package source files

  1. If necessary, restore the Definitive Software Library to its original folder.

  2. Create a new software update management package.

  3. Import the XML file from the original package source folder.

  4. Specify the folder name of the package source files so it is identical to the original folder name.

Restoring distribution points that were recovered from a reference site

The SMS Site Repair Wizard can use reference sites to recover package definitions that were created after the last site backup. However, the SMS Site Repair Wizard does not recover the distribution points associated with those packages. If you select Update the distribution point on the site server on the Package Recovery page in the wizard, the wizard updates only the distribution points for packages that are recovered from the backup snapshot.

Therefore, after the SMS Site Repair Wizard completes the site repair phase, you must manually update, and then add, distribution points for each package that was recovered from a reference site.

Note

Although the package initially has no distribution points, you must update the package before adding distribution points to the package.

To restore distribution points that were recovered from a reference site

  1. In the SMS Administrator console, navigate to Site Database, Packages.

  2. Select a package.

  3. On the Action menu, point to All Tasks, and then click Update Distribution Points.

  4. From <package name>, navigate to Distribution Points.

  5. On the Action menu, point to New, and then click Distribution Points.

For recovered packages that were created at another site, you need to perform these steps at the site where the package was created.

Recovering a Secondary Site

You can recover a secondary site in the same manner that you recover a primary site, by using the SMS Site Repair Wizard. However, there are some differences between recovering a primary site and recovering a secondary site, as follows:

  • When recovering a secondary site, the wizard is not optimized to recover data from a parent reference site. Therefore, when using the wizard to recover a secondary site, do not select Recover data from parent site on the Parent Site Connection page.

  • If you made any changes to the site settings on the secondary site after the last site backup, perform the following steps before you recover a secondary site:

    Important

    If your recovery scenario requires you to reinstall the secondary site, then perform these steps before reinstalling the site.

    • To recover the site control file of the secondary site, on the parent of the secondary site, run the following command from the SMS language folder (such as \SMS\bin\i386\00000409), This command writes the secondary site's site control file (Sitectrl_SiteCode.ct0) to the root of the SMS drive on the primary site server:

      preinst /dump <secondary site code>
      
    • Replace the sitectrl.ct0 file in the backup snapshot, located in SiteCodeBackup\SiteServer\SMSServer\inboxes\sitectrl.box, with the site control file recovered from the parent site.

    • Rename the recovered site control file to sitectrl.ct0.

  • The wizard might incorrectly display the address to the parent site on the Parent Site Connection page. Even if the address is displayed incorrectly, the address information will be recovered from the backup snapshot.

  • The SMS Site Repair Wizard does not repair client access points of secondary sites. If the secondary site has a remote client access point, see the Recovery Expert task ”Delete Remote CAPs.”

Recovering a Site Configured With SQL Replication

When recovering a site that was configured with SQL replication, you must perform additional steps as follows.

Warning

The following procedure is complex, therefore, it is recommended that only experienced SQL Server users perform that procedure.

To recover a site configured with SQL replication:

  1. Use the Disabling Publishing and Distribution Wizard to stop SQL replication as follows (some steps do not apply if the database is corrupted):

    1. Delete the subscription.

    2. Delete the subscription database

    3. Delete the publication

  2. Recover the site.

  3. Re-configure SQL replication as follows:

  4. If it is enabled, disable the site database server as a publisher.

  5. Re-enable the site database server as a publisher.

  6. Recreate the publication, subscription, and subscription databases.

For more information about configuring SQL replication, see  Scenarios and Procedures for Microsoft Systems Management Server 2003: Planning and Deployment * *on Microsoft TechNet.

For more information, see the Microsoft SQL Server™ Help.

Recovering a Site Installed on a Server Running Terminal Services

  • If you recover an SMS site that is installed on a server running Terminal Services in application server mode, you must set the Terminal Server to install mode for the duration of the recovery process.

  • If you use the SMS Site Repair Wizard on a server running Terminal Services in application server mode, and you are recovering a remote site server, the wizard is unable to display the Help for the wizard. To view Help for the wizard, set Terminal Server to install mode before starting the wizard, as follows:

    1. Open a command prompt window.

    2. Type the following and then press ENTER:

      change user /install