Appendix F: Recovery and Repair Tools
To assist you in a successful site recovery operation, Microsoft® Systems Management Server (SMS) provides various recovery and repair tools, the main one being the Recovery Expert. These tools are automatically installed with SMS, with the exception of the Recovery Expert, which administrators must set up before they can use it.
During a site recovery operation, recovery and repair tools greatly simplify some recovery tasks, reduce the risk associated with editing low-level data, and perform tasks that are impossible to perform by using any other method.
Failing to use recovery tools appropriately can significantly interrupt site operations, or cause unrecoverable loss of data.
The recovery and repair tools set include the following:
Recovery Expert Guides you through the recovery process by generating a recovery task list based on the site’s specific failure scenario and site configuration.
SMS Site Repair Wizard Automates some of the Recovery Expert’s tasks, and helps recover some of the data that was not backed up. Using the SMS Site Repair Wizard eliminates user errors that might occur when performing complex tasks.
ACL Reset (ACLreset.exe) Resets access control lists (ACLs) used by the SMS Server Connection account and remote site systems to access the site server.
Hierarchy Maintenance tool (PreInst) Passes commands, such as site repair or site diagnostics commands, to the SMS Hierarchy Manager while the SMS Hierarchy Manager is running.
Unenforce Software Metering tool (Unenforce.exe) Overrides software metering enforcement rules. This utility is needed only when recovering an SMS 2.0 site, and it is included on the SMS 2003 CD for compatibility reasons. For more information about the Unenforce Software Metering tool, see SMS 2003 Help.
On This Page
The Recovery Expert is a Web-based recovery tool that guides you through a site recovery operation. When you run the Recovery Expert, it scrolls through a series of Web pages with questions about the site failure scenario and the site configuration. The Recovery Expert then evaluates your answers and presents a recovery task list. Perform these tasks, in the order that they are prescribed, to recover the failed site.
Recovery tasks vary from site to site and from one failure scenario to another, but most recovery scenarios consist of the following phases:
Rebuilding the failed servers.
Restoring the site data.
Repairing and re-synchronizing data. These are the core tasks of a site recovery, and they are required to prevent interruption of operations and corruption of data.
Verifying the success of the recovery by testing the functionality of the recovered site.
Each recovery task belongs to one of these phases. A recovery tasks list produced by the Recovery Expert typically contains tasks from all phases.
Setting up, Using, and Running the Recovery Expert
To use the Recovery Expert, you must first set up the Recovery Expert Web Site, which hosts the Recovery Expert.
To set up and run the Recovery Expert
Allocate a server to set up the Recovery Expert Web Site.
Set up a Recovery Expert Web Site.
Run the Recovery Expert.
SMS Site Repair Wizard
The SMS Site Repair Wizard automates complicated recovery tasks and tasks that would be impossible to perform otherwise. Using this tool simplifies site recovery, increases the amount of data recovered, saves time, and reduces the risks associated with recovery.
The SMS Site Repair Wizard is used in conjunction with the Recovery Expert during a site recovery operation. Using the SMS Site Repair Wizard during site recovery is strongly recommended. Each recovery task in the Recovery Expert indicates whether it can be automated by using the SMS Site Repair Wizard.
Running the SMS Site Repair Wizard independently is not recommended. Always run the Recovery Expert first, and then run the SMS Site Repair Wizard as directed by the Recovery Expert.
Depending on the site backup schedule and the activity at the site, the latest site backup snapshot might not include the most recent modifications to the site. Any changes made after the most recent site backup are not included in the site’s backup snapshot. As a result, after restoring the site backup snapshot, the site can be out of synchronization with the rest of the hierarchy. For example, the site’s backup snapshot might contain information about a child site that has since moved to a different parent site.
After restoring the site backup snapshot, the SMS Site Repair Wizard attempts to restore as much as possible of the data that was not backed up. The wizard can restore objects such as collections based on query rules, packages, programs and advertisements, but cannot restore data such as software metering rules, reports, and custom queries. The SMS Site Repair Wizard restores data by restoring site settings and synchronizing site objects with parent and child sites.
The SMS Site Repair Wizard can use reference sites to recover data, and if possible, it uses the site’s site control file to recover site configuration data. Because each parent site contains a copy of site control files of all its lower level sites, the wizard obtains a copy of the failing site’s site control file from its parent site. The wizard then uses the configuration information from the file to reconfigure the site exactly as it was configured before it failed.
The SMS Site Repair Wizard can use reference sites to recover package definitions that were created after the last site backup. However, the SMS Site Repair Wizard does not recover the distribution points associated with those packages. If you select Update the distribution point on the site server on the Package Recovery page in the wizard, it updates only the distribution points for packages which are recovered from the backup snapshot. To mitigate that data loss, see Restore Distribution Points Which Were Recovered from a Reference Site.
Restoring Site Settings
When running the SMS Site Repair Wizard, the user is prompted to enter any changes to site settings that occurred after the most recent site backup. The wizard then restores site settings to what they were before the failure, according to the user input. For example, the administrator can specify that a child site no longer reports to the recovering site. The SMS Site Repair Wizard then deletes all objects associated with that child site from the recovering site.
To restore site settings, the wizard also uses the parent site, if one exists. The wizard obtains the most recent copy of the recovering site’s site control file. It then uses this file to configure the recovering site.
The SMS Site Repair Wizard synchronizes objects between the recovering site and other sites in the hierarchy, as follows:
The wizard restores control to objects, such as collections based on query rules, packages, programs, and advertisements, that were created on the failing site after the latest site backup was completed, but before the site failed. After restoring the site’s backup snapshot, the recovering site does not contain those objects because they are missing from the site’s backup snapshot.
Objects are regularly replicated from one site to other sites in the SMS hierarchy. This allows the wizard to use designated reference sites to replicate these objects from other sites to a recovering site. After these objects are restored to the recovering site, the recovering site has full control over these objects and they are synchronized between lower sites in the hierarchy and the recovering site.
The wizard deletes objects at the recovering site that were inherited from upper level sites, but were then deleted at the originating site. Objects that were created at upper level sites might have been deleted, while the site’s most recent backup snapshot still contains them. After restoring the site’s backup snapshot, the recovering site contains these objects. The wizard checks all inherited objects that exist on the recovering site. It then checks if these objects exist at the parent site. Inherited objects that exist on the recovering site, but no longer exist on the parent site, are deleted.
During regular site management you add, modify and delete configuration data, such as software distribution objects. Any changes to these objects, such as changes to software distribution related object definitions, are replicated down the hierarchy. Thus, a record of these changes exists on lower level sites. Any of those lower level sites that is also a primary site, can be a reference site during a recovery operation of the originating site. A reference site helps recover object definitions, such as software distribution-related object definitions.
In the above diagram, Site A and Site C can be reference sites when recovering the central site. Site C can be a reference site to help recover Site A or the central site. No site can be a reference site when recovering Site B or Site D. Site B and Site D, which are secondary sites, cannot be reference sites.
Reference Sites Role During Recovery
On large, busy sites, where changes to the site data are constant, there can be many changes between the time of the last site backup and the time the site fails. When recovering such sites, it might be impossible to repeat the configuration changes that were not backed up.
If such a site fails, all the objects, such as software distribution-related objects, that you created since the last backup (in addition to a few possible last minute changes that did not have a chance to replicate before the site failed), exist on child sites. However, you can no longer manage these objects — you cannot delete or modify them, and they are considered orphaned.
In this case, a recovery operation can be simplified if the SMS Site Repair Wizard can use designated reference sites to regain control of these SMS objects. During a recovery operation, you can designate any child primary site under the failed site as a reference site. Recovery tools use the data at the reference site to clone objects (including serial numbers and object IDs). The recovering site regains control of these objects after they are stored in the SMS site database.
The SMS Site Repair Wizard is not optimized to recover a large number of collections from reference sites, and therefore this operation can take a significant amount of time. For example, when recovering sites, that have more than 50 such collections, it can take up to seven minutes to recover each collection.
Planning for Reference Sites
Any primary site that is lower in the hierarchy than a failed site can be a reference site. If all important sites, including the central site, in your hierarchy have child primary sites, then there is no need for any additional planning. Lower-tier primary sites are especially useful, because they can serve as reference sites when recovering any site above them.
If your hierarchy plan might include important sites to which only secondary child sites are connected, these important sites will not have a reference site that recovery tools can use. In this case, especially if the site is the central site, it is recommended that you set up an additional child primary site to serve as a dedicated reference site. Alternatively, you can designate sites at the lowest tier of the hierarchy exclusively as reference sites. Do not use these sites for management purposes; use them only as a repository for replicated data from higher level sites.
Setting up a site to serve strictly as a reference site can be relatively inexpensive because it is not necessary that a dedicated reference site manage any clients, or run any SMS features.
Designating Reference Sites
When planning for reference sites during the hierarchy planning phase, and when designating reference sites during a recovery operation, follow these guidelines:
A reference site must be a child primary site.
Plan to have a reference site at each tier of the hierarchy.
The number of objects that can be replicated down the hierarchy before a site failure depends on network speed and timing. A reliable, high quality network connection between the recovering site and its reference site ensures that:
Definitions are replicated quickly. If a site fails, chances are higher that objects created immediately before failure are replicated to lower level sites.
During a recovery operation, the wizard can quickly obtain object definitions from the reference site.
Multiple reference sites may increase the amount of objects recovered. Designate reference sites as follows:
One to two reference sites are sufficient with a high quality, reliable connection.
Three to five reference sites are needed if the connections are not of the highest quality.
Designate reference sites from different tiers in the SMS hierarchy.
It is helpful to have a reference site at a close physical location to the recovering site..
Running the SMS Site Repair Wizard
Before running the SMS Site Repair wizard, you must ensure that there are no open Administrator console windows. Ensure that the current user has at least Read permission to objects such as collections, packages and programs on the designated references sites, and on the parent site, and administrative credentials on the recovering site.
When you run the Recovery Expert, it prompts you whether you intend to use the SMS Site Repair Wizard. If you chose to use the wizard, then the Recovery Expert produces the recovery task list, with the following differences:
All tasks that can be automated by the SMS Site Repair Wizard are unavailable.
The task list contains the Run the SMS Site Repair Wizard task.
As you start to perform the recovery tasks in the order prescribed by the Recovery Expert, do not perform the tasks that are unavailable. When you reach and run the Run the SMS Site Repair Wizard task, the wizard completes all the tasks that are unavailable. When the wizard finishes, continue to perform the remaining tasks in the list.
All tasks that can be automated by using the wizard are treated as a set. When the wizard runs, it performs all the tasks in that set.
The SMS Site Repair Wizard operates in two stages. During the first stage, the wizard restores the site backup snapshot to the recovering site. During the second stage, the wizard determines what modifications were not included in the site backup snapshot, and attempts to reapply as many of these modifications as possible.
The wizard logs its activity to C:\SMS\Logs\sms_srw.log.
Depending on the size of the database, it might take a considerable about of time for the wizard to restore it. As soon as the wizard submits the database restore SQL command, it logs a message stating that the database restore operation has started. If the wizard seems inactive, check the log file. The wizard might be busy restoring a large database.
ACL Reset Tool
ACL Reset is a command-line tool that resets the access control lists used by the SMS Server Connection account and by remote site systems to access the site server. ACL Reset does not reset access permissions to non-SMS objects. You can find the ACL Rest tool (ACLreset.exe) in the SMS\bin\1386\<language> folder.
ACL Reset is a repair tool and an important recovery tool. During a recovery operation, Recovery Expert tasks direct you to use this tool primarily to perform the following tasks:
Create a new SMS Server Connection account, even if it is recreated with the same name. This ensures that SMS processes that rely on the SMS Server Connection account have the correct permissions to objects on the site server.
Restore the SMS or NAL registry key, or any subkeys under them.
Restore the SMS folder tree, or any files or subdirectories under it.
You also must use ACL Reset when performing operations such as:
Changing the SMS Server Connection account
Resetting the SMS Server Connection account
For more information about using ACL Reset and ACL Reset syntax, see SMS Help.
The ACL Reset tool, which is included on the SMS 2003 product CD, is designed to be used only for SMS site backup and recovery tasks.
If you need to use the ACL Reset tool for tasks which are unrelated to backup or recovery, then you must use the ACL Reset tool from the SMS 2003 Toolkit 1. For more information about the ACL Reset tool, see Microsoft Knowledge Base article 829889 at the Microsoft Knowledge Base Web site.
Hierarchy Maintenance Tool
The Hierarchy Maintenance tool passes commands to the site’s Hierarchy Manager while the Hierarchy Manager is running. You can use the Hierarchy Maintenance tool to diagnose problems in a site, to repair sites, to dump site control images, to distribute public keys, or to stop all SMS services at a site. You can find the Hierarchy Maintenance tool (PreInst.exe) in the SMS\bin\1386\<language> folder.
To run the Hierarchy Maintenance tool, the logged on user must have administrative privileges on the computer itself. Also, the logged on user must explicitly have the Site - Administer security right, it is not sufficient that this right is inherited by the logged on user being a member of a group that has that permission.
The Hierarchy Maintenance tool is both a repair tool and an important recovery tool. During a recovery operation, Recovery Expert directs you to use this tool primarily to perform the following tasks:
Update package definitions at the recovering site with updates from the originating site, if the recovering site has packages inherited from upper level sites. After updating the recovering site, these changes propagate to lower level sites.
Restore any changes to feature configuration that were not backed up. Feature configurations are stored at the site control file, which is then forwarded to the parent site. You can use the Hierarchy Maintenance tool to dump the site control file from the parent site to the recovering site, so that the recovering site to be reconfigured the way it was.
Restore objects that were created after backup and that were lost when the site failed.
Dump site control files.
Securely exchange public keys between the recovering site, its parent and its child sites.
For more information about using Hierarchy Maintenance tool, see SMS Help.