Published : September 1, 2004
While recovering the data of a failed site, data in other parts of the hierarchy continues to flow. To simplify the recovery operation, and post-recovery data mitigation, it is important to reduce the amount of data which continues to flow into to failing site. The following sections describe Microsoft® Systems Management Server (SMS) data, and how to reduce the amount of data generated by SMS during a recovery operation.
On This Page
SMS Data Types
SMS Data Traffic Types
Reducing the Traffic Load During a Site Recovery Operation
SMS Data Types
When describing SMS related backup and recovery operations, it is important to know that SMS generates and uses two different types of data:
Configuration data
This data includes information about how the site is configured. It includes details such as site boundaries, feature configuration, and object definitions. All configuration data is critical when recovering a site. If you do not configure the site that you are recovering exactly the way it was configured before it failed, then the recovery operation might fail and the site might not function properly. Object definitions, such as software distribution objects, replicate down the hierarchy. The rest of the configuration data replicates up the hierarchy.
Administrative data
This data includes software and hardware inventory data from clients and status messages generated by clients and site systems. This data can be regenerated easily. This data is important, but not critical when recovering a site. The recovered site can function properly without having all administrative data. Administrative data replicates up the hierarchy.
SMS Data Traffic Types
While the failing site cannot maintain its regular communication with the rest of the hierarchy, the site clients and other sites in the hierarchy continue to generate data such as discovery records, inventory data, and status messages. While the failing site is disconnected, backlogs of pending data are queued on various client and server systems waiting for the next opportunity to transmit. The size of the data backlog depends on many factors, but the critical factor that affects all types of traffic is how long the site is out of service.
As soon as the site server is reconnected to the rest of the hierarchy, the backlog of data is processed and, if appropriate, forwarded up the hierarchy. If the site server was offline for a long time, a large amount of data might be suddenly overflowing the network, when it all tries to reach the site server. The site server might not have the capacity to handle such an extreme amount of data. That is why it is important to minimize the backlog as much as possible and to complete the recovery operation as quickly as possible. But it is important to perform it carefully and precisely to ensure a successful operation.
The data traffic that you need to be aware of is intrasite data traffic (traffic within the site, between clients and site systems, and between site systems and the site server), and traffic between sites. Intrasite traffic consists primarily of inventory data, discovery records and client status messages. Traffic between sites includes object definitions such as package and collection definitions, site status messages, and site configuration changes.
Intrasite data traffic
Clients continue to generate hardware and software inventory data, discovery data records and status messages. This data is usually forwarded to CAPs or management points, and then to the site server. Even while the site server is offline, clients forward this data to CAPs and management points. When the site server regains functionality, the data is forwarded from the CAPs and management points to the site server to be processed. This can flood the site systems with objects that must be processed all at once.
Intra-site data traffic can originate from the site’s clients, from the site’s CAPs or management points, or from child sites. The traffic load depends on the number of clients at the failing site and at lower sites and on other factors as follows:
Site-to-site traffic
Sites will accumulate large amounts of data and transactions that need to be forwarded up and down the hierarchy, and then transmit when the inbound share becomes available to receive files. This traffic includes some client traffic such as DDRs that are queued for transmission up the hierarchy.
The amount of site-to-site traffic depends on the number of sites below the failing site, the number of clients in those sites, and how long the failing site is offline. It also depends on the following:
Reducing the Traffic Load During a Site Recovery Operation
Much of the data generated while the failing site is offline will no longer be relevant after recovering the failing site. For example, status messages reporting the inability to connect to the offline site will be irrelevant after recovering the site. Other data such as hardware and software inventory will be gathered again after the recovery operation is completed. It is recommended that you minimize the data accumulated while recovering the failing site by doing any of the followings:
-
Configure the status system:
-
At the failing site’s child sites — configure the status system not to replicate status messages to the parent site.
-
Configure status message rules at lower level sites to discard as many status messages as possible.
For more information about status filter rules, see Chapter 14, “Using the SMS Status System,” in the Microsoft Systems Management Server 2003 Operations Guide.
-
Adjust the software and hardware inventory interval at lower level sites, so that inventory is not collected until the recovery operation is completed. Depending on the inventory schedule and on how long the recovery operation takes, this might help reduce the traffic. You need to consider the current inventory interval and the time it takes for the schedule change to take effect.
-
Use the Hierarchy Maintenance tool (PreInst.exe) to remove pending jobs entirely.