Updated: September 1, 2003
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
This section discusses the tools that the administrators at the Microsoft.com Web site use to monitor the condition and configuration of their clusters and cluster hosts. Microsoft Operations Manager (MOM) is installed on each server running IIS and is used to monitor event logs and run scripts remotely. In addition to MOM, Microsoft.com uses a variety of other tools for monitoring and reporting including ClusterSentinel, HTTP Monitoring Tool, Data Center Warehouse, and RMS. Microsoft developed these four monitoring tools for use within the company. However, other commercially available products, such as Microsoft Application Center, provide similar functionality.
ClusterSentinel is a tool used to monitor and control the Network Load Balancing cluster hosts. Microsoft.com administrators use ClusterSentinel to:
Determine and report the state of the Network Load Balancing cluster hosts, such as converged or suspended.
Manually control cluster hosts by using Network Load Balancing commands, such as Stop, Start, Drain and Drainstop.
Monitor and report on application health. If an application is determined to be having problems on a particular host, that host will automatically be removed from the cluster.
ClusterSentinel traffic is handled by the dedicated IP address on the FE network adapter on the cluster hosts. For more information about the network adapters configured in this cluster, see Network Load Balancing Host Diagram earlier in this document.
Network Load Balancing Manager, a new feature in the Windows Server 2003 family of products, can be used for the first two of the tasks performed by ClusterSentinel. Network Load Balancing Manager does not, however, have the ability to monitor application condition.
HTTP Monitoring Tool
HTTP Monitoring Tool (HTTPMon) is a Resource Kit utility that is included in Microsoft Windows NT 4.0 and in the Microsoft Windows 2000 Resource Kit. HTTPMon is a multithreaded tool that monitors HTTP activity on your servers and can notify you if there are changes in the amount of activity. You can also configure it to automatically remove a host from a Network Load Balancing cluster depending on the availability of specific applications, such as IIS, or the availability of a particular Web site.
HTTPMon runs through a series of remote application level tests and collects information from each server every 30 seconds, looking for failures in the application layer, which are more common than system failures, but much harder to detect. Application-layer problems can range from IIS being overloaded by requests, to an application not responding and requiring a server reboot. HTTPMon looks for errors consistent with RFC 1945. For example, a status code of 200 indicates successful performance, while a status code of 500 indicates a server error. The results of the tests are stored in a SQL database.
If HTTPMon determines that a server is having problems at the application layer (for example, with IIS), HTTPMon directs Network Load Balancing to remove the host from the cluster. One minute later, HTTPMon checks again to see if the server has recovered. If it has, the server is returned to service. If the server still does not respond, a technician is alerted to attend to the computer, determine the nature of the problem so it can be corrected, and bring it back online.
Data Center Warehouse
The Data Center Warehouse (DCW) is a tool that compiles all of the information from ClusterSentinel, HTTPMon and a variety of performance monitors, such as ASP requests per second, and presents the data on a single Web page. It displays a global view of each server and cluster. Administrators can then review the information collected in its entirety or they can limit their view to only certain error conditions, such as servers with failing applications or well-functioning servers that are not active in the cluster.
Figure 1 shows the type of information available in DCW.
RMS is another application used to monitor software configuration. RMS compares the configuration of the cluster and each cluster host to a predefined, standard configuration. The standard configuration includes information such as registry settings, program versions, IIS metadata, operating systems versions, and service packs that are installed. If differences are found between the actual cluster or host and the predefined configuration, Microsoft Operations Manager (MOM) is used to push that information to a SQL database.
By retaining and monitoring the data collected by all of these tools, Microsoft.com can quickly address current problems and also use the historical availability reports to monitor and track trends that might indicate future problems.