Chapter 7 - Monitoring

Article
08/31/2009

This chapter covers monitoring, the last major feature area of Microsoft Application Center 2000 (Application Center). Because monitoring is such a broad topic, and is a feature that meets the diverse needs of users with differing goals and objectives, this chapter focuses on the elements that make up Application Center monitoring. In addition to showing you how Application Center interacts with operating system tools such as the Microsoft Windows 2000 Event Viewer, this chapter provides detailed information about the unique, single-console view of a cluster and its members that is provided by Application Center.

The Role of Monitoring

Regardless of how well a network, computer system, or an application is configured and maintained, hardware and software elements can quit functioning or perform poorly. Situations such as server crashes, memory leaks, and disk failures are a reality in today's computing environments.

The main purposes of monitoring are to:

Flag a failed server, or one nearing a pre-defined critical threshold.
Flag a successful (or failed) operation.
Provide a tool for identifying performance bottlenecks.
Eliminate the guesswork in capacity planning by providing tangible metrics.

Member and cluster-wide monitoring capabilities are essential to managing complex and dynamic systems that support mission-critical applications. Tools for monitoring different components in a production environment have been available for several years now, and although they can be adapted to handle server clusters, the overhead involved in juggling multiple tools and sets of data grows exponentially with the size and complexity of the clusters.

The guiding principles for the Application Center monitoring feature are:

The ability to instrument a member based on tangible and measurable events.
Provide metrics based on real-time data from any cluster member.
Enable automated actions, such as e-mail notifications and smart recovery.
Provide flexible access to real-time and historical data.

In Chapter 6, "Synchronization and Deployment," you saw how Application Center provides real-time and results-based information for the synchronization service and application deployment. Other chapters—in particular, Chapter 10, "Working with Performance Counters"—deal with the performance aspect of cluster monitoring. This chapter focuses on the tools that you can use to monitor computer and cluster health.

High-Level Architecture

Application Center provides a single point for cluster monitoring that combines existing tools (Microsoft Health Monitor 2.1 and Event Viewer, for example) with its own tools for monitoring a cluster. This enables you to deal proactively with health and performance issues and obtain tangible metrics for capacity planning.

Application Center monitoring uses existing Windows technology extensively to provide access to the standard operating system monitoring features, as well as to give you a unique view of Application Center–specific activities by extending these core technologies.

The user interface, shown in Figure 7.1, provides access to a broad range of monitoring information that's captured by the operating system, Health Monitor, and Application Center at the cluster-wide level or at the individual-member level. Through this user interface you can create or delete monitors, enable or disable monitors, gather event information, configure thresholds, and configure event- or property-based actions for a single cluster member or the entire cluster.

The various monitoring views and their capabilities are covered in more detail later in this chapter. It's worthwhile noting that the Performance view of a cluster or member, which we've seen in other user interface illustrations, uses dynamically generated and refreshed icons to provide real-time information about server state—health, availability, and activity. Table 7.1 summarizes the information that this graphical representation of server status captures.

Bb734912.f07uj01(en-us,TechNet.10).gif

Figure 7.1 Cluster and member monitoring by using the ApplicationCenter snap-in

Table 7.1 ServerState Indicators

Status indicator	States
Server heartbeat	Alive (heartbeat successful)
	Failed (heartbeat unsuccessful)
	Unknown
Load balancing	Online (getting load balancing requests)
	Offline (not getting load balancing requests—user directed)
	Suspended (not getting load balancing requests—system directed)
	Draining (request queue is getting drained)
	Unknown
	Not installed (cluster is not using load balancing)
Health	· Normal
	· Critical
	· Warning
	· Unknown
	· Not installed
Replication loop	· In replication loop
	· Out of replication loop
	· Unknown
Synchronization/ Deployment	Synchronizing/deploying
	Not synchronizing/deploying
	Unknown

The underlying architecture that supports the monitoring user interface shown in Figure 7.1 is illustrated in Figure 7.2, which provides a high-level view of the Application Center monitoring architecture.

Bb734912.f07uj02(en-us,TechNet.10).gif

Figure 7.2 ApplicationCenter monitoring architecture

As you can see in Figure 7.2, Application Center interacts with the Windows Management Instrumentation (WMI) service to access:

Built-in WMI providers, such as the Event Log and Perfmon.
Health Monitor and its provider collection.
The Application Center event provider.

By understanding how all these pieces fit together, you'll be able to tailor your own monitoring views and performance counters much more easily. This topic is covered in detail in Chapter 9, "Working with Monitors and Events." This chapter also examines monitoring goals and objectives, as well as examples for what to monitor and when to monitor.

Note As with any cause-and-effect relationship, the distinction between health and performance monitoring isn't always that clear. It's safe to make the assumption that an unhealthy system will have an impact on performance—but you can't conclude that poor performance is necessarily caused by an unhealthy system.

Let's begin our walk-through of the Application Center monitoring architecture by examining the other Microsoft tools that Application Center utilizes, starting with WMI.

Windows Management Instrumentation

As noted in Chapter 3, "Application Center Architecture," WMI is at the core of the Application Center architecture. The monitoring feature, in particular, uses WMI extensively. This set of extensions to the Windows Driver Model (WDM) provides the interface through which components can provide information and notification. WMI consolidates data from the hardware platform, drivers, and applications, and passes it to a management information store. This data store uses the Common Information Model (CIM) to expose and interact with the data it holds. Working in combination, WMI and CIM provide a mechanism that enables management applications, platforms, and consoles to perform a variety of tasks, including monitoring and logging events.

Architecturally, WMI consists of four main elements:

Managed applications
Managed objects
WMI providers
Management infrastructure

Managed Applications

Management applications are applications based on services provided by the Windows or Microsoft Windows NT/ Windows 2000 operating system that process or display data from managed objects. A management application can perform a variety of tasks, such as:

Respond to events.
Start or stop services.
Measure performance.
Report outages.
Correlate data.

Several strategies are used to implement management applications. Applications can access WMI directly through the COM or scripting APIs, or indirectly with one of the following access methods.

Web browsers can use a set of Microsoft ActiveX controls to control the appearance, relationships, and behavior of data relating to managed objects. The controls are customizable and can be included in a standard schema definition.
Web browsers also can use HTML, which is supported through an ISAPI layer that interacts with WMI.
Database applications can use the WMI ODBC adapter to merge ODBC's database capabilities with WMI's management capabilities. With the ODBC adapter, an application can use a wide range of ODBC-based reporting packages and tools, such as Microsoft Excel and Microsoft Access.
Directory services applications can use the WMI Active Directory Service Interface (ADSI) extension to integrate directory service and management data.

Management applications can be written in any programming language that can communicate with the WMI Service by using a WMI-supported API. Among the supported APIs are the COM-based API for C/C++, and the scripting API for the Microsoft Visual Basic development system, DHTML, Active Server Pages (ASP), and Windows Script Host (WSH).

Managed Objects

Managed objects are logical or physical enterprise components. They are modeled by using the CIM and accessed by management applications through WMI. A managed object can be any system component—from a small piece of hardware, such as a cable or disk drive, to a large software application, such as a database system.

Example

The Win32_LogicalDisk WMI class, derived from CIM_LogicalDisk—an industry standard definition for disk drives with Windows extensions that enables the platform to manage drives—represents a data source that resolves to an actual local storage device on a system that is running the Microsoft Win32 API. Table 7.2, taken from the Platform SDK in the MSDN Library, provides some representative properties for this particular object, properties whose values can be altered or queried through WMI.

Table 7.2 Property Examples for the Win32_LogicalDisk Object

Property	Description
Access	Type of media access available.
BlockSize	Size, in bytes, of the blocks that form this storage extent. If the size is unknown or if a block concept is not valid (for example, for aggregate extents, memory, or logical disks), enter a 1.
Caption	Short description (one-line string) of the object.
Compressed	Indicates whether the logical volume exists as a single compressed entity, such as a DoubleSpace volume. If file-based compression is supported (such as on NTFS), this property will be FALSE.
DriveType	Numeric value corresponding to the type of disk drive that this logical disk represents.
DeviceID	Uniquely identifies the logical disk from other devices on the system.
FreeSpace	Space available on the logical disk.
VolumeName	The volume name of the logical disk.
Size	The size of the disk drive.
PNPDeviceID	Win32 Plug and Play device identifier of the logical device.

WMI Providers

Providers are used to supply WMI with data from managed objects, to handle requests on behalf of management applications, and to generate notifications of events.

Windows Management Instrumentation defines two types of providers:

Built-in providers
Application-specific providers (also called custom providers)

Built-in providers are providers that are included with WMI. These providers supply WMI with information from various logical and physical sources such as the operating system registry, the Win32 subsystem, and Simple Network Management Protocol (SNMP) devices.

WMI's built-in providers are summarized in Table 7.3.

Table 7.3 WMI Built-In Providers

Provider	Description
Directory Services	Makes the classes and objects in Microsoft Active Directory available to WMI management applications.
Event Log	Provides access to data and notifications of events from the Windows NT or Windows 2000 Event Log.
Microsoft Windows Installer	Provides access to information about applications that are installed with the Windows Installer.
Performance Counters	Provides access to raw performance counter data.
Performance Monitor	Provides access to data from the Windows NT or Windows 2000 Performance Monitor.
Power Management Event	Represents power management events resulting from power state changes.
Registry Event	Sends an event whenever a change occurs to a key, a value, or an entire tree in the registry.
Registry	Provides access to data from the registry.
Security	Provides access to security settings that control ownership, auditing, and access rights to NTFS.
SNMP	Provides access to data and events from SNMP devices.
View	Creates new classes made of properties from different source classes, namespaces, or computers.
WDM	Provides access to data and events from device drivers that conform to the WMI interface.
Win32	Provides access to data from the Win32 subsystem.

Developers create application-specific providers to communicate information from objects in their domain to WMI.

Providers communicate with the WMI service by using the COM/DCOM API and are typically written in C or C++.

Figure 7.3, which is based on the WMI architecture diagram (Figure 3.2 in Chapter 3), illustrates the use of providers to provide a link between managed objects, the CIM Object Manager (CIMOM), and the CIM Repository.

Bb734912.f07uj03(en-us,TechNet.10).gif

Figure 7.3 Providers and managed objects in the WMI architecture

Providers serve one of two purposes: they either provide data or provide events. The distinction between these two purposes is categorized further in Table 7.4.

Table 7.4 Provider Types

Type	Description
Class	Retrieves, modifies, deletes, and/or enumerates provider-specific classes. It can also support query processing.
Instance	Retrieves, modifies, deletes, and/or enumerates the instances of provider-specific classes. It can also support query processing.
Property	Retrieves and/or modifies individual property values.
Method	Invokes methods for a provider-specific class.
Event	Generates notifications of events.
Event consumer	Supports event notification by mapping a physical consumer with a logical consumer.

Management Infrastructure

The management infrastructure consists of the WMI Service and the CIM repository. WMI enables users to handle communications between management applications and providers. Applications and providers communicate through WMI by using a common programming interface (COM API). The COM API, which supplies event notification and query processing services, is available in the C and C++ programming languages. The CIM repository holds static management data that changes infrequently.

WMI Service

The file WinMgmt.exe implements the WMI Service.

WinMgmt.exe starts when the first management application makes a call to connect, and is activated when the first client application successfully connects and runs continuously when management applications actively seek its services.

WMI supports the following programming interfaces:

The COM API for WMI—Components, written in C or C++, use this API for communicating between management applications, providers, and schema extensions and the WMI service.
Scripting API for WMI—WMI client applications based on this API are created by using various scripting languages and Visual Basic.

When an application makes a request by calling a method in either the COM or Scripting API, the WMI Service determines whether the request involves static data stored in the CIM repository or dynamic data supplied by a provider.

Note Static data can be handled by WMI, whereas dynamic data always involves a provider. Providers register their location and support for particular operations, such as data retrieval, modification, deletion, enumeration, or query processing. WMI uses this registration information to match application requests with the appropriate providers—and to locate and load the providers when necessary. When a provider finishes processing a request, it returns the results back to WMI, which, in turn, forwards the result to the application.

In addition to implementing the operations associated with application requests, WMI also supplies support for the following:

Event notification
Query language
Security

Event Notification

Events are occurrences of interest in the managed world; for the Application Center administrator, events related to a cluster member, or cluster operations as a whole, can be used to manage, maintain, and troubleshoot cluster operations.

WMI supports the detection of events and their delivery to interested subscribers, or event consumers. Events are represented by instances of classes derived from the system class _Event. Although WMI can detect some events by itself, such as changes to the CIM repository, WMI event providers are used to detect most events. Event providers are WMI providers that monitor a source of events and notify WMI when events occur. An example is the Registry Event Provider, which notifies WMI when a registry entry changes.

WMI supports the registration for, and distribution of, event notifications to event consumers. Event consumers register to receive particular types of notifications. Event providers register to supply particular types of notifications. WMI acts as an intermediary between event consumers and providers, which enables them to operate independently.

WMI supports both temporary and permanent event consumers. Temporary event consumers receive notifications only as long as they are active, and their registration is removed when they terminate. Permanent consumers, on the other hand, will receive a notification whenever one occurs. WMI must be available at all times to deliver these event notifications.

In addition to the events generated by event providers, referred to as extrinsic events, WMI also produces two of its own events: timer events and intrinsic events. Timer events occur either periodically, according to a specified time interval, or once at a specified time. Intrinsic events are driven by changes to the data that is stored in the CIM repository.

WMI forwards the notifications of all types of events to applications that have registered to receive them.

Query Language Support

WMI supports the Windows Management Instrumentation Query Language (WQL), a subset of the ANSI standard Structured Query Language (SQL), with WMI-specific extensions.

WQL supports the following types of queries:

Data queries are used to retrieve class instances and data associations.
Event queries enable consumers to register for event notification and enable providers to register to support events.
Schema queries are used to retrieve class definitions and schema associations. Class providers use these queries to specify the classes they support when they register. The following example illustrates how you can use SQL's query extensions to discover/transverse a schema:

IWbemServices::ExecQuery ("Associators of {Win32_Service = 'DHCP'})

You can query any CIM object, such as the logical disk that was provided as an example of a managed object. An example of such a query is:

IWbemServices::ExecQuery
("select * from Win32LogicalDisk
where Win32LogicalDisk.FreeSpace < 2000000")\

By using WMI's rich query language support, you can execute sophisticated event filtering (thresholds, aggregation, and inheritance). The following code sample demonstrates this:

//* within nnn – for specifying the tolerance for event delay
//* isa – registration applies to all events from class (including derived classes)
//* within nnn (within group by clause) – specifies aggregation interval
select * from _InstanceModificationEvent within 5 where TargetInstance is a Win32_LogicalDisk 
group by Driveletter
within 10 having count > 25

Note WQL does not support cross-namespace queries or associations. You cannot query for all instances of a specified class residing in all the namespaces on the target computer, and you can't associate two objects across a namespace boundary and retrieve or query for the associations.

Security Support

WMI provides some security support, in that it will validate user credentials before a user is allowed to connect to WMI. WMI does not support protection of individual classes or instances of dynamic data. However, it does support protecting individual classes and instances of dynamic data through the use of impersonation. WMI also supports security for individual namespaces.

Because WMI is implemented by using DCOM, it is important to understand how to use the DCOM security settings. The following extracts from the WMI SDK documentation summarize these settings.

Using DCOM security from WMI

DCOM security settings fall into two categories: authentication and impersonation. Authentication is the means by which one process identifies itself to another. Impersonation indicates how much authority a client grants a server to call other processes on its behalf.

Impersonation

DCOM impersonation levels range from no identification to full-blown delegation of authority. DCOM provides default security levels, which it reads from the system registry. Unfortunately, unless specifically modified, these registry settings set the impersonation level too low for WMI to function. The default impersonation level is typically RPC_C_IMP_LEVEL_IDENTIFY, while WMI needs at least RPC_C_IMP_LEVEL_IMPERSONATE to function with most providers. This impersonation level, or higher, must be explicitly set before calling into WMI.

Authentication

DCOM authentication levels range from no authentication to per-packet encrypted authentication.

User identification involves verifying the user name and password, the responsibility of system security packages such as NTLM. Security packages have no information about access rights; their only responsibility is to positively identify valid users.

Depending on the operating system, WMI supports varying levels of authentication. For the Windows 95 and Windows 98 operating systems, all local users are assumed to be authentic, regardless of the user and password. There is no local authentication, and true authentication only occurs over remote connections. For the Windows NT and Windows 2000 operating systems, all users are authenticated. After authentication, the user is still subject to permissions settings, which are strictly enforced.

Permission assignment involves granting access to valid users and is the responsibility of WMI. Because Windows 95 and Windows 98 are not secure operating systems, granting and denying permissions have less effect on these systems than on other systems. For example, for the Windows NT and Windows 2000 operating systems, file operations may be denied to a user even if WMI considers the user as having full access.

For Windows NT version 4.0, the only authentication service is NTLM. Under Windows 2000, two additional authentication services are available: Kerberos V5 authentication and the Negotiate authentication service. Negotiate is recommended for use with code that needs to work in domains that are not using the Kerberos V5 protocol. If the client and server are on two computers running Windows 2000, the default authentication service will be Negotiate. In all other cases, the default authentication service is NTLM. For more information on DCOM impersonation and authentication levels, see the COM reference documentation in the Platform SDK.

Application Center and WMI Security

In Application Center, the Application Center and Health Monitor namespaces can be read by any authenticated user, but can be written (that is to say, an instance of existing classes or new classes can be created) only by an administrator and by the cluster user group account, which is identified by ACA_ computername.

On the Windows 2000 platform WMI, a remote connection to a given WMI namespace is a separate user right that may or may not be granted by the system administrator.

Warning With a remote connection, a user can specify a user name and password as a substitute for his or her current user name and password. If authenticated, the user can access the target namespace. (With a local connection, you cannot override the current name and password.) If you want to control access to a namespace, you have to do it via user rights.

The CIM Repository

The CIM repository is a central storage area managed by the CIM Object Manager. The repository uses a simple schema, consisting of namespaces, classes, and class instances, for storing and accessing data.

Namespace

The namespace is the top-level node, or root, of the CIM schema. A namespace is a logical unit for grouping classes and class instances, and for controlling their scope and visibility. Typically, a namespace contains a set of classes and instances that represent managed objects in a particular environment.

A namespace is represented in its parent namespace by an instance of the _Namespace system class or a class that derives from the __Namespace class. The __Namespace class has a single property: Name. It is the name of the namespace that distinguishes it from all other namespaces.

All WMI installations have these predefined namespaces:

root
root\default
root\cimv2

The root namespace is primarily designed to contain other namespaces. The WMI installation places the other predefined namespaces under the root namespace. The root\default namespace holds most of the system classes. The root\cimv2 namespace contains the classes and instances that represent a Win32 environment, such as Win32_LogicalDisk and Win32_OperatingSystem. Namespaces can be nested, which is the case with the Application Center environment. The Application Center namespace structure in the CIM repository is as follows:

root\

\ MicrosoftApplicationCenter

root\cimv2

\MicrosoftHealthMonitor

Classes and Instances

A WMI class defines a template for describing a type of managed object. An instance of a WMI class represents a specific managed object of the type described by the class. Where the class generally models the real-world device or component in the general sense, each instance represents a specific occurrence of the device or component. For example, a general class called FloppyDisk might exist, with the instances on a particular host computer representing drives A and B.

Class definitions can be either static or dynamic. A static class has a definition that is persistent and is stored in the CIM repository until it is explicitly deleted. WMI can provide definitions of static classes without the help of a provider.

A class provider supplies a dynamic class definition at run time when the class definition is required. Dynamic class definitions are not persisted. Whenever the definition of a dynamic class is requested, WMI makes a call to the appropriate class provider to get the definition of the dynamic class.

All instances of a class exist within the namespace to which the class belongs. Within a namespace, instances can be either static or dynamic.

The following limits apply to storing data in the repository:

Static classes with static instances store the class definitions and the instances in the CIM repository.
Static classes with dynamic instances store only the definitions.
Dynamic classes store neither the class definitions nor the instances. This information has to be handled by a provider.

Figure 7.4 shows how the example counter, Physical Disk Queue Length, is stored in the CIM repository. In this example, the object's namespace is root\MicrosoftHealthMonitor\PerfMon.

Note There are several tools available that you can use to access WMI and the CIM repository. First, there's wbemtest, which is a Win32 application; then there's the tool collection that ships with the WMI SDK. The SDK includes online documentation, WMI CIM Studio (an HTTP-based browser used to provide the view shown in Figure 7.4), the WMI Event Registration and the WMI Object Browser (HTTP-based), and the WMI Event Viewer, which is Win32-based.

Bb734912.f07uj04(en-us,TechNet.10).gif

Figure 7.4 The Physical Disk Queue Length counter, as it is stored in the CIM repository

WMI or network administrators can either place data directly into the repository or add it programmatically. Developers can use the Managed Object Format Language (MOF) or the COM API to write information, such as a new performance counter, to the repository. In Chapter 10, "Working with Performance Counters," you'll see how to create and add performance counters by using .mof files and the MOF compiler.

The following example shows the .mof source that is used to create the Physical Disk Queue Length counter. This source was created by using the CIM Studio MOF Generator Wizard, one of several tools included in the WMI SDK. The wizard automates the process of generating .mof source for class definitions and/or instances. The WMI SDK is available from Microsoft at https://msdn.microsoft.com/library/en-us/wmisdk/wmi/wmi_start_page.asp.

//*************************************************************************
//* File: SampleMOF.mof
//*************************************************************************
//*************************************************************************
//* This MOF was generated from the "\\.\ROOT\CIMV2\MicrosoftHealthMonitor\PerfMon"
//* namespace on machine "ACDW822AS".
//* To compile this MOF on another machine you should edit this pragma.
//*************************************************************************
#pragma namespace("\\\\.\\ROOT\\CIMV2\\MicrosoftHealthMonitor\\PerfMon")
//*************************************************************************
//* Class: PhysicalDisk
//* Derived from: Win32_PerfFormattedData
//*************************************************************************
[dynamic: ToInstance, provider("PerfProv"), DisplayName("PhysicalDisk"), Description("The 
Physical Disk performance object consists of counters that monitor hard or fixed disk 
drive on a computer. Disks are used to store file, program, and paging data and are read 
to retrieve these items, and written to record changes to them. The values of physical 
disk counters are sums of the values of the logical disks (or partitions) into which they 
are divided."), ClassContext("local|PhysicalDisk")]
class PhysicalDisk : Win32_PerfFormattedData
{
[read, key] string Name;
[read, CounterType(65536), PropertyContext("Current Disk Queue Length"), 
DisplayName("Current Disk Queue Length"), Description("Current Disk Queue Length is the 
number of requests outstanding on the disk at the time the performance data is collected. 
It includes requests in service at the time of the snapshot. This is an instantaneous 
length, not an average over the time interval. Multi-spindle disk devices can have 
multiple requests active at one time, but other concurrent requests are awaiting service. 
This counter might reflect a transitory high or low queue length, but if there is a 
sustained load on the disk drive, it is likely that this will be consistently high. 
Requests are experiencing delays proportional to the length of this queue minus the number 
of spindles on the disks. This difference should average less than 2 for good 
performance.")] real32 CurrentDiskQueueLength;
[read, CounterType(542573824), PropertyContext("% Disk Time"), DisplayName("% Disk 
Time"), Description("% Disk Time is the percentage of elapsed time that the selected disk 
drive is busy servicing read or write requests.")] real32 PercentDiskTime;
[read, CounterType(5571840), PropertyContext("Avg. Disk Queue Length"), 
DisplayName("Avg. Disk Queue Length"), Description("Avg. Disk Queue Length is the average 
number of both read and write requests that were queued for the selected disk during the 
sample interval.")] real32 AvgDiskQueueLength;
[read, CounterType(542573824), PropertyContext("% Disk Read Time"), DisplayName("% 
Disk Read Time"), Description("% Disk Read Time is the percentage of elapsed time that the 
selected disk drive is busy servicing read requests.")] real32 PercentDiskReadTime;
[read, CounterType(5571840), PropertyContext("Avg. Disk Read Queue Length"), 
DisplayName("Avg. Disk Read Queue Length"), Description("Avg. Disk Read Queue Length is 
the average number of read requests that were queued for the selected disk during the 
sample interval.")] real32 AvgDiskReadQueueLength;
[read, CounterType(542573824), PropertyContext("% Disk Write Time"), DisplayName("% 
Disk Write Time"), Description("% Disk Write Time is the percentage of elapsed time that 
the selected disk drive is busy servicing write requests.")] real32 PercentDiskWriteTime;
[read, CounterType(5571840), PropertyContext("Avg. Disk Write Queue Length"), 
DisplayName("Avg. Disk Write Queue Length"), Description("Avg. Disk Write Queue Length is 
the average number of write requests that were queued for the selected disk during the 
sample interval.")] real32 AvgDiskWriteQueueLength;
[read, CounterType(805438464), PropertyContext("Avg. Disk sec/Transfer"), 
DisplayName("Avg. Disk sec/Transfer"), Description("Avg. Disk sec/Transfer is the time in 
seconds of the average disk transfer.")] real32 AvgDiskSecPerTransfer;
[read, CounterType(805438464), PropertyContext("Avg. Disk sec/Read"), 
DisplayName("Avg. Disk sec/Read"), Description("Avg. Disk sec/Read is the average time in 
seconds of a read of data from the disk.")] real32 AvgDiskSecPerRead;
[read, CounterType(805438464), PropertyContext("Avg. Disk sec/Write"), 
DisplayName("Avg. Disk sec/Write"), Description("Avg. Disk sec/Write is the average time 
in seconds of a write of data to the disk.")] real32 AvgDiskSecPerWrite;
[read, CounterType(272696320), PropertyContext("Disk Transfers/sec"), 
DisplayName("Disk Transfers/sec"), Description("Disk Transfers/sec is the rate of read and 
write operations on the disk.")] real32 DiskTransfersPerSec;
[read, CounterType(272696320), PropertyContext("Disk Reads/sec"), DisplayName("Disk 
Reads/sec"), Description("Disk Reads/sec is the rate of read operations on the disk.")] 
real32 DiskReadsPerSec;
[read, CounterType(272696320), PropertyContext("Disk Writes/sec"), DisplayName("Disk 
Writes/sec"), Description("Disk Writes/sec is the rate of write operations on the disk.")] 
real32 DiskWritesPerSec;
[read, CounterType(272696576), PropertyContext("Disk Bytes/sec"), DisplayName("Disk 
Bytes/sec"), Description("Disk Bytes/sec is the rate bytes are transferred to or from the 
disk during write or read operations.")] real32 DiskBytesPerSec;
[read, CounterType(272696576), PropertyContext("Disk Read Bytes/sec"), 
DisplayName("Disk Read Bytes/sec"), Description("Disk Read Bytes/sec is the rate bytes are 
transferred from the disk during read operations.")] real32 DiskReadBytesPerSec;
[read, CounterType(272696576), PropertyContext("Disk Write Bytes/sec"), 
DisplayName("Disk Write Bytes/sec"), Description("Disk Write Bytes is rate bytes are 
transferred to the disk during write operations.")] real32 DiskWriteBytesPerSec;
[read, CounterType(542573824), PropertyContext("% Idle Time"), DisplayName("% Idle 
Time"), Description("% Idle Time reports the percentage of time during the sample interval 
that the disk was idle.")] real32 PercentIdleTime;
[read, CounterType(272696320), PropertyContext("Split IO/Sec"), DisplayName("Split 
IO/Sec"), Description("Split IO/Sec reports the rate that I/Os to the disk were split into 
multiple I/Os. A split I/O may result from requesting data in a size that is too large to 
fit into a single I/O or that the disk is fragmented.")] real32 SplitIOPerSec;
};
//*************************************************************************
//* Instances of: PhysicalDisk
//*************************************************************************
instance of PhysicalDisk
{
AvgDiskQueueLength = 0.1049272;
AvgDiskReadQueueLength = 0;
AvgDiskSecPerRead = 0;
AvgDiskSecPerTransfer = 5.24653E-03;
AvgDiskSecPerWrite = 5.24653E-03;
AvgDiskWriteQueueLength = 0.1049272;
CurrentDiskQueueLength = 0;
DiskBytesPerSec = 10350.89;
DiskReadBytesPerSec = 0;
DiskReadsPerSec = 0;
DiskTransfersPerSec = 20.21658;
DiskWriteBytesPerSec = 10350.89;
DiskWritesPerSec = 20.21658;
Name = "_Total";
PercentDiskReadTime = 0;
PercentDiskTime = 10.49272;
PercentDiskWriteTime = 10.49272;
PercentIdleTime = 88.41473;
SplitIOPerSec = 0;
};
//* EOF SampleMOF.mof

Performance Counters

Because you may want to access other performance counters in addition to those exposed by the Application Center user interface, this is a good point to examine performance counters.

Note Performance counters are accessed and manipulated via direct communications between Application Center and the Performance Data Helper (PDH) interface in order to achieve performance gains.

The Performance Data Helper

In order for a program to utilize the Windows performance feature set and library, it has to use the functions that the registry interface provides. These functions retrieve blobs of data from the key HKEY_PERFORMANCE_DATA, which contains performance information. In order to use this data—which is to say, convert it to human usable form—it's necessary to traverse the existing data structure, and then apply calculations against the raw data to produce usable counter information.

The Performance Data Helper (PDH) library is built on top of the standard performance monitoring features provided by Windows. Through its APIs the PDH Library supplies an interface that is essentially a higher-level abstraction of the registry interface's functionality. However, the PDH functions can use either the registry interface or WMI. In the latter case the PDH functions obtain data through providers that use performance extension DLLs or the high-performance data provider object.

Note The PDH is oriented more towards operations on single counters rather than groups of counters.

The PDH packages the data in a form that doesn't require any traversal of the data structure. Additionally, it applies the appropriate statistical calculations to each counter.

Windows Performance Data Collection

The performance data that Windows 2000 collects is described in terms of objects, counters, and instances. A performance object is any resource, application, or service that can be measured.

Each performance object has performance counters that are used to measure various aspects of performance, such as transfer rates for disks or the amount of processor time consumed for processors. The object may also have an instance, which is a unique copy of a particular object type (not all object types support multiple instances).

An instance called _Total instance, which is available on most objects, represents the sum of the values for all instances of the object for a specific counter.

Counter Types

Every counter is assigned a counter type, which determines how counter data is calculated, averaged, and displayed.

Note The Windows performance console supports more than 30 counter types, but many of the available counter types are not implemented in the counters installed with Windows 2000.

Counter types are described in the following manner:

Counter type name—the name of the counter type.
Description—a brief description of the counter type, including a description of the formula used to calculate and display counters of the specified type.

Generic type—the general category that represents the display format counter. Generic types include:
- Average. These counters measure a value over time and display the average of the last two measurements.
- Difference. These counters subtract the last measurement from the previous one and, if the measurement is positive, they display the difference; if negative, they display a zero.
- Instantaneous. These counters display the most recent measurement.
- Percentage. These counters display calculated values as a percentage.
- Rate. Similar to an average counter, these counters sample an increasing count of events over time and divide the cache in count values by the change in time to display a rate of activity.
Formula—describes how the raw data and other components, such as performance frequency, are converted to arrive at the formatted counter value for display in the console.
Average—the mathematical formula used to calculate averages of the formatted counter data.

Operating System Counters

The Windows 2000 operating system provides a collection of more than 70 performance objects, each with its own set of counters. The Memory object, which is described in Table 7.5, provides one of the counters that you can access through the Application Center user interface.

The Memory performance object consists of 29 counters that describe the behavior of physical and virtual memory on the computer. Physical memory is the amount of RAM on the computer, whereas virtual memory consists of space in physical memory and on disk. Table 7.5, extracted from the Windows 2000 SDK documentation, provides a summary of the Available Bytes counter for the Memory object.

Table 7.5 Example of a Windows 2000 Performance Counter

Counter name	Description	Counter type
Available Bytes	Shows the amount of physical memory, in bytes, available to processes running on the computer. It is calculated by adding the amount of space on the zeroed, free, and standby memory lists. Free memory is ready for use. Zeroed memory consists of pages of memory filled with zeros to prevent later processes from seeing data used by a previous process. Standby memory is memory that has been removed from a process's working set (its physical memory) en route to disk, but is still available to be recalled.	PERF_COUNTER_RAWCOUNT

The counter type for the Available Bytes counter is PERF_COUNTER_RAWCOUNT, which is described in Table 7.6.

Table 7.6 Performance Counter Type Description

Element	Comment
Description	Shows the last observed value only. It does not display an average.
Generic type	Instantaneous
Formula	None; shows raw data as collected.
Average	SUM (n) / x
Example	Memory\available bytes

Feature Counters

Some Windows 2000 features or services install one or more performance objects to measure the activity of the feature or service. Table 7.7 summarizes the list of features or services and their corresponding performance objects.

Table 7.7 Windows 2000 Feature/Service Performance Objects

Feature or service	Performance object
Internet Information Service 5.0 (IIS)	ASP pages, FTP Service, Web Service, Internet Information Services Global
Indexing Service	Indexing Service, Indexing Service Filter, HTTP Indexing Service
Message Queuing	MSMQ Session, MSMQ IS, MSMQ Queue, MSMQ Service
Quality of Service (QoS) Admission Control	ACS/RSVP Service, ACS/RSVP Interfaces, ACS/RSVP Policy
Routing and Remote Access (RRAS)	RAS Port, RAS Total
File Replication Service	File ReplicaConn, FileReplicaSet
Terminal Service	Terminal Services Session
Active Directory	NTDS

Health Monitor 2.1

Before examining Health Monitor's architecture and features, let's cover the basic monitoring terminology that's used by Health Monitor and Application Center.

Data collector—A data collector receives and stores WMI data. A data collector represents the worst state of its child thresholds. Through the Health Monitor snap-in, you can create and configure data points to specify which data to collect, when, and from which server. You can group related data collectors into a data group.
Data group—You can use the user interface to create a data group, which provides a means for grouping related data points into a category.
Threshold—A threshold is a boundary that you can set to establish criteria for generating alerts. When the threshold is crossed, a data collector's state changes and the appropriate alert reflects this change. For example, the CPU utilization state can change from OK to Critical.
Event—An event is any occurrence of interest related to managing computer hardware, software, and applications. Typically, an event is tied to an action or an alert. For example, the W3svc Service fails to start on a cluster member. This event results in two actions: first, a WMI event is sent, and second, an alert notification is displayed in the Alert view.
Alert—An alert is the interpretation of an event or collection of events that results in a message being sent to the Health Monitor snap-in. For example, when the threshold for CPU utilization is exceeded, the system interprets this information and generates an alert.
Action—An action is the monitoring system's automated response to a specified condition. An action is in response to an alert and can range in severity from displaying a message on the console to shutting down the entire system. For example, an e-mail notification is sent to a specified user when a threshold is exceeded.

High-Level Architecture and Operation

At the highest level, Health Monitor consists of two components: the Health Monitor snap-in and the Health Monitor agent. During installation, you have the option of installing either or both these components on the local server.

Note During installation Application Center installs both the Health Monitor snap-in and agent on the server.

The monitoring snap-in is installed in client-only mode during setup. Through this snap-in you can add computers and edit their monitoring configuration settings, provided that you are logged on as a user with administrative privileges on the target computer. Application Center requires that configuration settings be changed only by a user account that has administrative privileges. All other logons function in operator-only mode, which allows them to view monitoring information and enable or disable a monitor.

The agent gathers data through its data collectors, tests for threshold violations, and generates alerts. Figure 7.5 provides a rudimentary diagram of the Health Monitor architecture as it's implemented by Application Center.

Bb734912.f07uj05(en-us,TechNet.10).gif

Figure 7.5 The Health Monitor console and agent architecture

Health monitoring is set up in two steps by using .mof files. The first .mof file defines the namespace and sets up the agent. This .mof file gets compiled and placed into WMI when Health Monitor is installed on a server that's going to be monitored. Next, Application Center compiles a second .mof file that contains the default monitoring rules and policies.

Each agent runs independently on a single server and is unaware that a console is monitoring its activities. The agent continues collecting data, monitoring thresholds, generating events, and responding with actions. The Health Monitor design is such that a minimal amount of code is required for the agent. The console handles general communications between itself and the agent and provides support for features, such as the heartbeat. The console's Connection Manager (Figure 7.6) is responsible for handling Health Monitor communications between servers.

Figure 7.6 provides a more detailed view of the Health Monitor architecture. As you can see in this diagram, Health Monitor implements several of its own custom providers to supplement those supplied by WMI.

Bb734912.f07uj06(en-us,TechNet.10).gif

Figure 7.6 The Health Monitor 2.1 architecture

The Health Monitor Agent

The agent is a provider and consumer of WMI data. The agent runs on monitored computers and collects data as well as evaluates thresholds. It also generates alerts and manages actions when thresholds are crossed.

The Health Monitor agent utilizes several providers that ship with the product, including the following:

Core Agent
Win32
HTTP
COM+
Ping
TCP/IP Port Connect

Health Monitor Classes

There are three distinct types of classes in Health Monitor: configuration classes, status classes, and event classes. Figure 7.7 illustrates this hierarchy of classes and how they are interrelated. Chapter 9, "Working with Monitors and Events," describes these classes and their associations in detail.

Bb734912.f07uj07(en-us,TechNet.10).gif

Figure 7.7 An illustration of class relationships for a monitor with data collectors, thresholds, and actions

Configuration classes are used for configuring the agent provider by telling it what data to collect and what thresholds are run. The primary classes are MicrosoftHM_DataCollectorConfiguration and MicrosoftHM_ThresholdConfiguration, and their properties encompass:

When to poll the WMI class or register for an event.
What to look at.
The threshold value.
The duration for which the value must remain.
Which state to change to.
Associated actions.

Since these configuration classes are stored statically in WMI, the agent is a consumer of instances rather than a provider.

With the status and event classes, the agent is an instance and event provider, respectively. For each configuration class there is a corresponding status class. For example, in the MicrosoftHM_SystemConfiguration class, you can enable or disable monitoring. The agent provides an event from the MicrosoftHM_SystemStatusEvent class when the state of the computer changes. This state is also reflected in the MicrosoftHM_SystemStatus class. The console acts as a consumer for these events to display the correct icon in the user interface.

Core Agent Provider

The best way to understand how the Health Monitor agent works is to examine the workings of the Core Agent Provider, which handles the bulk of the Health Monitor agent's processing activities.

When the provider starts, it reads in the information that it requires from instances of the following classes: MicrosoftHM_SystemConfiguration,
MicrosoftHM_DataGroupConfiguration,
MicrosoftHM_DataCollectorConfiguration,
MicrosoftHM_ThresholdConfiguration, and some association classes.

The Core Agent Provider collects instances in three ways:

Via the GetObject WMI API call—instances of the MicrosoftHM_PolledGetObjectDataCollectorConfiguration class.
By executing a query that returns an instance as a response—instances of the MicrosoftHM_PolledQueryDataCollectorConfiguration class.
By registering a query to receive events (limited only by the length of time the query is active)—instances of the MicrosoftHM_EventQueryDataCollectorConfiguration class.

After this information is obtained, the provider is fully initialized and ready for operation.

Note Because the Core Agent Provider is also registered as a temporary consumer to receive events for instance modification and the deletion and configuration of the configuration classes, it can alter its behavior. These events occur when the console or a third-party tool needs to alter the work of the provider.

The Core Agent Provider, operating on a polling interval, loops through all the HMDataCollector instances and determines which ones need to collect their data. Those that have reached their time interval execute the appropriate query, method, or GetObject and collect their data.

Each instance is then evaluated to see whether or not a threshold on a property was crossed.

Note In cases where the threshold is based on a time period (duration), threshold violation must occur over successive collection intervals for the specified duration in order to be flagged as a valid violation.

Threshold tests against the data may be for different values: current property value, average property value, or number of instances returned, respectively. An additional test, Difference, can test for the difference between the current value of a counter and the value from a previous collection pass. However, only a single property may be evaluated in a threshold.

For thresholds that are crossed, the Core Agent Provider creates a status event (whose message is contained in the MicrosoftHM_ThresholdStatus class). If this threshold causes a state change in a parent data collector, data group, or the system, an event is fired from their event class as well. Status events are sent only when there is a state change, and only for the classes that had a change. This information can be pushed to the Windows Event Log by using an action, where it can then be accessed by the console. In addition, data collector state changes are logged to the Application Center event log.

The event-based instance collection works in much the same fashion, except that instances can come in at any time. Regardless of when these instances are received, they are evaluated only at the end of a specified collection interval.

Other Providers

Among the providers that Health Monitor uses, the HTTP and COM+ providers are important for monitoring Web servers and clusters.

HTTP Provider

The HTTP Provider is a WMI Instance Provider that supports the required interfaces for exposing the WMI Instance Provider services. The HTTP Provider monitors HTTP requests and responses, using WMI, and provides statistics to a monitoring tool—such as Health Monitor—on the status of Web application availability and performance.

Through the HTTP Provider, Application Center can use Health Monitor to execute HTTP requests and receive responses. This enables you to programmatically monitor Web application performance and availability. You can then direct the server to perform specific actions based on the information that's received.

Note Because the HTTP provider class does not use WinInet, it is safe for server-side use.

COM+ Provider

The COM+ Provider is a WMI Instance Provider that supports the required interfaces for exposing the WMI Instance Provider services. You can use the COM+ Provider to collect and monitor COM+ data by using WMI. It provides statistics on the status of COM+ application availability and performance.

In addition to providing a statistical view of COM+ server behavior, the provider can be configured to provide notifications when defined thresholds are met or exceeded. The provider also gives you access to information that is not easily available, such as failure shutdowns, object activations, or committed transactions. Because the provider enables you to select specific COM+ applications to monitor (as well as customize data that's collected), the processing overhead needed to gather all of the COM+ objects and events information for an application is minimal.

The Health Monitor Snap-in

The Health Monitor snap-in is the graphical user interface that you use to administer Health Monitor and view the state of a configured object. The Health Monitor snap-in is like other Microsoft Management Console (MMC) snap-ins; the console tree enables you to administer objects—monitors and groups in this case—and the details pane displays corresponding status information. Health Monitor splits the details pane into two sections for presenting information: the upper part displays details and statistics, and the lower part displays alerts, as shown in Figure 7.8.

Bb734912.f07uj08(en-us,TechNet.10).gif

Figure 7.8 The Health Monitor snap-in and its views for displaying information

Monitor Statistics and Alerts

The details pane for a monitor shows statistical information and alerts for a monitored object that you highlight in the console tree. In the example shown in Figure 7.8, the monitor is one that checks for the presence of a default home page at https://127.0.0.1 (ACDW516\Synchronized Monitors\Web Site Monitors) when one of two thresholds is passed. The Details view displays the following data:

Status—Disabled. The monitor is not running.
Threshold Name—A violation will occur if one of two thresholds is crossed, the response time is greater than 30 seconds, or the status code returned is greater than or equal to 400.
Last Alert—Not applicable, because no alerts have been generated.

Statistics View

Figure 7.9 shows the statistical information that is available for the monitor. Gathered by its data collector, this information includes Property and Instance information, and if desired, values (such as Current, Minimum, Maximum, and Average) returned for the last test, which is date and time stamped (Last Update). Statistics are shown for all properties selected in the data collector configuration and used in thresholds. Statistics are useful to see information, such as the current value of performance counters or headers returned by a Web server.

Bb734912.f07uj09(en-us,TechNet.10).gif

Figure 7.9 Statistical information available on the Statistics view

Alerts View

The Alerts view, shown in Figure 7.10, displays Alert notifications that are generated for the monitor. The Alerts view shows:

The Severity of the alert (Reset, Warning, Critical, Disabled, and so on).
The Date/Time of the alert.
The name of the Data Collector.
The name of the Computer the monitor is running against.
An Alert message, if provided.

Bb734912.f07uj10(en-us,TechNet.10).gif

Figure 7.10 The Alerts view for a monitor

In addition to customizing the Alert view to display selected information, you can sort alerts on each of the fields that are displayed, by severity or by date and time, for example.

Console Tree

The console tree provides the primary administrative interface for specifying which computers to monitor as well as creating and modifying the monitors for a system. Figure 7.11 provides a graphical representation of the monitoring functions that you can access from the console tree in the Application Center implementation of Health Monitor. The console tree that's illustrated is based on a standard Application Center installation—it does not include elements that are added if you decide to do a custom setup and install all the sample actions and monitors that are available.

Bb734912.f07uj11(en-us,TechNet.10).gif

Figure 7.11 Graphical representation of the ApplicationCenter Health Monitor console tree showing the major nodes and sub-nodes

The four major nodes for a monitored computer are:

Actions—This node is used to store and manage the actions. The default actions installed by Application Center are: take a server online, take a server offline, e-mail administrator, log on to Websitefailures.log, and log on to Offline.log
Non-Synchronized Monitors—This node contains monitors that you can configure for use on individual members. These monitors are not replicated across a cluster.
Sample Monitors—If you chose the default Application Center installation, this node is not created. With a custom installation, however, you can install a collection of sample monitors that are provided with Application Center. You can customize these samples to suit your particular cluster environment. You can always add these samples later by running the setup program again. The Program Maintenance dialog box, in Setup, has a Modify option that lets you change the features that are currently installed. Additionally, the file Samples.mof can be copied from the installation CD and compiled by using Mofcomp.exe.
Synchronized Monitors—Application Center installs a collection of synchronized monitors by default. These monitors are grouped into the following categories: Application Center monitors, Online/offline monitors, System monitors, and Web Site monitors. This collection of monitors is synchronized across the cluster, and their configuration is replicated to every member. If this data group is deleted, it will be re-created the next time the system runs a full synchronization.

SQL Server Desktop Engine and ACLog

The Application Center Events and Performance Logging feature uses the Microsoft SQL Server 2000 Desktop Engine (also known as MSDE 8.0), which provides a small footprint data store—installed as a named instance—for logging monitoring data that is generated by each cluster member. This data includes events, performance, and page-level statistics.

Installation is optional (performance logging is enabled and the SQL desktop engine is installed by default), but before you decide not to install this option, you should carefully weigh the benefits of using this feature against the performance impact it will have on your servers. The Application Center implementation of the SQL desktop engine is tuned to minimize the impact that monitoring will have on a system. In addition to memory tuning, the stored procedures and queries that this service uses are optimized as well.

If you decide not to install the SQL desktop engine, several Application Center monitoring features will not work (they still appear in the user interface, but are not functional) because they depend on the SQL desktop engine. These features and possible workarounds for them are summarized in Table 7.8.

Note The workarounds suggested in Table 7.8 have to be installed and configured on each cluster member, and they provide information on a per-machine basis, but not for the cluster as a whole, as does the integrated Application Center log feature set.

Table 7.8 Disabled Monitoring Features and Workarounds for an Installation Without ApplicationCenter Events and Performance Logging

Feature	Workaround
Performance view	Use Windows 2000 Performance Monitor.
Event view	Use Windows 2000 Event Viewer.
Historical view	Use Windows 2000 Event Viewer.
Health Monitor events	Use a standalone installation of Health Monitor.
Archiving data	Develop your own method for archiving data.
Reporting	Develop your own method for reporting.

The Application Center Log

With logging enabled, the SQL desktop engine and the Application Center monitoring database, ACLog, are created on each cluster member. In addition to the standard SQL system tables, this database consists of 11 Application Center–specific tables that are used to store event and performance information for each server. ACLog, normalized and indexed for optimum performance, serves two purposes.

First, it provides event and performance information that provides real-time and short-term historical data that is collected and displayed in the user interface.
Second, it provides an interim repository for performance data that can be extracted and archived in another database. The information in this database can be accumulated over a long period of time and used later for trend analysis and capacity planning.

Note This SQL database runs as a named instance, which allows multiple copies of SQL Server 2000 to run on the same server. This, along with the fact that Application Center uses a different port number than SQL Server 2000, isolates the monitoring database from other installations of SQL Server. As a result, potential performance and security issues are eliminated when running the SQL desktop engine and Microsoft SQL Server on the same computer.

The database tables, their relationships, and primary keys are shown in the table diagram provided in Figure 7.12.

Bb734912.f07uj12(en-us,TechNet.10).gif

Figure 7.12 Database table diagram for the ApplicationCenter Log (ACLog)

Table 7.9 summarizes the data that is stored in each of the ACLog database tables.

Table 7.9 ACLog Database

Table name	Used to
Servers	Store identifying information (for example, a globally unique identifier [GUID]) for each cluster member.
Events	Store unique event information, such as the server identifier and GUID, event class identifier, event generation time, and event data.
EventClasses	Store common event information such as: event severity, the event log, category or subcategory, the name displayed, short message, long message, and event description.
EventHelpMessages	Display the help message associated with a given event.
Counters	Store counter information such as: the counter name, counter status (active or inactive), the scale used, type of server or cluster data aggregation, and units of measurement.
PerfHistory	Store counter information that is collected every 10 seconds for active counters. The information stored includes the server and counter identifier, the time the data was collected, and the actual data.
PerfHistory2	Store PerfHistory data that has been rolled up into one-minute intervals.
PerfHistory3	Store PerfHistory2 data that has been rolled up into fifteen-minute intervals.
PerfHistory4	Store PerfHistory3 data that has been rolled up into two-hour intervals.
PerfHistory5	Store PerfHistory4 data that has been rolled up into twenty-four hour intervals.

Application Center monitoring utilizes SQL stored procedures as well as table views. The stored procedures are used for maintaining database table information, and the table views are used for displaying information in the user interface.

Table Layouts

The table layouts for each of the tables described in the preceding section are shown in the following tables (Table 7.10 through Table 7.19).

Table 7.10 Servers

Column/field name	Data type	Length	Allow nulls—default is no
ServerId	smallint	2
ServerGUID	uniqueidentifier	16	Yes
__Server	nvarchar	255

Table 7.11 Events

Column/field name	Data type	Length	Allow nulls—default is no
Id	int	4
GUID	uniqueidentifier	16
ServerId	smallint	2
EventClassId	smallint	2
TimeGenerated	datetime	8
Data	nvarchar	2000	Yes

Table 7.12 EventClasses

Column/field name	Data type	Length	Allow nulls—default is no
EventClassId	smallint	2
EventId	int	4
Severity	smallint	2
[log]	varchar	100
Category	nvarchar	255
SubCategory	nvarchar	255	Yes
__Class	varchar	255	Yes
DisplayName	nvarchar	255	Yes
ShortMessage	nvarchar	255	Yes
LongMessage	nvarchar	1000	Yes
Description	nvarchar	1000	Yes

Table 7.13 EventHelpMessages

Column/field name	Data type	Length	Allow nulls—default is no
EventHelpMessageId	int	4
EventClassId	smallint	2	Yes
HelpMessage	nvarchar	300	Yes

Table 7.14 Counters

Column/field name	Data type	Length	Allow nulls—default is no
CounterId	smallint	2
Status	tinyint	1
AccessType	varchar	100
Scale	int	4
ServerAggregation	tinyint	1	Yes
ClusterAggregation	tinyint	1	Yes
Name	nvarchar	255
Units	nvarchar	10	Yes

Table 7.15 PerfHistory

Column/field name	Data type	Length
ServerId	smallint	2
CounterId	smallint	2
TimeMeasured	datetime	8
Data	float	8

Table 7.16 PerfHistory2

Column/field name	Data type	Length	Allow nulls—default is no
ServerId	smallint	2
CounterId	smallint	2
TimeMeasured	datetime	8
Data	float	8	Yes

Table 7.17 PerfHistory3

Column/field name	Data type	Length	Allow nulls—default is no
ServerId	smallint	2
CounterId	smallint	2
TimeMeasured	datetime	8
Data	float	8	Yes

Table 7.18 PerfHistory4

Column/field name	Data type	Length	Allow nulls—default is no
ServerId	smallint	2
CounterId	smallint	2
TimeMeasured	datetime	8
Data	float	8	Yes

Table 7.19 PerfHistory5

Column/field name	Data type	Length	Allow nulls—default is no
ServerId	smallint	2
CounterId	smallint	2
TimeMeasured	datetime	8
Data	float	8	Yes

Table Views

Application Center uses a collection of table views (Figure 7.13) to store and display real-time event and performance data and short-term historical data in the user interface.

Bb734912.f07uj13(en-us,TechNet.10).gif

Figure 7.13 The PerfHistory table view with SELECT statement and SELECT statement output

The PerfHistory table view that's shown in Figure 7.13 illustrates how Application Center uses SQL SELECT statements and inner/outer joins to consolidate and provide information that can be displayed in the monitoring console. The table views that Application Center uses are summarized in Table 7.20.

Table 7.20 Table Views Used to Display Event and Counter Information

Table name	Tables used for view
EventView	Events, EventClasses, Servers
EventDetailView	Events, EventClasses, EventHelpMessages, Servers
PerfHistoryView	PerfHistory, Counters, Servers
PerfHistory2View	PerfHistory2, Counters, Servers
PerfHistory3View	PerfHistory3, Counters, Servers
PerfHistory4View	PerfHistory4, Counters, Servers
PerfHistory5View	PerfHistory5, Counters, Servers

Tip If you're running SQL Server 2000, you can attach a cluster member's database to a SQL Server Group (New SQL Server Registration under the SQL Server Group node) and view the table views that Application Center uses. Examining the code for these views will give you some ideas for creating your own views of the data that Application Center stores in ACLog.

ACLog Capacity Requirements

Because the SQL desktop engine has a capacity limitation of 2 GB, storage requirements for event and performance logging over a given period need to be considered. The following information, based on estimates, provides some guidelines for the storage requirements for event and performance counter data.

Event Logging

Two factors influence the size of the event log:

The number of events received per hour.
The size of the event data field, which is determined by the values that are used for substitution and other event data.

The estimated record size for an event is 32 bytes for the identification field and a variable length for the data field.

Note The number of days that event log information is retained is stored in WMI, and that number is configurable through the user interface. The Cleanup stored procedure reads this information (set at 15 days as the default clean-up interval) and deletes records from the Events table whose TimeGenerated is greater than the clean-up interval.

Performance Counter Logging

Each performance counter record is 20 bytes. Counter storage requirements are determined by:

The number of active counters in use.
The length of time that counter data is stored.

The primary purpose of performance counters is to provide information that has some immediacy, that is to say, what is happening on a server now. This requires high-resolution counters plotted over a short period of time (from 10 through 15 minutes). Once this requirement is fulfilled, lower resolution counters—counters in which performance data is rolled up and aggregated to a less granular level—can be used to plot performance over longer periods. These graphs can help you identify trends by showing relative performance (high and low points), as well as showing day-to-day changes in server/cluster performance.

Note Since the performance chart can display only a limited number of plot points, aggregation needs to take place in order to plot performance data over time periods greater than 10-15 minutes.

Table 7.21 summarizes the roll up frequency and storage periods for the performance data that's stored in the various performance history tables. The period of time that the monitoring user interface requires this data for is also provided.

Table 7.21 Counter Retention Periods and Roll Up Frequency

Table name	Counter interval	Roll up frequency	Retention period	Time used by user interface
PerfHistory	10 seconds	Every minute	1440 minutes (24 hours)	15 min
PerfHistory2	15 minutes	Every 15 minutes	1440 minutes (24 hours)	2 hours
PerfHistory3	2 hours	Every 2 hours	1800 minutes (30 hours)	1.25 days
PerfHistory4	1 day	Every day	14400 minutes (10 days)	9 days
PerfHistory5	1 day	No rollup	180000 minutes (125 days or approximately 4 months)	17 weeks

The type of rollup, or aggregation, that's used for each counter is specified in the ServerAggregation field of the Counters table. The following list summarizes the types of aggregation (and corresponding field code) that are used by Application Center.

Average of Values—1
Sum of Values—2
Last Value—3
Min Value—4
Max Value—5

Note The type of aggregation that should be used for a performance counter depends on several factors, which are covered in Chapter 10, "Working with Performance Counters."

After performance counter data is no longer required by the monitoring user interface, the data is purged from the performance log tables by the Cleanup stored procedure.

Now that you possess the necessary background information, let's examine Application Center monitoring in its entirety, starting with the major steps in the monitoring process.

Monitoring: a Four-Step Process

The easiest way to approach the Application Center monitoring process is to break it down into the four major steps shown in Figure 7.14.

Bb734912.f07uj14(en-us,TechNet.10).gif

Figure 7.14 The major areas of Application Center monitoring activity and process flow

Generating Data

The first step in the monitoring process is creating data that will provide monitoring information. Enabled by WMI, Application Center uses the following major data sources to obtain information:

Application Center events
Health Monitor data collectors
Windows events
Performance counters

Logging Data

The next step in the monitoring process is logging the data that is generated. As you've seen already, Application Center uses the SQL desktop engine to store data for a cluster and its members, thereby extending the existing Windows event and performance logs—which also store certain information by default.

Querying Data

The third step involves querying the data store by using built-in components provided by Application Center. The various information views that are available through the user interface are obtained by using parameterized SQL queries—handled transparently by the user interface—that run against the ACLog database.

Viewing Data

The final step is presenting the information to the monitoring screens, which provides member-wide and cluster-wide views of events and performance.

Let's step through the first three steps in the monitoring process, starting with generating the data.

Generating Data

In the WMI section, you saw how events are fired by the operating system or an application. Application Center uses its own custom provider to generate events that supplement the data provided by Windows events and Health Monitor events. In addition to this event data, the Performance Monitor obtains counter-based performance data from the Performance Data Helper (PDH).

Application Center generates events for the following core services:

Cluster services
Replication service
Request forwarding
Monitoring

Figure 7.15 illustrates the architecture that's used to provide event information to WMI. The core services in the preceding list function as clients for the Passive Provider, which is a key element in sending Application Center events to WMI.

Note The provider is a decoupled WMI event provider that sends Application Center event notifications to WMI. This provider is an in-process COM component created by the provider's clients, such as the cluster and replication services.

Bb734912.f07uj15(en-us,TechNet.10).gif

Figure 7.15 Eventing architecture

As you can see in Figure 7.15, the Application Center Event Provider writes a subset of errors and warnings to the Windows Event Log and it sends all event information to WMI.

Event Schema

The Application Center event schema is a hierarchy of WMI classes with a common root, MicrosoftAC_Base_Event, which is the Base class. The following .mof code shows how MicrosoftAC_Base_Event is defined:

class MicrosoftAC_Base_Event : __ExtrinsicEvent
{
// Identifies the event. This is specific to the source that generated the event log entry 
and is used, together with SourceName, to uniquely identify an NT event type.
[Key]
uint32 EventId;
// This uniquely identifies each instance of an event so that we can refer to the event 
later. We will be automatically generating these events in the provider.
[Key]
string GUID;
// Error code
uint32 Status;
// Error message
string StatusMessage;
// Specifies the time at which the source generated the event.
datetime TimeGenerated;
// The severity level
[Values {"Error", "Warning", "Information"}, ValueMap {1, 2, 4}]
uint32 Type;
};

There are two types of event classes: Containers and Events. Containers are higher-level classes that serve as categories for events (Replication Session and Request Forwarding Initialization, for example). Every event generated at run time is an instance of a series of these hierarchical classes. A container query will return events for any of its children.

All the classes that derive from the base class use the following naming convention: MicrosoftAC_ class1 _ class2 _ class3 _ name _Event, where classn is the name of each of the parent classes, not including Base. Underscores indicate the class hierarchy. To reduce the length of class names, the parent classes can be abbreviated.

All classes end with Event to indicate that they are events per WMI convention. This extended class naming is done to make the event namespace more usable from monitoring tools, such as Health Monitor.

Schema Example

The class structure for Replication Service is:

Base

Replication

Engine

General

Events

This class structure is represented in WMI as:

MicrosoftAC_Base_Event

MicrosoftAC_Replication_Event

MicrosoftAC_Replication_Engine_Event

MicrosoftAC_Replication_Engine_General_Event

If you enumerate MicrosoftAC_RepEngGeneral, you'll find several events represented by the following classes:

MicrosoftAC_Replication_Engine_General_SetReplAttrFailed_Event
MicrosoftAC_Replication_Engine_General_SetDriverAttrFailed_Event
MicrosoftAC_Replication_Engine_General_StartReplFailed_Event
MicrosoftAC_Replication_Engine_General_SetDriverAttr_Event
MicrosoftAC_Replication_Engine_General_SetReplAttr_Event
MicrosoftAC_Replication_Engine_General_StartRepl_Event
MicrosoftAC_Replication_Engine_General_StopRepl_Event
MicrosoftAC_Replication_Engine_General_DirChangeNotifyFailed_Event
MicrosoftAC_Replication_Engine_General_RemovedirectoryFailed_Event

The other clients that use the event provider implement a class structure and schema that follows the example given for the Replication Service.

Figure 7.16 shows how event information is generated from an instance of the Replication Service's class MicrosoftAC_Replication_Engine_ General_StartRepl_Event. The event's status—"Synchronization enabled successfully"—is passed to the event provider, which in turn forwards the information to WMI. Once this information is stored in WMI, the appropriate event consumer can access the data and write to an event log(s).

Bb734912.f07uj16(en-us,TechNet.10).gif

Figure 7.16 Architectural elements and process flow when an Application Center service generates an event

Two items should be noted in Figure 7.16. First, nothing is written to the Windows Event Log unless an error occurs. Second, errors are written to the Windows Event Log, and all events are sent to WMI. WMI, in turn, writes information to the Application Center log and any user-defined logs.

Logging Data

As you may have already gathered, Application Center does not collect data in a central cluster database. Instead, it persists data in a SQL desktop engine database that's installed on each member. Queries related to monitoring are run against the individual data stores to provide member-wide and cluster-wide reporting.

After data is generated, it has to be logged. This is accomplished by using the architecture illustrated in Figure 7.17.

The central element in this monitoring model is the Log Agent component, which functions as an intermediary between data consumers and the local instance of the SQL desktop engine database. (The consumers subscribe to the providers described in the preceding section.) The agent runs in process to the logging clients—the consumers—and provides its services to the client whenever the client has data that needs to be logged. Each local instance of the log agent maintains an OLEDB connection to the data store and provides an interface for structured logging. Each time the agent writes the log, it combines the log data it receives (represented as a variant containing an array of variants) with the log parameters that the client passes (server information and time stamp) to generate a log record that's written to the database.

Bb734912.f07uj17(en-us,TechNet.10).gif

Figure 7.17 Logging architecture

The Event Logging Consumer

The Event Logging consumer is a permanent consumer that subscribes to events from the following sources:

Application Center
Health Monitor
Windows Event Log

WMI activates this component based on permanent consumer registration before delivering events. The Event Logging consumer is used by the user interface to configure the event query filters that determine which events to collect according to their level of severity.

Note This consumer runs as a COM+ application with the process identity of a cluster user. COM+ performs an access check when events are delivered by WMI via calls to the Event Logging consumer. During cluster creation, the user and/or password for the server may get changed. If this happens, you have to remember to alter the process identity of the consumer COM+ application accordingly.

Although WMI throttles the delivery of events from the provider to the consumer, there may be cases in which the consumer can't keep up with the incoming data, in which case an event buffer overflow occurs. If this happens, WMI will drop events. While there is no guarantee that data won't get lost, Application Center uses additional buffering to ensure that event data loss is minimal.

The Performance Counter Logging Consumer

Application Center uses the Performance Counter Logging consumer to log performance metrics that are used for historical performance charts. This consumer is a permanent event consumer that links directly to the PDH to obtain data. The Performance Counter Logging component is implemented as a COM automation server and runs out-of-process to WMI. In the event that the consumer isn't running, WMI activates it before delivering the events that the consumer uses.

This consumer is configured through WMI with instances of the following configuration classes:

MicrosoftAC_CapacityLoggingConfig
MicrosoftAC_CapacityCounterConfig

Because changes to these configuration class instances are made on the cluster controller and replicated by the replication engine to every cluster member, each member picks up configuration changes to the Performance Counter Logging consumer.

Querying and Preparing Data

It's hard to say which is more difficult, getting data out of the log or putting it in. On the query side of the argument, the system has to locate the fields that are required for a specific view of the data, and then the data has to be formatted for display on the screen. Figure 7.18 shows the architecture that Application Center uses to access and query the data store, retrieve the data, format the data, and display the data in the user interface.

Bb734912.f07uj18(en-us,TechNet.10).gif

Figure 7.18 The event querying and viewing architecture

The key new elements in this architecture are the user interface (Web browser and MMC snap-in) and the Log Query Helper (LQH) Service.

The User Interface

Either user interface can send requests for information to the LQH Service. These actions are triggered by setting focus on a node in the console tree or by clicking a button (for example, Refresh) in the details pane of the snap-in.

The LQH Service

The role of the LQH Service is to provide the log data needed to populate the Event Viewer and Performance Viewer pages. It runs as a service on the local system account (Application Center Log Query Helper) and depends on the RPC Service.

The rollup component performs these basic tasks:

Accepts requests from the user interface.
Passes a query to each server.
Returns results and status to the user interface.

The background information that we've provided about the different elements of Application Center health monitoring and how it's implemented should help you with the decisions that you'll have to make when modifying or creating new monitors—a topic that is covered in detail in Chapter 9, "Working with Monitors and Events."

Chapter 7 - Monitoring

On This Page

The Role of Monitoring

High-Level Architecture

Windows Management Instrumentation

Managed Applications

Managed Objects

Example

WMI Providers

Management Infrastructure

WMI Service

Event Notification

Query Language Support

Security Support

Application Center and WMI Security

The CIM Repository

Namespace

Classes and Instances

Performance Counters

The Performance Data Helper

Windows Performance Data Collection

Counter Types

Operating System Counters

Feature Counters

Health Monitor 2.1

High-Level Architecture and Operation

The Health Monitor Agent

Health Monitor Classes

Core Agent Provider

Other Providers

HTTP Provider

COM+ Provider

The Health Monitor Snap-in

Monitor Statistics and Alerts

Statistics View

Alerts View

Console Tree

SQL Server Desktop Engine and ACLog

The Application Center Log

Table Layouts

Table Views

ACLog Capacity Requirements

Event Logging

Performance Counter Logging

Monitoring: a Four-Step Process

Generating Data

Logging Data

Querying Data

Viewing Data

Generating Data

Event Schema

Schema Example

Logging Data

The Event Logging Consumer

The Performance Counter Logging Consumer

Querying and Preparing Data

The User Interface

The LQH Service

Additional resources