Chapter 4 - Enterprise Monitoring

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Updated : August 22, 2002

This chapter is part of the Exchange 2000 Server Operations Guide.

To track any problems and to ensure that your server running Microsoft® Exchange 2000 Server is running efficiently, you need to monitor it effectively. Monitoring should take place not only when there are problems, but should occur continuously as part of your maintenance program. This chapter discusses how to monitor Exchange 2000 Server computers effectively in your organization.This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.

Introduction

Monitoring is an essential part of successful Microsoft® Exchange 2000 Server operations. Through effective monitoring, you are able to determine if you are meeting service level agreements, and if you are not, which areas are causing problems (known as reactive monitoring). You can even use a trend analysis of the data you have collected to predict future problems for your organization and to obtain a global picture of your Exchange 2000 Server environment (known as proactive monitoring). Good reactive and proactive monitoring will help you to maintain high availability for your servers running Exchange.

In this chapter, you will learn how to monitor at the server level and the client level, the key areas to monitor, and what benefits you will gain from thorough monitoring. You can use two approaches to monitoring Exchange 2000 Server operations – basic tools such as System Monitor and Event Viewer, and using more advanced monitoring tools such as Exchange 2000 Management Pack for Microsoft Operations Manager (Operations Manager).

Prerequisites

Before beginning this chapter, you should be familiar with service level agreements and basic operations procedures (covered in Chapter 1, "Introduction").

Chapter Sections

This chapter covers the following procedures:

Performance monitoring
Event monitoring
Availability monitoring
Client monitoring
Operation personnel notification

At the end of this chapter, you will be able to monitor your Exchange 2000 Server environment effectively.

Performance Monitoring

Performance Monitoring is the monitoring of existing system(s) to ensure that optimum use is made of the hardware resources, and that agreed performance levels can be maintained.

Performance Monitoring allows you to determine if your server running Exchange 2000 is meeting the performance standards you have defined in your service level agreements (SLAs). Over time, you can use Performance Monitoring to generate data that can be used in trend analysis. This alerts you to possible performance and availability issues in the future, and allows you to solve problems before they arise).

One of the first tasks involved in performance monitoring is to generate a baseline. This baseline is a measure of what figures you expect to see when measuring a healthy system. This can then be compared to the figures you gather in day-to-day monitoring, allowing you to track problems easily.

In this section, you will look at the objects and counters that you may want to monitor using System Monitor. These parameters will form the basis of your baseline. You will also examine centralized monitoring techniques for remote servers.

System Monitor

If your e-mail system was Exchange Server 5.5, you are probably accustomed to using Microsoft Windows NT 4.0 Performance Monitor to analyze the performance of your Exchange 5.5 server. Exchange Server 5.5 includes a series of Performance Monitor Workspaces to allow you to quickly see in graph form a series of key counters.

The Microsoft Windows® 2000 operating system includes System Monitor (which consists of Performance Monitor and Network Monitor) for analyzing the performance of your system. When you install Exchange 2000 Server, a large number of objects are installed and counters are associated with those objects.

It is worth noting that while real-time graphs created in System Monitor often look very pretty, they are only of limited use, particularly if no one is looking at them. If you continually monitor 500 different counters on your server running Exchange, the self-monitoring uses CPU cycles. You have now undermined the performance of that server just by monitoring it. So only monitor what you need to, and consider using Performance Logging and Alerts, which can produce much more useful information with less of a load on the server. Reducing the frequency of monitoring produces much less of a load on the server and in many cases produces a more accurate picture, depending on the counters in question.

Note: Remote monitoring is almost always better than self-monitoring, because performance is not tainted by the load caused by monitoring. For more information about remote monitoring, see articles 243283, Creating a Log File to Send to Customers for Remote Monitoring and 240389, Error Message: Event ID: 2028 "The Service was Unable to Add the Counter \\Server_Name\Counter_Name" in the Microsoft Knowledge Base.

Exchange 2000 Objects and Counters to Monitor

Every Exchange 2000 performance object has at least one counter associated with it. For information on particular counters, in Performance Monitor, click Select Counters from List, select a counter, and then click Explain.

Table 4.1 shows the various Exchange services and resources and the associated performance objects that you can monitor.

Table 4.1 Services, Resources, and Associated Performance Objects

Service or Resource	Performance Object
Active Directory™ DXA Connector	MSExchangeADDXA
Address List	MSExchangeAL
Chat Communities	MSExchange Chat Communities
Chat Service	MSExchange Chat Service
Directory Service Access Caches	MSExchangeDSAccess Caches
Directory Service Access Contexts	MSExchangeDSAccess Contexts
Directory Service Access Processes	MSExchangeDSAccess Processes
Document Conferences	MSExchangeCONF
Document Conferencing Manager	MSExchangeDcsMgr
Document Conferencing Protocol (Multipoint Control Unit)	MSExchangeT.120
Epoxy Queues and Activity	EXIPC
Event Store	MSExchangeES
File Replication Connector	FileReplicaConn
File Replication Settings	FileRepSet
HTTP Extension	Exchange Server HTTP Extension
Internet Information Server Store Driver	Exchange Store Driver (IIS)
IMAP4	MSExchangeIMAP4
Web Storage System	MSExchangeIS
Mailbox Store	MSExchangeIS Mailbox
Public Folder Store	MSExchangeIS Public
System Information Store	MSExchangeIS
Lotus CC Mail	MSExchangeCCMC
Lotus Notes Message Center	MSExchangeNMC
Service or Resource	Performance Object
Message Transfer Agent	MSExchangeMTA
Message Transfer Agent Connections	MSExchangeMTA Connections
MS Mail Connector Interchange	MSExchangeMSMI
Exchange Referral Service	MSExchangeSA-RFR
MS Mail Connector Message Transfer Agent	MSExchangePCMTA
Name Service Provider Interface (AD Integration)	MSExchangeSA-NSPI Proxy
Network News Transfer Protocol Commands	NNTP Commands
Network News Transfer Protocol Server	NNTP Server
Novell Groupwise Connector	MSExchangeGWC
Object Linking and Embedding database events	MSExchangeOledb Events
Object Linking and Embedding database resources	MSExchangeOledb Resources
Post Office Protocol Version 3	MSExchangePOP3
Service Account	MSExchangeSA
Site Replication Service	MSExchangeSRS
Simple Mail Transfer Protocol	SMTP
Store Driver	Exchange Store Driver (Store)
Video Conferencing	MSExchangeIPConf
Web Mail	MSExchangeWebMail

The following sections describe the counters that are the most important to monitor, categorized by object.

Note: In the following sections, a number of queues are mentioned. Large queue buildup on any server usually indicates a problem, generally in routing. If you see unusually large queues for your environment, check your connector.

Information Store Counters

MSExchangeIS

For this object, monitor the following counters:

User Count – This displays the number of people currently using the Information Store (not the number of connections). It is impossible to properly judge the performance of a server running Exchange unless you know how many people are using it.
RPC Requests – This shows the number of client requests currently being processed by the store. You should expect this figure to be fairly small, typically below 25. If it is consistently higher than this, your server is overloaded.

MSExchangeIS Mailbox and MSExchangeIS Public

For these objects, monitor the following counters:

Send Queue Size – This shows the queue of messages outbound from the Information Store. In situations where the SMTP service is down or there is a reduction in performance, you will see a nonzero value for this queue. On large busy systems (2000 users or more) you may never see this value at zero, but on smaller ones (500 or so medium users) you would not expect to see nonzero values for any significant period of time.
Messages Sent/Min – This shows the rate at which messages are sent to the transport. This figure being low is not a problem in itself, but if the Send Queue Size is nonzero and the value is still low compared to your baseline, then there are performance issues that need to be resolved (you will only be able to tell what these are by monitoring other Exchange 2000 Server and Windows 2000 counters)
Received Queue Size – This shows the queue of messages inbound to the Information Store. Unlike the Send Queue size, this is often nonzero, except on a bridgehead server with no local mailboxes. However, if the value is consistently high compared to your baseline, it could indicate a problem.
Messages Received/Min – Again a low value here could simply indicate a quiet server; however if the Receive Queue Size value is high and this value is low, it indicates that you are receiving messages that are stacking up and are not being processed.

SMTP Server

SMTP traffic can be from SMTP Servers, such as other servers running Exchange, or it can be from POP3 or IMAP4 Clients such as Microsoft Outlook Express. When monitoring SMTP parameters, remember that your client base will affect these figures:

Local Queue Length – This shows the number of messages in the local queue (this queue contains messages that are queued for local delivery on the server running Exchange to an Exchange mailbox). Under normal operating conditions, this number is rarely greater than zero. A reading of greater than zero shows that the server is receiving more messages than it can process. If this number increases steadily over time, there is probably a problem with the Exchange Store you are trying to deliver to.
Categorizer Queue Length – This shows the number of messages waiting for advanced address resolution. After this, the messages either go to the local queue or are sent to the routing engine to be delivered elsewhere. A high figure here compared to your baseline can indicate message flow problems.
Inbound Connections Current – Shows the number of current inbound connections. If this reading remains zero over time, then there may be network problems.
Message Bytes Sent/Second – Examine this figure in conjunction with other counters and your baseline to determine if your SMTP Server is passing messages as quickly as it should. If, for example, this figure is low, but queues leading to this transport are high, then there is a problem with the SMTP transport.
Message Bytes Received/Second – Again, use this in conjunction with other counters and your baseline to determine overall health. For example, there may be a problem with the SMTP transport if a queue going into this transport is high while the Message Bytes Received/Second is low.
Avg. retries/Msg delivered – When Exchange fails to deliver messages, those messages enter a retry queue. The SMTP server is configured with a retry interval showing how long the server will wait before a first retry, second retry, and so forth. This counter shows how many messages are going into retry as a fraction of the overall messages delivered. You should expect the figure to be close to zero. If large number of messages are being retried, the figure will approach 1. This counter is therefore a good indicator of general message delivery problems on your network.
Avg. retries/message sent – This counter is the same as the previous counter, except it applies to outgoing messages as opposed to incoming ones.

MSExchangeMTA and MSExchangeMTAConnections

In a pure Exchange 2000 Server environment running in native mode, the MSExchangeMTA and MSExchangeMTAConnections objects are not particularly important. However, in cases of coexistence with Exchange Server 5.5, or where messages are being relayed to and from X.400 recipients, you may want to measure the Messages/Sec and Work Queue Length of the MSExchangeMTA object and the Queue Length of the MSExchangeMTAConnections object.

In Exchange 2000 Server, you may find the Message Transfer Agent (MTA) shutting down fairly frequently, especially if it cannot find a domain controller temporarily. To resolve this problem, you may want to use the recovery actions option in services to restart the service in the event of it being stopped.

MSExchangeIM Virtual Servers

If you are running Instant Messaging in your organization, you may find that the organization quickly becomes as reliant on Instant Messaging as it is on e-mail. It is therefore important that you monitor Instant Messaging Counters. You should examine the following:

Current Online Users – This shows the number of users logged on to the server. Examining this parameter over time helps to determine the actual take up of Instant Messaging in your organization and therefore help you to scale it properly across multiple servers.
Current Subscriptions – This shows the number of subscription notifications sent to the server by the Instant Messaging client. A subscription notification occurs when a user is added to the contact list. This gives an indication of how heavily clients are using Instant Messaging.
Inbound Subscribes/sec – This shows the average number of subscribes/second. If this figure is low but the usage of Instant Messaging is high, it could indicate an overworked Instant Messaging server.

MSExchangeAL

The Recipient Update Service (RUS) plays a crucial role in the day-to-day operations of Exchange 2000 because it is responsible for keeping e-mail addresses and membership of address lists up to date. You should measure the Address List Queue Length when examining the RUS:

The Address List Queue Length shows the load the Recipient Update Service is under. If this value is consistently high compared to your baseline, you should seriously consider upgrading the server that has this role, or transferring the role from a weak or overloaded server to a more powerful one.

Windows 2000 Objects and Counters to Monitor

A heavily used Exchange 2000 server may have a number of bottlenecks. Simply monitoring Exchange 2000 Server performance objects and counters in isolation will not give you information about the condition of the server itself. You will need to monitor for bottlenecks in the Disk Subsystem, Memory, Processor, and the Network Subsystem. For example, in many cases there will be multiple instances of disks and processors, so make sure that you monitor all instances (that is, each disk or each processor). Table 4.2 shows which objects and counters it would be most useful to monitor, along with any specific notes regarding Exchange.

Note: When monitoring disk counters, you need to enable them to start at boot, using the diskperf –y command.

Table 4.2 Subjects and Associated Objects and Counters

Subsystem	Object	Counter	Exchange Comments
Disk	Logical Disk	% Free Disk Space
	Physical Disk	% Disk Time	Usually unreliable for RAID systems, so rarely applicable
	Physical Disk	Disk Reads/sec
	Physical Disk	Disk Writes/sec
	Physical Disk	Current Disk Queue Length	Should occasionally dip to zero
	Physical Disk	Avg secs per read	Should be analogous to published disk speed
	Physical Disk	Avg secs per write	Should be analogous to published disk speed or 1-2ms if you have write back caching enabled on your RAID controller
Memory	Memory	Committed Bytes
	Memory	Pages/sec	Exchange 2000 makes heavy use of a pagefile. A large amount of paging is not in itself an indication of a problem.
	Memory	Page Reads/sec	Value should generally be below 100. If the value is consistently high, you may need to increase system memory.
	Memory	Page Writes/sec	Value should generally be below 100. If the value is consistently high, you may need to increase system memory.
	Paging File	% Usage	You may need to increase the size of your pagefile for Exchange. Try to keep this counter below 70%.
	Process	Page Faults/sec
Processor	Processor	Interrupts/sec
	Processor	%Processor Time	The creation of indexes by Full Text Indexing generally uses a great deal of processor time. However, a low priority thread is used, so it does not necessarily cause performance issues.
	Process	%Process Time	Measure the following instances: store (Information Store), inetinfo (IIS), lsass (security system including AD), and mad (System Attendant)
Process	System	Processor Queue Length
	System	Context Switches/sec
Network	Network Segment	% Net Utilization
	Redirector	Bytes Total/sec
	Redirector	Network Errors/sec
	Server	Bytes Total/sec
	Server	Work Item Shortages
	Server	Pool Paged Peak
	Server Work Queues	Queue Length

For more information about monitoring Windows 2000 objects, see the Windows 2000 Server Resource Kit.

Centralized Monitoring

In an enterprise environment, you can reduce operations costs dramatically if you can capture performance data in a central location. Doing so moves the load of monitoring from the monitored server to the centralized server and also allows you to compare the performance of similarly configured servers and ensure a consistent response in the event of a problem with a server running Exchange.

An example of a centralized monitoring tool for Exchange 2000 Server is Microsoft Operations Manager with the Exchange 2000 Management Pack.

The centralized monitoring provided by Operations Manager scales to a large number of servers and provides an Exchange administrator with a single place to monitor all aspects of server health. Operations Manager can also be used to provide SLA monitoring, by tracking the number of alerts that were not handled within the period specified in the SLA.

The Microsoft Operations Manager can collect information about Exchange 2000 Server performance objects and counters, storing them in the repository where they can be used for long-term analysis. In addition, the Exchange 2000 Management Pack includes a set of scripts that provide detailed monitoring of the health and performance of an Exchange 2000 Server computer. Examples of the monitoring done in these scripts include:

Monitoring of service availability by verifying that it is possible to log into a test mailbox on each Exchange server. This ensures proper functionality by Exchange and dependent services. The Exchange administrator will be alerted the test mailbox cannot be accessed.
Verifying and monitoring the flow of mail between test mailboxes on servers, including mail delivery latency. The administrator is alerted if a given number of successive mails were not received.
Watching for dismounted Exchange databases and critical Exchange services which should be running on a server, and alerting the administrator of any problems.
Monitoring free disk space for Exchange, given the different usage patterns of different types of Exchange disks.
Collecting data on mail flow to and from servers, individuals, and domains sending and receiving the most mail, the largest mailboxes and public folders.

Using a tool such as Microsoft Operations Manager is not the only way to gather centralized performance monitoring information. Exchange is a Windows Management Instrumentation (WMI) provider, so it is possible to create your own Web interface for gathering information from other servers on the network.

Event Monitoring

When Exchange 2000 Server is running smoothly, event monitoring does not seem especially important. However, when performance is poor, you will quickly see the benefits of event monitoring. Event Viewer is a useful source of information about Exchange 2000 Server, along with log files that you may choose to generate. Large organizations may require an application such as Microsoft Operations Manager for reporting on Exchange 2000 Server events.

Event Viewer

Exchange reports to the Application event log. By default, it logs all critical events to the Application log. By increasing the logging on particular Exchange services, you can ensure that more data is available.

To enable logging for a particular Exchange service, right click the server in Exchange System Manager, select Properties, and then select the Diagnostics Logging tab.

The logging levels are:

None – Only error messages are logged (the default setting on all the services)
Minimum – Warning messages and error messages are logged
Medium – Informational, warning, and error messages are logged
Maximum – Troubleshooting (extra detail), informational, warning, and error messages are logged

You can log the following services in Exchange 2000 Server:

IMAP4Svc (IMAP4 Protocol)
MS-ExhangeAL (Address List)
MSExchangeIS\System (Information Store System)
MSExchangeIS\Mailbox (Information Store Mailbox)
MSExchangeIS\Public Folder (Information Store Public Folders)
MSExchangeSRS (Site Replication Service)
MSExchangeTransport (SMTP Routing Engine and Transport)
MSExchangeMTA (MTA Service)
MSExchangeSA (System Attendant Service)
POP3SVC (POP3 Protocol)

Under normal operating conditions, it is not necessary to set logging levels any higher than minimum, because increasing logging rapidly fills your event log with a great deal of unnecessary information. When issues arise, you can increase the level of logging to allow you to diagnose the problem, reducing it again after the issue has been resolved.

The Windows 2000 Resource Kit includes elogdmp.exe, a utility which allows you to dump the information in any Event Viewer log to a file for analysis elsewhere.

One of the difficulties of viewing event logs is knowing which events are more worrisome than others. In some cases, Exchange 2000 Server issues Stop events, which record temporary issues that resolve themselves in the course of time. In other cases it records warning events, which are indicative of more substantial problems.

In general terms, the errors and warnings that are likely to cause the most problems are Store errors, because they can affect the ability to access e-mail. 1018 and 1019 errors can indicate major problems for Exchange, typically caused by faulty hardware. You should watch for these two explicitly, and for Store errors and warnings in general.

You should also be careful to watch for errors indicating that domain controllers/global catalog (GC) servers cannot be found. If a GC cannot be found, the store will automatically dismount. Similarly, if the MTA service is temporarily unable to contact a domain controller, it will shut down. Watching for these errors allows you to diagnose quickly why services are being lost in the event of a problem.

One of the main problems with event viewing in Exchange 2000 Server is the sheer volume of information Exchange produces when you increase the logging level. It is often beneficial to use filters in the Event Log to produce only warning and critical events, or to use utilities that only display the more significant events.

Log Files

As well as logging events to Event Viewer directly, Exchange 2000 Server also produces a series of log files that can prove useful in troubleshooting problems. The Protocol Logging tool generates specific information about the commands being sent and received by SMTP and NNTP.

To enable logging for SMTP or NNTP, select the properties of the appropriate virtual server and enable logging. You can then alter the logging frequency and the name and location of the log file.

To enable logging for HTTP on the default Web site, use IIS administrative tools.

Centralized Event Monitoring

As with performance monitoring, monitoring events centrally provides distinct benefits to many organizations. A number of tools help you to do this efficiently, including Microsoft Operations Manager.

Operations Manager pulls information from a variety of locations, including event logs, WMI events, SNMP traps, and transaction logs. It consolidates these events from multiple sources to give you an overall picture of the Exchange 2000 Server environment. You can script responses to particular events, issuing notifications or taking predefined actions in response to particular events. One particularly useful feature is the ability to integrate events with a knowledge base, ensuring that useful explanations and recommended actions are issued to operators when particular events occur. The Exchange 2000 Management Pack gathers information on specific events required by the operations staff, sends alerts about outages, and provides early detection of problems before they result in an outage.

Availability Monitoring

To meet your availability SLAs, you need to ensure that, as much as possible, you protect against downtime. Because it is impossible to guarantee that there will be no unexpected downtime in your organization, you need to ensure that you are notified quickly in the event of unexpected downtime.

Whenever you are monitoring and measuring Exchange 2000 Server availability, it is important to consider domain controllers as well as servers running Exchange. You may do all you can to ensure high reliability of servers running Exchange, but they will not have high availability if there are no domain controllers available for Exchange or the clients to use. Therefore you should also monitor domain controller/global catalog server availability as well as network availability.

Monitoring and Status Tool

The monitoring and status tool is available in Exchange System Manager. This tool is used to monitor Exchange services and perform actions if the services fail. For Exchange to run as it should, a set of default services should be running. These services are:

Microsoft Exchange Information Store Service
Microsoft Exchange MTA Stacks
Microsoft Exchange Routing Engine
Microsoft Exchange System Attendant
Simple Mail Transport Protocol
World Wide Web Publishing Service

If any of these services are not running, Exchange 2000 logs a critical state warning in Event Viewer.

Note: The Monitoring and Status tool does not notify you if a store has become dismounted. To ensure that you are notified of a dismounted store, you will need to use other monitoring tools such as the Exchange 2000 Management Pack for Microsoft Operations Manager.

Adding Services to the Default Configuration

You can add additional services to the default Microsoft Exchange services that are monitored by the Monitoring and Status tool. If any of these additional services fail, they log a critical state warning, just as the default services do. This is particularly useful if you have other Exchange services that are vital to the user experience in your environment (for example, if Instant Messaging is used heavily in your organization).

Monitoring Resources

You can monitor other resources using the Monitoring and Status tool. To do so, click Add on the Monitoring tab and select the resources you want to monitor. These resources are monitored to see if they pass two thresholds. Resources that pass the first threshold enter a "warning" state; those that pass the second threshold enter a "critical" state. The following resources can be monitored:

Available Virtual Memory – You can set minimum availability thresholds for memory and a minimum period of time for which available virtual memory must be above a particular threshold.
CPU Utilization – You can set maximum CPU utilization thresholds for the CPU(s) in your server running Exchange.
Free disk space – You can set minimum drivespace thresholds for the disk drives in your server running Exchange.
SMTP queue growth – SMTP queues should not continue to grow. You can issue notifications if they continue to grow for longer than a specified period of time.
X.400 queue growth – X.400 queues should not continue to grow. You can issue notifications if they continue to grow for longer than a specified period of time.
Windows 2000 service – You can add additional Windows 2000 services to monitor. These services can be added to the default configuration just as you can add other services.

Although passing these thresholds does not necessarily affect availability directly, you will often find that your server is close to being unavailable in these circumstances, so it is very important to monitor them. A good example is free disk space. If you run out of disk space on the disk containing the transaction logs, the latest transactions will be written to res1.log and res2.log and the Exchange services will be shut down, resulting in a loss of availability.

Notifications

When services or resources enter a warning state or a critical state, it is important that operations staff is notified, so they can react accordingly. The configuration objects in the Notification Container allow you to determine which server does the monitoring, which servers, services and resources are being monitored, at what point a notification is being sent out (at the warning state or the critical state), and what to do in the event of entering a warning state or a critical state. You can either launch a script, or send an e-mail notification.

Note: Be very careful about how you configure e-mail notifications. If you are notifying users of a failure in the e-mail service, there is a possibility that the notification may never be received.

Status

The details pane of the Status container allows you to view the status of servers and connectors in your organization.

The Status container shows the following server states:

Available – This shows that the server is online and all the main services are running normally.
Unreachable – This shows that one of the main services on the server is down.
In Maintenance Mode – This shows that monitoring is disabled on this server for maintenance.
Unknown –This shows that the system attendant on the monitoring server cannot communicate with the monitored server.

When looking at connectors in the Status container, you will see the following possible states:

Available –This shows that the connector is functioning normally.
Unavailable –This shows that something is not functioning properly on the connector and that someone will need to investigate further.

Disabling Server Monitoring

In some circumstances it is necessary to take a server down for scheduled maintenance, or to rebuild a server that has failed. In cases where you are already aware of the problem, you can prevent a series of alerts from being issued by choosing the properties of the server in the Status details pane and selecting Disable all monitoring on this server. When your maintenance is complete, you can return to this dialog box and clear the option.

Centralized Availability Monitoring

In many environments, it is particularly important to have some sort of centralized availability monitoring if you are to meet your SLAs. Trend analysis is also very important, so you can avoid losing availability. In particular, if you monitor and find a degradation in performance over time, it may be an indicator of impending availability problems.

Microsoft Operations Manager with the Exchange 2000 Management Pack have a number of features that assist in availability monitoring. A lengthy response time in sending and receiving mail to another server running Exchange may indicate a loss of availability somewhere in the path the message would normally follow. Microsoft Operations Manager with the Exchange 2000 Management Pack also tells you when services are down, when queue lengths are abnormally high, or when public folders are inactive (perhaps because of a problem with replication to that folder).

Regardless of whether or not you choose to use Microsoft Operations Manager with the Exchange 2000 Management Pack or similar third-party tools in your organization, you should carefully consider finding a way of gathering information centrally about your Exchange environment. It is very important that information on existing or impending availability problems quickly reaches a person who can do something about it.

Client Monitoring

While it is very important to monitor the availability and performance of servers running Exchange, domain controllers, and the network, none of these directly cover one critical area – the experience of the Exchange end user. This area can be very challenging because your clients can differ greatly. They may be HTTP, POP3, or IMAP4 clients running over an intranet or the Internet. They could be MAPI clients connecting over an internal network, or using a VPN to tunnel in. While this makes client monitoring more difficult, it also makes it more important. After all, the main reason you monitor at the server level is to ensure better performance and availability for the end users of Exchange. Without monitoring at the client level, you cannot prove that your improved server performance is reaching the client. Furthermore, in many cases, you will be required to deliver particular levels of performance at the client level. You will need to be in a position to prove that you are meeting the client expectations. Client monitoring tools give you the ability to prove that you are meeting your target levels of performance and availability.

Monitoring at the client level differs dramatically from monitoring at the server level in that you will almost certainly not want to monitor all clients. Monitoring affects the performance of the client, but more significantly, if you monitor all workstations, you will generate a significant amount of network traffic, which could affect the overall performance of Exchange. Furthermore, if a server running Exchange is unavailable, you do not need to be told this by 5000 clients. Being told by one is usually sufficient.

There are a number of third-party tools on the market for monitoring clients. These tools generally work by having an agent installed on the client, simulating typical Exchange client activities (starting up Outlook, performing an address book lookup, accessing public folders, sending e-mail, and so forth). Agents report to a central management server, which collates their information and issues reports, notifications, and alerts in the event of problems.

For more information about how to handle problems when they arise, see Chapter 6, "Support."

Think of client monitoring not as something that examines the performance levels of each client, but rather as something that you use to verify that your server performance and availability levels are being reflected in appropriate client performance and availability. It is generally a good idea to ensure that you have at least one agent running per subnet because this will help you to identify problems at the client due to lack of network connectivity to a server running Exchange or to the domain controller/global catalog server. You should also have at least one agent running for each type of client. If, for example, the clients differ in operating system or in the Exchange client software they use, they could be affected differently, and so should be monitored separately. If you give users some freedom over the configuration of their computers, it is usually a good idea to run the agent on computers that users do not directly interact with. (It is important not to confuse a loss of service on the client due to Exchange 2000 Server issues with loss of service on the client due to user error.)

Summary

It is impossible to operate servers running Exchange efficiently if you do not know what they are doing. It is very important to ensure that you always have enough information about your Exchange environment to predict problems and to verify that you are meeting your service level agreements. However, there is such a thing as too much information. Servers running Exchange can produce a huge amount of information, much of which is unnecessary in a healthy Exchange environment. If your monitoring is to be useful and efficient, you need to ensure that you collate useful data, have an understanding of what it means, and are prepared to increase or decrease logging levels according to what is required at that time.

When monitoring Exchange, do not restrict yourself to real-time monitoring. Use recorded data to perform trend analysis. Doing so allows you to prove that you are meeting your SLAs and alerts you to potential problems in the future.

In larger scale environments, seriously consider a centralized approach to monitoring. This helps to ensure that information about problems is available in the data centers where more expertise is available. It also allows you to compare similar servers running Exchange for performance and to get a consolidated picture of your Exchange environment.

More Information

The Microsoft Operations Framework provides technical guidance and industry best practices that encompasses the complete IT service management environment including service monitoring and control, availability management, and Service Level Management.

For more information about the Microsoft Operations Framework, go to the following Web site:

https://www.microsoft.com/technet/itsolutions/cits/mo/mof/default.mspx

For prescriptive MOF information about service monitoring and control, availability management, and service level management, please review the detailed operations guides at https://www.microsoft.com/technet/prodtechnol/windows2000serv/default.mspx

For more information about the Exchange 2000 Management Pack for Microsoft Operations Manager, see the following: https://www.microsoft.com/technet/prodtechnol/exchange/2000/maintain/mom.mspx