System Monitoring with the Exchange Management Pack

When monitoring a collaboration and communication system, you need to consider the components with which the system is integrated. The Microsoft® Exchange Server 2003 architecture includes not only Exchange-specific components, such as the Microsoft Exchange Information Store service and system attendant, but also Active Directory® directory service, Simple Mail Transfer Protocol (SMTP), and underlying hardware. The Domain Name System (DNS) is also a critical element, without which neither Active Directory nor an Exchange 2003 organization can operate reliably.

Tracking and monitoring the system and preventing and solving errors require understanding possible points of failure and corresponding troubleshooting strategies. For example, a problem might be that messages are not being delivered to recipients. The cause of this might be related to the SMTP transport engine or DNS name resolution, or it might be a physical connectivity issue. The Exchange Management Pack accelerates the diagnostic process by including alerts, filters for events, rules for tracking common problems, reports and statistics, and a wealth of technical background information in knowledge base articles. With all these Exchange Management Pack tools combined, you can implement a comprehensive, centralized approach to monitoring an Exchange 2003 organization.

To understand how to use the Exchange Management Pack and monitor your Exchange organization, familiarize yourself with the following:

  • Monitoring Important Exchange 2003 Components   Exchange is a structured, but complex system. Understanding its components can help isolate trouble areas.

  • Monitoring System Availability   The Exchange Management Pack includes rules and scripts to help monitor database operations, service availability, mail flow, and general functionality of the servers.

  • Monitoring Health and Performance   To evaluate growth and health, you can use existing rules, or add custom ones. Tracking performance data and system health is important in planning for growth and in anticipating points of failure.

  • Monitoring Exchange Events   The Exchange Management Pack provides complex filtering and viewing tools that help monitor events related to an Exchange organization.

  • System Monitoring Best Practices By following established best practices, you can optimize your monitoring efforts with the Exchange Management Pack.

On This Page

Generating System Reports and Statistics
Search for Knowledge
Search for Events
Monitoring System Availability
Monitoring Health and Performance
Monitoring Exchange Events
System Monitoring Best Practices

Generating System Reports and Statistics

The Exchange Management Pack includes reports to view system performance and data over time and in a summarized format. The reports are particularly useful when comparing to a baseline standard in a centralized server environment, and for an overview of the current Exchange organization.

MOM reporting is accomplished by querying the data warehouse for the data that you want, summarizing the data, and then creating an output report of formatted data. The Exchange Management Pack includes predefined reports, and you can create custom reports according to your needs. In MOM 2005, data is transferred daily from the operational database to the data warehouse database. Because reporting is reliant on the data warehouse, you must have the full version of MOM 2005 deployed.

The reports are generated by SQL Reporting Services and can be viewed by selecting Operation in the left pane and clicking Start Reporting Console from the MOM 2005 Administrator Console or by opening the Reporting Console located in the Microsoft Operations Manager 2005 group off the Start menu. You can run reports for a specified time and select from which servers the data should be used in generating summaries and reports. To additionally customize reporting for your organization, you can specify chart options to display a preferred scale, chart type, and range.

Search for Knowledge

MOM 2005 includes product knowledge related to alerts. You can use this information to help identify the root cause of problems that cause alerts to be generated and to prevent these problems from recurring.

To view the product knowledge

  1. Click Start, point to Programs, point to Microsoft Operations Manager 2005, and then click Operator Console.

  2. In Microsoft Operations Manager 2005 - Operator Console, in the left pane, expand Microsoft Operations Manager.

  3. Under Microsoft Operations Manager, in the left pane, click Alerts.

  4. In the right pane, select the Alert that you want to learn more about.

  5. At the bottom of the right pane, in the Alert Details pane, click the Product Knowledge tab. If MOM contains additional product knowledge related to the alert, a check mark will appear on the Product Knowledge tab (see Figure 4.1).

  6. Read the information on the Product Knowledge tab to learn more about the Alert.

    Figure 4.1   Microsoft Operations Manager 2005 Operator Console - Alert Views

    Figure 4.1   Microsoft Operations Manager 2005 Operator Console - Alert Views

Search for Events

The MOM 2005 Operator Console (Figure 4.2) lets you quickly view and interpret events that have occurred on the servers that you are monitoring. From the Events pane, you can customize your view to highlight only specific types of events, and you can identify any alerts that may have been generated because of the event.

Figure 4.2   Microsoft Operations Manager 2005 Operator Console - Event Views

Figure 4.2   Microsoft Operations Manager 2005 Operator Console - Event Views

Monitoring System Availability

The Exchange Management Pack includes rule groups to monitor system availability. These rules can be enabled and disabled according to your requirements. The rules monitor the following components:

  • Mail Flow   Test messages between sending and receiving servers are sent out periodically through scripts. This is one of the fastest ways to monitor availability. If messages can be sent and delivered, the Exchange servers and their components are functioning.

  • Exchange Services   The key services that make up Exchange servers were discussed earlier in the chapter. the Exchange Management Pack checks these components and provides alerts in case of failure.

  • MAPI   A MAPI client, such as Microsoft Office Outlook® 2003, accesses the databases that store Exchange data. This rule group can verify that these operations are successful.

  • Database   The rules in this group let you know when a database is connected and disconnected. Alerts are generated only when the database is disconnected.

  • Outlook Web Access   Errors, logins, and test verifications of functionality are part of the Outlook Web Access rule group. With the rules and scripts included, you can monitor Outlook Web Access.

  • Outlook Mobile Access   Scripts in this rule group synthetically log on as a client to make sure that Outlook Mobile Access functions. Different types of alerts are generated for different failures.

  • Exchange ActiveSync   Scripts in this rule group perform synthetic Exchange ActiveSync® logons and monitor the results to determine the availability of Exchange ActiveSync.

Each component and its capabilities are discussed in the following sections.

Mail Flow

These rules verify mail flow between Exchange servers. The mail flow verification rules function only when the OnePoint service is running as the Local System account. You can use the rules to set up each Exchange 2003 server to send mail to other servers running Exchange 2003 and to receive mail from another server. You can also configure it to send mail back to the same server. If the mail flow between servers is interrupted, a notification is sent when an error occurs.

Information from these rules is used to generate Message Traffic reports. The Message Traffic reports provide data gleaned from message tracking logs. Because messages sent between mailboxes that are located on the same server are not logged in the message tracking logs, information about messages sent and received between mailboxes on the same server is not reported in the Message Traffic reports.

This group includes the following event rules:

  • Receive mail flow messages   This rule uses the Exchange 2003 – Mail Flow Receiver script and periodically checks mail flow. It generates an alert if message delivery latency exceeds a specified threshold.

  • Mail flow script cannot resolve recipient's address   This rule generates an alert if a recipient's address cannot be resolved.

  • An invalid parameter was sent to the Received Mail script   This rule generates an alert when a malformed or unacceptable parameter is passed to the script. The alert contains a description of acceptable values.

  • Mail flow latency exceeded the specified threshold   This rule generates an alert when latency threshold is greater than the defined value.

  • General errors in the mail flow scripts   This rule generates an alert when a mail flow script stops running.

  • Mail flow message not received   This rule generates an alert when messages sent by the mail flow verification scripts are not received.

  • Send mail flow messages   This rule is triggered by a timed event and periodically runs a script that sends a message to one Exchange server to verify that mail is being sent without problems.

  • Clock synchronization problem   This rule generates an alert when the system clock on the Exchange servers reports negative latency beyond the defined threshold.

The MOM DTS tool transfers data between the MOM database and a separate database that is offline. This separate database can be used for long-term data retention, for long-term trending, for additional reporting, and for keeping the MOM database well maintained. Data collection from the message tracking logs occurs once each day, and depends on completion of the DTS package. Therefore, there is a delay of approximately one day between activity in the message tracking logs and updated information in the Message Tracking report.

Mailbox Access Account

The Mailbox Access account must be able to log on for mail flow to function correctly. Therefore, it has specific rights to enable mail flow scripts to work. An access control entry (ACE) is added and specifies the following rights:

  • ADS_RIGHT_READ_CONTROL   The right to read data from the security descriptor of the object, not including the data in the system access control list ( SACL).

  • ADS_RIGHT_DS_READ_PROP   The right to read properties of the object. The ObjectType member of an ACE can contain a GUID that identifies a property set or property. If ObjectType does not contain a GUID, the ACE controls the right to read all the object properties.

  • ADS_RIGHT_DS_LIST_OBJECT   The right to list a particular object. If the user is not granted such a right, and the user does not have ADS_RIGHT_ACTRL_DS_LIST set on the object parent, the object is hidden from the user. This right is ignored if the third character of the dSHeuristics property is '0' or not set.

  • ADS_RIGHT_ACTRL_DS_LIST   The right to list child objects of this object.

Note For more information about ACE properties, see the Active Directory Service Interfaces (ADSI) enumerations (https://go.microsoft.com/fwlink/?LinkId=25449 ).

This ACE is added directly to the locations listed in Table 4.1.

Table 4.1   Mailbox Access account ACE locations

LDAP object

Inherited in the LDAP tree?

ViewStoreStatus

Configuration container

No

No

Exchange organization

No

No

Address lists container

Yes

No

Addressing container

Yes

No

Admin groups container

No

No

Selected admin group container

Yes

Yes

Global settings container

Yes

No

Recipients policies container

Yes

No

System policies container

Yes

No

ViewStoreStatus is a specific Exchange property that enables the account to view database information. The security ID (SID) of the Mailbox Access account is added to the msExchAdmins property so that it appears on the Delegation Wizard. The value specified in this property is the pair SID + ",30".

For each test mailbox, the Mailbox Access account has the following rights:

  • Delete mailbox storage

  • Read permissions

  • Full mailbox access

Exchange Services

The Exchange Services rule group has rules that use a timed rule that runs a script to periodically determine whether key Exchange services are running. The Configuration Wizard identifies the services to monitor by default, although you can customize this list. If a service stops, and the severity level is Error or higher, a notification is triggered. Additionally, the State View in the Operator Console will reflect that the service has stopped on your server. After the service is detected as restarted, the state view will reflect that the server is back up.

MAPI

The MAPI rule group checks whether a MAPI client can log on to an Exchange database. Implementation of this rule group enables verification of both the Exchange database and Active Directory availability. This data is used for the Exchange server availability report. The rules in this group require the MOM agent action account to run as the Local System account. Notification is sent when the severity level is Critical Error or higher.

Database

Two event rules and an alert rule make up the Database rules group. The rules determine which Exchange databases are not connected. An alert is generated when a database fails to connect or is disconnected. When the severity level is Error or higher, a notification is sent.

Outlook Web Access

These rules verify the availability of Outlook Web Access on a front-end Exchange server. These rules perform synthetic Outlook Web Access logons and check the results to determine the availability of Outlook Web Access. A notification is sent when the severity level is Error or higher.

This group includes the following event rules:

  • Outlook Web Access logon failure: Webexception   Synthetic Outlook Web Access logon attempt failed because of an exception. This rule generates an alert when a MOM event ID 20003 occurs.

  • Outlook Web Access logon failure: (HTTP error 401) Unauthorized   This rule generates an alert when MOM event 20015 occurs. The logon failure is caused by a rejected user name and password combination.

  • General error during synthetic OWA logon   This rule generates an alert when services or components on which the synthetic logon object relies are not running, are having problems, or refuse connection.

  • Outlook Web Access logon failure: (HTTP error 400) Bad Request   This rule generates an alert when MOM event 20014 occurs. The logon failure is caused because the server does not understand the request because of malformed syntax. This is frequently caused by interrupted communications.

  • Outlook Web Access logon failure: (HTTP error 404) Server not found   This rule generates an alert when MOM event 20017 occurs. When a connection cannot be established to the Outlook Web Access server, an alert is generated.

  • Outlook Web Access logon failure: Authentication error. Logon request was redirected back to logon page   An alert is generated when a logon attempt fails because of an authentication error. The credentials for Mailbox Access account may be incorrect or changed after initial deployment.

  • Outlook Web Access logon failure: (HTTP error 504) Service Unavailable   An alert is generated when MOM event 20013 occurs. This event signifies that the server cannot handle the request because of temporary overloading or maintenance of the server.

  • Outlook Web Access logon failure: (HTTP error 407) Proxy Authorization Required   An alert is generated when MOM event 20018 occurs. This event signifies that a proxy is required. If a proxy server is installed, it might not be relaying connections correctly.

  • Outlook Web Access logon failure: (HTTP error 408) Request Time Out   When a client request times out waiting for a response, an alert is generated. This rule generates an alert when MOM event 20019 occurs.

  • **Outlook Web Access logon failure: (HTTP error 403) Access forbidden   **An alert is generated when MOM event 20016 occurs. This event signifies that too many users are connected to the server.

  • Outlook Web Access logon failure: General HTTP error   When the Outlook Web Access server returns an error during a logon try, this rule uses the related MOM event 20011 to generate an alert.

  • **Unexpected error during synthetic Outlook Web Access logon   **When an error occurs during a logon try that is not addressed by a specific error type, MOM event 19999 is written to the log. This rule generates an alert when the event occurs.

  • Outlook Web Access A logon failure: (HTTP error 500) Server returned an unknown error   An alert is generated when a MOM event 20012 occurs. This event signifies that Outlook Web Access has returned an error, related to either ASP.NET, Kerberos or to general server malfunction.

  • Synthetic Outlook Web Access logon   This rule is a timed event that runs every 15 minutes and uses the Exchange 2003 - Outlook Web Access logon verification script. The script logs on to the front-end Outlook Web Access server and verifies that it is functional. This test requires Exchange Server 2003 Service Pack 1 (SP1).

Outlook Mobile Access

These rules verify the availability of Outlook Mobile Access on a front-end Exchange 2003 server. These rules perform synthetic Outlook Mobile Access logons and monitor the results to determine Outlook Mobile Access availability. A notification is sent when the severity level is Error or higher.

This group includes the following event rules:

  • Outlook Mobile Access logon failure: ASP.net errors   An alert is generated when MOM event 22008 occurs. This event signifies that ASP.NET or the Exchange server is configured incorrectly.

  • Outlook Mobile Access logon failure: OMA configuration errors   An alert is generated when MOM event 22007 occurs. This event signifies that there is a configuration problem with Outlook Mobile Access and that the IIS metabase might have been corrupted.

  • **Outlook Mobile Access logon failure: Mailbox hosted on an Exchange Server version earlier than 2003   **An alert is generated when MOM event 22002 occurs. This event signifies that the logon script tried to log on to an Exchange server running a version of Exchange before Exchange 2003.

  • Outlook Mobile Access logon failure: Unable to connect   An alert is generated when MOM event 22001 occurs. This event signifies that a connection cannot be established to the back-end Exchange mailbox.

  • Outlook Mobile Access logon failure: Network problem   An alert is generated when MOM event 22005 occurs. This event signifies that network problems are preventing Outlook Mobile Access operations.

  • Synthetic Outlook Mobile Access logon   This time event rule runs every 15 minutes and launches the Exchange 2003 - OMA logon verification script. This script verifies front-end server availability through synthetic Outlook Mobile Access logon. This test requires Exchange Server 2003 SP1

  • Outlook Mobile Access logon failure: Wireless access is not enabled for the account   An alert is generated when MOM event 22004 occurs. This event signifies that the account is not enabled to use Outlook Mobile Access.

  • General error during synthetic Outlook Mobile Access logon   An alert is generated when MOM events 20907 and 20908 occur. These events signify that the underlying components and services on which the synthetic logon relies are not operational.

  • Outlook Mobile Access logon failure: Unexpected errors   An alert is generated when MOM event 22010 occurs. This event signifies that an unexpected error or exception has occurred when Outlook Mobile Access is processing the logon request.

  • Outlook Mobile Access logon failure: Device type not supported (Web.config file is modified)   An alert is generated when MOM event 22009 occurs. This event signifies that the device is unsupported. A possible cause is that the Web.config file has been modified.

  • Outlook Mobile Access logon failure: Invalid password or mailbox not created   An alert is generated when MOM event 2003 occurs. This event signifies that the entered password is incorrect, or that the account is not created.

Exchange ActiveSync

The rules in the Exchange ActiveSync front-end Availability group verify the availability of Exchange ActiveSync on a front-end Exchange 2003 server. These rules perform synthetic Exchange ActiveSync logons and check the results to determine the availability of Exchange ActiveSync. A notification is sent when the severity level is Error or higher.

This group includes the following event rules:

  • EAS logon failure: Forbidden   This rule generates an error when the Mailbox Access account is not enabled for Exchange ActiveSync, or when Exchange ActiveSync is disabled. The alert is generated when the logon scripts cannot log on.

  • Synthetic EAS logon   This rule runs a script every 15 minutes to perform synthetic Exchange ActiveSync logons. The other rules in this rule group rely on the script for event data. This test requires Exchange Server 2003 SP1

  • EAS logon failure: Internal Server Error   This rule generates an alert when the logon scripts cannot successfully log on because of a server error.

  • EAS logon failure: Bad Request   This rule generates an error when the Exchange ActiveSync logon function is not receiving acceptable parameters.

  • EAS logon failure: General Error   The Exchange ActiveSync synthetic logon scripts rely on underlying components to function. This script generates an error when a component is unavailable.

  • EAS logon failure: Server Busy   This rule generates an alert when synthetic logon fails because of a busy or overloaded server. This alert is also generated when the Active Directory domain controller cannot return data to the logon scripts because it is overloaded.

Monitoring Health and Performance

Indicators of health and performance problems exist for Exchange servers. The Exchange Management Pack includes rules to monitor performance indicators such as mail queues, disk use, CPU load, and other thresholds. These rules can be enabled and disabled according to your requirements. The rules check the following:

  • Free Disk Space Thresholds   Disk space use must be checked to ensure availability and to help plan for upgrades.

  • Mail Queue Thresholds   Thresholds of queue size and latencies included in this rule group help provide an alert about potential failures in SMTP transfer.

  • Server Configuration and Security   Security and settings can be checked with the rules in this group.

  • Server Performance Thresholds   Overall server performance dealing with CPU use, latencies, and so on can be checked with rules in this group.

  • SMTP Remote Queues Thresholds   Outbound queues, growth, and sizes can be checked with this group.

  • Windows Updates   To have uniform application of Windows updates, you can specify them, and have each server checked to verify that they are installed. This can help maintain a consistent and centralized update policy.

Each component and its capabilities are discussed in the following sections.

Free Disk Space Thresholds

The rules in the Free Disk Space Thresholds group provide alerts based on disk space usage. When free space is below the defined threshold, an alert is generated. The rule that runs the script is the Check Free Disk Space rule. The Check Free Disk Space script classifies each local volume in one of the following categories:

  • Includes volumes with Exchange 2003 transaction log files

  • Includes volumes with Exchange 2003 SMTP queue directories

  • Includes volumes with both SMTP queue directories and transaction log files

  • Includes volumes not in these categories and that have neither transaction log files nor SMTP queue directories

According to the category of a volume, the script generates a warning event or an error event if appropriate. For each category, the script has four different thresholds. Two of these thresholds are related to the warning event, and the other two are related to the error event. A notification is sent if the severity level is Error or higher. If a volume fits more than one category and different thresholds are set for the various types, the most conservative threshold is used. Each category includes an absolute threshold and a percentage threshold. If you want to customize thresholds, you must determine what the percentage threshold should be, relative to the absolute threshold.

This group includes the following event rules:

  • Exchange 2003 Transaction log drive is low on disk space   An alert is generated when MOM event 9976 occurs. This event signifies that both the percentage and absolute amount of free disk space are below the current warning thresholds for a volume containing Exchange transaction log files.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue and Transaction log drive is low on disk space   An alert is generated when MOM event 9978 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing both Exchange transaction log files and queues.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue drive is low on disk space   An alert is generated when MOM event 9974 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing SMTP queues.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue drive is very low on disk space   An alert is generated when MOM event 9973 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing SMTP queues. This is a more severe alert to notify you when free space is critically below the threshold.

  • Low free disk space   An alert is generated when MOM event 9972 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a local disk. For Exchange servers, this event refers to volumes other than those containing Exchange transaction log files or Exchange queue files.

  • **Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue and Transaction log drive is very low on disk space   **An alert is generated when MOM event 9977 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current Critical Error thresholds for a volume containing both Exchange transaction log files and queues. This situation should be resolved immediately, because it is time-consuming to recover from running out of space on the transaction log volume.

  • Check free disk space   This is the underlying script that checks the percentage of free space of each local disk. By default, it runs every 30 minutes.

  • Very low free disk space   An alert is generated when MOM event 9971 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a local disk. For Exchange servers, this event refers to volumes other than those containing Exchange transaction log files or Exchange queue files.

  • Exchange 2003 Transaction log drive is very low on disk space   An alert is generated when MOM event 9975 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing Exchange transaction log files.

Mail Queue Thresholds

The rules in the Mail Queue Thresholds rule group check mail flow. They generate an alert when there is a disruption of mail flow and when the severity level is Error or higher. These rules look at the length of all mail queues available as performance data. The two major classes of queues are Simple Mail Transfer Protocol (SMTP) and message transfer agent (MTA). The relevant queues and their operation are described in the section "Exchange Message Flow" earlier in this chapter.

Depending on the level of mail flow in the computers that are being checked, the thresholds might have to be adjusted to be either more or less sensitive to mail flow interruptions. To help determine the appropriate thresholds for a particular deployment, check the lengths of these queues using the views provided with this management pack.

This group includes the following performance rules:

  • Exchange Information Store service Queue of Messages to MTA > 50   This rule tracks the current number of messages in transit to MSExchangeMTA. It uses the MSExchangeIS Transport Driver performance object.

  • Exchange 2003: SMTP: Local Retry Queue > 50   This rule tracks the message queue of those messages waiting to be delivered to the database that have previously failed delivery. It tracks the SMTP Server object and its Total Retry Queue Length counter.

  • Exchange 2003: SMTP: Messages Pending Routing > 50   This rule tracks the number of messages that are categorized but are not routed. It uses the SMTP Server object and its Messages Pending Routing counter.

  • Public Folder Replication: PF Receive Queue consistently > 10 deep   This rule tracks the Public Folder Replication Receive queue. It uses the MSExchangeIS Public object and the Receive Queue Size counter. Most of the time, this value should be close to zero. When the queue depth is consistently greater than ten, the public folders are not synchronizing with other servers.

  • Mailbox Store: Receive Queue > 25   This rule tracks the MSExchangeIS Mailbox object and its Receive Queue Size counter. Receive Queue Size is the number of messages in the mailbox store receive queue.

  • Information Store Transport Temp Table Entries > 600   This rule tracks the current number of entries in the Microsoft Exchange Information Store service Temp Table that is used by Exchange Transport. It uses the MSExchangeIS Transport Driver object and the TempTable Current counter.

  • MTA Queue Length per Connection > 50   This rule uses the MSExchangeMTA Connections object and the Queue Length counter. This counter tracks the outstanding messages queued for transfer to the database and the Pending Reroute queue.

  • Exchange 2003: SMTP: Remote Queue > 500   This rule uses the SMTP Server object and the Remote Queue Length counter. It tracks the remote queues, which send messages to other servers. This is a total number for all remote queues.

  • Mailbox Store: Send Queue > 25   This rule uses the MSExchangeIS Mailbox object and the Send Queue Size counter. It tracks the number of messages awaiting transfer from the Microsoft Exchange Information Store service to the IIS.

  • Exchange 2003: SMTP: Remote Retry Queue > 500   This rule uses the SMTP Server object and Remote Retry Queue Length counter to track the number of messages in the remote queue that cannot be sent to a destination server.

  • Exchange 2003: SMTP: Messages in SMTP Queue Directory > 500   This rule tracks the message number of the queue stored on the physical disk. It uses the SMTP NTFS Store Driver object and the Messages In Queue Directory counter.

  • MTA Work Queue > 50   This rule tracks the number of messages not yet processed to completion by the MTA. It uses the MSExchangeMTA object and the Work Queue Length counter.

  • Exchange 2003: SMTP: Local Queue > 50   This rule uses the SMTP Server object and Local Queue Length counter to track the queue of messages awaiting delivery to the Microsoft Exchange Information Store service.

  • Information Store Queue of Messages from MTA > 25   This rule tracks the number of messages in transit from the MTA to the Exchange store. It uses the MSExchangeIS Transport Driver object and the Current Message From MSExchangeMTA counter.

  • Exchange 2003: SMTP: Categorizer Queue > 50   This rule tracks the Categorizer queue through the SMTP Server object and Categorizer Queue Length counter. This queue is discussed earlier in the chapter.

Server Configuration and Security

The rules in the Server Configuration and Security rule group check the Exchange server for configuration and security errors. This group includes rules to verify issues such as circular logging, SMTP anonymous relay, mailboxes on front-end servers, and log file truncation.

Other rule groups have other rules that are related to server configuration. These include checking that the /3GB switch is enabled on appropriate servers. A notification is sent when the severity level is Error or higher.

This group includes the following rules:

  • Verify that the IIS lockdown wizard started   This rule runs the Microsoft Operations Manager\Rules\Advanced\Scripts\Exchange 2003 - Verify IIS Lockdown script to determine if the IIS Lockdown Tool has been started by verifying registry keys. IIS Lockdown applies only to computers Microsoft Windows® 2000 Server. On newer servers, the script does not run. The script generates event 8144 when the IIS Lockdown Tool does not start.

  • SMTP Virtual Server that relays anonymously   A virtual SMTP server can be used to relay anonymously. When you allow anonymous access to your SMTP virtual server and allow all IPs to relay through this virtual server, an alert is generated.

  • URLScan ISAPI filter is disabled   When the URLScan Internet Server Application Programming Interface (ISAPI) filter is not running, an alert is generated, together with event ID 8164. This filter is important only to Windows 2000. It is used to protect Web server security from being compromised by examining HTTP header information and filtering requests based on the URLScan.ini configuration file.

  • Verify that the URLScan ISAPI filter is installed and running   This rule runs a script to determine if the URLScan ISAPI filter is running.

  • Verify that SMTP Virtual Server cannot anonymously relay (spam prevention)   This rule runs a script that uses Active Directory Service Interface (ADSI) and Collaboration Data Objects (CDO) to determine anonymous relay for each SMTP virtual server. The script generates event 8083 for each virtual server that allows anonymous relay.

  • Check for existence of mailboxes on Front-End Servers   This rule runs a script to look for mailboxes on front-end servers. Event 8203 is generated for each front-end server that has a mailbox.

  • Exchange Transaction Log files are equal to or older than the maximum days allowed   The Microsoft Operations Manager\Rules\Rule Groups\Microsoft Exchange 2003 Server\Server Health Monitoring\Server Configuration Monitoring\ Verify that the Log Files are being truncated by backup (by age modified) script generates an alert when log files are equal to or older than the maximum days configured in the settings.

  • SSL should be required to secure HTTP access to the Exchange server   The rule generates an alert when there is a server configuration that allows for non-SSL data transmission of sensitive data. Configure Secure Sockets Layer (SSL) for any back-end HTTP virtual server that accepts anonymous and basic authentication and always configure SSL for any front-end server.

  • Message Tracking Logs have 'Everyone' group listed in the ACL permission   To prevent unauthorized users from reading the Message Tracking Log, remove the Everyone group from the access control list (ACL) permission. If this permission is given, an alert is generated.

  • Verify Circular Logging setting for each Storage Group   The Microsoft Operations Manager\Rules\Advanced\Scripts\Exchange 2003 - Verify Circular Logging settings are correct for each Storage Group script used by this rule determines whether the circular logging setting is correct for each storage group. The script generates one event per storage group that does not have the circular logging state set correctly.

  • Value of the HeapDeCommitFreeBlockThreshold Registry Key is incorrect   On servers with one gigabyte of physical memory, the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\HeapDecommitFreeBlockThreshold key in the registry should be set to 262144 to help reduce heap fragmentation. This rule generates an alert when the registry value is different.

  • Verify that Message Tracking is enabled   This rule runs a script to determine if Message Tracking is enabled and generates an alert when it is not.

  • SMTP directories are not on an NTFS formatted drive   This rule runs a script to see if the Queue, Pick Up, and BadMail SMTP directories are not on an NTFS file system drive.

  • Message Tracking is not enabled   You must enable Message Tracking to track undelivered messages and troubleshoot mail flow problems. Event 8043 and an alert are generated when message tracking is disabled.

  • IIS Lockdown was not found on a server   On Windows 2000 servers, Exchange runs the IIS Lockdown Tool. When it is not run, an alert is generated.

Server Performance Thresholds

The Server Performance Thresholds rule group checks performance counters that can indicate poor performance. These counters include RPC requests, disk reads and writes, and CPU use. Notification is sent if the severity level is Error or higher.

The following performance rules are included for both Exchange 2000 Server and Exchange Server 2003 unless otherwise indicated:

  • MSExchangeIS:RPC latency > 200 ms   This rule checks the latency of RPC requests every minute. If the average latency over five minutes exceeds 200 milliseconds (ms), an alert is generated.

  • MSExchangeIS: RPC Requests > 25   This rule tracks the number of RPC requests serviced by the Microsoft Exchange Information Store service at a particular time. Up to 100 RPC requests can be handled at the same time. However, the value is typically quite low, less than ten, when the server is functioning normally.

  • Disk Write Latencies > 50 ms   When disk write latencies are above 50 milliseconds, an alert is generated.

  • ESE Log Generation Checkpoint Depth > 800   The Microsoft Exchange Information Store service varies startup time based on the log generation checkpoint depth. When this value is above 1000, all databases in the affected storage group are disconnected. When the value increases above the 800 threshold, an alert is generated.

  • Average CPU > 90% for 15 minutes   When the CPU is idle less than ten percent of the time, an alert is generated. Continuous CPU execution of non-idle threads can indicate a hung thread or overall increased server load.

  • Information Store Private Bytes > 1 GB   When a process allocates bytes that cannot be shared, these bytes are named private bytes. When the Microsoft Exchange Information Store service runs at a high stress level for prolonged periods of time, the private bytes can exceed the threshold, which generates an alert.

  • Information Store Virtual Bytes > 2.9 GB   Virtual Bytes is the current size in bytes of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and by using too much, the process can limit its ability to load libraries. When the virtual bytes are greater than the 2.9-gigabyte threshold, an alert is generated.

  • DSAccess:LDAP Search Time > 50 ms avg. over 5 minutes   If the average DSAccess search time is above 50 milliseconds for more than five minutes an alert is generated. This counter measures only the search time for queries over LDAP originating from DSAccess. Long search times for these queries do not necessarily indicate that Active Directory is also experiencing excessive latency. High values for this counter are not necessarily a problem, unless other problems are detected on the server, such as growing SMTP queues.

  • Disk Read Latencies > 50 ms   When disk read latencies are above 50 milliseconds, an alert is generated.

  • Outlook Mobile Access: Last response time > 60 sec   When the Outlook Mobile Access server response time value is greater than 60 seconds, an alert is generated.

  • Pool Nonpaged Bytes > 90 MB   This rule is available only for Exchange Server 2003. When the performance counter for Memory-Pool Nonpaged Bytes exceeds 90 MB, an alert is generated.

SMTP Remote Queues Thresholds

The SMTP Remote Queues Thresholds rule group checks the state and health of the Exchange SMTP remote queues. Alerts are provided if a significant amount of mail is queuing at one specific location. The number of messages in the queue that cause an alert is defined by the value of the NumberOfMessages parameter defined within the script run by the Verify Remote SMTP Queues timed event. A notification is sent if the severity level is Error or higher.

This rule group includes the following event rules:

  • Alert for problems in remote Simple Mail Transfer Protocol (SMTP) queues   When the NumberOfMessages value exceeds 200, an alert is generated. To modify this value, access the rule Properties dialog box, click the Responses tab, click the script, click Edit, and then click Edit Parameter.

  • Verify remote Simple Mail Transfer Protocol (SMTP) queues   This rule runs a script every hour to determine remote SMTP queue state. The script generates an event when the specified number in the NumberOfMessages parameter exceeds a certain threshold. By default, the NumberOfMessages value is 200.

Windows Updates

The rules in the Verify Windows Hotfixes rule group verify whether all specified Windows updates are installed on servers running Exchange 2003. If a specified hotfix is not installed, an alert is generated. A notification is sent if the severity level is Error or higher.

This rule group includes the following event rules:

  • Verify required Windows hotfixes   This rule runs a script every day to check for updates. You can specify the updates for which the script searches by accessing the Properties for this rule, clicking the Responses tab, selecting the script and then clicking Edit. Click HotfixIDs, and then click Edit Parameter. In the Value box, type in a comma-delimited list of all update IDs that you require on your Exchange servers. The script generates event 9017, listing all required updates that are not installed.

  • The required Windows hotfix is not installed   When a required update is not installed, this rule generates an alert.

Monitoring Exchange Events

The Exchange Management Pack, combined with Microsoft Operations Manager (MOM), provides complex filtering and viewing tools to help you monitor events related to your Exchange organization. MOM includes two default event views that can be accessed from the Microsoft Operations Manager 2005 - Operator Console:

  • Events   Collected events from monitored servers are listed here. The events include information, in addition to warnings and errors. To view Exchange-specific information, you can sort by Source or Event ID. Using this view is a quick way to discover events that occur across many servers in the organization. For example, suppose you notice server mail flow errors. Additional inspection and a sort reveal that the incident is isolated to a specific geographical site only, and does not hinder performance for other users. Focusing resources and correcting the problem by using this view fosters immediate data gathering from a central location.

  • Task Status   MOM-related events dealing with scheduled tasks are listed in this directory. For this event, mail flow is one of the most important events to monitor. This view lists general information, warnings, and errors. You can filter events in this view based on criteria, such as matching words, category, or severity.

System Monitoring Best Practices

The Exchange Management Pack allows you significant flexibility in the messaging functionalities that you monitor. At a minimum, you should monitor the items listed in Table 4.2.

Table 4.2   Minimum messaging functions to monitor

Test

Details

Server availability

  • Server heartbeat.

  • Required services are running.

  • Databases are mounted.

  • MAPI logon check verification is running without errors.

  • Mail flow verification is running without errors.

  • No unexpected service termination.

  • Front End Server Monitoring test is running without errors.

Services running

  • Verify that all required services are running on each server. Note that you can configure the list of monitored services for each server.

  • Generate an alert when a service is not running.

Databases mounted

  • Verify that all databases are mounted.

  • Generate an alert if any database becomes dismounted.

MAPI Logon check

  • Verify that the Server Availability Report shows no errors. This test verifies that each store can be accessed by a MAPI client, and implicitly verifies both Exchange and Active Directory functionality.

Log on to the mailbox of a test account

  • Verify client to server connectivity, including verification that Exchange is running, the database is mounted, and Active Directory is functioning correctly.

  • Use this data to compile server availability statistics.

Front-end Server Monitoring

After you edit your registry to enable Front-end server monitoring, the following tests are performed:

  • Verify that services are running on the front-end server.

  • Verify that Internet clients can connect, including Outlook Web Access, Outlook Mobile Access, and Exchange ActiveSync (for computers running Exchange Server 2003).

  • Verify localhost monitoring occurs by default.

  • Verify that the public URL is resolvable and successfully connects to your front-end servers.

  • Verify that connectivity through your firewall and/or proxy server is functioning.

  • Verify that load balancing is occurring.

Mail flow verification

  • Verify mail flow between selected servers by sending periodic e-mails to test mailboxes on each server.

  • Generate an Alert for successive failures.

  • Record mail delivery latency.

Server Health Monitoring

Scripts and rules are configured by default to monitor key health indicators. These indicators include:

  • Free Disk Space

  • Mail Queue Thresholds

  • Configuration and Security

  • Performance Thresholds

  • SMTP Queues

Free disk space

Running out of disk space is a common, preventable source of Exchange failures. This test monitors counter thresholds that you specify for the following performance objects:

  • All disks

  • Log disks

  • SMTP queue disks

The Free disk space test is cluster and IFS aware, and uses WMI to collect information. It does not use performance data.

Mail Queues

  • Verify that all mail queues (SMTP, MTA, internal mail delivery queues) are processing messages according to your thresholds

  • Verify that mail is flowing properly

  • Identify queue length problems that may lead to slow e-mail delivery and identify issues in your infrastructure that require attention

  • This data is based on performance data and Exchange WMI classes.

Server Configuration and Security Monitoring

  • Verify that the IIS Lockdown Tool started.

  • Verify that Message Tracking Log shares are locked down.

  • Verify that the URLScan ISAPI filter is installed and running.

  • Verify that SMTP Virtual Server cannot anonymously relay (spam prevention).

  • Check for the existence of mailboxes on Front-End Servers.

  • Determine if SSL should be required.

  • Verify that the Log Files are being successfully purged after backup.

  • Verify that the SMTP directories are on a NTFS formatted drive.

  • Verify that circular logging is disabled for each Storage Group.

  • Verify that the value of the HeapDeCommitFreeBlock Threshold Registry Key is correct.

  • Verify that Message Tracking is enabled.

Server performance

  • Generate an alert if thresholds for disk response are exceeded, indicating a slow disk.

  • Generate an alert if the RPC requests queue length exceeds expected thresholds. A consistent high value can indicate that you have a resource bottleneck.

  • Monitors the average RPC latency of all RPC requests submitted to the server.

  • Monitors the Outlook Mobile Access Latency response time.

Server performance issues quickly become user response time issues. You can quickly solve these problems if you monitor the correct objects and act upon the issues that MOM brings to your attention.

Database checkpoint depth and memory usage

An alert is generated by default if any of the following counters exceed the identified threshold:

  • Disk Read Latencies: 50 msec

  • Disk Write Latencies: 50 msec

  • ESE Log Checkpoint Depth: 800

  • Information Store Private Bytes: 1 GB

  • Information Store Virtual Bytes: 2.9 GB

  • MSExchangeIS: RPC Requests: 25

  • MSExchangeIS: RPC latency: 200 ms

  • Outlook Mobile Access: Last response time: 60 sec