Monitoring Health and Performance

 

Indicators of health and performance problems exist for Exchange servers. The Exchange Management Pack includes rules to monitor performance indicators such as mail queues, disk use, CPU load, and other thresholds. These rules can be enabled and disabled according to your requirements. The rules check the following:

  • Free Disk Space Thresholds   Disk space use must be checked to ensure availability and to help plan for upgrades.

  • Mail Queue Thresholds   Thresholds of queue size and latencies included in this rule group help provide an alert about potential failures in SMTP transfer.

  • Server Configuration and Security   Security and settings can be checked with the rules in this group.

  • Server Performance Thresholds   Overall server performance dealing with CPU use, latencies, and so on can be checked with rules in this group.

  • SMTP Remote Queues Thresholds   Outbound queues, growth, and sizes can be checked with this group.

  • Windows Updates   To have uniform application of Windows updates, you can specify them, and have each server checked to verify that they are installed. This can help maintain a consistent and centralized update policy.

Each component and its capabilities are discussed in the following sections.

Free Disk Space Thresholds

The rules in the Free Disk Space Thresholds group provide alerts based on disk space usage. When free space is below the defined threshold, an alert is generated. The rule that runs the script is the Check Free Disk Space rule. The Check Free Disk Space script classifies each local volume in one of the following categories:

  • Includes volumes with Exchange 2003 transaction log files

  • Includes volumes with Exchange 2003 SMTP queue directories

  • Includes volumes with both SMTP queue directories and transaction log files

  • Includes volumes not in these categories and that have neither transaction log files nor SMTP queue directories

According to the category of a volume, the script generates a Warning event or an Error event, if appropriate. For each category, the script has four different thresholds. Two of these thresholds are related to the warning event, and the other two are related to the error event. A notification is sent if the severity level is Error or higher. If a volume fits more than one category and different thresholds are set for the various types, the most conservative threshold is used. Each category includes an absolute threshold and a percentage threshold. If you want to customize thresholds, you must determine what the percentage threshold should be, relative to the absolute threshold.

This group includes the following event rules:

  • Exchange 2003 Transaction log drive is low on disk space   An alert is generated when MOM event 9976 occurs. This event signifies that both the percentage and absolute amount of free disk space are below the current warning thresholds for a volume containing Exchange transaction log files.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue and Transaction log drive is low on disk space   An alert is generated when MOM event 9978 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing both Exchange transaction log files and queues.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue drive is low on disk space   An alert is generated when MOM event 9974 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing SMTP queues.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue drive is very low on disk space   An alert is generated when MOM event 9973 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing SMTP queues. This is a more severe alert to notify you when free space is critically below the threshold.

  • Low free disk space   An alert is generated when MOM event 9972 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a local disk. For Exchange servers, this event refers to volumes other than those containing Exchange transaction log files or Exchange queue files.

  • Exchange 2003 Simple Mail Transfer Protocol (SMTP) Queue and Transaction log drive is very low on disk space   An alert is generated when MOM event 9977 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current Critical Error thresholds for a volume containing both Exchange transaction log files and queues. This situation should be resolved immediately, because it is time-consuming to recover from running out of space on the transaction log volume.

  • Check free disk space   This is the underlying script that checks the percentage of free space of each local disk. By default, it runs every 30 minutes.

  • Very low free disk space   An alert is generated when MOM event 9971 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a local disk. For Exchange servers, this event refers to volumes other than those containing Exchange transaction log files or Exchange queue files.

  • Exchange 2003 Transaction log drive is very low on disk space   An alert is generated when MOM event 9975 occurs. This event signifies that both the percentage and absolute amount of disk free space are below the current warning thresholds for a volume containing Exchange transaction log files.

Mail Queue Thresholds

The rules in the Mail Queue Thresholds rule group check mail flow. They generate an alert when there is a disruption of mail flow and when the severity level is Error or higher. These rules look at the length of all mail queues available as performance data. The two major classes of queues are Simple Mail Transfer Protocol (SMTP) and message transfer agent (MTA).

Depending on the level of mail flow in the computers that are being checked, the thresholds might have to be adjusted to be either more or less sensitive to mail flow interruptions. To help determine the appropriate thresholds for a particular deployment, check the lengths of these queues using the views provided with this management pack.

This group includes the following performance rules:

  • Exchange Information Store service Queue of Messages to MTA > 50   This rule tracks the current number of messages in transit to MSExchangeMTA. It uses the MSExchangeIS Transport Driver performance object.

  • Exchange 2003: SMTP: Local Retry Queue > 50   This rule tracks the message queue of those messages waiting to be delivered to the database that have previously failed delivery. It tracks the SMTP Server object and its Total Retry Queue Length counter.

  • Exchange 2003: SMTP: Messages Pending Routing > 50   This rule tracks the number of messages that are categorized but are not routed. It uses the SMTP Server object and its Messages Pending Routing counter.

  • Public Folder Replication: PF Receive Queue consistently > 10 deep   This rule tracks the Public Folder Replication Receive queue. It uses the MSExchangeIS Public object and the Receive Queue Size counter. Most of the time, this value should be close to zero. When the queue depth is consistently greater than ten, the public folders are not synchronizing with other servers.

  • Mailbox Store: Receive Queue > 25   This rule tracks the MSExchangeIS Mailbox object and its Receive Queue Size counter. Receive Queue Size is the number of messages in the mailbox store receive queue.

  • Information Store Transport Temp Table Entries > 600   This rule tracks the current number of entries in the Microsoft Exchange Information Store service Temp Table that is used by Exchange Transport. It uses the MSExchangeIS Transport Driver object and the TempTable Current counter.

  • MTA Queue Length per Connection > 50   This rule uses the MSExchangeMTA Connections object and the Queue Length counter. This counter tracks the outstanding messages queued for transfer to the database and the Pending Reroute queue.

  • Exchange 2003: SMTP: Remote Queue > 500   This rule uses the SMTP Server object and the Remote Queue Length counter. It tracks the remote queues, which send messages to other servers. This is a total number for all remote queues.

  • Mailbox Store: Send Queue > 25   This rule uses the MSExchangeIS Mailbox object and the Send Queue Size counter. It tracks the number of messages awaiting transfer from the Microsoft Exchange Information Store service to the IIS.

  • Exchange 2003: SMTP: Remote Retry Queue > 500   This rule uses the SMTP Server object and Remote Retry Queue Length counter to track the number of messages in the remote queue that cannot be sent to a destination server.

  • Exchange 2003: SMTP: Messages in SMTP Queue Directory > 500   This rule tracks the message number of the queue stored on the physical disk. It uses the SMTP NTFS Store Driver object and the Messages In Queue Directory counter.

  • MTA Work Queue > 50   This rule tracks the number of messages not yet processed to completion by the MTA. It uses the MSExchangeMTA object and the Work Queue Length counter.

  • Exchange 2003: SMTP: Local Queue > 50   This rule uses the SMTP Server object and Local Queue Length counter to track the queue of messages awaiting delivery to the Microsoft Exchange Information Store service.

  • Information Store Queue of Messages from MTA > 25   This rule tracks the number of messages in transit from the MTA to the Exchange store. It uses the MSExchangeIS Transport Driver object and the Current Message From MSExchangeMTA counter.

  • Exchange 2003: SMTP: Categorizer Queue > 50   This rule tracks the Categorizer queue through the SMTP Server object and Categorizer Queue Length counter. This queue is discussed in "Mail Flow Thresholds" earlier in this topic.

Large Mailboxes

If you need to collect information for a report, you can either parse the data directly from the logs or access the MOM database to get the data. You should note that the MOM database only collects the top 100 users. By default, in the Exchange Management Pack, there is a script parameter that limits collection of the top 100 Message Tracking Log entries. This script can be modified to collect more or all entries, but it is not recommended. Because parsing of the Message Tracking logs, reads from the MOM database, and modifications to the script can affect performance of your Exchange server, you should test your solution in a lab before you deploy it on your production servers.

Server Configuration and Security

The rules in the Server Configuration and Security rule group check the Exchange server for configuration and security errors. This group includes rules to verify issues such as circular logging, SMTP anonymous relay, mailboxes on front-end servers, and log file truncation.

Other rule groups have other rules that are related to server configuration. These include checking that the /3GB switch is enabled on appropriate servers. A notification is sent when the severity level is Error or higher.

This group includes the following rules:

  • Verify that the IIS lockdown wizard started   This rule runs the Microsoft Operations Manager\Rules\Advanced\Scripts\Exchange 2003 - Verify IIS Lockdown script to determine if the IIS Lockdown Tool has been started by verifying registry keys. IIS Lockdown applies only to computers Microsoft Windows® 2000 Server. On newer servers, the script does not run. The script generates event 8144 when the IIS Lockdown Tool does not start.

  • SMTP Virtual Server that relays anonymously   A virtual SMTP server can be used to relay anonymously. When you allow anonymous access to your SMTP virtual server and allow all IPs to relay through this virtual server, an alert is generated.

  • URLScan ISAPI filter is disabled   When the URLScan Internet Server Application Programming Interface (ISAPI) filter is not running, an alert is generated, together with event ID 8164. This filter is important only to Windows 2000. It is used to protect Web server security from being compromised by examining HTTP header information and filtering requests based on the URLScan.ini configuration file.

  • Verify that the URLScan ISAPI filter is installed and running   This rule runs a script to determine if the URLScan ISAPI filter is running.

  • Verify that SMTP Virtual Server cannot anonymously relay (spam prevention)   This rule runs a script that uses Active Directory Service Interface (ADSI) and Collaboration Data Objects (CDO) to determine anonymous relay for each SMTP virtual server. The script generates event 8083 for each virtual server that allows anonymous relay.

  • **Check for existence of mailboxes on Front-End Servers **  This rule runs a script to look for mailboxes on front-end servers. Event 8203 is generated for each front-end server that has a mailbox.

  • Exchange Transaction Log files are equal to or older than the maximum days allowed   The Microsoft Operations Manager\Rules\Rule Groups\Microsoft Exchange 2003 Server\Server Health Monitoring\Server Configuration Monitoring\ Verify that the Log Files are being truncated by backup (by age modified) script generates an alert when log files are equal to or older than the maximum days configured in the settings.

  • SSL should be required to secure HTTP access to the Exchange server   The rule generates an alert when there is a server configuration that allows for non-SSL data transmission of sensitive data. Configure Secure Sockets Layer (SSL) for any back-end HTTP virtual server that accepts anonymous and basic authentication and always configure SSL for any front-end server.

  • Message Tracking Logs have 'Everyone' group listed in the ACL permission   To prevent unauthorized users from reading the Message Tracking Log, remove the Everyone group from the access control list (ACL) permission. If this permission is given, an alert is generated.

  • Verify Circular Logging setting for each Storage Group   The Microsoft Operations Manager\Rules\Advanced\Scripts\Exchange 2003 - Verify Circular Logging settings are correct for each Storage Group script used by this rule determines whether the circular logging setting is correct for each storage group. The script generates one event per storage group that does not have the circular logging state set correctly.

  • Value of the HeapDeCommitFreeBlockThreshold Registry Key is incorrect   On servers with one gigabyte of physical memory, the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\HeapDecommitFreeBlockThreshold key in the registry should be set to 262144 to help reduce heap fragmentation. This rule generates an alert when the registry value is different.

  • Verify that Message Tracking is enabled   This rule runs a script to determine if Message Tracking is enabled and generates an alert when it is not.

  • SMTP directories are not on an NTFS formatted drive   This rule runs a script to see if the Queue, Pick Up, and BadMail SMTP directories are not on an NTFS file system drive.

  • Message Tracking is not enabled   You must enable Message Tracking to track undelivered messages and troubleshoot mail flow problems. Event 8043 and an alert are generated when message tracking is disabled.

  • **IIS Lockdown was not found on a server **  On Windows 2000 servers, Exchange runs the IIS Lockdown Tool. When it is not run, an alert is generated.

Server Performance Thresholds

The Server Performance Thresholds rule group checks performance counters that can indicate poor performance. These counters include RPC requests, disk reads and writes, and CPU use. Notification is sent if the severity level is Error or higher.

The following performance rules are included for both Exchange 2000 Server and Exchange Server 2003 unless otherwise indicated:

  • MSExchangeIS:RPC latency > 200 ms   This rule checks the latency of RPC requests every minute. If the average latency over five minutes exceeds 200 milliseconds (ms), an alert is generated.

  • MSExchangeIS: RPC Requests > 25   This rule tracks the number of RPC requests serviced by the Microsoft Exchange Information Store service at a particular time. Up to 100 RPC requests can be handled at the same time. However, the value is typically quite low, less than ten, when the server is functioning normally.

  • Disk Write Latencies > 50 ms   When disk write latencies are above 50 milliseconds, an alert is generated.

  • ESE Log Generation Checkpoint Depth > 800   The Microsoft Exchange Information Store service varies startup time based on the log generation checkpoint depth. When this value is above 1000, all databases in the affected storage group are disconnected. When the value increases above the 800 threshold, an alert is generated.

  • Information Store Virtual Bytes > 2.9 GB   Virtual Bytes is the current size in bytes of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and by using too much, the process can limit its ability to load libraries. When the virtual bytes are greater than the 2.9-gigabyte threshold, an alert is generated.

  • Disk Read Latencies > 50 ms   When disk read latencies are above 50 milliseconds, an alert is generated.

  • Outlook Mobile Access: Last response time > 60 sec   When the Outlook Mobile Access server response time value is greater than 60 seconds, an alert is generated.

  • Pool Nonpaged Bytes > 90 MB   This rule is available only for Exchange Server 2003. When the performance counter for Memory-Pool Nonpaged Bytes exceeds 90 MB, an alert is generated.

SMTP Remote Queues Thresholds

The SMTP Remote Queues Thresholds rule group checks the state and health of the Exchange SMTP remote queues. Alerts are provided if a significant amount of mail is queuing at one specific location. The number of messages in the queue that cause an alert is defined by the value of the NumberOfMessages parameter defined within the script run by the Verify Remote SMTP Queues timed event. A notification is sent if the severity level is Error or higher.

This rule group includes the following event rules:

  • Alert for problems in remote Simple Mail Transfer Protocol (SMTP) queues   When the NumberOfMessages value exceeds 200, an alert is generated. To modify this value, access the rule Properties dialog box, click the Responses tab, click the script, click Edit, and then click Edit Parameter.

  • Verify remote Simple Mail Transfer Protocol (SMTP) queues   This rule runs a script every hour to determine remote SMTP queue state. The script generates an event when the specified number in the NumberOfMessages parameter exceeds a certain threshold. By default, the NumberOfMessages value is 200.

Windows Updates

The rules in the Verify Windows Hotfixes rule group verify whether all specified Windows updates are installed on servers that are running Exchange Server 2003. If a specified hotfix is not installed, an alert is generated. A notification is sent if the severity level is Error or higher.

This rule group includes the following event rules:

  • Verify required Windows hotfixes   This rule runs a script every day to check for updates. You can specify the updates for which the script searches by accessing the Properties for this rule, clicking the Responses tab, selecting the script and then clicking Edit. Click HotfixIDs, and then click Edit Parameter. In the Value box, type in a comma-delimited list of all update IDs that you require on your Exchange servers. The script generates event 9017, listing all required updates that are not installed.

  • The required Windows hotfix is not installed   When a required update is not installed, this rule generates an alert.