Troubleshooting Microsoft Exchange 2000 Server Performance

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Published: August 1, 2002 | Updated : October 3, 2002

Exchange Core Documentation

Produced by Exchange User Education

For the latest information, see https://www.microsoft.com/exchange

Writer: Dylan Miller

Technical Reviewers: Dale Koetke, KC Lemson, Jim Lucey, Nick Rosenfeld, Jason Hill, Michael Palermiti, Charles McDaniels, Sameer Patel, Scott Landry

Project Editor: Susan Bradley

Designer: Kristie Smith

Applies To: Exchange 2000 Server SP3

On This Page

Introduction
Performance Troubleshooting Tools
Establishing a Baseline
Troubleshooting Performance
Appendix
Additional Resources

Introduction

This technical article introduces the tools, concepts, and recommendations you need in order to troubleshoot Microsoft Exchange 2000 Server performance. It also provides information on how to monitor the health of your Exchange 2000 servers and how to establish a baseline of normal server performance to measure against when troubleshooting performance.

Performance Troubleshooting Tools

The following tools can be used to monitor and troubleshoot Exchange 2000 Server performance:

  • System Monitor

  • Performance Logs and Alerts

  • Microsoft Operations Manager 2000

  • Event Viewer

  • Network Monitor

  • File Monitor Tool

System Monitor

System Monitor is part of the Performance Microsoft Management Console (MMC) snap-in administrative tool. Using System Monitor, you can measure the performance of your own computer or other computers on a network.

Note System Monitor may also be referred to as Performance Monitor or perfmon, which is the name of the executable.

The following figure shows System Monitor in action.

Cc751280.troup01(en-us,TechNet.10).gif

Figure 1: System Monitor

System Monitor can do the following:

  • Collect and view real-time performance data on a local computer or on several remote computers.

  • View current or past data collected in a counter log.

  • Present data in a printable graph, histogram, or report view.

  • Create HTML pages from performance views.

  • Create reusable monitoring configurations that can be installed on other computers using Microsoft Management Console.

Using System Monitor, you can collect and view extensive data about the usage of hardware resources and the activity of system services on computers you administer. You can define the data you want the graph to collect in the following ways:

  • Type of data System Monitor lets you select the data you want collected by specifying performance objects, performance counters, and object instances. Some objects provide data on system resources (such as memory); others provide data on the operation of applications (for example, Exchange 2000).

  • Source of data System Monitor can collect data from your local computer or from other computers on the network on which you have permissions. In addition, it can collect real-time or past data using counter logs.

  • Sampling parameters System Monitor supports manual, on-demand sampling or automatic sampling based on a time interval you specify. When viewing logged data, you can also choose starting and stopping times so that you can view data spanning a specific time range.

In addition to options for defining data content, you have considerable flexibility in designing System Monitor views:

  • Type of display System Monitor supports graph, histogram, and report views. The graph view is the default view; it offers the widest variety of optional settings.

  • Display characteristics For any of these views, you can define the colors and fonts for the display. In graph and histogram views, you can select from many different options to view performance data, such as:

    • Provide a title for your graph or histogram and label the vertical axis.

    • Set the range of values depicted in your graph or histogram.

    • Adjust the characteristics of lines or bars plotted to indicate counter values, including color, width, style, and so on.

For more information about System Monitor, see Microsoft Windows 2000 Server Help.

Performance Logs and Alerts

Performance Logs and Alerts are part of the Performance Microsoft Management Console (MMC) snap-in administrative tool. With Performance Logs and Alerts, you can collect performance data automatically from local or remote computers. You can view logged counter data using System Monitor or export the data to a spreadsheet or database for analysis and report generation.

Performance Logs and Alerts does the following:

  • It collects data in a comma-separated or tab-separated format for easy import to a spreadsheet. A binary log-file format is also provided for circular logging or for logging instances such as threads or processes that begin after the log starts collecting data. (Circular logging is the process of continuously logging data to a single file, overwriting previous data with new data.)

  • It collects counter data that can be viewed during collection, as well as after collection stops.

  • It runs as a service and collects data even if no one is logged on to the computer being monitored.

  • It allows you to define start and stop times, file names, file sizes, and other parameters for automatic log generation.

  • It allows you to manage multiple logging sessions from a single console window.

  • It allows you to set an alert on a counter, thereby ensuring that a message is sent, a program is run, or a log is started when the counters selected value exceeds or falls below a specified setting.

Similar to System Monitor, Performance Logs and Alerts supports defining performance objects, performance counters, and object instances, and setting sampling intervals for monitoring data about hardware resources and system services. In addition, Performance Logs and Alerts offers the following options related to recording performance data:

  • It starts and stops logging, either manually on demand or automatically—based on a user-defined schedule.

  • It configures additional settings for automatic logging, such as automatic file renaming, and sets parameters for stopping and starting a log based on the elapsed time or the file size.

  • It creates trace logs. Using the default system data provider or another provider, trace logs record data when certain activities such as a disk I/O operation or a page fault occur. When the event occurs, the provider sends the data to the Performance Logs and Alerts service. This differs from the operation of counter logs; when counter logs are in use, the service obtains data from the system when the update interval has elapsed, rather than waiting for a specific event. A parsing tool is required to interpret the trace log output. Developers can create such a tool using application programming interfaces (APIs) provided on the Microsoft Web site (https://msdn.microsoft.com/).

  • It defines a program to run when a log is stopped.

For more information about Performance Logs and Alerts, see Windows 2000 Server Help.

Microsoft Operations Manager 2000

Microsoft Operations Manager 2000 provides comprehensive event management, proactive monitoring and alerting, reporting, and trend analysis. Application Management Pack—the extensive product support knowledge base included in Microsoft Operations Manager—helps reduce day-to-day support costs associated with running applications and services in a Microsoft Windows–based IT infrastructure. Microsoft Operations Manager 2000 management packs provide necessary operational knowledge about Windows 2000 Server and Exchange 2000 Server.

The following figure shows typical information available from Microsoft Operations Manager.

Cc751280.troup02(en-us,TechNet.10).gif

Figure 2: Microsoft Operations Manager

Using Microsoft Operations Manager 2000, you can:

  • Check system status from a Web console

  • Create sophisticated rules to respond to events

  • Generate custom reports

  • Handle basic operational tasks using one of the add-on management packs

Microsoft Operations Manager 2000 has a full set of features that help administrators monitor and manage the events and performance of Windows 2000–based server systems.

For more information on Microsoft Operations Manager 2000, see the product Web site at https://www.microsoft.com/mom/.

Event Viewer

Using the event logs in Event Viewer, you can gather information about hardware, software, and system problems, and you can monitor Windows 2000 security events.

The EventLog service starts automatically when you start Windows 2000 and records events in three kinds of logs as outlined in the following table.

Table 1 Logs used by the event viewer

Log

Description

Application log

The application log contains events logged by Exchange 2000 and other applications. Most Exchange 2000 events are logged in the application log.

System log

The system log contains events logged by the Windows 2000 system components. For example, the failure of a driver or other system component to load during startup is recorded in the system log. The event types logged by system components are predetermined by Windows 2000.

Security log

The security log can record security events such as valid and invalid logon attempts, as well as events related to resource use, such as creating, opening, or deleting files. An administrator can specify what events are recorded in the security log. For example, if you enabled logon auditing, attempts to log on to the system are recorded in the security log.

Event Viewer displays the types of events outlined in the following table:

Table 2 Events displayed by Event Viewer

Event

Description

Error

Indicates a significant problem, such as loss of data or loss of functionality. For example, if a service fails to load during startup, an error is logged.

Warning

Indicates a potentially significant problem. For example, when disk space is low, a warning is logged.

Information

Indicates the successful operation of an application, driver, or service. For example, when a network driver loads successfully, an information event is logged.

Success Audit

Indicates a successful audited security access attempt. For example, if a user's attempt to log onto the system is successful, a success audit event is logged.

Failure Audit

Indicates an audited security access attempt has failed. For example, if a users attempt to access a network drive fails, a failure audit event is logged.

For more information about Event Viewer, see Windows 2000 Server Help.

Network Monitor

Network Monitor enables you to detect and troubleshoot problems on LANs. Using Network Monitor, you can:

  • Identify network traffic patterns and network problems. For example, you can locate client-to-server connection problems, find a computer that makes a disproportionate number of work requests, and identify unauthorized users on your network.

  • Capture frames (packets) directly from the network.

  • Display, filter, save, and print the captured frames.

Instructions for using Network Monitor to troubleshoot performance can be found in the Troubleshooting Performance section later in this document.

For more information about Network Monitor, see the following knowledge base articles:

File Monitor Tool

The System Internals File Monitor tool available at https://www.sysinternals.com monitors and displays file system activity on a system in real-time. Its advanced capabilities make it a powerful tool for exploring the way Windows works, seeing how applications use files and DLLs, or assessing problems in system or application file configurations. File Monitors time stamping feature precisely indicates when every open, read, write, or delete occurs, and its status column indicates the outcome. File Monitor begins monitoring when you start it, and its output window can be saved to a file for off-line viewing. It has full search capability and filters.

For more information on the System Internals File Monitor tool, see the Troubleshooting Performance section later in this document and the System Internals Web site at: https://www.sysinternals.com.

Note This third-party contact information is provided to help you find the technical support you need. This contact information is subject to change without notice. Microsoft in no way guarantees the accuracy of this third-party contact information.

Notations Used In This Article

This article covers many performance counters. Performance counters are made up of the following three parts:

  • Performance Object This is the part of the computer being monitored. Some of the most commonly used objects are Processor, Memory, and PhysicalDisk. When Exchange 2000 is installed, new objects such as MSExchangeIS are added to the performance object list.

  • Counters The counters available for a performance object are the parts of the object you can monitor. For example, on the Memory object, you can monitor the available bytes, kilobytes, and megabytes of memory, as well as the page faults per second or total pages per second.

  • Instances There may be multiple objects or counters to monitor on the machine. For example, when looking at counters under the Processor object on a multiple processor machine, you will see as many instances as there are processors on that machine. You can choose to monitor only a specific processor or all processors.

When performance counters are referenced in this article, they will be listed in this format:

Performance Object(Instance)\Counter

Note The instance is not a requirement. For example:

PhysicalDisk\% Disk Time

Establishing a Baseline

To know that you are having a performance problem, you must establish consistent monitoring of your Exchange 2000 servers and a baseline of normal server performance. Immediately after setting up an Exchange 2000 server, you should use the counters below to monitor your servers performance and establish a baseline.

Minimal Set of Counters

The following counters are the minimal set of counters you should use to establish a baseline and monitor overall server health. A description and recommended value are provided for each counter. Use the recommended value for each counter to monitor performance.

Note There are many counters you can use to establish a baseline specific to your organization and to monitor your Exchange 2000 servers performance. See the Appendix section later in this document for a complete list of counters, with a description and recommended value for each.

Table 3 Minimal Set of Counters

Counter

Description

Recommended Value

MSExchange IS Mailbox\Message Opens/sec

Message Opens/sec indicates the rate that requests to open messages are submitted to the Exchange store.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

MSExchangeIS Mailbox\Folder Opens/sec

Folder Opens/sec indicates the rate that requests to open folders are submitted to the Exchange store.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

MSExchangeIS Mailbox\Local Delivery Rate

Local Delivery Rate indicates the rate at which messages are being delivered locally.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

MSExchangeIS\ RPC Operations
/sec

RPC Operations/sec indicates the rate that RPC operations occur. This counter tells you how many RPC requests are outstanding. If Outlook is notifying users that it cannot contact their Exchange server, it is likely that this counter will show significant spikes.

The value of this counter is will be specific to your organization, but in standard operation this counter should remain at 0 on 4 processor machines. Use this counter to establish a baseline of normal server performance.

MSExchangeIS\RPC Requests

RPC Requests indicates the number of client requests that are currently being processed by the Exchange store.

This counter should not exceed 100. You should also use this counter to establish a baseline of normal server performance.

PhysicalDisk
(_Total)Disk Transfers/sec

Disk Transfers/sec indicates the number of completed read and write operations per second. This counter measures disk utilization and is expressed as a percentage. Values over 50 percent might indicate that the disk is becoming a bottleneck.

This counter should remain below 50 percent. You should also use this counter to establish a baseline of normal server performance.

Process
(store)\% Processor Time

% Processor Time indicates the percentage of time the processor is running non-idle threads. You can use this counter to monitor the percent each Exchange service is using the processor.

An average value that is below 20 percent indicates the server is unused or services are down. An average value that is consistently above 75-80 percent indicates that the server is overburdened. Use this counter to establish a baseline of normal server performance.

Processor
(_Total)\% Processor Time

% Processor Time indicates the percentage of time the processor is running non-idle threads. You can use this counter to monitor the percent each Exchange service is using the processor.

An average value that is below 20 percent indicates the server is underused or services are down. An average value that is consistently above 75 to 80 percent indicates that the server is overburdened and you should consider moving users to another server. Use this counter to establish a baseline of normal server performance.

SMTP Server\Local Queue Length

Local Queue Length indicates the number of messages in the local SMTP queue.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

SMTP Server\Messages Delivered/sec

Messages Delivered/sec indicates the rate that messages are being delivered to local mailboxes.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

SMTP Server\Messages Received/sec

Messages Received/sec indicates the rate that messages are being received.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

SMTP Server\Messages Sent/sec

Messages Sent/sec indicates the rate that messages are being sent.

The value of this counter is specific to your organization. Use this counter to establish a baseline of normal server performance.

Note Before troubleshooting disk problems, at the command prompt, run diskperf –y to activate logical disk counters as well as physical disk counters.

Example Baseline

After you begin monitoring your Exchange 2000 servers, you can use the data you capture to establish your baseline. The following sections provide questions you should answer about your normal server performance, as well as System Monitor capture examples.

Questions to Answer

When establishing your baseline, it is important that you answer questions like the following. Answers to these questions will help you interpret current performance data and investigate performance problems.

  • What is the average number of messages users receive per day?

  • How many messages do users open, and how often do they open folders?

  • What is the peak delivery rate, the peak period during the day, and the peak day of the week?

  • Are there monthly or quarterly peaks?

  • How many more users can your servers support?

Your goal is to compare baseline data you have gathered from typical load periods against current performance data. By comparing baseline data with your servers current performance, you can determine if the server is operating normally or if there are performance problems. Answering the questions above also helps you analyze current performance data and identify performance problems.

System Monitor Examples

The following are example System Monitor performance data captures. Consider leaving System Monitor running all the time for easy access, gathering data at the following rates.

  • 900 second for a 24-hour view

  • 60 seconds for a 1- to 2-hour view

  • 10 seconds to catch short-lived spikes

The following System Monitor illustrations were captured while monitoring a production Exchange 2000 Service Pack 3 server. The first illustration shows performance data with a 24-hour view and represents both peak and non-peak operation. The second and third illustrations capture 1 to 2 hour and short-term views captured during business hours.

The following figure illustrates System Monitor capturing data with a 24-hour view.

Cc751280.troup03(en-us,TechNet.10).gif

Figure 3: System Monitor data with a 24-hour view.

The following figure illustrates a System Monitor capturing data with a 1- to 2-hour view.

Cc751280.troup04(en-us,TechNet.10).gif

Figure 4: System Monitor data with a 1- to 2-hour view.

The following figure illustrates a System Monitor capturing data every 10 seconds to catch short-lived spikes.

Cc751280.troup05(en-us,TechNet.10).gif

Figure 5: System Monitor data with short-lived spikes

Monitoring performance using the views illustrated above and the minimal set of counters allows you to establish a baseline, as well as monitor your servers for performance problems. Using the recommended values for the minimal set of counters as a guide, you can see that the server represented in Figure 6 is generally healthy. The RPC Operations/sec counter generally remains 0, the RPC Requests counter remains below 100, the Disk Transfers/sec counter generally remains below 50%, and % Processor Time (_Total) and % Processor Time (STORE.EXE) remains below 80%.

Figure 6 also illustrates that spikes in server performance are normal. In Figure 6, the Folder opens/sec counter, % Processor Time (STORE.EXE), and % Processor Time (_Total) counters temporarily spike and then return to lower levels.

Note You can save performance data in log files using the Performance Logs and Alerts tool. Performance Logs and Alerts save the performance data in log files, allowing you to compare performance data saved during typical load times against current performance data. You can then view the data in the logs files using System Monitor.

Troubleshooting Performance

After you have established your baseline of normal server performance and established monitoring of your Exchange 2000 servers, you will be able to detect performance problems. The following sections help you isolate performance problems.

Is the Problem with Exchange, Before the NetworkExchange, or a DiskAfter Exchange?

When investigating Exchange performance problems, you may have indications of performance problems from monitoring data or from users who simply say their mail is slow. The first step in isolating an Exchange performance problem is to determine if the problem is with Exchange itself, thebefore Exchange, such as a network problem, or after Exchange, such a disk problem.

The following performance counters help you determine if the requests made by clients are even reaching the Exchange server.

MSExchangeIS\RPC Requests
MSExchangeIS\RPC Operations/sec

The

MSExchangeIS\RPC Requests

counter indicates the number of MAPI RPC requests presently being serviced by the Exchange store. The Exchange store can only service 100 requests simultaneously. The FakePre-5d95dc13950e4d1bba1a3fcb60e4a35f-94ffb11053984dc4b0ff66529e00f275 counter indicates how many RPC operations are being asked of the Exchange store per second, and how many it is actually responding to per second.

The performance problem is occurring with the networkbefore Exchange if the RPC Requests are low, and the RPC Operations per second (outstanding requests) is zero. All other combinations point to a problem with Exchange 2000 or a problem with a diskafter Exchange 2000.

The following figure illustrates an issue with Exchange performance that was identified using the

MSExchangeIS\RPC Requests

counter and the FakePre-bacb3991c5334719b6e041ef4704fed3-2fb81bd476c64b17a59030af4ded3f6d counters.

Cc751280.troup06(en-us,TechNet.10).gif

Figure 6: Example of an Exchange 2000 performance issue

Figure 6 illustrates an Exchange performance issue. No operations are executing for a 3-minute period, but the Exchange store has outstanding requests.

The following figure illustrates another performance problem with Exchange that was identified using the

MSExchangeIS\RPC Requests

counter and the FakePre-2fb22a49c1024dbb9c92001f20f5b6dc-e906dd88651940f1a273c0c859d59bd2 counter.

Cc751280.troup07(en-us,TechNet.10).gif

Figure 7: Example of an Exchange 2000 performance issueproblem

Figure 7 illustrates four periods of time in which where of increasing outstanding requests are increasing, while throughput is decreasing drops. In these periods, RPC requests increase but RPC Operations/sec isn’t keeping updoes not match. In the third RPC Requests spike (illustrated in green), the RPC requests rate goes up, but the RPC Operations/sec rate (illustrated in red) is zero during the period of the spike.

The following figure illustrates a client problem identified using the

MSExchangeIS\RPC Requests

counter and the FakePre-2e299eaf7aa74615b09cb3cb26a232d6-3a8543ac521b4dca8dabc206221c3dc7 counters.

Cc751280.troup08(en-us,TechNet.10).gif

Figure 8: Example of a client performance issue

Figure 8 illustrates a client performance issue. The RPC Operations/sec and the RPC Request rate are growing simultaneously. A client may be running a utility or script that is making many requests of the Exchange store and the Exchange store is struggling to keep up. In this situation, you could use the Network Monitor tool to find the machine from which the requests are coming.

The following figure illustrates a network problem identified using the

MSExchangeIS\RPC Requests

counter and the FakePre-1a861dc5088043eca9c885590ea01fa1-4df1c9d3a6bc4aeebc02efa1aa2b0217 counters.

Cc751280.troup09(en-us,TechNet.10).gif

Figure 9: Example of a network performance issue

Figure 9 illustrates a network performance problem. In two cases, the RPC Operations/sec and the RPC Requests are both zero. In this situation, something is preventing the requests from arriving at the Exchange store. You can use the Network Monitor tool to determine whether requests are arriving at the server.

Determine the Type of Problem

After determining if the problem is with Exchange, before Exchange, or after Exchange, you must get answers to common questions that will help you determine the next step you must take. Before beginning troubleshooting, you should have the answers to the following questions about clients and hardware on the server on which the problem is happening:

  • Are clients acting sluggish or have they stopped responding?

  • 9Is the problem happening with a particular client operation?

  • Do all clients experience the problem at the same time?

  • At what frequency does the problem occur?

  • What hardware does the server have?

  • How many processors are there on the server?

  • How much memory is there on the server?

  • For each physical disk volume, how many disks exist and how are they configured (such as RAID-0, RAID-1, RAID-5)?

  • What versions of Exchange, Windows, and their Service Packs are installed, and are the correct versions installed?

  • Does the servers hardware meet the hardware requirements for the installed software?

  • Will the bandwidth support what is being attempted (for example, using Site connector over a 56k line)?

  • Is the scenario being attempted a supported scenario?

  • Could the network be the problem? Confirm all IP information (WINS, DNS, Global Catalog/Domain Controller communication.

CPU Performance Issues

CPU bottlenecks are the easiest bottlenecks to detect. If the

Processor(_Total)\% Processor Time

counter is approaching 100%, then that indicates a CPU bottleneck.

Important If you are running Content Indexing on the server, it utilizes all available CPU, so disable it while you investigate a performance problem. Content Indexing appropriates all idle CPU processing power and uses it. If another process requests additional CPU power from the system while Content Indexing is running, the content indexing engine relinquishes the CPU.

If the

Processor(_Total)\% Processor Time

counter is high, check to see if the FakePre-f0731d893ec14a81859789739da1cfca-b11fd2616b2b4beaa299302a20cc0b83 counter is increasing. If the FakePre-099d82b0ddc44ebf924a353fb0b00da8-ce82f6ff225045b5be55c8b12e72a830 counter reaches the maximum of 100, it will cause client timeouts. The Exchange store can only handle 100 simultaneous RPC requests.

The following figure illustrates a CPU performance issue.

Figure 10 An example CPU performance issue

Figure 10 illustrates a sudden increase in the local delivery rate. As a result, CPU usage rose to 100%. In this situation, the CPU is working at capacity delivering local messages.

What is Consuming the CPU?

After you have determined the problem is with the CPU, you should determine what is consuming the CPU. The counters below are the most likely suspects for this problem, in order from most likely to least likely. These four counters normally add up to 90% of the CPU being used.

Process(store)\% Processor Time
Process(inetinfo)\% Processor Time
Process(emsmta)\% Processor Time
Process(system)\% Processor Time

Note Process counters count 100 percent for each CPU on the server. On an 8-processor machine, the value of each of the processor counters above would be between 0 percent and 800 percent.

The following figure illustrates a histogram view of the processes that are most likely to consume the CPU.

Cc751280.troup11(en-us,TechNet.10).gif

Figure 11: A histogram view of processes most likely to consume the CPU

Figure 11 illustrates that the Exchange store process is consuming most of the CPU. If you suspect that other processes, besides the four most likely, may be consuming the CPU, you should include them in this histogram view.

Note Viewing multiple counters in histogram view in System Monitor is a quick way to isolate the counter indicating a problem.

The following are other common processes that may consume the CPU:

  • Backup utilities

  • Monitoring utilities

  • Remote access tools

Isolating Threads

An advanced step that may help you further determine what process is consuming the CPU is to monitor the individual threads using the CPU to isolate the thread or threads that are consuming it in a specific process.

Use the same histogram view technique in System Monitor to isolate the thread consuming the CPU, as you did to isolate the process. Add all

Thread(process/threadnumber)\% Processor Time

counters for the target process to a histogram view of System Monitor. You can identify the thread using the FakePre-fb3991ac31e7424394b5cac3fcd27ba7-585af2c119d848e1b072aaeea87c6da8 counter.

Disk Performance Issues

Unlike CPU performance issues, disk performance issues cannot be diagnosed with a single counter that indicates that you have a disk bottleneck.

Note A disk bottleneck can also be a result of memory issues, and cannot be solved by simply adding more spindles.

Ensure when you size your Exchange 2000 disk configurations, to size for I/O capacity and not for disk space alone. Microsoft recommends RAID 0+1 because this configuration tends to result in more I/O capacity than RAID 5.

Note Before troubleshooting disk issues, at the command prompt, run diskperf –y to activate logical, as well as physical, disk counters.

Disk Performance Issues: Approach One

The first approach to determining if you are encountering a disk bottleneck is to monitor the following counters for each of your physical drives.

PhysicalDisk(drive:)\Disk Writes/sec
PhysicalDisk(drive:)\Disk Reads/sec

Note Before troubleshooting disk performance problems, at the command prompt, run diskperf –y to activate logical, as well as physical, disk counters.

Look at each drive and compare to the total instance to isolate where the I/O is going. You can use the recommendations below to assist with the comparison and determine if you have a bottleneck.

  • Raid-0: Reads/sec + Writes/sec < # Spindles x 100

  • Raid-1: Reads/sec + 2 * Writes/sec < # Spindles x 100 (each write has to go to each mirror on the array)

  • Raid-5: Reads/sec + 4 * Writes/sec < # Spindles x 100 (each write requires two reads and two writes)

Note This assumes disk throughput is equal to 100 random I/O per spindle.

For more information about RAID, see the following RAID Levels section.

RAID Levels

Although there are many different implementations of RAID technologies, they all share two similar aspects. They all use multiple physical disks to distribute data, and they all store data according to a logic that is independent of the application for which they are storing data.

This section discusses four primary implementations of RAID: RAID-0, RAID-1, RAID-0+1, and RAID-5. Although there are many other RAID implementations, these four types serve as a representation of the overall scope of RAID solutions.

RAID-0

RAID-0 is a striped disk array; each disk is logically partitioned in such a way that a "stripe" runs across all the disks in the array to create a single logical partition. For example, if a file is saved to a RAID-0 array, and the application that is saving the file saves it to drive D, the RAID-0 array distributes the file across logical drive D (see Figure 12). In this example, it spans all six disks.

Cc751280.troup12(en-us,TechNet.10).gif

Figure 12: RAID-0 disk array

From a performance perspective, RAID-0 is the most efficient RAID technology because it can write to all six disks at once. When all disks store the application data, the most efficient use of the disks occurs.

The drawback to RAID-0 is its lack of reliability. If the Exchange mailbox databases are stored across a RAID-0 array and a single disk fails, you must restore the mailbox databases to a functional disk array and restore the transaction log files. In addition, if you store the transaction log files on this array and you lose a disk, you can perform only a point-in-time restoration of the mailbox databases from the last backup.

RAID-1

RAID-1 is a mirrored disk array in which two disks are mirrored (see Figure 13).

Figure 13: RAID-1 disk array

Figure 13: RAID-1 disk array

RAID-1 is the most reliable of the three RAID disk arrays because all data is mirrored after it is written. You can use only half of the storage space on the disks. Although this may seem inefficient, RAID 1 is the preferred choice for data that requires the highest possible reliability.

RAID-0+1

A RAID-0+1 disk array allows for the highest performance while ensuring redundancy by combining elements of RAID-0 and RAID-1 (see Figure 14).

Cc751280.troup14(en-us,TechNet.10).gif

Figure 14: RAID-0+1 disk array

In a RAID-0+1 disk array, data is mirrored to both sets of disks (RAID-1), and then striped across the drives (RAID-0). Each physical disk is duplicated in the array. If you have a six-disk RAID-0+1 disk array, three disks are available for data storage.

RAID-5

RAID-5 is a striped disk array, similar to RAID-0 in that data is distributed across the array; however, RAID-5 also includes parity. This means that there is a mechanism that maintains the integrity of the data stored in the array, so that if one disk in the array fails, the data can be reconstructed from the remaining disks (see Figure 15). Thus, RAID-5 is a reliable storage solution.

Cc751280.troup15(en-us,TechNet.10).gif

Figure 15: RAID-5 disk array

However, to maintain parity among the disks, 1/n GB of disk space is sacrificed (where n equals the number of drives in the array). For example, if you have six 9-GB disks, you have 45 GB of usable storage space. To maintain parity, one write of data is translated into two writes and two reads in the RAID-5 array; thus, overall performance is degraded.

The advantage of a RAID-5 solution is that it is reliable and uses disk space more efficiently than RAID-1 (and 1+0).

For more information on comparing RAID solutions and RAID levels, as well as Storage Area Network (SAN) and Network Attached Storage (NAS) solutions, see the Storage Solutions for Microsoft Exchange 2000 Server white paper.

Disk Performance Problems: Approach Two

The second approach to determining if you are encountering a disk bottleneck requires looking at the I/O requests waiting to be completed using the following disk queue counters.

PhysicalDisk(drive:)\Avg. Disk Queue
PhysicalDisk(drive:)\Current Disk Queue

The

PhysicalDisk(drive:)\Avg. Disk Queue

counter indicates the average queue length over the sampling interval. The FakePre-63d05b9d06344a6991e41a9ef44abc1a-d4444469a9fe4854acfab03c53a425bb counter reports the queue length value at the instant of sampling.

You are encountering a disk bottleneck if the average disk queue length is greater than the number of spindles on the array and the current disk queue length never equals zero. Short spikes in the queue length can drive up the queue length average artificially, so you must monitor the current disk queue length. If it drops to zero periodically, the queue is being cleared and you probably do not have a disk bottleneck.

Note When using this approach, correlate the queue length spikes with the

MSExchangeIS\RPC Requests

counter to confirm the effect on clients.

Disk Problems: Approach Three

For the third approach to determining if you are encountering a disk bottleneck, look at the I/O latency, which can give you an indication of the health of your disks:

PhysicalDisk(drive:)\Avg. Disk sec/Read
PhysicalDisk(drive:)\Avg. Disk sec/Write

A typical range is .005 to .020 seconds for random I/O. If write-back caching is enabled in the array controller, the

PhysicalDisk(drive:)\Avg. Disk sec/Write

counter should be less than .002 seconds.

If these counters are between .020 and .050 seconds, there is the possibility of a disk bottleneck. If the counters are above .050 seconds, there is definitely a disk bottleneck.

What is Causing the I/O?

After determining that you have a disk problem, you may want to determine what is causing the I/O. First, you must identify the drive on which the I/O is occurring. If you separate the various Exchange files on to separate volumes, you can more easily identify if it is the paging file, the .edb (Exchange database) file, the .stm (Exchange streaming database) file, the .log (log files) files, or the routing queue files that are causing the I/O.

In Windows 2000, you can use these counters to help determine which process is causing the disk I/O:

Process(process name)\IO Read Operations/sec
Process(process name)\IO Write Operations/sec

Second, you can use the System Internals File Monitor tool to determine which file or files is showing I/O activity. Choose the logical disks that need investigation and show all disk reads and writes. This is particularly useful for multi-use disks, such as C:\, which may have several major files on it that are used by the system or applications.

The following figure illustrates the System Internals File Monitor tool.

Cc751280.troup16(en-us,TechNet.10).gif

Figure 16: System Internals File Monitor tool output showing the I/O going to priv1.stm and priv1.edb

Note This third-party contact information is provided to help you find the technical support you need. This contact information is subject to change without notice. Microsoft in no way guarantees the accuracy of this third-party contact information.

Memory Problems

When investigating memory problems, the first counter to use to monitor physical memory usage is

Memory\Available MBytes

. If this counter goes below 4 megabytes, Windows will aggressively start cutting the working sets of running processes. The server is generally healthy if FakePre-059077ee11ff4ef39980514d898fd996-e8b5c0b461e84870b9b5dbd63fdda62d is greater than 4 megabytes.

Primary Counters

The following counters are the primary counters to use when investigating memory problems. They help you determine if there are paging problems. These counters provide information about hard pages: pages that are causing information to go to and from the disk.

Memory\pages/sec
Memory\page reads/sec
Memory\page writes/sec

Note Paging I/O is normal because Exchange 2000 uses the Windows system cache to back the .STM file.

Additional Counters

There are additional counters you can use to further investigate memory problems:

Memory\Page Faults/sec
Memory\Cache Faults/sec
Memory\Transitions Faults/sec
Process(process)\Page Fualts/sec

The

Memory\Page Faults/sec

counter is often not an indication of a memory problem because it also includes the FakePre-2a5c3de573bc4bcd95385e3a4e214bd3-cb54f87fd2de426c8db4a318e769ef8d counter and cache faults are a normal part of Exchange 2000 operation due to the .stm file. Also, both the FakePre-7dae130f89fc40b985ea9d8836c196df-2e2b2e6f72084b5bbad3201d7485adb8 counter and the FakePre-aab835da83fe4c5288c55702f2c5cecf-35d7c3b56447442e99350b59dd522ce5 counter include transition faults indicated by the FakePre-2e6ab9d629124d2394a88df8416ff658-b4dbcfcfd1f34279bb4da6e9b577ad1e counter. Transition faults are faults that do not go to the disk because the memory manager has the pages on the standby list.

The

Process(process)\Page Faults/sec

counter can be useful to identify processes with high page faults. Using System Monitor, add processes in a histogram view to quickly identify the process with high page faults. This is similar to the technique used to identify processes consuming the CPU, which is described in the CPU problems section earlier in this article.

Note This counter should be used as a guide. Page faults do not necessarily indicate a memory problem. However, a process with high page faults is probably also generating a lot of page read and write operations.

Where Did The Memory Go?

To determine where memory is being used, monitor the following counters, which are the most likely suspects (starting at the top with the most likely) for memory consumption.

Process(store)\Working Set
Process(inetinfo)\Working Set
Process(emsmta)\Working Set
Memory\Cache Bytes

The Exchange store process indicated by the

Process(store)\Working Set

counter tends to consume most of the committed bytes. This is because the Exchange store maintains a large cache. You can use the FakePre-dbd4d35450704dc2865e0f687cc2662e-0c4dd156239e464e92efde11971cd8fa counter to confirm this.

You can also use the histogram view technique in System Monitor to identify the processes with large working sets. This technique is discussed in the CPU problems section earlier in this article.

Virtual Memory

One of the most problematic areas of Exchange scaling is the lack of virtual memory in the store.exe process. As you scale a server to accommodate more users and more usage, the server may run low on virtual memory. This problem is signified by the presence of 9582 events in the Application log. In some cases, these events are informational or routine, and can be ignored. In other cases, the lack of virtual memory can cause severe performance degradation and message processing errors (signaled by 12800 events). The following is an example of 9582 events.

The Information Store service logs the following events if the virtual memory for your Exchange 2000 server becomes excessively fragmented:

EventID=9582
Severity=Warning
Facility=Perfmon
Language=English
The virtual memory necessary to run your Exchange server is fragmented in
such a way that performance may be affected. It is highly recommended that
you restart all Exchange services to correct this issue.

Note This warning is logged if the largest free block is smaller than 32 MB.

EventID=9582
Severity=Error
Facility=Perfmon
Language=English
The virtual memory necessary to run your Exchange server is fragmented in
such a way that normal operation may begin to fail. It is highly
recommended that you restart all Exchange services to correct this issue.

Note This error is logged if the largest free block is smaller than 16 MB.

There is virtually no correlation between physical memory and virtual memory. Errors indicating that you are out of virtual memory cannot be solved by adding more physical memory. Additionally, errors indicating that you are out of virtual memory and virtual memory fragmentation are not just a feature of active/active clustering; active/passive clusters and even standalone machines can suffer from virtual memory issues as well. However, you will notice virtual memory issues more frequently on clusters because standalone servers are not usually scaled to multiple thousands of users.

To troubleshoot virtual memory problems

  1. Determine if your server is running Windows 2000 Server or Windows 2000 Advanced Server. If your server is running Windows 2000 Server, ensure that the /3GB switch is not in the boot.ini file. If your server is running Windows 2000 Advanced Server and there is more than 1GB of physical RAM installed, ensure that the /3GB switch does appear in the boot.ini file.

    Note Because Windows 2000 Server does not support the /3GB tuning switch, do not attempt to scale individual servers to host multiple thousands of users.

  2. Check the Application log for 9582 warnings (less than 32MB virtual memory blocks available) or 9582 errors (less than 16MB virtual memory blocks available). On some large systems, it is usual to drop below the 32MB threshold during peak activity; however the available virtual memory should rise significantly during non peak activity.

  3. Check the application log for other errors that indicate that you are out of memory (such as 12800 MIME processing errors) in addition to 9582 warnings. If the warnings are accompanied by other errors indicating that you are out of memory, users may be unable to access mail. If no other processing errors occur, and users are able to access their mail, it indicates that the 9582 warnings may be relatively harmless. However, 9582 warnings should still be investigated for possible action.

  4. Monitor the

    MSExchangeIS\VM Largest Block Size

counter. This counter is the best way to investigate virtual memory issues. You can monitor this counter in real time, or monitor 1-minute intervals. Collecting 18-24 hours of data to determine if a trend indicates that memory is being released. Monitor the minimum value to see what the drop is. It can be normal on large servers if this minimum value is around 55MB.

  1. Be aware that other store-related processes, such as virus scanning, can tip the threshold. However, as long as user performance is not affected, and the virtual memory block grows again during non-peak activity, corrective action may not be necessary. However, if you expect user load to increase, you may want to reduce overall virtual memory consumption so that the server can accommodate a greater load.

  2. To reduce virtual memory consumption, first ensure that the Exchange server is running Exchange 2000 Server Service Pack 3 (SP3); Exchange SP3 has specific virtual memory optimizations.

  3. If 9582 warnings are still being logged, then you must perform a registry change. This registry change is acceptable as long as there is an adequate amount of RAM available on the server. Monitor the

    Memory\Available Bytes

counter. Make sure the counter indicates more than 200MB.

Change HKEY\_LOCAL\_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\Session Manager\\HeapDeCommitFreeBlockThreshold to equal 262144.
  1. At this point, if you are still experiencing virtual memory issues, then you either have a heavily loaded system or memory leaks. If you suspect a memory leak, monitor the

    Process(Store)\Private Bytes

counter to determine if it is growing over time. If you suspect the system is overloaded, you will likely also encounter other indications, such as high CPU utilization.

  1. If the 9582 warnings have not stopped, then look at your

    Physical Disk\Disk Queue Length 

counter for the database and transaction log drives (run the diskperf –y command at the command prompt to view these counters). The disk queue length should never be consistently above the number of spindles in the array. You will see peaks, and ranges into the low hundreds are acceptable. If the disk queue lengths are up into 300 and beyond consistently, then you may have a disk bottleneck.

  1. If you reach this point and are still encountering virtual memory issues, you must further reduce virtual memory consumption. For example, find out what the storage group and database configuration is. You can reduce this to 3 storage groups—to try to further reduce virtual memory consumption.

Other Problems

If after you have successfully investigated problems with Exchange, the CPU, the physical disks, and memory, but the server is still running slow, you can also investigate the following items.

Active Directory

Exchange 2000 Server is dependant on Active Directory. You can investigate CPU, and disk and memory bottlenecks on your Active Directory servers. Most techniques used to identify and investigate problems with Exchange 2000 servers are equally applicable to Windows 2000 Active Directory servers.

DSAccess

DSAccess is the cache on the Exchange server that caches frequent Active Directory queries from the Exchange server. By caching Active Directory information, the Exchange server doesn't have to contact an Active Directory server each time a query is needed. The following counters are useful for investigating problems with DSAccess:

MSExchangeDSAccess Caches\Cache Hits/Sec
MSExchangeDSAccess Caches\LDAP Searches/Sec

You should compare the current data from these counters with baseline data from other servers that are operating normally.

Network Problems

Network problems can result in information not getting to the Exchange server. The following counters are useful for investigating network problems:

Network Interface(netcard)Bytes Recieved/sec
Network Interface(netcard)Bytes Sent/sec

In datacenter environments, or in environments in which there are high bandwidth connections, network problems are rare. However, you could possibly create a network problem by, for example, scheduling backup operations during the day when you should have scheduled them at night.

Using Network Monitor

If client traffic is not getting to your Exchange server, you can use the Network Monitor tool to examine the traffic. Network Monitor is a network diagnostic tool that monitors local area networks and provides a graphical display of network statistics. While collecting information from the network's data stream, Network Monitor displays the following types of information:

  • The source address of the computer that sent a frame to the network (this address is a unique hexadecimal (or base-16) number that identifies that computer on the network)

  • The destination address of the computer that received the frame

  • The protocols used to send the frame

  • The data, or a portion of the message being sent

The process by which Network Monitor collects this information is called capturing. By default, Network Monitor gathers statistics on all the frames it detects on the network into a capture buffer, which is a reserved storage area in memory. To capture statistics on only a specific subset of frames, you can single out these frames by designing a capture filter. When you have finished capturing information, you can design a display filter to specify how much of the captured information is displayed in Network Monitor's Frame Viewer window.

To use Network Monitor, your computer must have a network card that supports promiscuous mode. If you are using Network Monitor on a remote machine, the local workstation does not need a network adapter card that supports promiscuous mode, but the remote computer does.

Once data has been captured either locally or remotely, the data can be saved to a text or a capture file, and can be opened and examined later.

Note To fully troubleshoot possible network issues using Network Monitor, consider configuring Network Monitor to capture not only what the client sends and receives, but also to capture what the server is sending and receiving. Performing both a client and server-side trace of network traffic further helps you troubleshoot network issues.

Creating an Address List

To use address pairs in a capture filter, you should first build an address database. After this database is built, you can use the addresses listed in the database to specify address pairs in a capture filter.

To create an address list

  1. From the Capture menu, select Start. Optionally, open a .cap file in the Frame Viewer window.

  2. When you finish capturing information, select Stop and View from the Capture menu to display the Frame Viewer window.

  3. From the Display menu, select Find All Names. Network Monitor processes the frames and then adds them to the address database.

  4. Close the Frame Viewer window, and display the Capture window.

  5. From the Capture menu, select Filter to display the Capture Filter dialog box.

  6. In the Capture Filter dialog box, double-click Address Pairs. Or, click Address in the Add box.

  7. Network Monitor displays the address database you created. You can use the names in this database to specify address pairs in the capture filter.

To monitor traffic between two computers

  1. From the Capture menu, choose1select Filter to display the Capture Filter dialog box.

  2. Double-click ANY<->ANY to display the Address Expression dialog box.

  3. In the left window of the Address Expression dialog box, select the address of a computer.

  4. In the right window of the Address Expression dialog box, select the address of a computer.

  5. In Direction, select one of the symbols:

    • Select the <--> symbol to monitor the traffic that passes in either direction between the addresses that you selected.

    • Select the --> symbol to monitor only the traffic that passes from the address selected in the left window to the address selected in the right window.

    • Choose the <-- symbol to monitor only the traffic that passes from the address selected in the right window to the address selected in the left window.

  6. Click OK.

  7. In the Capture Filter dialog box, click OK.

  8. From the Capture menu, choose Start.

Tracing in a WAN Environment

When troubleshooting network problems, you may need to create a capture of network traffic between two specific computers that are separated by one or more routers. In this case, you may want to analyze all network traffic between the first computer and its nearest router, and all network traffic between the second computer and its nearest router. Most of the time, this analysis is done to check whether network packets are being lost or corrupted somewhere between the routers. To make these traces consistent and to be able to read these traces simultaneously, the system clocks must be synchronized between the two computers before making the trace.

To synchronize time between two computers

  1. From the computer against which you want to synchronize the time, at the command prompt, type net time \\ComputerName /set /yes, where ComputerName is the name of the computer to which you want to synchronize.

  2. Verify the computers have the same time by typing TIME at the command prompt for each computer.

  3. Proceed with the trace.

Measuring Non-MAPI Requests

In the same way you used the RCP counters to examine the use of the Exchange store by MAPI clients, such as Outlook, you can use another set of queue counters to examine the use of the Exchange store by POP3, IMAP4, SMTP, DAV, and NNTP clients. These counters are contained in the Epoxy performance object. These are queues in which information is passed out of IIS to the Exchange store and then returned from the Exchange store to IIS.

Epoxy(protocol)\Client Out Que Len
Epoxy(protocol)\Store Out Que Len

The

Epoxy(protocol)\Client Out Que Len

counter indicates the number of requests waiting to be processed by the Exchange store, and the FakePre-1a3a780005714e99b110dd2e1753cde7-a293e341c7cc448f8a44f802ce8c15c8 counter indicates the number of requests waiting to be processed by the IIS protocol handlers. You can use these counters to investigate whether information is being successfully passed between IIS and the Exchange store.

Message Delivery Counters

The Exchange store responds preferentially to user requests as opposed to delivering mail. If your servers begin to build delivery queues, this is a sign that you have an overbooked server. This means that user requests are arriving at such a high rate that the server cannot efficiently process the e-mail. Use the following counters to monitor message delivery.

SMTP Server\Local Queue Length
SMTP Server\Messages Delivered/sec

The SMTP Server\Local Queue Length counter should not grow continuously. This counter grows during peak lead periods, and anywhere from 0 to 1000 is a reasonable length. The SMTP Server\Messages Delivered/sec counter should be continuous. Gaps of zero delivery followed by spikes of delivery are indicative of a bottleneck.

Appendix

Performance Counters

The following are additional performance counters that can be used to monitor the health of your Exchange 2000 servers or to establish a baseline. They are grouped by their performance object area. When investigating a performance problem, you can use these counters to gather more information or add them to the minimum list of counters to use when establishing a baseline.

Note Some of the counters do not have recommended values, as the values are be specific to your organization, or provide additional information only.

Database Counters

The following are Database (Exchange store) performance object counters. These counters are monitored using the Information Store instance.

Table 4 Database (Exchange store) counters

Counter

Description

Recommended Value

Database Cache Size

The Database Cache Size counter shows is the amount of system memory used by the database cache manager to hold commonly used information from the database file(s) in order to prevent file operations. If the database cache size seems to be too small for optimal performance and there is very little available memory on the system (see Memory/Available Bytes), adding more memory to the system may increase performance. If there is a lot of available memory on the system and the database cache size is not growing beyond a certain point, the database cache size may be capped at an artificially low limit. Increasing this limit may increase performance.

The Jet DBA will grow to 900 megabytes by default. Monitor this counter and make sure that the Jet DBA does not exceed 900 megabytes.

Log Record Stalls\sec

Log Record Stalls/sec is the number of log records that cannot be added to the log buffers per second because they are full. If this counter is non-zero most of the time, the log buffer size may be a bottleneck.

This counter should remain at zero.

Log Threads Waiting

Log Threads Waiting is the number of threads waiting for their data to be written to the log in order to complete an update of the database. If this number is too high, the log may be a bottleneck.

This counter should not be too high.

Log Writes/sec

Log Writes/sec is the number of times the log buffers are written to the log file(s) per second. If this number approaches the maximum write rate for the media holding the log file(s), the log may be a bottleneck.

This counter should remain below the write rate for the media holding the log file(s).

Table Opens/sec

Table Opens/sec is the number of database tables opened per second.

This is a gGood rate counter for how busy JET is.

Epoxy Counters

The following are Epoxy performance object counters.

Table 5 Database (Exchange store) counters

Counter

Description

Recommended Value

Client out Que Len

Client out Que Len indicates the number of requests waiting to be processed by the Exchange store.

This counter should be zero.

Store out Que Len

Store out Que Len indicates the number of requests waiting to be picked up by the IIS proto6ccol handlers.

This counter should be zero.

LogicalDisk Counters

The following are LogicalDisk performance object counters.

Table 6 Logical Disk Counters

Counter

Description

Recommended Value

% Disk Time

% Disk Time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. A sustained value above 90 percent indicates that the hard drive is a performance bottleneck.

This counter should remain below 90%.

% Free Space

% Free Space is the ratio of the free space available on the logical disk unit to the total usable space provided by the selected logical disk drive. A recommended threshold for % Free Space is 15%.

This counter should remain above 15%.

Avg. Disk Queue Length

Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.

This counter should remain below 2 in normal operating conditions.

Avg. Disk sec/Read

Avg. Disk sec/Read is the average time in seconds of a read of data from the disk.

This counter should remain below the read rate specified by the manufacturer for the disk.

Avg. Disk sec/Write

Avg. Disk sec/Write is the average time in seconds of a write of data to the disk.

This counter should remain below the write rate specified by the manufacturer for the disk.

Avg. Disk sec/
Transfer

Avg. Disk sec/Transfer is the time in seconds of the average disk transfer.

This counter should remain below the transfer rate for the disk.

Current Disk Queue Length

Interpreting this counter depends on the function of the logical disk being monitored. On most Exchange servers, there are two key logical disks, one for the databases and the other for the transaction logs. The Current Disk Queue Length must be interpreted differently for each.

The database volume can be subject to a burst of write operations every 30 seconds, with a maximum of 64 operations. Between two bursts, the only I/O activity is read operations. You will see peaks above the acceptable queue length, which is generally the number of spindles divided by 2, every thirty seconds. If you have a queue length larger than half of the spindles between the peaks, it indicates that you are short on read I/Os and should add more spindles. To shorten the duration of the peak queue length, you should use caching (write-back), increase the number of spindles, and possibly change from RAID5 to RAID0+1—if the RAID array controller is not very powerful.

The transaction log volume should never have a queue length above 1, because the I/Os are synchronous and single-threaded. Do not assume that you do not have a disk performance problem if the queue length is not above 1. The queue length will never be above 1 in normal operations (not including backup operations). If a performance problem is detected on the log volume, you should employ a write-back cache.

This counter should remain below the number of spindles divided by 2 for Exchange database volumes, and below 1 for transaction log volumes.

Free Megabytes

Free Megabytes displays the unallocated space on the disk drive in megabytes.

Alerts must be configured on disks that contain Exchange databases or log files that will notify you as soon as they approach capacity. Exchange will shuts down if its log files or databases have no more space to grow.

Memory Counters

The following are Memory performance object counters.

Table 7 Memory Counters

Counter

Description

Recommended Value

Available Bytes

Available Bytes shows the amount of physical memory, in bytes, available to processes running on the computer.

Microsoft recommends You should keeping this counter above 4000 KB (4MB).

Committed Bytes

Committed Bytes displays the size of virtual memory (in bytes) that has been cCommitted (as opposed to simply reserved). Committed memory must have backing (disk) storage available, or must be assured never to need disk storage (because main memory is large enough to hold it.) This is an instantaneous count, not an average over the time interval. Acceptable average range is less than the amount of physical RAM on the server. However, before making such an assumption, check Memory\Pages/sec and Memory\Page Faults/sec. If the Memory\Pages/sec is greater than 10 (10 is a reasonable guideline, but varies with disk hardware) and Memory\Page Faults/sec is greater than Memory\Cache Faults/sec then there is too much paging.

This counter should remain below the amount of physical RAM on the servercomputer.

Page faults/sec

Page Faults/sec is the overall rate at which faulted pages are handled by the processor. A page fault occurs when a process requires code or data that is not in its working set (its space in physical memory). This counter includes both hard faults (those that require disk access) and soft faults (in which where the faulted page is found elsewhere in physical memory). Most processors can handle large numbers of soft faults without consequence. However, hard faults can cause significant delays.

This counter should never show a consistently high single figure amount.

Pages/sec

Pages/sec is the number of pages read from or written to disk to resolve hard page faults. (Hard page faults occur when a process requires code or data that is not in its working set, or elsewhere in physical memory, and must be retrieved from disk). This counter was designed as a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory\Pages Input/sec and Memory\Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory\Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files.

Microsoft recommends keeping this counter below 20. 5 pages/sec is an ideal target. Once this counter starts to average consistently at 10 or above, performance is significantly degraded and disk thrashing is probably occurring.

Pool Nonpaged Bytes

Pool Nonpaged Bytes is the number of bytes in the nonpaged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated.

This counter should remain level. If this counter is steadily increasing, look for a memory leak.

Pool Paged Bytes

Pool Paged Bytes is the number of bytes in the paged pool, an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used.

This counter usually will stops increasing at 196MB on a server with more than 1024MB and has a /3GB switch (270MB without /3GB switch) set. When this counter stops increasing, the server can become unresponsive. If this value grows, itThis value growing can be indicative of handle leaks (check progress handles counters) or a growing SMTP queue.

MSExchangeIS Counters

The following are MSExchangeIS performance object counters.

Table 8 MSExchangeIS counters

Counter

Description

Recommended Value

Active Connection Count

Active Connection Count indicates the number of connections to the Exchange store that have shown activity in the last 10 minutes.

The value of this counter will be specific to your organization.

Active User Count

Active User Count indicates the number of user connections that have shown activity in the last 10 minutes.

The value of this counter will be specific to your organization.

Connection Count

Connection Count indicates the number of client processes connected to the Exchage store.

The value of this counter will be specific to your organization.

RPC Averaged Latency/sec

RPC Averaged Latency is the RPC latency in milliseconds averaged for the past 1024 packets.

The counter should remain between the 10 to 20 millisecond range in normal operations.

RPC Operations/sec

RPC Operations/sec indicates the rate that RPC operations occur. This counter tells you how many RPC requests are outstanding. If Outlook is prompting users it is likely that this counter will show significant spikes.

The value of this counter will be specific to your organization but generally this counter should remain at 0 on 4 processor machines in normal operations.

RPC Requests

RPC Requests indicates the number of client requests that are currently being processed by the Exchange store.

This counter should not exceed 100.

User Count

User Count is the actual count of users (not connections) that are currently using the Exchange store. Performance measurement must always be correlated with current user numbers when interpreting this counter.

The value of this counter will be specific to your organization.

Virus Scan Queue Length

Current number of outstanding requests that are queued for virus scanning.

The value of this counter will be specific to your organization.

VM Largest Block Size

Displays the size in bytes of the largest free block of virtual memory. This counter is a line that slopes down as virtual memory is consumed. When this counter drops below 32 MB, Exchange 2000 logs a warning in the event log (Event ID=9582) and logs an error if this drops below 16 MB.

This counter should remain above 32 MB.

VM Total 16MB Free Blocks

Displays the total number of free virtual memory blocks that are greater than or equal to 16 MB. This line forms a pyramid as you monitor it. It starts with one block of virtual memory greater than 16 MB and progresses to smaller blocks greater than 16 MB. Monitoring the trend on this counter should allow a system administrator to predict when the number of 16 MB blocks is likely to drop below 3, at which point restarting all the services on the node is recommended.

This counter should remain above 3 16 MB blocks.

VM Total Free Blocks

Displays the total number of free virtual memory blocks regardless of size. This line forms a pyramid as you monitor it. This counter can be used to measure the degree to which available virtual memory is being fragmented. The average block size is the Process\Virtual Bytes\STORE instance divided by MSExchangeIS\VM Total Free Blocks.

 

VM Total Large Free Block Bytes

Displays the sum in bytes of all the free virtual memory blocks that are greater than or equal to 16 MB. This line slopes down as memory is consumed. Should stay above 50mb on a healthy server. This counter monitors store memory fragmentation.

This counter should stay above 50 MB.

MSExchangeIS Mailbox Counters

The following are MSExchangeIS Mailbox performance object counters.

Table 9 MSExchangeIS Mailbox Counters

Counter

Description

Recommended Value

Active Client Logons

Active Client Logons indicates the number of clients that performed any action within the last 10 minute time interval.

The value of this counter will be specific to your organization.

Average Delivery Time

Average Delivery Time is the average time between the submission of a message to the mailbox store and submission to other storage providers for the last 10 messages.

Use this counter to record the delay time when load on the server is low. A high value could indicate a performance problem with the MTA.

Average Local Delivery Time

Average Local Delivery Time is the average time between the submission of a message to the mailbox store and the delivery to all local recipients (recipients on the same server) for the last 10 messages.

Use this counter to record delay time when server load is low. A high value could indicate a performance problem with the mailbox store. This counter should never remain at a non-zero value for longer than a few seconds.

Message Opens/sec

Message Opens/sec indicates the rate that requests to open messages are submitted to the Exchange store.

The value of this counter will be specific to your organization.

Receive Queue Size

Receive Queue Size is the number of messages in the mailbox store's receive queue.

This counter should remain at zero during normal operations.

Send Queue Size

Send Queue Size is the number of messages in the mailbox store's send queue.

This counter should remain at zero during normal operations.

Local Delivery Rate

Local Delivery Rate indicates the rate at which messages are being delivered locally.

The value of this counter will be specific to your organization.

MSExchangeIS Public Counters

The following are MSExchangeIS Public performance object counters.

Table 10 MSExchangeIS Public Counters

Counter

Description

Recommended Value

Average Delivery Time

Average Delivery Time is the average time between the submission of a message to the public store and submission to other storage providers for the last 10 messages.

Use this counter to record the delay time when load on the server is low. A high value could indicate a performance problem with the MTA.

Average Local Delivery Time

Average Local Delivery Time is the average time between the submission of a message to the public store and the delivery to all local recipients (recipients on the same server) for the last 10 messages.

Use this counter to record delay time when server load is low. A high value could indicate a performance problem with the public store. This counter should never remain at a non-zero value for longer than a few seconds.

Folders Open/sec

Folder Opens/sec indicates the rate that requests to open folders are submitted to the Exchange store.

The value of this counter will be specific to your organization.

Message Open/sec

Message Opens/sec is the rate that requests to open messages are submitted to the Exchange store.

The value of this counter will be specific to your organization.

Receive Queue Size

Receive Queue Size is the number of messages in the public store's receive queue.

This counter should remain at zero during normal operations.

Send Queue Size

Send Queue Size is the number of messages in the public store's send queue.

This counter should remain at zero during normal operations.

Network Interface Counters

The following are Network Interface performance object counters. These counters are monitored using all instances.

Table 11 Network Interface Counters

Counter

Description

Recommended Value

Bytes Received/sec

Bytes Received/sec is the rate at which bytes are received on the interface, including framing characters.

The value of this counter will be specific to your organization.

Bytes Sent/sec

Bytes Sent/sec is the rate at which bytes are sent on the interface, including framing characters.

The value of this counter will be specific to your organization.

Bytes Total/sec

Bytes Total/sec is the rate at which bytes are sent and received on the interface, including framing characters.

The value of this counter will be specific to your organization.

Output Queue Length

Output Queue Length indicates the length of the output packet queue. A queue length of 1 or 2 is often satisfactory. Longer queues indicate that the adapter is waiting for the network and thus cannot keep pace with the server.

This counter should remain below 1 or 2.

Paging File Counters

The following are Paging File performance object counters.

Table 12 Paging File Counters

Counter

Description

Recommended Value

% Usage

% Usage indicates the amount of the paging file that is in use during the sample interval, as a percentage. A high value indicates that you may need to increase the size of your Pagefile.sys file or add more RAM.

Microsoft recommends keeping this value below 75 percent.

PhysicalDisk Counters

The following are PhysicalDisk performance object counters.

Table 13 Physical Disk Counters

Counter

Description

Recommended Value

% Disk Time

% Disk Time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests.

This counter should remain below 50%.

Avg. Disk sec/Read

Avg. Disk sec/Read is the average time in seconds of a read of data from the disk. Check the specified transfer rate for your hard disks to verify that this rate does not exceed the specifications. Some SCSI disks can handle 50 to 70 I/O operations per second.

This counter should remain below the manufacturers specifications. A general threshold is below 20 milliseconds.

Avg. Disk sec/
Transfer

Avg. Disk Sec/Transfer indicates how fast data is being moved, in seconds. A high value might indicate that the system is retrying requests due to lengthy queuing or, less commonly, a disk failure.

Watch this counter for significant variances from baseline data.

Avg. Disk sec/Write

Avg. Disk sec/Write is the average time in seconds of a write of data to the disk. Check the specified write rate for your hard disks to verify that this rate does not exceed specifications. Some SCSI disks can handle 50 to 70 I/O operations per second.

This counter should remain below the manufacturers specifications. A general threshold is below 20 milliseconds.

Current Disk Queue Length

For more information about this counter, see the description for LogicalDisk\Current Disk Queue Length.

 

Disk Transfers/
sec

Disk Transfers/sec indicates the number of completed read and write operations per second. This counter measures disk utilization and is expressed as a percentage. Values over 50 percent might indicate that the disk is becoming a bottleneck.

This counter should remain below 50%.

Process Counters

The following are Process performance object counters. Select the different Exchange processes that you which to monitor as the instance of these counters.

Table 14 Process Disk Counters

Counter

Description

Recommended Value

% Processor Time

% Processor Time indicates the percentage of time the processor is running non-idle threads. You can use this counter to monitor the percent each Exchange service is using the processor.

An average value that is below 20 percent indicates the server is unused or services are down. An average value that is consistently above 75-80 percent indicates that the server is overburdened.

% User Time

% User Time is the percentage of elapsed time that this process' threads have spent executing code in user mode. Code executing in user mode cannot damage the integrity of the Windows NT Executive, Kernel, and device drivers.

 

Elapsed Time

Elapsed Time indicates the number of seconds a process has been running. It gives you a quick way to see whether a server or service has recently been restarted without looking through the event log. A zero value indicates a non-active process.

 

Handle Count

The total number of handles currently open by this process. This number is the sum of the handles currently open by each thread in this process.

The handles open by MAD, MTA, and Store should remain fairly constant. Inetinfo handles can grow radically during queue buildup.

Page faults/sec

Page Faults/sec is the rate Page Faults occur in the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory.

Use this counter to monitor for processes lacking virtual memory.

Page File Bytes

Page File Bytes is the current number of bytes this process has used in the paging file(s). Paging files are used to store pages of memory used by the process that are not contained in other files. Paging files are shared by all processes, and lack of space in paging files can prevent other processes from allocating memory.

 

Pool Nonpaged Bytes

Pool Nonpaged Bytes is the number of bytes in the nonpaged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk, but must remain in physical memory as long as they are allocated.

 

Private Bytes

Private Bytes is the current number of bytes this process has allocated that cannot be shared with other processes.

MAD, MTA and Store private bytes should remain fairly constant except when background tasks run. Inetinfo private bytes can grow radically during queue buildup.

Virtual Bytes

Virtual Bytes is the current size in bytes of the virtual address space the process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and by using too much, the process can limit its ability to load libraries.

Virtual bytes should remain fairly constant across processes. Virtual bytes is most important for the Exchange store process where it only has 2 GB or 3 GB of virtual address space to work with when running with /3GB switch or not.

On a large server with the /3GB switch, this counter should stay below 2.8gb.

Each process running on Windows 2000 Server has 2GB of virtual memory available. If the Exchange store's virtual memory is approaching the limit it may encounter an out of memory condition.

Working Set

Working Set is the current number of bytes in the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before they leave main memory.

MAD, MTA and Store working set should remain fairly constant except when background tasks run. Inetinfo working set can grow radically during queue buildup.

Processor Counters

The following are Processor performance object counters.

Table 15 Processor Disk Counters

Counter

Description

Recommended Value

% Privileged Time

% Privileged Time is the percentage of non-idle processor time spent in privileged mode. % Privileged Time includes time servicing interrupts and DPCs. A high rate of privileged time might be attributable to a large number of interrupts generated by a failing device. This counter displays the average busy time as a percentage of the sample time.

This counter should remain below 75%.

% Processor Time

% Processor Time indicates the percentage of time the processor is running non-idle threads. You can use this counter to monitor the percent each Exchange service is using the processor.

An average value that is below 20 percent indicates the server is underused or services are down. An average value that is consistently above 75-80 percent indicates that the server is overburdened and you should consider moving users to another server.

% User time

% User Time is the percentage of non-idle processor time spent in user mode.

This counter should remain below 75%.

Redirector Counters

The following are Redirector performance object counters.

Table 16 Redirector Disk Counters

Counter

Description

Recommended Value

Bytes Total/sec

Bytes Total/sec is the rate the Redirector is processing data bytes. This includes all application and file data in addition to protocol information such as packet headers.

Compare the maximum throughput of your network card with the maximum value of this counter to see if network traffic is a bottleneck in your system.

Network Errors/sec

Network Errors/sec measures the number of unexpected errors the redirector receives. If you suspect network problems, check to see whether this counter is above zero. If it is above zero, check the system event log for details on the network error.

This counter should remain zero.

Server Counters

The following are Server performance object counters.

Table 17 Server Disk Counters

Counter

Description

Recommended Value

Bytes Total/sec

The number of bytes the server has sent to and received from the network. This value provides an overall indication of how busy the server is.

If Bytes Total/sec is roughly equal to the maximum transfer rates of your network, you might need to segment the network.

Pool Nonpaged Bytes

Pool Nonpaged Bytes indicates the number of bytes of non-pageable computer memory the server is using.

 

Pool Nonpaged Failures

Pool Nonpaged Failures indicates the number of times allocations from nonpaged pool have failed. If this number is high, either the amount of RAM is too little or the pagefile is too small or both. If this number is consistently increasing, increase the physical RAM and the size of the pagefile.

 

Work Item Shortages

Work Item Shortages indicates the number of times STATUS_DATA_NOT_ACCEPTED was returned at receive indication time. This occurs when no work item is available or can be allocated to service the incoming request. Indicates whether the InitWorkItems or MaxWorkItems parameters might need to be adjusted.

If the value reaches the recommended threshold of 3, consider tuning the InitWorkItems or MaxWorkItems entries in the registry (in HKEY_LOCAL_MACHINE\SYSTEM\ CurrentControlSet\Services\lanmanserver\ Parameters).

Server Work Queues Counters

The following are Server Work Queues performance object counters.

Table 18 Server Work Queues Counters

Counter

Description

Recommended Value

Active Threads

Active Threads is the number of threads currently working on a request from the server client for this CPU. The system keeps this number as low as possible to minimize unnecessary context switching. This is an instantaneous count for the CPU, not an average over time.

The value of this counter will be specific to your organization.

Bytes Sent/sec

The rate at which the Server is sending bytes to the network clients on this CPU. This value is a measure of how busy the Server is.

The value of this counter will be specific to your organization.

Queue Length

Queue Length is the current length of the server work queue for this CPU. A sustained queue length greater than four might indicate processor congestion. This is an instantaneous counter; observe its value over server intervals.

This counter should remain below 4.

Read Bytes/sec

Read Bytes/sec is the rate the server is reading data from files for the clients on this CPU. This value is a measure of how busy the Server is.

The value of this counter will be specific to your organization.

Write Bytes/sec

Write Bytes/sec is the rate the server is writing data to files for the clients on this CPU. This value is a measure of how busy the Server is.

The value of this counter will be specific to your organization.

Write Operations/sec

Write Operations/sec is the rate the server is performing file write operations for the clients on this CPU. This value is a measure of how busy the Server is. This value will always be 0 in the Blocking Queue instance.

The value of this counter will be specific to your organization.

SMTP Server Counters

The following are SMTP Server performance object counters.

Table 19 SMTP Server Counters

Counter

Recommended Value

Categorizer Queue Length

Categorizer Queue Length indicates how well SMTP is processing LDAP lookups against global catalog servers. This should be at or around zero unless expanding DLs where it can go up higher occasionally. This is an excellent counter to tell you how healthy your GCs are. If you have slow GCs you will see this counter go up.

This counter should remain at or around zero.

Local Queue Length

Local Queue Length indicates the number of messages in the local SMTP queue.

The value of this counter will be specific to your organization.

Messages Delivered/
sec

Messages Delivered/sec indicates the rate that messages are being delivered to local mailboxes.

The value of this counter will be specific to your organization.

Messages Received/
sec

Messages Received/sec indicates the rate that messages are being received.

The value of this counter will be specific to your organization.

Messages Sent/sec

Messages Sent/sec indicates the rate that messages are being sent.

The value of this counter will be specific to your organization.

System Counters

The following are System performance object counters.

Table 20 System Counters

Counter

Description

Recommended Value

Processor Queue Length

Processor Queue Length indicates the number of threads in the processor queue. There is a single queue for processor time, even on computers with multiple processors. This counter shows ready threads only, not threads that are currently running. Microsoft recommends keeping this value to 2 or less.

This counter should remain at or below 2.

System Up Time

System Up Time is the elapsed time (in seconds) that the computer has been running since it was last started.

 

TCP Counters

The following are TCP performance object counters.

Table 21 TCP Counters

Counter

Description

Recommended Value

Segments Received/
Sec

Segments Received/Sec indicates the rate at which segments are received, including those received in error. This count includes segments received on currently established connections. A low value means that you have too much broadcast traffic.

A low value means that you have too much broadcast traffic.

Segments Retransmitted/Sec

Segments Retransmitted/Sec indicates the rate at which segments containing one or more previously transmitted bytes are retransmitted. A high value might indicate either a saturated network or a hardware problem.

A high value might indicate either a saturated network or a hardware problem.

Thread Counters

The following are Thread performance object counters.

Table 22 Thread Counters

Counter

Description

Recommended Value

% Processor Time

% Processor Time is the percentage of elapsed time that this thread used the processor to execute instructions.

Watch for threads that consume a high amount of processor time.

ID Thread

ID Thread is the unique identifier of this thread. ID Thread numbers are reused, so they only identify a thread for the lifetime of that thread.

 

Thread State

Thread State is the current state of the thread. It is 0 for Initialized, 1 for Ready, 2 for Running, 3 for Standby, 4 for Terminated, 5 for Wait, 6 for Transition, 7 for Unknown. A Running thread is using a processor; a Standby thread is about to use one. A Ready thread wants to use a processor, but is waiting for a processor because none are free. A thread in Transition is waiting for a resource in order to execute, such as waiting for its execution stack to be paged in from disk. A Waiting thread has no use for the processor because it is waiting for a peripheral operation to complete or a resource to become free.

 

Thread Wait Reason

Thread Wait Reason is only applicable when the thread is in the Wait state (see Thread State). It is 0 or 7 when the thread is waiting for the Executive, 1 or 8 for a Free Page, 2 or 9 for a Page In, 3 or 10 for a Pool Allocation, 4 or 11 for an Execution Delay, 5 or 12 for a Suspended condition, 6 or 13 for a User Request, 14 for an Event Pair High, 15 for an Event Pair Low, 16 for an LPC Receive, 17 for an LPC Reply, 18 for Virtual Memory, 19 for a Page Out; 20 and higher are not assigned at the time of this writing. Event Pairs are used to communicate with protected subsystems (see Context Switches).

 

Additional Resources

The following technical papers and Microsoft Knowledge Base articles provide valuable information about troubleshooting Exchange 2000 performance.

Technical Papers

The following technical papers are available on the Web at the Exchange 2000 section: on TechNet

Microsoft Knowledge Base Articles

The following Microsoft Knowledge Base articles are available on the Web at https://support.microsoft.com/:

For more information: https://www.microsoft.com/exchange/

Does this paper help you? Give us your feedback. On a scale of 1 (poor) to 5 (excellent), how do you rate this paper?

mailto:exchdocs@microsoft.com?subject=Troubleshooting Microsoft Exchange 2000 Server Performance Problems