Ruling Out Memory-Bound Problems

Article
07/25/2014

By design, Exchange Server is an aggressive memory user, being able to use up to 3 GB of physical memory. On a production server, it is common to see the Store.exe process taking 1.5 GB of virtual memory, because this process maintains large memory caches.

In addition to the memory utilization by various processes within Exchange, Exchange's ExIFS kernel driver also uses kernel memory. Although less visible, high utilization of kernel memory causes severe performance degradation and instability.

Looking at User Space Memory

As the server uses memory and free memory becomes scarce, the operating system starts trimming the working set of the process and using the page file more aggressively. Using the page file affects overall performance because disk operations take longer than memory operations.

Additionally, when the paging to and from disk gets high enough, eventually a disk bottleneck occurs and performance suffers. In this case, the real problem is memory, and the disk bottleneck is only a symptom.

Use the counters listed in the following table to determine the current state of the user space memory.

Performance Counters for User Space Memory

Counter	Expected values
Memory\Available Mbytes (MB) Indicates the amount of physical memory (in MB) immediately available for allocation to a process or for system use. The amount of memory available is equal to the sum of memory assigned to the standby (cached), free, and zero page lists.	During the test, there must be 50 MB of available memory at all times.
Memory\Pages/sec Indicates the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the types of faults that cause system-wide delays. It includes pages retrieved to satisfy page faults in the file system cache. These pages are usually requested by applications.	This counter should be below 1,000 at all times.

Memory\Available Mbytes (MB)

Indicates the amount of physical memory (in MB) immediately available for allocation to a process or for system use.

The amount of memory available is equal to the sum of memory assigned to the standby (cached), free, and zero page lists.

During the test, there must be 50 MB of available memory at all times.

Memory\Pages/sec

Indicates the rate at which pages are read from or written to disk to resolve hard page faults.

This counter is a primary indicator of the types of faults that cause system-wide delays. It includes pages retrieved to satisfy page faults in the file system cache. These pages are usually requested by applications.

This counter should be below 1,000 at all times.

Improving User Space Memory

The following list describes how you can improve the performance of user space memory:

Remove superfluous software

To free up resources for Exchange, remove from the server any third-party software tools that do remote monitoring or any type of non-essential service. Use the Performance snap-in to understand how much memory each application consumes.
Run maintenance tasks off peak times

Running maintenance tools (such as eseutil) or tasks (such as mailbox management) during peak times can consume memory that would otherwise be needed for Exchange. It is good practice to run these tools and tasks at off peak times or during low use periods (such as weekends).

Looking at Kernel Memory Usage

Windows kernel memory, which consists of several memory structures that are used by the core operating system, or kernel, is another area that must be monitored to ensure a healthy Exchange Server deployment. This section describes how to monitor and troubleshoot the kernel memory structures that affect Exchange server performance and reliability.

Three key kernel memory structures should be monitored on servers that are running Exchange:

Paged pool Paged pool is the portion of shared system memory that can be paged to the disk paging file. Paged pool is created during system initialization and is used by kernel-mode components to allocate system memory.
Nonpaged pool Nonpaged pool consists of system virtual addresses that are guaranteed to be resident in physical memory at all times and can thus be accessed from any address space without incurring paging input/output (I/O). Like paged pool, nonpaged pool is created during system initialization and is used by kernel-mode components to allocate system memory.
System PTEs Paging file virtual memory addresses are mapped to physical memory addresses by means of a page table. Microsoft Exchange Server 2003 uses a pool of system Page Table Entries (PTEs) to map system pages such as I/O space, kernel stacks, and memory descriptor lists.

Boot.ini file Settings Affect the Size of Kernel Memory Spaces

The maximum size of the kernel memory spaces on an Exchange server can be affected by settings in the Microsoft Windows Server™ 2003 boot.ini file. For example, if you use the /3GB switch in the boot.ini file, 3 GB of virtual address space is allotted for the user-mode process, and only 1 GB of virtual address space is allotted to the operating system.

For more information about using the /3GB switch, see the following Microsoft Knowledge Base articles:

823440, "Use of the /3GB switch in Exchange Server 2003 on a Windows Server 2003-based system" (https://go.microsoft.com/fwlink/?linkid=3052&kbid=823440)
316739, "How to use the /userva switch with the /3GB switch to tune the User-mode space to a value between 2 GB and 3 GB" (https://go.microsoft.com/fwlink/?linkid=3052&kbid=316739)

The following table shows how the approximate maximum size of each kernel memory space on a server running Exchange varies with boot.ini settings. In this example, Exchange Server 2003 is running on a multiprocessor server running Windows Server 2003 with four gigabytes of RAM.

Boot.ini Settings and Maximum Kernel Memory Space Sizes

Kernel memory space	Maximum size with default boot.ini options	Maximum size with boot.ini options /3GB and /USERVA = 3030
Paged Pool	356 MB	245 MB
Nonpaged Pool	256 MB	128 MB
System PTEs	300,000 page table entries available	24,000 page table entries available

The following table shows Performance Monitor alert settings for Exchange Server running on a multiprocessor server with four gigabytes of RAM and running Windows Server 2003. At the “warning” level, the server is stable, but memory allocations should be investigated for potential leaks. At the “critical” level, the server is in danger of becoming unstable, especially on spikes in the load.

Performance Monitor Alert Settings with Different Boot.ini File Settings

Kernel memory space	Performance Monitor counter	Performance Monitor triggers with default boot.ini	Performance Monitor triggers with boot.ini options /3GB and /USERVA = 3030
Paged Pool	Memory\Pool Paged Bytes	“Warning” when the Pool Paged Bytes counter exceeds 300 MB “Critical” when the Pool Paged Bytes counter exceeds 320MB	“Warning” when the Pool Paged Bytes counter exceeds 200 MB “Critical” when the Pool Paged Bytes counter exceeds 220 MB
Nonpaged Pool	Memory\Pool Nonpaged Bytes	“Warning” when the Pool Nonpaged Bytes counter exceeds 200 MB “Critical” when the Pool Nonpaged Bytes counter exceeds 220 MB	“Warning” when the Pool Nonpaged Bytes counter exceeds 100 MB “Critical” when the Pool Nonpaged Bytes counter exceeds 110 MB
System PTEs	Memory\Free System Page Table Entries *	“Warning” when Free System Page Table Entries is less than 8,000 “Critical” when Free System Page Table Entries is less than 5,000	“Warning” when the Free System Page Table Entries is less than 8000 “Critical” when the Free System Page Table Entries is less than 5,000

* The Performance Monitor “Memory\Free System Page Table Entries” counter is inaccurate on installations of Windows Server 2003 without Service Pack 1. For more information about this counter, see Microsoft Knowledge Base article 894067 “The Performance tool does not accurately show the available Free System Page Table entries in Windows Server 2003” (https://go.microsoft.com/fwlink/?linkid=3052&kbid=894067).

Symptoms of Kernel Memory Exhaustion on Servers Running Exchange

Symptoms of kernel memory exhaustion on servers running Exchange range from sluggish response to outright failures.

Symptoms of Kernel Memory Exhaustion on Servers Running Exchange

Kernel Memory Space	Exhaustion Symptoms
Paged Pool	Sluggish or unresponsive user interface Server has message or client processing failures Paged pool allocation failures (Event ID 2020: “The server was unable to allocate from the system paged pool because the pool was empty.” For more information, see Microsoft Knowledge Base article 312362 “Server is unable to allocate memory from the system paged pool.” (https://go.microsoft.com/fwlink/?linkid=3052&kbid=312362).
Nonpaged Pool	Sluggish or unresponsive user interface Server has message or client processing failures Server fails to respond to network requests Nonpaged pool allocation failures (Event ID 2019: “The server was unable to allocate from the system nonpaged pool because the pool was empty.”)
System PTEs	Server fails to respond to I/O requests Server fails to respond to network requests Server has message or client processing failures

Troubleshooting Kernel Memory on Servers Running Exchange

If the Performance Monitor kernel memory counters shown in the previous table, "Performance Monitor Alert Settings with Different Boot.ini File Settings" are outside the recommended values and/or if you see the symptoms described in the table "Symptoms of Kernel Memory Exhaustion on Servers Running Exchange," use the following troubleshooting approach to determine the cause of the kernel memory exhaustion.

Run the Microsoft Exchange Server Best Practices Analyzer Tool to determine whether the Exchange server is correctly configured.

There are a variety of configuration settings that affect Exchange Server kernel memory spaces (for example, the /3GB and /Userva boot.ini settings discussed earlier, the SystemPages registry key, and others). Before investigating kernel memory exhaustion symptoms, it is critically important to ensure that the server’s kernel memory is correctly configured. Run the Microsoft Exchange Best Practices Analyzer Tool and carefully review its output to insure that the server configuration is correct. For more information about the tool, see "Exchange Server Best Practices Analyzer Tool" in the Microsoft Download Center.
Determine which kernel memory space is being exhausted.
- Analyze Events Analyze the Event Viewer logs for evidence of pool and nonpaged pool allocation failure events (Event IDs 2019 and 2020).
- Analyze Performance Use Performance Monitor to create a log of paged pool, nonpaged pool, and free system PTEs, with samples taken every 60 seconds for a period of 24 hours. Compare the Performance Monitor log results with the warning and critical triggers shown in the table shown previously, "Performance Monitor Alert Settings with Different Boot.ini File Settings."
  
  When you analyze the Event Viewer and Performance Monitor logs using the three tables shown previously, the kernel memory space that is being exhausted should be readily apparent.

Determine what is causing the kernel memory space exhaustion.

Determine which tag or tags in pool and/or nonpaged pool memory are causing the memory exhaustion condition.

The operating system uses tags to keep track of pool and nonpaged pool kernel memory allocations. Windows Server 2003 does this by default; Microsoft Windows 2000 Server uses the Global Flags Editor utility (gflags.exe) to enable pool tagging. For detailed information about flags with regard to memory troubleshooting, see Microsoft Knowledge Base article 177415, “How to Use Memory Pool Monitor (Poolmon.exe) to Troubleshoot Kernel Mode Memory Leaks” (https://go.microsoft.com/fwlink/?linkid=3052&kbid=177415) and Knowledge Base article 298102, “How to Find Pool Tags That Are Used By Third-Party Drivers” (https://go.microsoft.com/fwlink/?linkid=3052&kbid=298102).

Run Memory Pool Monitor (poolmon.exe) or the MemSnap memory profiling tool (memsnap.exe, with the /p switch) to dump the tags to a file, and then determine which tag is consuming the most pool or nonpaged pool memory. Running the MemSnap memory profiling tool at scheduled intervals can help to isolate kernel memory leaks and provide an indication of how allocations are changing over time. For more information about the MemSnap memory profiling tool, see "Memsnap Overview" (https://go.microsoft.com/fwlink/?LinkId=50167).

Example MemSnap Memory Snapshot

Tag Type	Allocs	Frees	Diff	Bytes	Per Alloc
AGP Nonp	1	0	1	344	344
AGP Paged	7	5	2	384	192
AcdN Nonp	2	0	2	1,072	536
AcpA Nonp	39	36	3	192	64
AcpA Paged	1	0	1	504	504
AcpB Paged	42	38	4	576	144
AcpD Nonp	315	170	45	15,080	335
AcpF Nonp	493	485	8	320	40

In the example shown in the table above, use the “Bytes” column to isolate which tag is consuming the most memory. In this example, it is AcpD Nonp. Use the “Allocs” and “Frees” columns to track potential memory leaks. Allocs and Frees values that differ by a large amount can indicate a potential memory leak.

Match the pool or nonpaged pool tag to the application or driver. Use the following resources to match the tag in question to the calling application or driver:

Download and install the Microsoft debugging tools for Windows (https://go.microsoft.com/fwlink/?LinkId=50168). The installation will place “pooltag.txt” in the %program files%\Debugging Tools for Windows\triage directory, which provides a mapping between Microsoft applications or drivers and their associated tags. For example:

Ntf? - ntfs.sys - NTFS specific allocation tags

Ntf0 - ntfs.sys - General pool allocation

Ntf9 - ntfs.sys - Large temporary buffer

For more information about tags, see Microsoft Knowledge Base article 298102 "How to Find Pool tags That Are Used By Third-Party Drivers" (https://go.microsoft.com/fwlink/?linkid=3052&kbid=298102).

Take corrective action for tag allocation leaks:

Contact Microsoft Product Support Services if the leaking tag is an application or driver developed by Microsoft (as indicated in the pooltag.txt file mentioned previously).

Contact the third-party manufacturer’s support services if the leaking tag is an application or driver developed by a third party.

When appropriate, disable or uninstall the offending applications and/or drivers to maintain system stability until the leak can be fixed.
Take corrective action for tags with high memory allocations but no evidence of leaking.

There are cases in which pool or nonpaged pool tags consume considerable amounts of kernel memory (approximately 10MB/tag for nonpaged pool and 40MB/tag for paged pool), but the tags are not leaking. Generally, these cases occur when scaling up an Exchange mailbox server to 4,000 mailboxes or more, or when dealing with message delivery queues exceeding 10,000 messages.

The following are specific cases of high memory usage based on tag allocation:

TOKE paged pool tag This tag is used by Windows to cache security information for every user session opened against the server. For example, a token (with memory allocated using the TOKE paged pool tag) is created for every user session generated by an e-mail client such as Microsoft Office Outlook. Depending on the client, multiple sessions may be opened for each e-mail client, causing multiple tokens to be cached on the server for each client. The paged pool memory footprint of each token is generally based on the number of security groups to which the user belongs. The more security groups that a user is a part of, the more paged pool memory will be consumed by the tokens associated with that user’s sessions.

In the MemSnap output in the following table, the average token size is approximately three KB.

Example MemSnap Memory Snapshot

Tag Type Allocs Frees Diff Bytes Per Alloc

Toke Paged

4,856,027

4,855,591

436

12,093,848

2,967

Take the following steps to correct paged pool memory exhaustion in which the TOKE tag is the primary consumer of paged pool memory:

If the paged pool memory footprint per token (TOKE per Alloc) is greater than 8 KB:

•   Reduce the number of security groups in the organization.

•   Consolidate security groups and eliminate deep nesting.

If the paged pool memory footprint per token (TOKE Per Alloc) is less than 8 KB:

•   If paged pool exhaustion occurs on a mailbox server, reduce the number of mailboxes on the server and/or remove the public folder role from the mailbox server.

•   If paged pool exhaustion occurs on a public folder server, reduce the number of mailbox stores that use the public folder server as their default public store. This will reduce the number of clients (and, as a result, the number of user sessions) on the public folder server.

•   If calendar sharing for Outlook is used extensively, insure that all clients are running Outlook 2003 or later. Calendar sharing creates an additional token load on the server by creating additional user sessions. Outlook 2003 performs this task more efficiently (with fewer sessions) than earlier versions.

MMST paged pool tag Windows Cache Manager uses this tag for file caching. Windows Cache Manager automatically reduces file caching to free paged pool memory if the pool becomes depleted. No corrective action is required if the Exchange Best Practices Analyzer Tool does not report any warnings or errors regarding SMTP configuration. Non-default or incorrect SMTP settings can place an additional load on Windows Cache Manager, which may cause more paged pool memory to be consumed.

Contact Microsoft Product Support Services if you encounter paged pool exhaustion symptoms and/or events related to the MMST paged pool tag.

AUXL and FLST nonpaged pool tags These tags are used by exifs.sys, which is the kernel-mode driver that the Exchange store driver uses to read and write items to and from the messaging databases.

If either of these tags is the primary consumer of nonpaged pool memory, contact Microsoft Product Support Services.

IpSA nonpaged pool tag IPSec.sys, the main IPSec (IP Security) device driver, uses this Windows tag to access security associations that are stored in paged pool memory. Using IPSec.sys on Exchange servers adds nonpaged pool overhead. If this tag is the primary consumer of nonpaged pool memory in a nonpaged pool exhaustion case, consider the following corrective actions:

If nonpaged pool exhaustion occurs on a mailbox server, reduce the number of mailboxes on the server and/or remove the public folder role from the mailbox server.

If nonpaged pool exhaustion occurs on a public folder server, reduce the number of mailbox stores that have the public folder server set as their default public store. This will reduce the number of clients (and, as a result, the number of user sessions) on the public folder server.
Isolate the causes of PTE exhaustion, which is not as straightforward as isolating the cause of paged or nonpaged pool exhaustion. There is no tool such as MemSnap to track PTE consumption. Only general corrective actions can be taken to resolve a PTE exhaustion scenario.

PTE leak cases PTE exhaustion due to a leak can be isolated using Performance Monitor (using the counters described above that are associated with free system PTEs). A PTE leak will show itself as a continuous depletion of Performance Monitor Memory\Free System PTEs over several days. Contact Microsoft Product Support Services for assistance with diagnosing PTE leak conditions.

PTE exhaustion cases with no evidence of leaking PTE exhaustion can occur when PTE leaking is not taking place. Generally, these cases involve scaling up an Exchange mailbox server to 4,000 mailboxes or more, or they may involve consumption of PTEs by third-party drivers. Take the following corrective actions to address PTE exhaustion:

•   Remove any unnecessary third-party drivers.

•   Use /BASEVIDEO or a generic video driver to free up system page table entries. Video boards use the system page table entries to map their buffers in kernel space. This usage competes with the need for system page table entries by Microsoft Exchange.

•   Reduce the /USERVA setting in boot.ini to add more available PTEs. To do this, follow the instructions that are in Microsoft Knowledge Base article 823440 "Use of the /3GB switch in Exchange Server 2003 on a Windows Server 2003-based system" (https://go.microsoft.com/fwlink/?linkid=3052&kbid=823440).

•   If PTE exhaustion occurs on a mailbox server, reduce the number of mailboxes on the server and/or remove the public folder role from the mailbox server.

•   If PTE exhaustion occurs on a public folder server, reduce the number of mailbox stores that have the public folder server set as their default public store. This action will reduce the number of clients (and, as a result, the number of user sessions) on the public folder server.

Tag Type	Allocs	Frees	Diff	Bytes	Per Alloc
Toke Paged	4,856,027	4,855,591	436	12,093,848	2,967

Looking at the Exchange Store Virtual Memory

Each Store.exe process of a server has a limited amount of memory—called the store virtual memory—that it can address. As you scale a server to accommodate more users and more usage, the server may run low on virtual memory. When a server already has 4 GB of RAM, you cannot expand the memory of the server any further. Adding more physical memory cannot solve errors that indicate that you are out of virtual memory.

When a server is low on virtual memory, the server's overall performance degrades as the low memory situation forces the Store.exe process to use the page file, and the Store.exe process starts paging rapidly.

You can use the performance counters listed in the following table to determine the current state of the store's virtual memory.

Performance Counters for Exchange Store Virtual Memory

Counter	Expected values
MSExchangeIS\VM Largest Block Size Displays the size (in bytes) of the largest free block of virtual memory. This counter is a line that slopes down while virtual memory is consumed. When this counter drops below 32 MB, Exchange 2003 logs a warning (Event ID=9582) in the event log. When this counter drops below 16 MB, Exchange logs an error.	At no point should this value go below 32 MB.
MSExchangeIS\VM Total 16 MB Free Blocks Displays the total number of free virtual memory blocks that are greater than or equal to 16 MB. This counter displays a line that may first rise, but then may eventually fall when free memory becomes more fragmented. It starts by displaying a few large blocks of virtual memory and may progress to displaying a greater number of separate, smaller blocks. When these blocks become smaller than 16 MB, the line begins to fall.	At no point should this value go below 1.
MSExchangeIS\VM Total Free Blocks Displays the total number of free virtual memory blocks regardless of size. This counter displays a line that may first rise, but then may eventually fall when free memory first becomes fragmented into smaller blocks, and then when these blocks are consumed. Use this counter to measure the degree to which available virtual memory is being fragmented. The average block size is the Process\Virtual Bytes\STORE instance divided by MSExchangeIS\VM Total Free Blocks.	At no point should this value go below 1.
MSExchangeIS\VM Total Large Free Block Bytes Displays the sum in bytes of all the free virtual memory blocks that are greater than or equal to 16 MB. This counter monitors store memory fragmentation and forms a line that slopes down when memory is consumed. On a healthy server, the line should stay above 50 MB.	At no point should this value go below 50 MB.

The Store.exe process also uses its own heap allocation mechanism and structures, which are called exchmem. The Store.exe process creates several exchmem heaps at startup, and does not increase the number of heaps unless the existing number is either fully utilized or is fragmented to a point where an allocation cannot find enough contiguous memory to succeed.

If there is a memory utilization problem or internal fragmentation (fragmentation inside the exchmem heaps, which themselves reside inside the store's virtual memory space), the Store.exe process creates new exchmem heaps.

Generally, if the Store.exe process must repeatedly create additional heaps, the overall store virtual memory becomes fragmented or depleted. By tracking the counters listed in the following table, it is possible to determine whether or not the exchmem heaps are a source of problems or performance degradation as the heaps become fragmented.

Performance Counters for exchmem Heaps

Counter	Expected values
MSExchangeIS\Exchmem: Number of heaps with memory errors Indicates the total number of exchmem heaps that failed allocations due to insufficient available memory.	This value should be 0 (zero) at all times.
MSExchangeIS\Exchmem: Number of memory errors Indicates the total number of exchmem allocations that could not be satisfied by available memory.	This value should be 0 (zero) at all times.
MSExchangeIS\Exchmem: Number of Additional Heaps Indicates the number of exchmem heaps created by store after startup.	This value should not exceed 3 at any time.

MSExchangeIS\Exchmem: Number of heaps with memory errors

Indicates the total number of exchmem heaps that failed allocations due to insufficient available memory.

This value should be 0 (zero) at all times.

MSExchangeIS\Exchmem: Number of memory errors

Indicates the total number of exchmem allocations that could not be satisfied by available memory.

This value should be 0 (zero) at all times.

MSExchangeIS\Exchmem: Number of Additional Heaps

Indicates the number of exchmem heaps created by store after startup.

This value should not exceed 3 at any time.

Improving Exchange Store Virtual Memory

The following list describes how you can improve the performance of Exchange store virtual memory:

Consolidate Storage Groups

For each storage group, the Store.exe process must allocate structures and consume memory. If possible, use the minimum number of storage groups that satisfy the SLA. This is much more important with Exchange 2000 than Exchange 2003. Significant changes were made to reduce the virtual memory footprint increase per additional storage group in Exchange 2003. As a result, it is highly unlikely that the storage group or database configuration is root cause for a virtual memory fragmenation scenario on Exchange Server 2003. For more information on configuring storage groups in Exchange 2003, see Microsoft Knowledge Base article 890699, "How to configure storage groups in Exchange Server 2003," (https://go.microsoft.com/fwlink/?linkid=3052&kbid=890699).
Offload server roles

If memory utilization increases because the server is performing multiple roles (such as being a public folder and a mailbox server), it is a good idea to offload roles to dedicated servers.
Read Microsoft Knowledge Base Article 815372

For more information about how to optimize virtual memory usage, see Microsoft Knowledge Base article 815372, "How to Optimize Memory Usage in Exchange Server 2003" (https://go.microsoft.com/fwlink/?linkid=3052&kbid=815372).

Ruling Out Memory-Bound Problems

Looking at User Space Memory

Performance Counters for User Space Memory

Improving User Space Memory

Looking at Kernel Memory Usage

Boot.ini file Settings Affect the Size of Kernel Memory Spaces

Boot.ini Settings and Maximum Kernel Memory Space Sizes

Performance Monitor Alert Settings with Different Boot.ini File Settings

Symptoms of Kernel Memory Exhaustion on Servers Running Exchange

Symptoms of Kernel Memory Exhaustion on Servers Running Exchange

Troubleshooting Kernel Memory on Servers Running Exchange

Example MemSnap Memory Snapshot

Example MemSnap Memory Snapshot

Looking at the Exchange Store Virtual Memory

Performance Counters for Exchange Store Virtual Memory

Performance Counters for exchmem Heaps

Improving Exchange Store Virtual Memory

Additional resources