Troubleshooting Version Store Issues
Topic Last Modified: 2012-01-18
This topic describes how to troubleshoot issues that may occur in the version store in Microsoft Exchange Server 2007.
You receive an ESE Event ID 623 that has the following description:The version store for instance 0 <GUID> has reached its maximum size of <Size>Mb. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back. Possible long-running transaction: [transaction].
You may also receive an MSExchangeIS Event ID 1022 that has the following Description:Logon Failure on database [Database name - Account] Error -1069.
What can you do about this issue? First, let’s take a closer look at what these events mean.
The ESE version store is where the Exchange Information Store service keeps records of transactions that are not yet finished. This gives ESE the ability to track and manage current transactions. The version store has a list of operations that are performed by active transactions. This is an in-memory list of modifications that are made to the database.
A transaction is a series of operations that are treated as atomic (indivisible). Either all the operations in a transaction are done and permanently saved, or none of the transactions are done. For example, consider the operations that are involved when we do something very simple such as moving a message from the Inbox folder to the Deleted Items folder. The version store gives ESE the ability to track and manage current transactions, thereby enabling ESE to implement isolated and consistent transactions.
A hung transaction causes the version store to get very large. If there are very long-running or hung transactions, the version store can grow quite quickly and, eventually, generate out-of-memory errors. A transaction that takes a very long time to run can cause the Exchange store to run out of spaces because the program cannot flush more recent transactions from the version store. When the version store is full, any updates to the database are rejected until the long-running transaction is completely committed or rolled back. This causes a service interruption for users. Additionally, an ESE Event ID 623 may be logged in the Application log, and an associated MSExchangeIS Event ID 1022 that includes Error -1069 or JET_errVersionStoreOutOfMemory may also occur.
Error -1069 indicates that the version store space had been consumed and has reached its defined size. No additional transactions can proceed until this condition is resolved. Because the version store is where transactions are held in memory until they can be written to disk, ESE will consume this cache if something is preventing ESE from completing the transaction or writing to disk. At this point, the store will stop responding to requests until there is room in the cache again.
|This error is not the result of the system running out of memory. If there is a failure to allocate more memory, and NT refuses to provide it, a failure that has a different error occurs. You cannot resolve this error by increasing the RAM in the server.|
Typically, when Event ID 623 is logged on the server that is running Exchange Server 2007, the following symptoms may occur:
Users cannot access their Inbox.
E-mail messages are backing up in the local delivery queue.
Email messages are awaiting directory lookup.
Client e-mail messages may be stuck in the Outbox.
You can work around this problem by restarting the server. But to resolve this problem, and to help prevent it from occurring again, you must troubleshoot the cause.
Event 623 and Event 1022 occur because version store entries are not being cleaned up at all or are not cleaned within the expected time. This can occur for either of the following reasons:
A long transaction is interfering with the cleanup process. To correctly reconcile write conflicts and support repeatable reads, a given entry in the version store cannot be cleaned until it is older than the oldest active transaction. This means that if the store opens a transaction, and keeps the transaction open indefinitely or otherwise orphans it, any version store cleanup would be precluded because of this active transaction. Therefore, the version store continues to grow until it reaches the maximum size.
Version store cleanup cannot keep pace with the load on the machine. The cleanup of Version Store entries is performed by an asynchronous background thread. As transactions commit or roll back, this asynchronous background thread cleans up the version store entries that are older than the oldest of the remaining active transactions. However, if there is so much write activity that it outpaces this clean-up thread, you will reach a state in which the version store continues to grow but the version clean-up thread can't keep pace with the cleaning of old entries. Eventually you reach the maximum version store size.
Of these two reasons, a long transaction is the more common.
The size of the version store is determined by the msExchESEParamMaxVerPages parameter on the "Storage Group" object in Active Directory. This setting is in units of 16K. If the attribute is not set, the default is 9280 in decimal. The attribute can be increased to 10280 and then to 11280 to help prevent Event 623. However, less memory will be available for other function in the store, and this may cause other performance issues.
In addition to the occurrence of Event 623, write operations to the database will start to fail and generate Error -1069 (JET_errVersionStoreOutOfMemory) because no more version store space is available in which to record the operation. Also, when the version store approaches its maximum size, Event ID 602 is logged. For more information about this event, see Event ID 602.
Event 602 indicates that any expensive clean-up operations will be skipped in an effort to get the asynchronous background Version Store clean-up thread to run as fast as possible. Event 602 is generated for every 100 operations that are skipped. If many 602 events occur immediately after a 623 event, it is likely that the 1069 (version-store-out-of-memory) conditions have caused the long-running transaction to roll back. Because this transaction no longer exists, version cleanup is not held back. Therefore, the process can try to clean all the entries in the very large version store. Because the version store is so large, all expensive clean-up operations are skipped until the store size is reduced. You may see many 602 events occur because many operations are skipped.
It is possible that the Event ID 602 messages, which indicate that on-line compaction is not running to completion, can cause indexes that are so inefficient that transactions take longer and longer to process. This could cause Event ID 623 to occur. To help prevent this from occurring, try the following methods, as recommended in Event ID 602:
Widen the on-line defragmentation window.
Use the Exchange Server Database Utilities (Eseutil.exe) defragmentation command to defragment and compact an Exchange database offline. For more information, see How to Run Eseutil /D (Defragmentation).
Determine whether the affected servers are heavily loaded in regards to their hardware when compared to servers that are unaffected.
Off-line defragmentation can dramatically improve the database efficiency so that Event 623 does not occur. But Event 623 could reoccur if the database becomes fragmented again because on-line defragmentation cannot be completed. Make sure that on-line defragmentation can be completed. For more information, see How to Monitor Online Defragmentation.
Increasing the Version Store size may not be the correct solution here. If Event ID 623 is caused by another problem, try to debug the problem to figure out why the version store got so big in the first place. The 623 event reports that the Jet session holds the oldest transaction (and the thread that it was on) at the time that the version-store-out-of-memory condition occurred. Ideally, you want to set a breakpoint at the location in Jet that returns a JET_errVersionStoreOutOfMemory message, and use the information in the 623 event to identify the thread that holds the long-running transaction. Then, you can debug what it is that the store is doing during that transaction, and why that process is taking so long to complete.
There are a lot of different issues that can cause Event ID 623. This includes database corruption, huge messages, and third-party software integration, among others. For the steps to troubleshoot Event ID 623, see Troubleshoot Event ID 623
To help determine the cause of Event ID 623, collect a user dump of the store process while the long running transaction is occurring. Unfortunately, this doesn't always catch the culprit. By the time the dump is grabbed, the session or thread that owns the transaction that caused the version store to build up may have stopped running because of an error. Typically, such errors can be difficult to catch manually.
Jet's Version Buckets Allocated counter is displayed when you enable the Show Advanced Counters value in the registry. Enable the Show Advanced Counters value for the ESE performance counters, and then monitor the Database > Version Buckets Allocated counter. The next time that Event ID 623 occurs, collect a dump file when this counter hits 70 percent.
For more information about how to enable the Show Advanced Counters value in the registry, see Enable "Show Advanced Counters".
The Show Advanced Counters value is set in the registry so that an alert for the Version Buckets Allocated counter can be configured to troubleshoot Event ID 623.
The text of Event ID 623 resembles the following:
Event Type: Error
Event Source: ESE
Event Category: Transaction Manager
Event ID: 623
Description: The version store for this instance (0) has reached its maximum size of y MB. It is likely that a long-running transaction is preventing cleanup of the version store and causing it to build up in size. Updates will be rejected until the long-running transaction has been completely committed or rolled back.
A version bucket size in the Perfmon counter is in units of 32k on 64-bit Exchange Server 2007. To calculate the maximum number of version buckets that are allocated, use the following equation:x/1024 *32 = y
In this equation, x is the number of version buckets that are allocated, and y is the total version store memory that is provided in the description of Event ID 623.
For example, assume that Y is 155MB. If we know that the maximum Version Store memory (y) is 155Mb, we can calculate the maximum number of version buckets that are allocated as follows:x= (155*1024)/32. Therefore, x = 4960
When you see the Version Buckets Allocated value reach 70 percent of this maximum amount, you are likely experiencing a long-running transaction. At this point, you can start to collect dumps accordingly. This process is calculated as follows:70% * 4960 = 3472 buckets
To capture performance data, obtain the Experfwiz.ps1 script from the following MSDN Archive Web site:ExPerfwiz
Follow the directions to run the Experfwiz.ps1 script to start capturing performance data. Then, obtain the Sysinternals Procdump utility from the following Microsoft Web site:ProcDump v4.01
Procdump has excellent features that let a dump be created based on a Performance counter threshold. Download and extract Procdump to the c:\procdump folder.
The Performance counter for version buckets usage in Exchange Server 2007 has been changed in Exchange Server 2007 to "\MSExchange Database (Information store)\Version buckets allocated.” Therefore, the Procdump syntax will be as follows:c:\procdump>procdump -mp store.exe -p "\MSExchange Database(Information store)\Version buckets allocated" 3472 -s 30 -n 3 -accepteula c:\procdump\store_623.dmp
The following list describes the switches that are used together with Procdump:
-p: Performance counter to monitor
-s: Consecutive seconds that the threshold must be reached before the dump is written
-n: Number of dumps to write before exiting
-mp: Write a dump file that has thread and handle information and all read/write process memory (To minimize dump size, memory areas that are larger than 512MB are searched for. If it is found, the largest area is excluded. A memory area is the collection of same-sized memory allocation areas. The removal of this (cache) memory reduces Exchange Server and SQL Server dumps by more than 90 percent.)
The arguments configure Procdump to do the following:
Generate Miniplus (-mp) dumps of the Store.exe process when the Version Buckets value exceeds a specific calculated value for 30 seconds (-s 30)
Generate up to 3 dumps (-n 3)
Save the dumps in the c:\procdump folder by using names that begin with “store_623”
Send the dump file, the application log, and the performance monitor log that were running when the dump was collected to Microsoft Customer Support Services for additional analysis.
If you are running Microsoft Windows Server 2008, you can also use an alternative method to gather data for version store issues. For more information, see Alternative Method for Gathering Data for Version Store Issues on Exchange Server 2007 running on Windows Server 2008.