Exchange Transactions and the Exchange Database Overview


Topic Last Modified: 2006-04-07

By Colby Holland

This article contains information on how Microsoft® Exchange Extensible Storage Engine (ESE) handles database transactions, including how data is stored in the database files. It also covers the Exchange database files.

There are three places that data can exist on an Exchange server: memory, log files, and database stores. But there is confusion about the underlying steps Exchange and ESE take in moving the data through these three places.

In part, the purpose of this article is to provide an easy-to-understand explanation of the process as well as a quick-reference guide for future disaster recoveries. However, it also explains the details of the .edb and .stm database files, and more specifically, how and why data moves between the two.

ESE is a DLL used by the Exchange store process to store records and create indexes within the Exchange database. It is the intermediary technology between the Microsoft Exchange Information Store service and the actual database.

ESE allows applications to store records and create indexes to access those records in different ways. Although ESE does not accept structured query language (SQL) requests directly, it takes transactions passed to it from the client, in this case Exchange, and is the component that is responsible for the actual manipulation of the data. Because the engine is multithreaded and based on JET technology, it is optimized for fast data storage and retrieval.

A transaction is a set of operations that all must be completed together. If one of these operations fails, the entire transaction is void. Similar to most database transactions, the last operation is followed up with a COMMIT statement indicating the transaction is complete. So if one or more of the operations within a transaction fails, the COMMIT statement will never process and the database will never receive the transaction.

An example of a transaction is moving a message from one folder to another. The process of doing so may seem effortless to the end user, but in actuality there is a series of transactions that accompanies that seemingly simple action from within Microsoft Outlook®. Whenever something as simple as this occurs within an Exchange environment, the ESE typically processes multiple transactions to fulfill the request.

Transactions originate through a client request. Client requests do not come from the end users, who are Outlook clients. They come directly from Exchange. So in this case, Exchange, or the Microsoft Exchange Information Store service specifically, is the client.

For example, when an action is performed in Outlook, a remote procedure call (RPC) is made to Exchange, which then builds the transaction and passes it on to ESE for processing. Then a series of subcomponents makes sure Exchange commits a transaction only when it can guarantee that the data is durable or persistent, and protected from failures. These same components ensure that the transactions are as follows:

  • Atomic   Either all the operations occur or none of them occur.

  • Consistent   The database is transformed from one correct state to another.

  • Isolated   Changes are not visible until they are committed.

  • Durable   Committed transactions are preserved in the database even if the system fails.

To understand exactly how ESE accomplishes these transactions, you need to explore what happens to the transactions after they are passed from the Microsoft Exchange Information Store service to ESE for processing.

The following five subcomponents of ESE work together to move the data into the database and to its static form. It is important to understand the way the data flows through ESE to properly troubleshoot events such as disaster recovery.

  • Log buffers   When ESE first receives a transaction, it stores it in log buffers. These log buffers are used to hold information in memory before it is written to the transaction logs. By default, each buffer unit is the size of a disk sector, which means that it's 512 bytes in size. JET does some sanitation to make sure that the number of buffers is a minimum of 128 sectors, a maximum of 10,240 sectors, and aligns to the largest 64 KB boundary. So, for Exchange 2000 Server (and all service packs) the default number of log buffers is 84, which JET sanitizes to 128 so the actual buffer area is 64 Kbytes. For Exchange Server 2003, the default number of log buffers is 500, which JET sanitizes to 384 so the actual buffer area is 192 KB.

    Microsoft recommends that you manually tweak the default for both Exchange 2000 Server and Exchange Server 2003 to 512 bytes, which requires no sanitation, and results in a 256 KB area. In situations where there are disks that are borderline slow, Microsoft recommends that the buffers be set up to 9,000 (that is, greater than 4 MB).
  • Log writer   As the buffers fill up, ESE moves the data from the buffers onto disk and into the log files. In this operation, the transactions are committed to disk to the logs in a synchronous fashion. This process is fast, because it is crucial to move the data from memory and into the transaction logs quickly in case of a system failure.

  • IS buffers   The IS or cache buffers are the first step toward turning a transaction into actual data. The IS buffers are a group of 4 kilobyte (KB)-pages allocated from memory by Exchange for the purpose of caching the database pages before they are written to disk. When first created, these pages are clean, because they have yet to have any transactions written to them. ESE then plays the transactions from the logs into these empty pages in memory, thereby changing their status to dirty. The default value for maximum size these buffers can reach is 900 MB in Exchange 2000 Server SP 3.

  • Version store   ESE writes multiple, different transactions to a single page in memory. The version store keeps track of and manages these transactions. It also structures the pages as the transactions occur.

  • Lazy writer   At this point, ESE must flush the dirty pages out of memory. The lazy writer is responsible for moving the pages from the cache buffers to disk. Because there are so many transactions coming in at once and so many pages getting dirtied, the job of the lazy writer is to prioritize them and subsequently handle the task of moving them there without overloading the disk I/O subsystem. This is the last phase and the point at which the transactions have officially become static data. It is also at this point that the dirty pages are cleaned and ready for use again.

Not much is different during an online backup from what is mentioned previously. As soon as the backup begins, the checkpoint stops increasing because the backup process needs to know to back up all the log files after the frozen checkpoint, although transactions still move through the five stages of ESE. Then, after the backup process finishes copying the database files and necessary log files to tape, the checkpoint file is allowed to catch up.

In some cases, the backup stops responding although transactions continue to be processed by ESE. Exchange 2000 Server SP 3 and later is hard coded to limit this checkpoint depth to just above 1,000. If this happens and enough transactions are processed to where just above 1,000 log files are created in the interim, Exchange will dismount the databases for that particular storage group. The error that will be logged is JET_errCheckpointDepthTooDeep.

The .edb file is the main repository for the mailbox data. The fundamental construct of the .edb file is the b-tree structure, which is only present in this file, and not in the .stm file. The b-tree is designed for quick access to many pages at once. The .edb file design permits a top level node and many child nodes.

In a b-tree, each child node can have only a single parent. Although the typical b-tree allows unlimited depth, Microsoft restricts the depth of the b-trees in most of its applications to facilitate quick access by whatever engine happens to be working with it. By allowing for such a high spread and low tree depth, Exchange and ESE can guarantee that users can access any page of data, called a leaf node, within four I/Os.

Tree depth has the greatest effect on performance. A uniform tree depth across the entire structure, where every leaf node or data page is equidistant from the root node, means database performance is consistent and predictable. In this way, the ESE 4 KB pages are arranged into tables that form a large database file containing Exchange data.

The database is actually made up of multiple b-trees. These other ancillary trees hold indexing and views that work with the main tree.

The .edb file is accessed by ESE directly.

The .stm or streaming media file is used in conjunction with the .edb file to comprise the Exchange database. Both files together make up the database, and as such, they should always be treated as a single entity. Typically, if you perform an action on the .edb file, such as with Exchange Server Database Utilities (Eseutil), the .stm file is automatically included.

The purpose of the .stm file is to store streamed native Internet content. To understand what that means, you should first understand the way in which legacy Exchange products handled data with only a single file.

In Exchange Server 5.5, for example, the Internet Mail Connector accepts inbound Multipurpose Internet Mail Extensions (MIME) messages and writes them to a disk queue where Exchange then converts them to the native MAPI content or MDBEF for use by the Information Store and MAPI clients. Then if an Internet API, such as Post Office Protocol version 3 (POP3) or IMAP4 requests the data, it is converted back again before being sent out. This back-and-forth conversion process can cause overhead and performance issues.

The streaming media file helps to alleviate some of this conversion.

Unlike the .edb file mentioned previously, the .stm file does not store data in a b-tree structure. When a message arrives through the Internet or Simple Mail Transfer Protocol (SMTP), it always arrives as a stream of bytes. In Exchange Server 2003 and Exchange 2000 Server, these messages are streamed directly to the .stm file where they are held until accessed by a MAPI client. So the content is not converted. That way, if the end user is consistently accessing mail through POP3, the mail items are pulled directly from the .stm file and are already in the proper state for delivery. In the case that the message is accessed by a MAPI client, however, the message is moved over to the .edb file and converted to Exchange native form, and is never moved back to the .stm file.

If the .stm file is missing or corrupt, it can be reconstructed. This may be a catastrophic loss, depending on the situation. If end users work in a non-MAPI environment, they may lose a great deal of data. To accomplish the reconstruction, you may use the /createstm switch of the Eseutil tool. Because the actual content is stored in the .stm file, but the pointers and header information for the messages are stored in the .edb file, the tool rebuilds the .stm file using that information, but without the content.

There are three ESE components in memory: transaction log buffers, data cache, and version store. There are two components on disk: log files and database files. Transactions move through the components as follows whether or not a backup is taking place:

  1. Log buffers

  2. Log files

  3. Cache buffers and version store

  4. Database files

The .edb and .stm files combine to make up the Exchange database. The .edb file stores all the data accessed by MAPI clients, and after a piece of data is moved to the .edb file and transformed into the Exchange native format, it stays that way. The .stm file houses all the content streamed over the Internet. Internet mail destined for review by a MAPI client first arrives through SMTP to the .stm file before it is promoted to the .edb file. If a message originates from another Exchange server, but is transported by SMTP, it is streamed into the STM and goes through an immediate promotion to the EDB. If the mail is never accessed by a MAPI client, it will stay in its native form in the .stm file. The splitting of these two files and the roles they play significantly reduce the hefty conversion overhead legacy Exchange products endured.

For more information, see the following Microsoft Knowledge Base article:

For more information, see the following third-party resources:

The third-party Web site information in this topic is provided to help you find the technical information you need. The URLs are subject to change without notice.