Exchange Storage Architecture
Topic Last Modified: 2005-05-23
Exchange servers store data in two files: an .edb file and an .stm file. Together, the .edb file and the .stm file form an Exchange store repository. For example, the default mailbox store on an Exchange server uses files named Priv1.edb and Priv1.stm. The default public folder store uses the files Pub1.edb and Pub1.stm. The .edb file contains many tables that hold metadata for all e-mail messages and other items in the Exchange store, in addition to the contents of MAPI messages. The .edb file is an ESE database, and because it is used primarily to store MAPI messages and attachments, it is also referred to as the MAPI-based database. The .stm file, in contrast, stores native Internet content. Because Internet content is written in native format, there is no need to convert messages and other items to Exchange format (as in Exchange 5.5 and earlier). The .stm file is also an ESE database, referred to as the streaming database. The .edb and .stm files function as a pair, and the database signature (a 32-bit random number combined with the time that the database was created) is stored as a header in both files. The internal schema for the .stm pages is stored in the .edb file.
|You can rename the .edb and .stm databases and move them to different directories in Exchange System Manager. Because the .edb and .stm files together create a complete Exchange store repository, you should keep them together and assign them a common name with different extensions (that is, .edb and .stm).|
Exchange Server 2003 uses transactions to control changes in storage groups. These transactions are recorded in a transaction log, similar to the way transactions are stored in traditional databases. Changes are committed or rolled back based on the success of the transaction. If there is a failure, you use transaction logs (together with the database files and, in some cases, the checkpoint file) to restore a database. The facility that manages transactions is the Microsoft Exchange Information Store service (Store.exe). Any uncommitted transaction log entries are also considered part of a current Exchange database, as illustrated in the following figure.
Current Exchange Server 2003 database
The following two types of databases are available in Exchange Server 2003:
Private store databases These databases store mailboxes and message queues for MAPI-based messaging connectors.
Public store databases These databases store public folder hierarchies and public folder contents.
The following figure illustrates the internal Exchange store architecture. The Microsoft Exchange Information Store service (Store.exe) uses Extensible Storage Engine (ESE) to access the database files in the file system, and provides access to the data through various interfaces, such as MAPIsvr, ExPOP, ExIMAP, ExSMTP, and ExOLEDB. Client application and application programming interfaces, such as Collaboration Data Objects for Exchange (CDOEX), can use these interfaces or communicate with the messaging database (MDB) module.
Exchange store architecture
Each storage group is made up of a set of log files and auxiliary files (internal temporary databases, the checkpoint file, and reserve logs) for all the databases (.edb files, .stm files) in the storage group. Exchange Server 2003 supports multiple storage groups and multiple databases in each storage group. In Exchange Server 2003, a single server supports up to four storage groups and a single storage group supports up to five databases. Support for multiple databases enables you to distribute numerous mailboxes and public folders across numerous, smaller databases, thus making database management easier. Exchange 2000 Server and Exchange Server 2003 can support up to 20 mailbox and public folder databases on a single server.
As illustrated in the following figure, all storage groups are hosted from the same Store.exe process. Each storage group is represented by an ESE instance.
Storage group architecture
Within each storage group, each .edb and .stm database pair represents a mailbox store or a public folder store. As shown in Figure 10.3, all mailbox and public folder stores in a particular storage group share a common set of log files and other system files. These files enable transaction-oriented processing.
The log files and other system files in each storage group have the following purposes:
<Log Prefix>xxx.chk This is the checkpoint file (for example, E00.chk) that determines which transactions require processing to move them from the transaction log files to the databases. Checkpoint files are updated when ESE writes a particular transaction to a database file on a disk. This update always points the checkpoint file to the last transaction that was transferred successfully to the database. This update provides a fast recovery mechanism. However, checkpoint files are not required to commit transactions to databases. ESE has the ability to process transaction log files directly and to determine for itself which transactions have not yet been transferred. This process takes significantly more time than using checkpoints.
Note: Extensible Storage Engine guarantees that transactions are not written to a database multiple times.
Exx.log This is the current transaction log file for the storage group. Transaction log files give ESE the ability to manage data storage with high speed efficiency. ESE stores new transactions, such as the delivery of a message, in a memory cache and in the transaction log concurrently. The data is written sequentially. New data is appended to existing data without the need for complex database operations. At a later time, the transactions are transferred in a group from the memory cache to the actual databases, which update them.
By default, the default storage group, named First Storage Group, uses the prefix E00, which results in a transaction log file name of E00.log. The E00.log is used for all mailbox and public stores in this storage group. If you create additional storage groups, the prefix number is incremented to E01, E02, and E03.
<Log Prefix>XXXXX.log These are transaction log files that have no room remaining for further data. By default, transaction log files are always exactly 5.242.880 bytes (five megabytes) in size. It is theoretically possible to change the log file size, but this is not recommended. When a log is full, it is renamed to allow the creation of a new, empty transaction log file. Renamed transaction log files are named previous log files. The naming format of previous log files is <Log Prefix>XXXXX.log (such as E00XXXXX.log), where XXXXX represents a five-digit hexadecimal number from 00000 to FFFFF. Previous log files reside in the same directories as the current transaction log file.
Res1.log and Res2.log These are reserved transaction log files for the storage group. Reserved log files are an emergency repository for transactions. They provide enough disk space to write a transaction from memory to the hard disk, even if a server's disk is too full to admit new transactions to a log file. The reserved log files can be found in the transaction log directory. They are created automatically when the databases are initialized. They cannot be created later.
ESE uses reserved transaction log files only to complete a current transaction process. It then sends an error notification to Store.exe to dismount the Exchange store safely. In the application event log, there is an entry that indicates the issue. In this situation, you should create additional free hard disk space (for example, add a new hard disk) before you mount the database again.
Tmp.edb This is a temporary workspace for processing transactions. Tmp.edb contains temporary information that is deleted when all stores in the storage group are dismounted or the Exchange Information Store service is stopped.
Note: Tmp.edb is not included in online backups.
<file name>.edb These are the rich-text database files for individual private or public stores. The rich-text database file for the default private store is named Priv1.edb. The file for the default public store is named Pub1.edb.
<file name>.stm These are the streaming Internet content files for individual databases. The streaming database file for the default private store is named Priv1.stm. The file for the default public store is named Pub1.stm.
You can determine the path to a storage group's transaction log file and the log file's name in Exchange System Manager. Right-click the desired storage group, select Properties, and from the General tab, look at the information in the Transaction Log Location and the Log File Prefix fields. Using the Browse buttons, you can move the transaction log and system files to a new location, such as a separate physical drive.
The configuration settings for a storage group are stored in Active Directory. If you want to use ADSI Edit to locate the directory object for a storage group, you must open the configuration naming contacts, expand the services node, then CN=Microsoft Exchange, and then expand the Exchange organization object, administrative group, and server container. Underneath it, you can find a container named CN=InformationStore, which contains the storage groups, such as CN=First Storage Group. The object class for storage group objects is msExchStorageGroup. If you plan to use custom scripts to manage Exchange store resources, you can access msExchStorageGroup objects by using Active Directory Service Interfaces (ADSI).
The following code example demonstrates how to access the default storage group on a server called SERVER01 in an Exchange organization called Contoso. It displays the current path to the transaction log files of that storage group.
strStorageGroupDN = "CN=First Storage Group," _ & "CN=InformationStore," _ & "CN=SERVER01,CN=Servers," _ & "CN=First Administrative Group," _ & "CN=Administrative Groups," _ & "CN=Contoso,CN=Microsoft Exchange," _ & "CN=Services,CN=Configuration," _ & "DC=Contoso,DC=com" Set oStorageGroup = GetObject("LDAP://" & strStorageGroupDN) MsgBox oStorageGroup.Get("msExchESEParamLogFilePath")
The following are important Exchange attributes of msExchStorageGroup objects that you can use in custom scripts based on ADSI:
msExchESEParamCircularLog This is a Boolean flag that determines whether circular logging is enabled or disabled. A value of 0 indicates that circular logging is disabled; a value of 1 indicates that circular logging is enabled.
Circular logging causes ESE to discard transactions when the committed changes are transmitted to the database file on disk. The checkpoint file indicates which log files and transaction entries are successfully committed to the database. Any existing previous logs are deleted, while transactions in the current transaction log file are marked as obsolete. New transactions eventually overwrite the obsolete entries in the current transaction log before a new log file is created.
Note: Through purging of transactions, circular logging reduces consumption of disk space. However, circular logging is not compatible with sophisticated fault-tolerant configurations and several online backup types that rely on the existence of transaction logs. When circular logging is enabled, you can only perform full backups. You cannot perform backups that rely on transaction log files, such as differential or incremental backups. When you recover data, you cannot replay transaction log files, thus you cannot restore data beyond the most recent backup. In contrast, if transactions are not automatically deleted through circular logging, you might be able to recover beyond the most recent backup by replaying transactions that still exist on a hard disk. Although circular logging is enabled by default in Exchange Server 5.5, it is disabled by default in Exchange 2000 Server and Exchange Server 2003.
msExchESEParamEventSource This is a language-independent process descriptor string that points to the Microsoft Exchange Information Store service key (MsExchangeIS) in the registry under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services.
msExchESEParamLogFilePath This attribute determines the path to a storage group's transaction log files, such as C:\Program Files\Exchsrvr\mdbdata.
msExchESEParamLogFileSize This attribute specifies the log file size in kilobytes (KB). The default value is 5120.This value should never be changed.
msExchESEParamSystemPath This attribute specifies the path to the check point file, such as C:\Program Files\Exchsrvr\mdbdata, in addition to the path to any temporary databases that might be present.
msExchESEParamZeroDatabaseDuringBackup This is a Boolean flag that determines whether deleted records and long values are overwritten with zeros during backup operations. A value of 0 indicates that records are not overwritten. A value of 1 indicates that databases are overwritten with zeros.
msExchESEParamEnableOnlineDefrag This is a Boolean flag that determines whether the Microsoft Exchange Information Store service should perform online defragmentation of databases. A value of 0 indicates no online defragmentation should be performed. A value of 1 indicates online defragmentation should be performed during scheduled maintenance cycles.
Note: Online defragmentation frees space in the databases but does not reduce the size of the database files. Database inconsistencies are corrected during every start and shutdown of the server in a process referred to as soft recovery.
msExchESEParamEnableIndexChecking This is a Boolean flag that determines whether the operating system version is checked for Unicode indexes. A value of 0 indicates that index checking is not performed. A value of 1 indicates that index checking is performed. This parameter detects changes in the operating system that result from upgrading to a newer version or from applying a service pack. This flag determines whether the sort order for Unicode has changed. Whenever the operating system is changed in this manner, re-indexing occurs automatically.
msExchESEParamBaseName This attribute specifies the base name for the log files in this storage group. For example, a base name of E00 results in a transaction log file name of E00.log.
msExchESEParamDbExtensionSize This attribute specifies the database extension size, in pages. The default value is 2 megabytes (MB).
msExchESEParamPageTempDBMin This attribute specifies the minimum size of the temporary database, in pages. The default value is 0.
msExchESEParamCheckpointDepthMax This attribute specifies the preferred (not hard) maximum checkpoint depth, in bytes.
Each storage group consumes about 50 MB of free disk space. The files listed above that are required by the storage group use a minimum of 11 MB of disk space. The minimum disk space for private and public stores is 5 MB and 8 MB, respectively. Although the total disk space used is about 24 MB, extra disk space is also needed for the actual creation of the storage group and for read and write operations.
When working with storage groups, remember the following:
A server running Exchange Server 2003 can have up to five storage groups. Because one of the storage groups is reserved for database recovery operations, only four storage groups can be used to hold databases that are accessible by clients. Attempts to create more than four storage groups result in an error message.
You can create only five databases in a storage group. Attempts to create more databases result in an error message.
Exchange Server uses ESE as an embedded database engine that determines the structure of the databases and manages memory. The database engine caches the databases in memory by transferring four-kilobyte chunks of data (pages) in and out of memory. It updates the pages in memory and writes new or updated pages back to the disk. When requests come to the system, the database engine can buffer data in memory, so that it does not have to access the disk constantly. This makes the system more efficient, because writing to memory is approximately 200,000 times faster than writing to disk. When users make requests, the database engine starts loading the requests to memory and marks the pages as dirty. A dirty page is a page in memory that contains data. These dirty pages are later written to the Microsoft Exchange Information Store service databases on disk.
Although caching data in memory is the fastest and most efficient way to process data, it means that while Exchange is running, the information on disk is never completely up-to-date. The latest version of the database is in memory, and because many changes in memory are not on disk yet, the database and memory are not synchronized. If there are any dirty pages in memory that have not been transferred and written to disk, the databases are flagged as inconsistent. Exchange databases are synchronized only when all dirty pages in memory are transferred to disk. This happens when you properly shut down the Microsoft Exchange Information Store service. During the shutdown process, the Microsoft Exchange Information Store service flushes all pages to disk.
The Exchange Server 2003 MAPI database file contains the tables that hold the metadata for all e-mail messages, other objects in the database, and the contents of MAPI messages. Every folder displayed in Microsoft Office Outlook is a separate database table in the Exchange store. Every sort order used to view these folders is represented by a separate index on that table. The Store.exe process manages these sort orders.
Messages from MAPI clients, such as Outlook, are stored in the MAPI database, just as they were stored in previous versions of Exchange Server. MAPI-based clients can then access these messages without conversion. However, if an Internet protocol-based client attempts to read a message in this database, the message is converted to the requested format.
The traditional .edb file and its accompanying .stm file are a single unit. One of these files is of little use without the other file. It is important to understand that a single database in the Microsoft Exchange Server Information Store service contains two files, the .edb file and the .stm file.
A record in the .edb file contains a column (of data type JET_coltypSLV) that references a list of pages in the streaming file that contains the raw data. Space usage (maximum of four kilobytes of page numbers) and checksum data for the data in the streaming file is stored in the .edb file.
Exchange Server 5.5 and earlier store messages in message database encapsulated format (MDBEF). This is the native format for Outlook clients. When a non-MAPI client requests a message, the Microsoft Exchange Information Store service converts the contents from MDBEF to the appropriate format, based on what the client requests. This conversion consumes processor bandwidth and slows server performance.
Later versions of ESE enable Internet messaging clients to store raw data in native format. The repository for this raw data is referred to as the streaming database, or simply the streaming file. The streaming file has no balanced tree (B-tree) overhead. Instead, it contains two four-kilobyte pages of header information and then raw data in four-kilobyte pages. This flat data structure is designed for binary large objects (BLOBs) of data that are unlikely to need content conversion and that can be received and transmitted very quickly.
Property promotion determines where data is stored in an ESE database and is therefore an important concept to understand. The Microsoft Exchange Information Store service supports the property promotion of data held in the .stm file to the .edb file. Property promotion enables folder views and indexes to be maintained efficiently. For example, a message streamed to the .stm file has its properties, such as sender, subject, and date sent and received, promoted to the records representing the message in the .edb file.
When a MAPI client, such as Microsoft Outlook, submits a message to the Microsoft Exchange Information Store service, the contents of that message are stored in the .edb file. If a non-MAPI client opens the message, the Microsoft Exchange Information Store service does an immediate conversion of the MAPI content to Internet format by performing some of the conversion and calling IMAIL, which in turn calls RTFHTML, to complete the conversion. None of this conversion is persistent, meaning that data is not moved out of the .edb file and written to the .stm file.
If an Internet client submits a message to the Microsoft Exchange Information Store service, the contents of that message are stored in the .stm file. Certain headers from the Internet message are duplicated to the .edb file, so the Microsoft Exchange Information Store service can find the message. This is referred to as a state 0 conversion.
If any client asks for a property, such as PR_Subject, or one of its many aliases, then the Microsoft Exchange Information Store service promotes all of the Internet message's header information to Properties. This is referred to as a state 1 conversion.
If any client asks for attachment information, then the Microsoft Exchange Information Store service creates a near duplicate (in MAPI form) of the Internet message. At first, the message is still in the .stm file. However, much of the data needed for MAPI access is in the .edb file. If a client alters the message in a way that changes the Multipurpose Internet Mail Extensions (MIME), then the .stm file version of the message is discarded and the .edb file of the message is preserved. This is referred to as a state 2 conversion.
Regardless of how a message is submitted to the Microsoft Exchange Information Store service, if Exchange Server receives Internet content that includes Application/ms-tnef content, the message initially goes to the .stm file, but it is then immediately decoded and moved to the .edb file. The same applies to messages with a winmail.dat attachment, encoded using UUEncode. Transport neutral encapsulation format (TNEF) and Winmail.dat are encapsulation methods for MAPI messages to preserve MAPI properties on transports that do not support MAPI. Therefore, the general principal that MAPI messages reside in the .edb file and Internet messages reside in the .stm file is correct. The current functionality has the TNEF decoded before any one of the MAPI properties are read.