Understanding Mailbox Database and Log Capacity Factors
Applies to: Exchange Server 2010 SP3, Exchange Server 2010 SP2
Topic Last Modified: 2012-02-24
This topic explains the factors that you should consider when you plan mailbox database and log capacity as part of your mailbox server storage design in Microsoft Exchange Server 2010.
Many factors influence a sizing capacity plan for Exchange Server 2010 Mailbox databases. This section discusses the following:
The first metric to understand is the storage size limit, known as the mailbox storage quota, that's in effect in your organization. Knowing the amount of data that an end user is allowed to store in his or her mailbox allows you to determine how many user mailboxes can be housed on the server. Although mailbox storage quotas can change in response to changing organizational requirements, having a goal for the mailbox storage quota is the first step in determining your needed mailbox database capacity.
For example, if you have a server with 5,000 250-MB user mailboxes on it, you need at least 1.25 TB of disk space, excluding space requirements for recoverable items. If a limit isn't set for mailbox storage quotas, you'll find it difficult to estimate database capacity. Mailbox storage quotas for Exchange 2010 need to include the space for both the primary mailbox and personal archive mailbox (when used). For more information, see Managing Mailbox Servers and Managing Archives.
The database size on the physical disk isn't just the number of users multiplied by the mailbox storage quota. When the majority of users aren't approaching their mailbox storage quota, the databases consume less space and white space isn't a capacity concern. The database itself will always have free pages, or white space, spread throughout. During background database maintenance, items marked for removal from the database are removed, which frees these pages. The percentage of white space is constantly changing due to the efforts of the 24x7 online defragmentation process.
You can estimate the amount of white space in the database by knowing the amount of mail sent and received by the users with mailboxes in the database. For example, if you have 100 2-GB mailboxes (total of 200 GB) in a database where users send and receive an average of 10 MB of mail per day, the amount of white space is approximately 1 GB (100 mailboxes × 10 MB per mailbox). The amount of white space can exceed this approximation if background database maintenance isn't able to complete a full pass.
Each database has a dumpster that stores soft-deleted items. By default, soft-deleted items are stored for 14 days and calendar items are stored for 120 days in Exchange 2010.
In addition, Exchange 2010 also includes the ability to prevent the purging of data before the deleted item retention window has passed. This functionality is known as single item recovery. Single item recovery is disabled by default. However, when single item recovery is enabled, there is an additional 1.2 percent increase in the size of the mailbox for a 14-day deleted item retention window. For calendar version logging data, there is an additional 3 percent increase in the size of the mailbox. Calendar version logging data is enabled by default.
The formula for determining the dumpster space requirements for 14 days of deleted item retention with single item recovery and calendar version logging enabled is:
Dumpster Size = (Daily Incoming/Outgoing Mail x Average Message Size x Deleted Item Retention Window) + (Mailbox Quota Size x 0.012) + (Mailbox Quota Size x 0.03)
For example, if the mailbox size is 2 GB, enabling single item recovery for 14 days of deleted item retention requires an additional 25 MB of space, and the calendar logging feature requires an additional 61 MB.
For more information, see the following topics:
Over time, user mailboxes will reach the mailbox storage quota, so an amount of mail equivalent to the incoming mail will need to be deleted to remain under the mailbox storage quota. This requirement means that the dumpster will increase to a maximum size equivalent to the amount of e-mail sent and received each day multiplied by the number of days within the deleted item retention window. If the majority of users haven't reached the storage quota, only some of the incoming/outgoing mail is deleted. Therefore, the growth is split between the dumpster and the increase in mailbox size.
To determine database size using a 2-GB mailbox without using the personal archive feature, see the "Mailbox Capacity Requirements" section in the Exchange 2010 Mailbox Server Role Design Example topic.
After you have determined the projected actual mailbox size, you can use that value to determine the maximum number of users per database. Divide projected mailbox size by the recommended database size. This value will also help you determine how many databases you will need to handle the projected user count, assuming fully populated databases. Be aware that due to non-transactional input/output (I/O) or because of hardware limitations, you may have to modify the number of users placed on a single server. Some administrators will prefer to use more databases to further reduce the database size. This approach can assist with backup and restore windows at the cost of more complexity in managing more databases per server.
Content indexing creates an index, or catalog, that allows users to easily and quickly search through their mail items rather than manually search through the mailbox. Exchange 2010 creates an index that is about 10 percent of the total database size, which is placed on the same LUN as the database. Therefore, an additional 10 percent needs to be factored into the database LUN size for content indexing.
A database that needs to be compacted offline requires capacity equal to the size of the target database plus 10 percent. Whether you allocate enough space for a single database, or a backup set, additional space must be available to perform these operations.
|Offline maintenance procedures should only be implemented by request of Microsoft Customer Service and Support because offline maintenance procedures invalidate all database copies and require a full reseed of the database.|
If you plan to use a recovery database as part of your disaster recovery plans, sufficient capacity must be available to handle all the databases you want to be able to simultaneously restore on that server. For more information, see Recovery Databases.
The database size ultimately determines how many mailboxes you deploy within each database and how many databases you deploy. The database size you deploy depends on several factors:
- Backup/restore service level agreements (SLAs) The database size ultimately dictates how fast you can backup and restore the data within a reasonable amount of time.
- High availability architecture If you plan to have multiple database copies, you can design your databases to be 2 TB in size because your copies become your first line of defense in terms of recovery operations.
- Storage architecture If you plan to deploy on JBOD storage (one disk houses both the database and its corresponding transaction logs), then the size of the disk you use dictates the maximum database size. For example, on a 1 TB disk (with a formatted capacity of about 917 GB), you also need to include space for transaction logs and the content index, and ensure you don't consume all available space.
After all factors have been considered and calculated, we recommend that you include an additional overhead factor of 20 percent for the database logical unit number (LUN). This value accounts for the other data that resides in the database that isn't necessarily seen when calculating mailbox sizes and white space.
The transaction log files are a record of every transaction performed by the database engine. All transactions are written to the log first, and then lazily written to the database. Unlike Exchange Server 2003, the transaction log files in Exchange 2010 have been reduced in size from 5 MB to 1 MB. This change was made to support the continuous replication features and to minimize the amount of data loss if primary storage fails.
You can use the following table to estimate the number of transaction logs that are generated on an Exchange 2010 Mailbox server where the average message size is 75 KB.
The value for Number of transaction logs generated per day is based on the message profile selected and the average message size. It indicates how many transaction logs will be generated per mailbox per day. The log generation numbers per message profile account for:
Message size impact
Amount of data sent/received
Database health maintenance operations
Records Management operations
Data stored in a mailbox that is not a message (tasks, local calendar appointments, contacts)
Forced log rollover (a mechanism that periodically closes the current transaction log file)
Number of transaction logs generated per mailbox profile
|Message profile (75 KB average message size)||Number of transaction logs generated per day|
You can use the following guidelines to understand how message size affects the generation rate of transaction logs:
If the average message size doubles to 150 KB, the logs generated per mailbox increases by a factor of 1.9. This number represents the percentage of the database that contains the attachments and message tables (message bodies and attachments).
Thereafter, as message size doubles beyond 150 KB, the log generation rate per mailbox also doubles, increasing from 1.9 to 3.8.
For example, if you have a 100 messages per day and:
An average message size of 150 KB, the logs generated per mailbox are 20 × 1.9 = 38.
An average message size of 300 KB, the logs generated per mailbox are 20 × 3.8 = 76.
The following sections discuss factors that affect your log sizing capacity:
- Backup and restore factors
- Move mailbox operations
- Log growth overhead
- High availability factors
- LUN capacity planning
Log LUN sizing is partly dependent on your backup and restore design. For example, if your design allows you to go back two weeks and replay all the logs generated since then, you will need two weeks of log file space. If your backup design includes weekly full and daily differential backups, the log LUN needs to be larger than an entire week of log file space to allow both backup and replay during restore. Most enterprises that perform a nightly full backup allocate two to three times the required daily log generation capacity. This approach is taken to prevent a backup failure from causing the log drive to fill, which would dismount the database.
If you plan on using the mailbox resiliency and single item recovery features within Exchange 2010 as your backup infrastructure (and thus enabling circular logging), as a best practice, you should ensure that you allocated three times the required daily log generation capacity. This ensures that, when replication is suspended or not functioning under normal parameters, the databases don't dismount due to truncation failures.
Moving mailboxes is a primary capacity factor for large mailbox deployments. Many large companies move a percentage of their user mailboxes on a nightly or weekly basis to different databases, servers, or sites. If your organization does this, you may find it necessary to provide extra capacity to the log LUN to accommodate mailbox moves.
Although the source server logs the record deletions, which are small, the target server must write all transferred data first to transaction logs. If you generate 10 GB of log files in one day, and keep a three-day buffer of 30 GB, moving 50 2-GB mailboxes (100 GB) would fill your target log LUN and cause downtime. In cases such as these, you may have to allocate additional capacity for the log LUNs to accommodate your move mailbox practices.
For most deployments, we recommend that you add an overhead factor of 20 percent to the log size (after all other factors have been considered) when creating the log LUN to ensure necessary capacity exists in moments of unexpected log generation.
High availability influences log capacity requirements in three significant ways:
- Database copy count The log capacity of the entire system is increased based on the number of database copies chosen in the high availability deployment. If you have three database copies spread across three servers, you need to provision log capacity for each copy on each server.
- Log truncation mechanism High availability in Exchange 2010, with the ability to have up to 16 copies of each mailbox database, provides the foundation to use continuous replication circular logging as the log truncation/deletion mechanism as opposed to running Full/Incremental backups to truncate/delete the older logs. For more information, see the "Log Truncation without Backups" section in Understanding Backup, Restore and Disaster Recovery and High Availability and Site Resilience.
- Database copy replay lag High availability in Exchange 2010 provides the option to lag log replay on passive database copies (configured on a per copy basis). This feature is used to provide a delay for when logs get played in to lagged database copies. This delay can be useful to protect against events which would cause undesirable content to be replicated to all database copies. The content can be stopped from being played in to the lagged database copy by suspending replay before the logs with the undesired content are played in to the database.
When replay lag is enabled for a database copy, the log capacity requirements change accordingly. If you have a 14-day lag configured, you need to provision for 17 days worth of logs. The additional log capacity is only required for the database copy that has the lag configured, other copies of that database, which don't have a lag, will have normal (non-lagged) log capacity requirements.
For more information, see Understanding High Availability Factors.
The capacity requirements for the LUN will be based on the size of the data set (database, transaction logs, content index, and recovery space) and some additional free space. Most operations management programs have capacity thresholds that provide an alert when a LUN is more than 80 percent utilized.
You can use the following formula to determine the appropriate size of the LUN:
LUN Capacity = Data Size / (1 - Free Space Percentage Requirement)
For example, if you had a data size requirement of 3000 MB and a free space requirement of 20 percent, then the LUN that hosts this data must be 3750 MB in size.
To avoid having all your transaction log disk space be consumed, you must first calculate a baseline of your environment to determine the typical log generation rate per day. Second, you must set up monitoring, and take action regarding any alerts that are generated. You should monitor for the following items:
Transaction Log LUN disk space. Set up several thresholds and different alert mechanisms. For example, if you know your typical log generation baseline, you can set up a threshold to report when you are 20 percent over the baseline.
Successful completion of your backups (if you aren’t leveraging Exchange Native Data Protection).
The truncation of events in the Application Log.
Your database copy replication health.
To help troubleshoot unexplained growth in Transaction logs, see Manage Database Log Growth by Using the Troubleshoot-DatabaseSpace.ps1 Script in the Shell.