There are several data points that you will use to determine how to size a database logical unit number (LUN). In addition, there are other factors to consider. After all factors have been considered and calculated, we recommend that you include an additional overhead factor for the database LUN of 20 percent. This value will account for the other data that resides in the database that is not necessarily seen when calculating mailbox sizes and white space. For example, the data structure (tables, views, and internal indices) within the database adds to the overall size of the database. For example, if after reading the following subsections, you determine that you need 120 gigabytes (GB), we recommend that you provision 144 GB, representing a 20 percent safety overhead for that storage group's database LUN.
Mailbox Quota
The first metric to understand is mailbox size. Knowing the amount of data that an end user is allowed to store in his or her mailbox allows you to determine how many users can be housed on the server. Although final mailbox sizes and quotas change, having a goal is the first step in determining your needed capacity. For example, if you have 5,000 users on a server with a 250 megabyte (MB) mailbox quota, you need at least 1.25 terabytes of disk space. If a hard limit is not set on mailbox quotas, it will be difficult to estimate how much capacity you will need.
Database White Space
The database size on the physical disk is not just the number of users multiplied by the user quota. When the majority of users are not near their mailbox quota, the databases will consume less space, and white space is not a capacity concern. The database itself will always have free pages, or white space, spread throughout. During online maintenance, items marked for removal from the database are removed, which frees these pages. The percentage of white space is constantly changing with the highest percentage immediately after online maintenance and the lowest percentage immediately before online maintenance.
The size of white space in the database can be approximated by the amount of mail sent and received by the users with mailboxes in the database. For example, if you have 100 2-GB mailboxes (total of 200 GB) in a database where users send and receive an average of 10 MB of mail per day, the white space is approximately 1 GB (100 mailboxes × 10 MB per mailbox).
White space can grow beyond this approximation if online maintenance is not able to complete a full pass. It is important that your operational activities include enough time for online maintenance to run each night, so that a full pass can complete within one week or less.
Database Dumpster
Each database has a dumpster that stores soft-deleted items. By default, items are stored for 14 days in Microsoft Exchange Server 2007. These include items that have been removed from the Deleted Items folder. By default, compared with Exchange Server 2003, Exchange 2007 increases the overhead consumed by the database dumpster because deleted items are now stored for twice as long. The actual amount in the dumpster will depend on the size of each item and your organization's specific retention settings.
After the retention period has passed, these items will be removed from the database during an online maintenance cycle. Eventually, a steady state will be reached where your dumpster size will be equivalent to two weeks of incoming/outgoing mail, as a percentage of your database size. The exact percentage depends on the amount of mail deleted and on individual mailbox sizes.
The dumpster adds a percentage of overhead to the database dependent upon the mailbox size and the message delivery rate for that mailbox. For example, with a constant message delivery rate of 52 MB per week, a 250-MB very heavy profile mailbox would store approximately 104 MB in the dumpster, which adds 41 percent overhead. A 1-GB mailbox storing the same 104 MB in the dumpster adds 10 percent overhead.
Actual Mailbox Size
Over time, user mailboxes will reach the mailbox quota, so an amount of mail equivalent to the incoming mail will need to be deleted to remain under the mailbox quota. This means that the dumpster will increase to a maximum size equivalent to two weeks of incoming/outgoing mail. If the majority of users have not reached the mailbox quota, only some of the incoming/outgoing mail will be deleted, so the growth will be split between the dumpster and the increase in mailbox size. For example, a 250-MB very heavy message profile mailbox that receives 52 MB of mail per week (with an average message size of 50 kilobytes (KB)) would result in 104 MB in the dumpster (41 percent), and 7.3 MB in white space, for a total mailbox size of 360 MB. Another example is a 2-GB very heavy message profile mailbox that receives 52 MB of mail per week, which results in 104 MB in the dumpster (5 percent) and 7.3 MB in white space, for a total mailbox size of 2.11 GB. Fifty 2-GB mailboxes in a storage group total 105.6 GB.
The following is a formula for database size using a 2-GB mailbox:
Mailbox Size = Mailbox Quota + White Space + (Weekly Incoming Mail × 2)
Mailbox Size = 2,048 MB + (7.3 MB) + (52 MB × 2)
2,159 MB = 2,048 MB + 7.3 MB + 104 MB (5 percent larger than the quota)
After you have determined the projected actual mailbox size, you can use that value to determine the maximum number of users per database. Take the projected mailbox size, and divide it by the maximum recommended database size. This will also help you determine how many databases you will need to handle the projected user count, assuming fully populated databases. Be aware that due to non-transactional input/output (I/O) or because of hardware limitations, you may have to modify the number of users placed on a single server. Some administrators will prefer to use more databases to further shrink the database size. This approach can assist with backup and restore windows at the cost of more complexity in managing more databases per server.
Content Indexing
Content indexing creates an index, or catalog, that allows users to easily and quickly search through their mail items rather than manually search through the mailbox. Exchange 2007 creates an index that is about 5 percent of the total database size, which is placed on the same LUN as the database. An additional 5 percent capacity needs to be factored into the database LUN size for content indexing.
Maintenance
A database that needs to be repaired or compacted offline will need capacity equal to the size of the target database plus 10 percent. Whether you allocate enough space for a single database, a storage group, or a backup set, additional space will need to be available to perform these operations.
Recovery Storage Group
If you plan to use a recovery storage group as part of your disaster recovery plans, enough capacity will need to be available to handle all of the databases you want to be able to simultaneously restore on that server.
Backup to Disk
Many administrators perform streaming online backups to a disk target. If your backup and restore design involves backup to disk, enough capacity needs to be available on the server to house the backup. Depending on the backup type you use, this capacity can be as small as the database and logs to as large as the database and all logs since the last full backup.