Exchange Q & A Site Resiliency with SCR, Recommended Root Volume Size, and More
Q I'm currently architecting an Exchange Server 2007-based messaging infrastructure for a large enterprise requiring that the solution include site resiliency. So I'm considering deploying cluster continuous replication (CCR)-based Mailbox servers on Windows Server 2008 failover clusters with the active node in the primary (active) datacenter and the passive node in the secondary (backup) datacenter.
The datacenters are on different subnets but belong to the same Active Directory site, which means this is a supported scenario when installing the CCR-based Mailbox servers on Windows Server 2008. Would you recommend deploying CCR-based Mailbox servers in a multi-subnet environment?
A You're correct that this scenario is supported as long as you stretch the Active Directory site between the datacenters, as Exchange 2007 doesn't support deploying the active and passive CCR nodes in different Active Directory sites. Nevertheless, it's important to bear in mind that CCR was really designed for high availability within a datacenter, not for site resiliency (which is more about disaster recovery than high availability).
For this specific reason, the Exchange Product group introduced standby continuous replication (SCR) with Exchange 2007 SP1. SCR uses the same log-shipping and replay technology used by CCR, but SCR does not rely on Windows failover clustering functionality, as does CCR. This means that you can enable replication for both clustered and non-clustered Mailbox servers to an SCR target. That target can consist of a non-clustered Mailbox server or a standby cluster on which the passive Mailbox server role has been installed.
This is illustrated in Figure 1, which is part of the Standby Continuous Replication section in the Exchange 2007 documentation.
I don't recommend implementing a geographically dispersed cluster (geo-cluster) based on CCR, but it is, as already mentioned, fully supported by Microsoft. However, you need to be aware of the disadvantages of following this approach.
Figure 1 Standby continuous replication model (Click the image for a larger view)
First, you must use a stretched Active Directory site. It seems that you already have this one settled, but it's worth mentioning for the rest of the readers out there who may not know this. Next, you should be aware that because the CCR nodes are located on different subnets, the clustered Mailbox server (CMS) will change its IP address during a failover to the other site. This change will need to be replicated to all DNS servers used by Microsoft Outlook clients and—just as important—all machines with Outlook clients connecting to the CMS must have their DNS cache flushed.
The Outlook clients will be disconnected from the respective mailboxes on the CMS during the period that this takes place. For this reason, you should change the DNS time-to-live (TTL) value for the CMS network name resource to five minutes (300 seconds) or less. For more on how this is done, refer to technet.microsoft.com/library/bb676687.aspx.
You must also take into consideration where you want to place the file share witness (FSW). The recommendation is to create the FSW on a Hub Transport server, but in which site? If you locate it on a Hub Transport server in the primary datacenter and the primary datacenter is lost, will an automatic failover happen? No, this is not possible if one of the CCR nodes and the FSW are down at the same time. On the other hand, it's likely that many would prefer the failover not to be automatic in a geo-CCR scenario. Be sure to read the guidance related to placement of the FSW in a geo-CCR scenario.
Finally, you should be aware that if a failover to the passive node in the backup datacenter occurs, any backfill from the Transport Dumpster on Hub Transport servers in the primary datacenter will not happen since these servers most likely are down. The net result is missing e-mail.
As you can see, there are many things to consider before you deploy a geo-CCR in your environment, far more than if you instead choose to use SCR for site resiliency.
Q Our organization consists of multiple physical sites spread across the U.S. as well as Europe, the Middle East, and Africa. Today all user mailboxes are hosted on Mailbox servers deployed locally at each physical site, but we want to consolidate those servers to one datacenter located in the U.S. Given this plan, can you tell us the maximum latency for Outlook 2003/2007 clients running in cached mode?
A It's good to hear all your clients are either Outlook 2003 or 2007 and run in cached mode, because cached mode is typically your lifesaver in consolidations such as the one you describe.
In the past, Microsoft recommended 1,000 ms or less end-to-end latency to the home Mailbox server for Outlook 2003 cached-mode clients. For non-cached-mode Outlook 2003 clients, the recommendation was 200 ms or less.
Based on my own personal experience, 1,000 ms is a little high even for Outlook 2007 cached-mode clients. When you get more than 500 ms, Outlook starts to hang and generally becomes sluggish. My recommendation is you should strive for latency less than 500 ms between the Outlook client and the home Mailbox server. In Figure 2, you can see the average response time for my Outlook 2007 client connected to a mailbox in Redmond.
Here's a tip: to open the Connection Status window shown in Figure 2, you can right-click the Outlook icon in the system tray while holding down the CTRL button. Alternatively, you can launch Outlook by clicking Start | Run and entering Outlook.exe /RPCDIAG.
Figure 2 Average response time in Outlook 2007
Considering that I'm located in Denmark, the average response times are acceptable. You can see the average response times from the global catalog server are drastically higher, but since these are used less frequently (for address book lookups and so forth), it doesn't matter that much.
Q We have just deployed Exchange 2007 SP1 in our organization. The Active Directory topology consists of two sites. In the first site, we have deployed a CCR-based Mailbox server and two servers each with the Hub Transport and Client Access server roles installed. In the second site, we have deployed a single copy cluster (SCC)-based Mailbox server and two servers with the Hub Transport and Client Access server roles just as in the first Active Directory site.
The majority of clients in each Active Directory site still run Outlook 2003, which means a public folder store is required for free/busy information and the offline address book (public folders will be used only for this purpose, not as a data repository). Our initial plan was to mount a public folder store on both the CCR- and the SCC-based Mailbox servers so public folder changes could be replicated between the Active Directory sites. But then we heard that Microsoft doesn't recommend having more than one public folder store in an Exchange organization if one is hosted on a CCR-based Mailbox server, because public folder replication and CCR replication don't behave well together.
So with all of this in mind, what would you recommend that we do? We don't want to go down a path not recommended by Microsoft. But at the same time, we don't want Outlook 2003 clients on each Active Directory site to contact the public folder store on the other Active Directory site.
A This is a very good question. You're correct that combining public folder replication and CCR replication for a public folder store should be avoided (details on why you should avoid it can be found in my January 2009 Exchange Queue & A column. Since you will use public folder functionality to enable legacy Outlook clients to perform free/busy lookups to download the offline address book (OAB), I recommend you install the Mailbox server role on one of the combined Hub Transport and Client Access servers in the Active Directory site where the CCR-based Mailbox server resides. Then move the public folder store from the CCR-based Mailbox server to this server.
Bear in mind, though, that for public folder replication to work properly, you must also have a mailbox database running on the server. Using the combined Hub Transport and Client Access server for free/busy lookups and the OAB will not put a big performance burden on the box, and it is what most Exchange architects do in such a scenario.
Q We are currently planning the storage layout for the Mailbox servers that will be part of the Exchange 2007 messaging environment we plan to transition to within the next six months. Since our enterprise consists of thousands of users, we plan to have 48 storage groups on each Mailbox server, all of which will be based on CCR technology.
Because of the number of storage groups, we will, of course, make use of mount points. Since we will create a mount point for each LUN on each of the CCR-based Mailbox servers, we are interested in hearing what the recommended minimum size is for the anchor drives where the mount points will be created.
A The Knowledge Base article "How to Configure Volume Mount Points on a Server Cluster in Windows Server 2008" describes how to configure volume mount points on a server cluster in Windows Server 2008. It states that if the root (host) volume—also known as the anchor LUN—is used extensively for mount points, it must be at least 5MB. But the recommendation for root volumes is different when it comes to Exchange.
In Exchange 2007, as well as previous versions of Exchange Server, it is a best practice to use a root volume between 100MBs and 500MBs. When running Exchange 2007 on Windows Server 2003-based servers, you can have problems creating a new database if the root volume has fewer than 20MBs of free space. See "Event 104 is logged after you create a new database in Exchange Server 2007" for more information about this specific issue.
Q Our organization's messaging infrastructure is based on Exchange 2007 SP1. All Exchange 2007 servers are located in the same datacenter and are based on CCR clustering technology, so we have local redundancy. But we have just set up a second datacenter to act as the backup in case our primary datacenter is lost during a disaster.
For Exchange, we want to deliver site resiliency by deploying one Hub Transport, one Client Access, and one Mailbox server in the backup datacenter. We then want to enable SCR from the CCR-based Mailbox servers in the primary datacenter to the Mailbox server in the backup datacenter. Before we start to build the Exchange 2007 servers in the backup datacenter, we have a question we hope you can answer. We want to know whether it's supported to enable SCR between SCR sources consisting of CCR-based Mailbox servers and an SCR target consisting of a standalone Mailbox server. And if this is the case, would you recommend that we follow this approach?
A The short answer is yes, this is fully supported. The long answer, however, is also yes, but you must remember that you can't recover the CMS located on the SCR source server(s) using the /RecoverCMS switch. You can only recover a CMS using the /RecoverCMS switch when the SCR target is a Windows Server 2003/2008 failover cluster on which only the passive Mailbox server role has been installed.
If you deploy a standalone Mailbox server as the SCR target, you'll have to recover the databases using database portability, a procedure that's more cumbersome than the RecoverCMS method. You can find the necessary step in the Exchange 2007 documentation in the article "Standby Continuous Replication: Database Portability." If you have Windows Server 2003/2008 Enterprise edition installed on the SCR target, you could also remove the Mailbox server role from it, form a failover cluster, install the passive Mailbox server role on one of the failover cluster nodes, and then recover your CMS using the RecoverCMS option.
If you have only one physical machine available right now but will have an extra machine (plus an additional license for Windows Server 2003/2008 Enterprise edition and Exchange Server 2007 Enterprise edition) in the near future, you could also start by forming a Windows Server 2003/2008 failover cluster consisting of a single node, and then install the passive Mailbox server role on this node. When you have that extra machine ready, you could install Windows Server 2003/2008 on it and configure the network settings and the like and then finally add it to the Windows failover cluster.
Henrik Walther is a Microsoft Certified Master: Exchange 2007 and Exchange MVP with more than 15 years of experience in the IT business. He works as a Technology Architect for Trifork Infrastructure Consulting (a Microsoft Gold partner based in Denmark) and as a Technical Writer for Biblioso Corporation (a U.S.-based company specializing in managed documentation and localization services).