Exchange Queue & ARecovering a Clustered Mailbox Server, Offline Address Book Issues, and More

Henrik Walther

QOur messaging infrastructure is based on Exchange 2007 SP1. All Exchange 2007 SP1 servers have been installed on Windows Server 2008. We have two datacenters—the primary datacenter and a backup where we can failover should a disaster strike the primary. In our primary datacenter, all Mailbox servers are based on cluster continuous replication (CCR) in order to provide a local high availability solution. For Mailbox server failovers from the primary datacenter to the backup datacenter, we use standby continuous replication (SCR). This means all the CCR-based clustered Mailbox servers (CMSs) in the primary datacenters also act as SCR sources. Each SCR source has corresponding SCR targets in the backup datacenter in the form of standby clusters on which only the passive Mailbox server role has been installed

Recently we did a site failover test between the two datacenters and, unfortunately, we ran into an issue when we tried to recover the CMSs to the standby clusters. When running Setup.com with the /RecoverCMS switch, we got the error message shown in Figure 1.

fig01.gif

Figure 1 Setup error when recovering CMS to a standby cluster

I was wondering if you have seen this error while recovering a CMS to a standby cluster and, more importantly, whether you have a resolution for it.

AYes, I had the misfortune of encountering this issue while trying to recover a CMS to a standby cluster. Luckily, this was also during a site level failover test. (Do I need to explain why testing your failover solutions is important?)

One thing that got me thinking was that I had tested the same setup many times before without issues. However, all the previous recovery tests were with Exchange 2007 SP1 installed on Windows Server 2003 and not Windows Server 2008 as was the case when I hit this issue.

This led me to discover how Windows Server 2008 failover clusters work compared to Windows Server 2003-based clusters. In Windows Server 2003, you created and dedicated a cluster service account to the cluster. In Windows Server 2008, you no longer do this; instead, the failover cluster runs under the "Local System." After examining the application and system logs on the standby cluster on which I tried to recover the CMS, I found the error shown in Figure 2.

fig02.gif

Figure 2 Recovery error due to inadequate permissions

This event id error explains that the Windows Failover cluster doesn't have the permissions necessary to update the CMS computer account in Active Directory. It also lists three possible reasons. Since we're recovering an existing CMS on a standby cluster, we can ignore the first one. Since we haven't reached any quotas for the number of computer objects, we can ignore number two as well. The last item, however, is quite interesting. It tells us to verify that the Windows Failover cluster on which we recover the CMS has "Full Control" permissions to the CMS computer account object.

A look under the Security tab on the property page of the CMS computer object in the Active Directory Users and Computers reveals that the standby cluster does not have "Full Control" permissions (Figure 3).

fig03.gif

Figure 3 The standby cluster does not have “Full Control” permissions

Adding the standby cluster with "Full Control" permissions to the CMS computer object resolved the issue for me and it should do the same in your environment.

At the time of this writing (the end of February), there's no information about this issue at public places like TechNet or in any KnowledgeBase articles. However, my good friend Tim McMichael from Microsoft Customer Support Services has written a blog post on this topic that goes into far more detail than I'm able to do here. So please go check out Tim's blog for more information ("Permissions recommended for the CNO (Cluster Name Object) in Windows 2008 for Exchange 2007 SP1 setup operations.").

QWe're currently in the process of crafting a site-level failover solution. For our Exchange 2007 SP1-based messaging infrastructure, we're going to use standby continuous replication (SCR) as the disaster recovery solution between our primary and backup datacenter. Since only some of our end-users have been upgraded to Office Outlook 2007 with the rest still on Outlook 2003, we've got a question. When a failover of the Exchange 2007 SP1 servers occurs from the primary datacenter to the backup datacenter, will both Outlook versions simply pickup the changes after performing the required SCR site failover steps?

AVery good question and, actually, the answer depends on whether you're using RecoverCMS or Database portability to failover your mailbox servers to the backup datacenter. If you have standalone Mailbox servers in the primary datacenter and replicate these to standalone Mailbox servers in the backup datacenter using SCR, then you would use database portability in order to failover the Mailbo x databases. If you have single copy cluster (SCC) or CCR Mailbox servers in your primary backup datacenter and standby clusters in your backup datacenter, you would use the RecoverCMS switch to recover the whole CMS to the backup datacenter. When using RecoverCMS as the failover mechanism, you typically don't need to worry about Outlook client connectivity after the failover. Do bear in mind that the IP address of the CMS will change. But if you have configured the DNS Time to Live (TTL) value to five minutes according to best practice recommendations, note that there will be a slight delay before the Outlook clients will be able to reconnect to the CMS.

If you're using database portability as the recovery mechanism, the situation is a bit different, depending on the Outlook client version. Outlook 2007 clients will reflect the changes automatically via the Autodiscover service that runs on the Client Access servers. This means you don't have to do any manual changes for this Outlook version. However, that's not necessarily the case with Outlook 2003 clients. When a mailbox has been recovered on another server, the name of the server storing the Mailbox database(s) will obviously be different.

You might wonder, does this matter when you use the Move-Mailbox cmdlet with the –ConfigurationOnly switch after the failover? Yes, it still matters because Outlook 2003 doesn't support the Autodiscover service. This means that the original server where the Mailboxes were stored before the failover must be online so that the server name in the Outlook MAPI profile can be updated. If the original server is offline, the server name can't be updated automatically.

So, if you're facing a disaster where all servers in your primary datacenter are offline, you must reconfigure the Outlook 2003 MAPI profiles using a tool such as the Microsoft Exchange Server Profile Redirector (ExProfre) in combination with a login script to reflect the changes. It's worth noting that if all your clients were located in the primary datacenter, you would need to rebuild them anyway.

QIn our Exchange 2007 SP1-based messaging infrastructure, all our Mailbox servers are cluster continuous replication (CCR)-enabled. We have installed four network interface cards (NICs) in each cluster node. Two NICs have been teamed and are connected to the public network, which accepts Outlook client requests and so forth. The third NIC is used for the heartbeat network between the two cluster nodes in the CCR. The fourth NIC is there specifically for log shipping purposes. Using the Enable-ContinuousReplicationHostName cmdlet introduced in Exchange 2007 SP1, we have (in order to achieve log shipping redundancy) specified that both the heartbeat and the dedicated log shipping network can be used to ship logfiles from the active to the passive node. This works great and really reduces the traffic on the public network, especially in situations where a reseed of one or more Mailbox databases are required (though this should be pretty rare).

We also have SCR enabled between these CCR-based Mailbox servers and multiple SCR targets in our backup datacenter. This leads to our question. Is it possible to use the Enable-ContinuousReplicationHostName cmdlet with SCR?

AI'm glad that the EnableContinuousReplicationName cmdlet has been helpful to you. However, since this cmdlet was specifically created for CCR solutions, the answer to your question is, unfortunately, no, currently this is not supported in an SCR solution.

QWe have just transitioned from Exchange 2003 to Exchange 2007 SP1. All Exchange 2007 SP1 server roles are running on Window Server 2008 and our Exchange 2007 Mailbox servers are based on CCR.

Things work very well so far but we have observed an issue with the Offline Address Book (OAB). When it's updated with new mail objects, the updates aren't reflected in Outlook 2007 at the end users. We have been troubleshooting the issue and have found Event ID 1021 in the Application log on the Client Access Servers with the following description:

Process MSExchangeFDS.exe (PID=xxxx). Could not find directory <OAB share location> This is normal if the directory has never been generated. Otherwise, make sure this directory and share has read permission for the "Exchange Servers" group.

We have tried to copy the OAB manually from the CCR-based Mailbox server where it is hosted to the Client Access Server. This results in updates in Outlook, but we would like to get the issue fixed permanently. Do you have the recipe?

AI've been down that road, too. The reason for this problem is because of the way Windows 2008 Failover Clusters behave. Windows 2008 Failover Clusters introduces a new concept called shared scoping. Basically, shared scoping means that a file share is specific to either the node name or to one of the cluster name objects that the share hosts. When a share is shared by the node name, it cannot be accessed by the Clustered Mailbox Server (CMS) name. For more geeky details about file share scoping, see this post on the Ask the Core Team blog ("File Share 'Scoping' in Windows Server 2008 Failover Clusters").

To resolve the issue, you need to install Exchange 2007 SP1 Rollup Update 5 or later, which includes the required bug fix. Also see the article "Exchange 2007 CAS cannot copy the OAB from the OAB share on Windows Server 2008-based Exchange 2007 CCR clusters." Because this Rollup Update brings some regressions with it, it's important you read the Rollup Update 5 KB article closely before using this solution.

Henrik Walther is a Microsoft Certified Master: Exchange 2007 and Exchange MVP with more than 15 years of experience in the IT business. He works as a Technology Architect for Trifork Infrastructure Consulting (a Microsoft Gold partner based in Denmark) and as a Technical writer for Biblioso Corporation (a US based company that specializes in managed documentation and localization services).