When migrating from one group to another, updates aren’t always automatic, nor are they always desired.
Q. We’re planning to deploy Exchange Server 2010. Company policy states that all new servers must be virtual machines (VMs) running under Hyper-V. In order to increase application and service availability, all Hyper-V root servers participate in Hyper-V Failover Clusters.
We recently saw this blog post on the Microsoft Exchange Team blog announcing enhanced hardware virtualization support for Exchange 2010. We especially found the following statement interesting:
“Combining Exchange 2010 high availability solutions (database availability groups [DAGs]) with hypervisor-based clustering, high availability, or migration solutions that will move or automatically failover mailbox servers that are members of a DAG between clustered root servers, is now supported.”
This is good news for us, as it means we now can deploy Exchange 2010 Mailbox servers participating in a DAG as VMs on a Hyper-V Failover Cluster. Is there anything else we should be aware of in doing this?
A. You certainly timed your Exchange 2010 deployment well. You’re absolutely correct that Microsoft now supports combining DAGs with host-based failover clustering and migration technology. This not only applies to Hyper-V, but any hardware virtualization vendor participating in the Windows Server Virtualization Validation Program (SVVP).
However, there are a couple of important points to keep in mind if you plan on live migrating DAG members. You need Exchange 2010 SP1 or later for this to be supported. It’s also highly recommended to use cluster shared volumes and not pass-through disks. When the Exchange Product group and Hyper-V team tested each method, they noticed the offline time associated with moving the storage resources was cut in half by using cluster shared volumes.
If the server offline time exceeds five seconds, the DAG node will be evicted from the cluster. So it’s important to make sure that Hyper-V Failover Cluster can migrate resources in less than five seconds. If it can’t, adjust the cluster heartbeat timeout to a higher value. You shouldn’t set it higher than 10 seconds, though.
You should also make sure you’ve applied the latest Hyper-V-related patches to the Hyper-V host servers. Regarding the live migration network, make sure jumbo frames are enabled and the switches involved support jumbo frames. It’s also recommended that you change the receive buffer on each Hyper-V host to 8192. Also, you should deploy as much bandwidth as possible for the live migration network (5GB or greater is preferable).
It’s important to configure each VM such that they save and restore state on disk when moved or taken offline. Any failover must cause a cold boot when the target node is running a VM. A planned migration process must also include a shutdown and cold boot. The cluster-controlled offline action is set to “Save State” by default. Instead, it must be set to “Shut down” (see Figure 1) so that the server cold boots when live-migrated from one Hyper-V node to another.
Figure 1 Cluster-controlled offline action set to “Shut down.”
The recently released “ Best Practices for Virtualizing Exchange Server 2010 with Windows Server 2008 R2 Hyper-V” white paper includes a lot of good information about Exchange 2010 virtualization using Hyper-V. And, last but not least, the Exchange 2010 documentation on TechNet has been updated to reflect this new support stance.
Q. We’ve deployed Exchange 2010, and have a total of 12 Client Access Server (CAS) roles. We use an active/passive user distribution datacenter model. This means we have a primary datacenter and a failover datacenter. There are six CAS roles in each datacenter. We use a hardware-based load balancer solution to load balance client access traffic in each datacenter. We have approximately 50,000 users, with most connecting to their mailbox using internal Outlook 2007/2010 clients.
We recently saw this blog post on the Exchange Team Blog, which recommends all customers enable Kerberos authentication for internal Outlook clients and explains why. Because of this recommendation, we’re planning to enable Kerberos authentication for the internal Outlook clients in our organization.
Although we’ve read the blog post in detail and checked out the relevant Exchange 2010 documentation on TechNet, we’re still a little in doubt when it comes to which fully qualified domain names (FQDNs) we need to register as service principal names (SPNs). What about the auto-discover FQDN? Would we need to register this FQDN as well?
A. I can understand why you’re a little confused about the auto-discover FQDN. Most customers have an Autodiscover FQDN usually named autodiscover.(companyname).com. This FQDN is for external clients (mainly Outlook Anywhere and Exchange ActiveSync devices) for automatic profile creation.
Several Outlook 2007/2010 features rely on the Autodiscover service (Out of Office Assistant [OOF], Offline Address Book [OAB] and Unified Messaging [UM]). Internal Outlook 2007/2010 clients don’t use this Autodiscover FQDN, though. They look up the service connection point in Active Directory using the internal Autodiscover service Uniform Resource Identifier (URI), which, by default, points to the CAS FQDN (see Figure 2).
When you use a load-balancer solution to distribute client traffic among CASs in a CAS array, you usually set the internal Autodiscover service URI to point to the virtual IP address (VIP) associated with the respective virtual service on the load balancer. Some customers use a dedicated FQDN for this. Others just use the same FQDN used for Outlook Web App (OWA), Exchange Control Panel (ECP), OAB and Exchange Web Services (EWS).
Figure 2: FQDN for the internal Autodiscover service URI.
Q. We have a site-resilient Exchange 2010 solution with both a primary and failover datacenter. We recently did a planned datacenter switchover to verify the steps in our datacenter disaster recovery plan. During the switchover, we noticed something. Although internal Outlook clients could connect just fine after having changed the DNS record for the CAS array in the primary datacenter so it pointed to the CAS array in the failover datacenter, the connection endpoint was never updated on the clients.
When the CAS array is unavailable in the primary datacenter, and the DNS record for this CAS array is updated to point to the CAS array in the failover datacenter, isn’t it expected behavior that the connection endpoint is updated in the Outlook client as well?
A. What you see is actually the expected behavior. The connection endpoint will not/should not be updated when you change the DNS record for the CAS array in the primary datacenter to point to the IP address associated with the CAS array in the failover datacenter. As you saw yourself, the Outlook clients will connect just fine as long as the CAS array name resolves to reachable IP address.
This behavior makes things less painful when doing the datacenter switchover. You only need to take care of DNS replication—there’s no need to worry about whether an Outlook client had its profile updated.
Q. We have multiple Active Directory sites, all with Exchange 2010 SP1 servers. Each site has three CAS roles. In order to distribute client traffic between the three CAS roles at each site, we created CAS arrays and DAGs to provide mailbox resiliency.
Sometimes, a user permanently moves from one physical site to another. In this event, we move the user’s mailbox to a mailbox database in the new site. Because we move the user mailbox from a mailbox database with an RPC Client Access Server value different from the value of the target database, we’d expect the user’s Outlook profile would be updated to reflect the RPC CAS value of the target mailbox database (see Figure 3).
Figure 3: RPC Client Access Server value for a mailbox database.
We don’t have any Outlook 2003 clients, only 2007 and 2010. So far, we’ve only been able to have the Outlook profile updated by running a profile repair.
A. I understand how you’d expect the Outlook profile to automatically update during a cross-site mailbox move from a mailbox database with one RPC CAS value to a target mailbox database with another RPC CAS value. Actually, though, what you’re experiencing is the expected behavior.
The reason for this behavior has to do with the fact that the source CAS (array) determines which mailbox it should access based on Active Directory properties. If Active Directory is up-to-date, it will never talk to the wrong mailbox database and never receive the ecWrongServer response, which is required in order to trigger an Outlook profile update. The Exchange Product group is fully aware of this far-from-ideal behavior. There’s no word yet on when—or even if—this will be fixed.
Q. We’re trying to set up federation between our Exchange 2010 group and another. I seem to recall that you need to use a trusted certificate from a third-party certificate authority (CA) to set up federation trusts between Exchange 2010 organizations. I wanted to verify if this is true.
A. You probably read this before Exchange 2010 SP1 had been released. With Exchange 2010 RTM, you did require a trusted certificate from a third-party CA to get a federation trust working between two Exchange 2010 organizations.
However, this changed with the release of Exchange 2010 SP1. With Exchange 2010 SP1, you can use a self-signed certificate or a certificate issued from an internal PKI. Actually, the “New Federation Trust” wizard automatically creates and installs a self-signed certificate specifically for federation trust purposes (see Figure 4).
Figure 4 The new self-signed certificate created by the “New Federation Trust” wizard.
Henrik Waltheris a Microsoft Certified Master: Exchange 2007 and Exchange MVP with more than 15 years experience in the IT business. He works as a technology architect for Timengo Consulting (a Microsoft Gold Certified Partner in Denmark) and as a technical writer for Biblioso Corp. (a U.S.-based company specializing in managed documentation and localization services).