Microsoft Information Technology (Microsoft
IT) uses failover clustering in the Windows Server® 2008 operating system
to support users worldwide. Microsoft IT found the solution easy to plan and deploy,
especially because of built-in migration tools. The result is a set of Windows Server 2008
clusters that support more users through increased reliability and features.
Situation
The Microsoft IT division supports the daily IT operations of a large global corporation
that has demands similar to those of many other organizations of the same size.
These demands include the requirement of network, servers, and applications to be
available with a very small amount of time set aside for maintenance. Supporting
a global infrastructure means that administrators work within limited maintenance
windows, with the infrastructure in use around the clock in many locations worldwide.
Failover clustering in Windows Server 2008 provides the ability to meet these
demands while improving on previous cluster technologies.
Within Microsoft IT, several groups support the daily operations of Microsoft Corporation
worldwide. One key group is the File Services Utility (FSU) group, which manages
resources and provides services from four sites worldwide. With Windows Server 2008
clustered file services operating from Redmond, Dublin, Singapore, and Kawaguchi,
FSU provides thousands of worldwide users access to approximately 200 terabytes
of data. This data includes all types of file share resources, which typically range
in size from 20 gigabytes (GB) to 300 GB.
Each FSU site consists of several Windows Server 2008 cluster nodes that work
together to provide redundant access to files for each region. The regions are similar
in their structure and server composition to provide a straightforward method for
management of cluster nodes. One of the sites has a single pair of servers in an
active/passive configuration, as shown in Figure 1. In this configuration, each
server is online, but only one server actively supports requests. If a hardware
or software failure occurs on the active node, the passive node takes control and
resumes file services. This method provides for highly available file services. .jpg)
Figure 1. Active/passive cluster configuration
The other FSU sites use two active nodes that share a single passive node, also
known as an active/active/passive configuration, as shown in Figure 2. This design
provides more active resources to users, while a single failover node supports the
active nodes. This configuration helps to reduce the cost of server deployments,
but it also increases complexity through increased administrative overhead and design
requirements. .jpg)
Figure 2. Active/active/passive cluster configuration
Solution
Microsoft IT chose to use Windows Server 2008 failover clusters for global
file services for a number of reasons. As with most IT departments, the major considerations
for Microsoft IT are availability, flexibility, and performance; cost through procurement,
migration, and maintenance are also important. The Windows Server 2008 clusters
offer file services that provide a mission-critical service. These file services
are crucial to the jobs of thousands of users worldwide, necessitating the requirement
for availability—the file services must be available around the clock, every day
of the year. Migration is a major concern for many IT groups, with a focus on minimizing
impact to users and reducing risk when moving the users from one cluster technology
to another. Maintenance is also a key issue; ensuring that the system is designed
to be easily maintained helps to decrease maintenance efforts, and therefore the
total cost of ownership (TCO) for the file services.
Windows Server 2008 offers the ability to meet all of these challenges and
requirements. Some of the key benefits of Windows Server 2008 failover clusters
are: - Volumes under Windows Server 2008 clusters never stay in an unprotected
state. Clusters use SCSI-3 persistent reservations.
- The Cluster service no longer requires a domain user account to run
password updates and associated account maintenance.
- The node names, cluster name, and network names are Active Directory®
Objects.
- Kerberos-only authentication is the default authentication mechanism.
- Full support is included for Volume Shadow Copy Service (VSS) for
easier backups.
- Windows Server 2008 clusters use SCSI-3 persistent reservations
but respect cluster disk signatures that Windows Server 2003 clusters use,
enabling easier and safer migrations from previous cluster releases.
The migration to Windows Server 2008 clusters involved managing key areas that
are essential to maintaining service. These key areas help to ensure that the process,
software, server hardware, and storage hardware are migrated safely. The key areas
are: - Protection of disk resources during migration
- Availability of service during migration
- Clearly defined and reliable rollback plan by keeping one or more
Windows Server 2003 cluster nodes in reserve
- Clearly communicated and managed migration process
- Scheduled migrations during hours of least impact to users
The deployment of Windows Server 2008 clusters for file services gives users
distributed access to multiple file shares through clusters with a Distributed File
System (DFS) namespace. The deployment of clusters in Redmond, Dublin, Singapore,
and Kawaguchi provides services globally, around the clock. Having global, centralized
file services requires an approach that is demanding for normal computing environments
and places emphasis on storage, networking, server, and application configurations.
Microsoft IT upgraded the FSU sites from Windows Server 2003 clusters one at
a time, and it easily applied lessons learned from the early deployments to the
successive migrations. Microsoft IT also reinforced existing best practices during
the process.
Multiple vendors support all FSU clusters through Fibre Channel connectivity to
both traditional Fibre Channel storage and storage area network (SAN) storage. Each
site includes several storage arrays that the FSU team presents through the cluster
nodes through Fibre Channel switches from multiple manufacturers. It is notable
that Microsoft IT did not require any new storage capacity or storage switch technology
for the upgrade and deployment to the Windows Server 2008 clusters. Each server
takes advantage of Multipath Input/Output (MPIO) for multipath access to the storage
arrays. Although Windows Server 2008 fully supports 4-GB Fibre Channel cards,
this deployment primarily used 2-GB cards because all servers were re-used from
the previous Windows Server 2003 clusters.
The Windows Server 2008 clusters use built-in Gigabit Ethernet ports for user
access and private network connections for cluster communications. The result is
a cluster platform that, in Microsoft IT's experience, was simple to manage and
provision and did not require complex networking configuration. It should be noted,
however, that as part of the migration to Windows Server 2008 clusters, Microsoft
IT used a phased in-place upgrade process that re-used existing physical network
connections in addition to existing IP and namespace resources.
Windows Server 2008 clustering allows file services to be provided to disjoined
network namespaces, which means that DFS replicas can be serviced on hosts that
are in a different domain. This allows the file service clusters to provide root-level
shares to DFS, which can then provide file-level and folder-level access to users
in different domains. This is a key advantage for large, global deployments that
span multiple Active Directory domains, and it enables users to access the resources
that are based in one domain while administrators perform server management from
a different domain. This key benefit is valuable for Windows Server 2008 clusters
that the FSU group deploys.
Reasons for Choosing Windows Server 2008
The migration to Windows Server 2008 did not require the purchase of any new
server, network, or Fibre Channel infrastructure, because of the increased performance
and extensive driver support in Windows Server 2008. Microsoft IT migrated
clustered file servers to Windows Server 2008 one FSU site at a time around
the world. An in-place upgrade helped to speed the migration process by removing
the need to procure and deliver new hardware worldwide.
Windows Server 2008 clustering improves on the high-availability capabilities
of Windows Server 2003 clustering by adding support for additional features
such as dynamic hardware partitioning for certain Windows Server 2008 approved
hardware platforms. These features also help to make use of advanced hardware for
mission-critical applications.
The FSU group typically does not handle server applications with extremely high
disk demands, but users worldwide require performance in an equally demanding way.
Users of the file share services expect the same high-performance access to their
files and documents whenever they need them. Although this performance is not typically
measured in disk I/O metrics, users and enterprise applications are equally demanding
of file resources. The Windows Server 2008 clusters enable DFS to serve multiple
namespaces and domains, enabling Microsoft IT to provide highly available file services
regardless of internal namespace.
Servers have increased in power and reliability in recent years, providing more
features for redundancy, such as multiple power supplies and large arrays of redundant
disks. Although these new features have reduced the amount of effort that administrators
require for certain kinds of maintenance, planned outages are still necessary for
software and firmware updates and routine hardware replacement. This deployment
focused on the re-use of existing server hardware; however, Windows Server 2008
provides support for dynamic hardware partitioning, which enables the replacement
of disk, processor, and random access memory (RAM) without the need to take the
server offline. Windows Server 2008 also places an emphasis on reducing the
installation footprint for the operating system, installing only the components
required for its role. The result is increased performance and reduced maintenance
costs for servers on the most basic hardware platforms.
The FSU group uses failover clusters to reduce the downtime that administrators
require for server maintenance. In the failover cluster model, one node provides
clustered services such as file, print, messaging, or database services, while at
least one additional node waits in a passive state to take over if a failure occurs.
The process has been highly effective against server failures with past releases
of Windows Server, and Windows Server 2008 continues to use these proven methods
while providing new features and optimizations. One key benefit to failover clustering
is that administrators can take the passive node offline momentarily for updates
without affecting users. In this approach, administrators take the passive node
offline and update it with new drivers or application updates. That server is not
currently serving users, and it can be safely restarted.
Figure 3 illustrates the update process for the active/passive configuration. .jpg)
Figure 3. Update process for active/passive configuration
After the passive node returns from being restarted, administrators can check it
for errors and return it to service in the cluster group. During a maintenance window,
the next step is to move the cluster resources from the current active node to the
current passive node. In this step, the roles are reversed, with the typical standby
node now hosting active connections to cluster resources. The downtime for file
share resources is measured in seconds, and many users are not even aware that the
change has occurred.
After administrators complete this process, they can update the remaining server
without affecting users, because the server is now the passive node. After administrators
complete all updates, they can reverse the server process to move cluster resources
back to the assigned node. Although this step is not required, it is considered
a best practice to help clearly define the active and passive nodes during normal
production hours. The move of cluster resources back to the original active node
still takes only a matter of seconds.
The end result is that downtime for the users is measured in only the time required
to move cluster resources from one node to the other, which enables administrators
to take in-depth and highly detailed efforts for server upgrades and updates without
having to affect the users or uptime metrics. This approach also provides a key
opportunity for performing in-place upgrades—Microsoft IT this phased in-place update
process for the deployment of the Windows Server 2008 failover clusters.
Windows Server 2008 cluster nodes cannot coexist in the same cluster with Windows
Server 2003 nodes, so an organization must use a migration approach that focuses
on moving cluster resources rather than a prolonged coexistence. Windows Server 2008
clusters support 16 nodes, an increase beyond the previous maximum of eight nodes
under Windows Server 2003. The clusters that Microsoft IT deployed did not
require additional nodes. Furthermore, an in-place upgrade approach proved to be
highly effective and efficient, and it provided a safe and reliable path for rollback
to Windows Server 2003 clusters in the event of a problem.
Microsoft IT's general approach was to first move all cluster resources to a single
node. Microsoft IT completed this move in a matter of seconds, and users did not
notice the change. For example, on a Windows Server 2003 cluster with two active
nodes and a single passive node, Microsoft IT moved all resources from one active
node to the other. This momentarily created an active/passive/passive configuration.
After Microsoft successfully moved all resources to a single active node, it shut
down the two passive nodes. To reduce the risk of any changes to shared Fibre Channel
storage, Microsoft IT masked the Fibre Channel storage on the two shut-down nodes,
preventing them from accessing any of the cluster's shared storage. Then, the team
installed the two shut-down nodes with Windows Server 2008 Enterprise and configured
them as a new active/passive cluster.
As mentioned earlier, Windows Server 2008 and Windows Server 2003 clusters
cannot participate in the same cluster. This means that after Microsoft IT built
the Windows Server 2008 cluster, it unmasked Fibre Channel storage to the new
nodes and ran the Cluster Validation tests. After Microsoft IT confirmed that the
new nodes could access all storage and ran the full Cluster Validation tests successfully,
it migrated the Windows Server 2003 cluster resources and applications through
the wizard on the Windows Server 2008 nodes.
After Microsoft IT migrated all resources to the new Windows Server 2008 active/passive
cluster, it shut down and migrated the remaining Windows Server 2003 cluster
just as it did for the other two nodes, or it retained the nodes until administrators
were comfortable that rollback was unnecessary. It is important to note that if
Microsoft IT had encountered an error at any time during the migration, it could
have easily reverted resources to the Windows Server 2003 cluster. The clear
ability to roll back to the previous environment enables a confident approach to
migrating critical company file shares.
Figure 4 illustrates the failover process for the active/passive configuration. .jpg)
Figure 4. Failover process for active/passive configuration
There are two approaches to migrating clusters from previous releases to Windows
Server 2008. Previously, the only viable approach for the demanding environments
that clusters serve was to build a complete new set of clustered servers and to
provision new storage. This approach required the procurement of additional hardware,
even if the current hardware was satisfactory for use. In addition, limited options
existed for migrating resources, so administrators had to manually re-create each
cluster resource on the new servers. Although this approach is still valid, and
many customers continue to use this approach as an opportunity to consolidate or
refine what services are offered on Windows Server clusters, Microsoft IT was able
to use an in-place upgrade that did not require the purchase of new hardware or
storage. The migration also enabled complete regression back to Windows Server 2003
clusters, although Microsoft IT never needed this regression.
As mentioned earlier, Windows Server 2008 clusters have the capability to provide
services to multiple namespaces at the same time, which further extends the business
value to multiple business groups. In addition, DFS namespaces can be exported,
deleted, and re-imported, when all domain controllers are running Windows Server 2008
and Active Directory is running in Windows Server 2008 functional mode. This
capability provides for future scalability and supports Access-Based Enumeration
(ABE), which enables users to view only shares that they have access to.
Management of the Microsoft cluster resources worldwide is very efficient; only
three operations staff manage all of the clusters and the provisioning process for
file share requests from users and groups. These three administrators not only handle
daily permissions and new provisioning requests, but also handle all server maintenance
issues. Through the use of Windows Server 2008 clusters, they can perform updates
with little to no impact to users, by means of proven cluster update methodologies.
The relatively small management team manages servers in a highly distributed environment,
with servers and storage in the United States, Ireland, Singapore, and Japan. Even
in a large infrastructure, remote management of resources requires the server applications
to be robust and flexible. The management console for Windows Server 2008 clusters
is an excellent fit for these requirements; it enables a small team of administrators
to remotely manage many servers worldwide.
The management interface for Windows Server 2008 clusters employs many additional
features, from new cluster capabilities to using the Microsoft Management Console
(MMC) version 3.0 interface. This integrated interface enables not only management
of cluster groups and resources, but also backups by means of VSS and management
of DFS. This capability enables administrators to manage a cluster and related resources
in a single console rather than opening multiple consoles for different components,
as was previously needed. In addition, the administrators can easily access cluster
dependency reports and cluster events from the MMC. This greatly improved interface
offers a more robust and detailed view for daily cluster administration.
Lessons Learned
Through the deployment of Windows Server 2008 failover clusters for the Microsoft
data centers worldwide, Microsoft IT learned several key lessons. Active/Passive
Pairing
Windows Server 2008 supports a configuration with multiple active nodes and
a single passive node. However, the Microsoft IT discovered that migrations required
far less planning and effort overall when single active/passive pairs—consisting
of a single active node and single passive node—are deployed. The later cluster
deployments reflect this finding.
Although use of this configuration is a key benefit in terms of simplicity and support,
there were no drawbacks to running previous configurations, and Microsoft IT left
these configurations in place. In addition, the use of active/passive pairs provides
for scale by using a common, repeatable introduction of the standard configuration
to grow as needed. Migration
Wizard
The migration wizard for Windows Server 2008 clustering proved to be essential
to the smooth migration from Windows Server 2003 clusters to Windows Server 2008.
This wizard was the only migration tool that Microsoft IT used for all FSU cluster
servers for the migration worldwide, and it proved to be very effective.
One limitation of the cluster migration wizard is that it is unable to migrate resources
that are in the core cluster group (Cluster Name, Cluster IP, and Quorum disk) or
that have dependencies on the cluster resource name. Because deploying cluster applications
that are configured with dependencies on cluster group resources is not a best practice,
the migration team accepted this limitation.
Before an organization uses the migration wizard, it should resolve dependencies
on cluster resources so that the application resources do not rely on core cluster
group resources such as cluster network name. In addition, the core cluster group
should contain only resources that the cluster nodes themselves use, and any other
resources that might have been placed in the cluster resource group must be reconfigured
prior to migration.
Migration Technique
The migration technique that the migration team used for the migration from Windows
Server 2003 to Windows Server 2008 proved to be valuable for the team.
This technique enabled the in-place upgrade of Windows Server 2003 cluster
nodes to Windows Server 2008.
For this process, the migration team moved all resources in the cluster group to
a single server. Depending on the environment, this relocation left one or more
servers in a passive state. The team then shut down these passive nodes and masked
the shared storage from the node's Fibre Channel ports. The team then installed
the servers with Windows Server 2008 and configured them as a new cluster group.
After the team completed the server builds, it re-presented (unmasked) the storage
to the new cluster, and it used the migration wizard. This approach enabled a full
rollback to Windows Server 2003 if needed, because the team took the last Windows
Server 2003 node offline only after it successfully migrated all cluster resources
to the Windows Server 2008 cluster.
Storage
Signatures
Windows Server 2003 uses different methods for labeling shared storage that
clustered resources use. Windows Server 2003 uses cluster disk signatures,
whereas Windows Server 2008 uses SCSI-3 persistent reservations. This approach
means that Windows Server 2008 clusters do not interfere with Windows Server 2003
signatures when they are imported to a Windows Server 2008 cluster. Therefore,
the disk signatures are at very little risk of change if a need arises to revert
to a Windows Server 2003 cluster during a migration.
Benefits Windows Server 2008 clustering offers many of the same
services as previous releases, but the key differentiator for Microsoft IT was the
ability to provide services to multiple namespaces. This feature has enabled Microsoft
IT to extend services and resources to groups that were previously unable to use
the clusters under Windows Server 2003. As a result, the Windows Server 2008
clusters can serve more users and lower the TCO for the organization. Using centralized,
multi-domain shares along with Windows Server 2008 clusters also helps to further
centralize server management, reducing the number of resources that are required
for monitoring and daily administration.
Windows Server 2008 clusters also provide numerous additional features that
extend beyond the file services that FSU provides, so that other groups can use
the same benefits.
Best Practices
An organization can use several best practices for deploying Windows Server 2008
clusters. The following practices can help to ease migration efforts and also to
maintain a fully supported Windows Server 2008 cluster: - Storage Masking
As part of the migration from Windows Server 2003 clusters to Windows Server 2008
clusters, an organization should mask the shared storage from the Windows Server 2008–based
servers during setup and initial configuration. This practice prevents an administrator
from accidentally installing the operating system on a cluster disk resource, or
even accessing it and creating a disk lock. Unmasking can easily occur on the storage
array or with zoning at the Fibre Channel switch. An administrator must unmask storage
before it runs the Windows Server 2008 cluster validation wizard and before
it uses the Windows Server 2008 cluster migration wizard to migrate resources
from the Windows Server 2003 cluster node. - Resource Dependencies
A key component to a successful cluster is clearly mapped and properly configured
resource dependencies on Windows Server 2003 prior to migration. When correctly
defined, dependencies provide the cluster with key information about the order in
which resources should be brought online and on what resource failure requires attention
or failover. - Phased Migrations
The approach of phased migrations incorporates incremental upgrades or additions
to the environment, rather than moving all resources at once. The migration team
at Microsoft benefited from phasing each site's file server cluster to Windows Server 2008
one data center at a time. Through this approach, the migration team learned valuable
lessons about migration techniques and applied them to the next site, thereby providing
a more gradual and informed migration approach.
Conclusion
Microsoft IT's FSU group migrated to Windows Server 2008 clusters, meeting
the group's key business requirements. The group met the requirement of increased
availability through a more robust cluster solution that benefits from new features
and optimizations. The business also places a high value on performance, which Windows
Server 2008 offers through support for larger file systems, larger amounts
of physical memory, more cluster nodes, and improved network performance via Server
Message Block (SMB) version 2.0. Finally, the ability of Windows Server 2008
to support multiple namespaces addresses the issue of TCO by extending cluster services
to a larger group of users and customers than was possible with the previous solution.
The result is 20 servers worldwide that provide almost 200 terabytes of data to
thousands of users while maintaining a highly efficient and virtually transparent
upgrade path from previous releases.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information via the World Wide Web, visit
any of the following sites: http://www.microsoft.com http://www.microsoft.com/technet/itshowcase http://www.microsoft.com/windowsserver2008
© 2008 Microsoft Corporation. All rights reserved.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Active Directory, and Windows Server
are either registered trademarks or trademarks of Microsoft Corporation in the United
States and/or other countries. The names of actual companies and products mentioned
herein may be the trademarks of their respective owners.
|