Export (0) Print
Expand All

Windows Server 2008 File Server Performance at Microsoft IT

Technical Case Study

Published: July 2009

Microsoft Information Technology (Microsoft IT) deployed the Windows Server® 2008 operating system to support the file server workload for 50,000 users worldwide. This paper details the performance and operational improvements realized from the upgrade.

Download

Download Technical Case Study, 582 KB, Microsoft Word file

Situation

Solution

Benefits

Products & Technologies

Microsoft has established a File Services Utility (FSU) that provides high-performance, highly centralized file services to users worldwide. Microsoft wanted to extend the reach of the service, improve performance and availability, and reduce the cost of the service by further minimizing operational complexity.

Microsoft upgraded its existing Windows Server 2003 clusters supporting the FSU to Windows Server 2008. The solution minimizes operational complexity and cost while offering higher levels of service to end users worldwide. The migration was quick and simple, and required no new hardware.

The solution allows for further consolidation of local data servers into FSU clusters and a reduction in backup infrastructure. Included in the migration is an upgrade in the file transfer protocol from SMB 1.0 to SMB 2.0, which resulted in measurable performance improvements. The simple design enables a small group of administrators to manage the service remotely.

  • A near doubling of server performance and reduction in total number of servers
  • Higher performance for file transfers across WAN links
  • Faster end-user response time for browsing
  • Faster recovery of failed clustered file share resources, including faster chkdsk performance
  • Reduced operational and backup infrastructure costs
  • Minimal upgrade and migration effort from Windows Server 2003 clusters via Windows Server 2008 migration wizards
  • Windows Server 2008 Enterprise
  • Windows Server 2008 failover clustering
  • Distributed File System services
  • Server Message Block 2.0 protocol
  • File Server Resource Manager
  • Microsoft System Center Data Protection Manager 2007
  • GPT disks

Situation

The Microsoft IT division supports the daily IT operations of a large global corporation that has demands similar to those of many other organizations of the same size. These demands include the requirement to provide services for file storage and sharing for more than 130,000 users and 340,000 computers in hundreds of locations worldwide. File services are a crucial component to the daily business activities of Microsoft, and users demand an enterprise class of service. The required service performance and availability levels can be provided in a cost-effective manner with a solution built on Windows Server technologies.

For several years, Microsoft has provided enterprise-class, centralized, utility-style file services for corporate users based on Windows Server technologies. The utility service is now called the File Services Utility (FSU) and consists of 10 two-node server clusters strategically located to provide high-performance file services to users worldwide. The FSU started as a single, seven-node server cluster that supported users in the largest domain for the Microsoft corporate headquarters The original service was named Clustered File Services (CFS) and was offered as a free service to users, because no mechanism was in place to charge back the cost of the services based on actual usage.

The CFS service evolved by consolidating 94 stand-alone file servers running legacy software and hardware into a seven-node cluster on Windows Server 2003.This has since been further consolidated to six two-node clusters and has become part of the FSU. There are two-node FSU clustered server installations in Redmond, Dublin, Singapore, and Kawaguchi. The four FSU locations currently provide more than 50,000 users worldwide access to more than 100 terabytes of data combined. This data includes all sizes of file share resources, which as a legacy ranged in size between 20 GB to 300 GB. Each FSU site consists of multiple clusters that provide redundant, highly available access to file share resources in each region. The server clusters across and within the geographic sites are similar in their structure and server composition to provide a consistent method for management and operation of the utility.

Historically, to achieve a complete backup the total drive partition size was limited to 300GB and could contain one or more shares However, since leveraging Microsoft System Center Data Protection Manager 2007 (DPM), Microsoft has benefitted by a reduction in backup time. As a result, Microsoft now implements 500GB partitions as a general standard with additional 1TB partitions when appropriate or as need dictates. Microsoft also leverages the latest partition style or drive type of GUID Partition Table (GPT) which allows partition sizes beyond 2TB. In the past, FSU was concerned about implementing volumes greater than 2 TBs. MSIT is working directly with the Microsoft Windows product group to provide data and help improve Chkdsk performance. The time required for chkdsk to complete depends on the number of files on the volume, not the size of the volume. In Windows Server 2008 R2, there are performance improvements to chkdsk that decrease the amount of time required for chkdsk to complete by 50%, on average, and up to 80%, depending on the number of files on the system and amount of system RAM. Based on the performance improvements made to Chkdsk for Windows Server 2008 R2, MSIT plans to increase the number of files stored on volumes.

DFS-Namespace is also used to present the file shares to the end users. Microsoft can now present multiple data partitions under the same logical share and thus overcome any limitations of partition size from a client perspective.

Solution

Although the original seven-node CFS cluster enabled Microsoft IT to consolidate dozens of stand-alone file servers into a single cluster, Microsoft IT continued to seek opportunities to streamline the day-to-day operational tasks. With the move from Windows Server 2003 to Windows Server 2008 and expansion of the FSU service to other data centers, a two-node active/passive cluster model was chosen to simplify operations and to mitigate impact if an outage occurs. The two-node cluster configuration consists of a single pair of servers in an active/passive configuration, as shown in Figure 1.

Two-node active/passive FSU cluster configuration

Figure 1. Two-node active/passive FSU cluster configuration

In this configuration, one server actively provides file services as the primary, or active, node. The second server is always online but in a standby state. In the event of a hardware or software failure on the active node, the passive node takes control and resumes file services, making the service highly available for that site. Each node in a cluster has potential access to all of the data partitions in the shared cluster storage, but only one node at a time actively mounts particular data partitions contained within a cluster resource group. The nodes are all connected by a dedicated private network called a heartbeat connection. If regular heartbeat communications for an active node are interrupted, the passive node will assume that the active node has failed. The passive node will then mount the data partitions on the shared storage to provide continued access to the resources to end users over the public network, by using the same virtual network identity.

All of the servers in the FSU clusters are standard hardware models similarly configured with the following specifications:

  • 64-bit CPUs
  • 8 GB of random access memory (RAM)
  • 52 terabytes of Fibre Channel storage

Table 1 depicts the current distribution of clusters and the users they service.

Table 1. Distribution of Clusters

Location

Number of clusters

Number of shares

Number of users

Terabytes of data

Redmond

6

2,816

38,000

28.98

Dublin

1

180

5,000

4.32

Singapore

1

50

2,000

1.83

Kawaguchi

1

80

5,000

1.27

The FSU uses Distributed File System Namespaces (DFS-N) to consolidate the namespace for all of the file shares across different domains. DFS-N technology enables FSU administrators to group shared folders located on different servers and present them to users as a virtual tree of folders known as a namespace. DFS-N was formerly known as Distributed File System in Microsoft® Windows® 2000 Server and Windows Server 2003. The DFS-N namespace servers are physically located in a single location in Redmond. The use of DFS-N has enabled the FSU to extend services and resources to groups that were previously unable to use the clustered file services. Using centralized, multi-domain shares along with Windows Server 2008 clusters also helps to further centralize server management, reducing the number of resources that are required for monitoring and daily administration. The FSU also utilizes a central form-based portal for file services requests that includes automated ticket generation to engage the provisioning process. That process includes creation and population of security groups specific to the file share that allow users to be granted either read or change access to the share.

Whereas the original CFS service was free, the FSU has evolved to take advantage of additional Windows Server technologies to allocate and track usage of resources so that users are charged fees for the service commensurate with their consumption of the service. Implementation of file share quotas limit the amount of space that a file share can consume, and File Server Resource Manager (FSRM) is used to provide monthly reports of actual usage. In addition to space usage, FSRM provides detailed reporting on file access frequency, largest files, data aging, duplicate files, number of files by owner, and quota usage. It also performs file screening audits to look for prohibited file types, such as .pst files.

Yet another Windows Server technology used by FSU, Shadow Copies of Shared Folders, enables end users to recover files from a previous state by using the Previous Versions client included with the Windows Vista® operating system. Many end users can perform these recovery tasks entirely on their own, while other users are assisted by first level help desk technicians, particularly if it is their first time using the feature. This drastically reduces the operational costs of file recovery.

Management of the FSU cluster resources worldwide is quite efficient, with only three operations staff managing all of the clusters and the provisioning process for file shares. These three administrators not only handle daily permission change and new provisioning requests, but also handle all server maintenance issues. A total of five personnel design, manage, and support the overall FSU worldwide service, as shown in Figure 2.

FSU organizational structure

Figure 2. FSU organizational structure

The FSU based on Windows Server 2003 long provided the Microsoft worldwide user base with a highly scalable, highly available, full-featured file services utility with a low total cost of ownership (TCO). However, Microsoft IT wanted to improve the performance and availability of the service even further, and to reduce the cost of the service by continuing to minimize operational complexity. New features of Windows Server 2008 provided ample opportunity to achieve those goals.

Microsoft IT upgraded its existing Windows Server 2003 clusters supporting the FSU to Windows Server 2008 to take advantage of significant improvements in file services. The solution minimizes operational complexity and cost while offering higher levels of service to end users worldwide. The migration was quick and simple, requiring no additional hardware.

Note: For more information about how Microsoft IT upgraded the FSU clusters, refer to the technical case study "Microsoft IT Deploys Windows Server 2008 Failover Clusters for File Services" and the TechNet webcast "How Microsoft IT Deploys Windows 2008 Clusters for File Services."

Benefits

Beyond a simple, low-risk migration path, the FSU team has recognized several important benefits in upgrading from 32-bit Windows Server 2003 to 64-bit Windows Server 2008. The most important outcome of these cumulative benefits is an improvement in service level for end users, including a faster, more reliable experience that can be extended to more users in distant locations. Those increased service levels, coupled with reductions in server count and improved overall server performance, makes for a powerful combination of benefits to Microsoft. Specific benefits include:

  • Improved end-user experience. Several improvements to the end-user experience have been observed, including much faster resumption of service upon cluster resource failover. Previously, resources could take several minutes to fail over, whereas the failover is so fast now as to be virtually undetectable by most users. Response time for browsing of file share resources has also improved noticeably since the upgrade.
  • Improved server performance with fewer servers. Large gains in server performance have been observed, including a reduction in average disk utilization from 70 percent to 30 percent, even after doubling the average amount of storage space served by a single server and increasing the user load by 150 percent. The most impressive example of increased server performance is exhibited by the original CFS cluster, which has gone from four active nodes before the upgrade to only one active node while supporting an even larger user base. The results of before and after performance testing on the original CFS cluster shows performance gains when migrating from Windows Server 2003 R2 with Service Pack 1 (SP1) to Windows Server 2008 with SP1. That migration resulted in a reduction of an average CPU utilization from 7.6 percent to 3.4 percent.

    With these gains in server performance, Microsoft IT has also been able to decommission dozens of additional legacy file servers without increasing the number of clusters. There is ample headroom available on the FSU clusters outside Redmond to allow for further reduction in stand-alone file servers in remote locations.

  • Faster network file transfer performance with Server Message Block (SMB) 2.0. Microsoft IT has observed between a 50 percent and 100 percent increase in file transfer performance since the upgrade with no negative impact on bandwidth consumption or server performance.
  • Improved backup performance. As the FSU team migrated FSU servers from traditional backup methods to Microsoft System Center Data Protection Manager 2007, the team observed drastically reduced resource consumption on the servers during backup activities, and reduced duration of backup activities. Also, the backups consume only a fraction of the pervious storage space required.
  • Simplified operational processes. The huge reduction in failover time benefits the FSU administrators because fewer users notice the failover, so fewer users contact the Helpdesk when a failover event occurs. The health of the overall cluster is noticeably better with Windows Server 2008 with no issues that require restarting an entire cluster. The improved user interfaces for the Cluster Microsoft Management Console (MMC) snap-in and Computer Management MMC snap-in enable the FSU administrators to work more efficiently and accurately. This means that the existing team will be able to continue to provide a high level of support for the FSU while expanding the size and reach of the service.

Future Design Improvements

Beyond the benefits already being realized from the FSU migration to Windows Server 2008, the FSU team plans to take further advantage of the technology to gain even more service improvements and operational efficiencies in the near future.

Further FSU Cluster Consolidation

Because of the significant increase in server performance with Windows Server 2008 and the improvements in SMB 2.0 file transfer performance over wide area network (WAN) links, the FSU team plans to further reduce the number of legacy file servers in several locations. Fourteen legacy file server clusters in Japan, Dublin, and Singapore will be consolidated to existing FSU clusters, without the need to add any new servers or hardware upgrades.

Windows Server 2008 R2

With Windows Server 2008 R2, the FSU will begin virtualizing the cluster nodes by using Hyper-V® technology, providing greater flexibility for making changes to the physical server hardware. Also in Windows Server 2008 R2, the new version of FSRM will provide the ability to create a file classification infrastructure based on metadata for files created with the Microsoft Office suite of products. FSRM will also provide automated enforcement of document retention policies.

Server Core Installation Option of Windows Server 2008

The Server Core installation option of Windows Server 2008 is expected to provide long-term operational benefits in addition to performance improvements that result from a decreased operating system footprint. A smaller operating system footprint with fewer moving parts should also result in a reduction in patching frequency and an even more stable, reliable platform. However, a Server Core installation will require changes in certain administrative processes that the FSU team currently performs, and such changes require careful planning, testing, and training prior to implementation.

Tiered FSU Service Level Offerings

Currently, Microsoft IT's FSU offers a single level of service for file services, and continues to support the users of the original free, legacy CSU service. However, there are varying business requirements for file services throughout Microsoft, and the FSU team plans to evolve to meet those needs by establishing different tiers of service. The first and least expensive tier will provide a basic level of file services with limited or no backup service. The second tier will be most similar to the current offering, including high availability and backup service. The third and most expensive service tier will provide even higher availability in the event of operational or natural disaster by using DFS Replication to duplicate data in a second data center location.

DFS Namespace Restructure

The FSU team plans to separate the DFS namespaces from the actual FSU data clusters where each data center will have its own DFS clusters. By separating the DFS load from the FSU servers, the operations of these two services can be separated and at the same time expand the capacity for DFS and file services.

Best Practices

Over the years, Microsoft IT's FSU team has developed or used several best practices for both designing and operating high-volume file service clusters. Some of those practices include:

  • Establish a formal service offering for enterprise file services. Plan for continuous service improvement by defining a service offering with reasonable service level targets, a well-defined customer base, and a complete process for service request, modification, and chargeback. The service should be supported by a team that includes a service manager who understands business requirements of the customer base, a systems engineer who can translate those requirements into a sound technical design, and a well-trained operations staff that can manage all of the service resources remotely.
  • Simplify cluster design as much as possible. Administering a cluster involves some of the more complex activities that are performed on any server platform, and increased complexity can potentially lead to operational failures. To reduce operational complexity, minimize the number of cluster nodes, cluster groups, and cluster resources. This approach will result in achieving scale by creating more small clusters of similar design as opposed to fewer, large, and more complex clusters. Using DFS-N to simplify the namespace across multiple clusters and domains will give users the experience of a single entry point of service.
  • Provision drive sizes to meet downtime requirements. Determine the maximum amount of allowable downtime for a particular set of data before establishing a drive size. In the event of planned or unplanned chkdsk execution, the data will be offline until chkdsk finishes. Chkdsk times are impacted by the number of files on a volume and not by the size of the volume. Volumes should be provisioned with this in mind to ensure that chkdsk and backup run times are kept within acceptable availability needs.
  • Do not allow drives to become full. File services performance degrades as a drive becomes full. The FSU team begins to move data from drives when they exceed 60 percent to 70 percent space utilization to preserve automatic quota growth ability and to allow for System Center Data Protection Manager backup activities.
  • Use quotas to establish limits on space usage by specific users. Also use FSRM to track usage statistics. Plan for and configure automatic growth for quotas to minimize manual operations tasks. With the implementation of automatic quota growth, FSU customers no longer buy fixed amounts of space. Rather, their quotas are rounded up to the nearest 10 GB and grow automatically in 10-GB increments as needed. Because most users typically request more space than they actually need, using automatic quotas also eliminates the need to predict the appropriate percentage of disk space to overallocate.
  • Use the out-of-the-box Windows Server 2008 file server role. The FSU team does not employ any special registry tweaks or tunings for the FSU clustered servers. The standard file server role is already optimized for delivering enterprise-class file services. To further optimize performance, use the Server Core role for file services if it is an appropriate fit for operational support teams.
  • Use an in-place migration approach when possible. As the FSU team experienced, the in-place migration method via the Migrate a Cluster wizard allows for an efficient upgrade with minimal risk and downtime.

Conclusion

For several years, Microsoft IT's FSU group has provided a highly scalable, highly available, and high-performance centralized file services utility based on Windows Server technologies to more than 50,000 users worldwide. The service, which includes only 20 servers and more than 100 terabytes of data, has been provided at a low TCO, with a minimal staff administering the entire service utility remotely from a single location.

By upgrading the FSU to Windows Server 2008, Microsoft IT was able to provide even higher performance and availability service levels while significantly reducing server counts and further reducing operational complexity and costs. In conjunction with Windows Vista on the desktop, a new version of the file transfer protocol SMB 2.0 provides significant file transfer performance improvements, especially across WAN links, resulting in greatly improved customer experience. Improvements to failover clustering and 64-bit support have resulted in drastic improvements in failover time, with almost no interruption in user connections upon failover. And the implementation of System Center Data Protection Manager 2007 to replace traditional backup methods greatly reduces the storage requirements and associated costs for providing backup services, which had previously been the single most expensive component of the FSU.

Additional References

TechNet Webcast: How Microsoft IT Deploys Windows 2008 Clusters for File Services (Level 300)

TechNet Webcast Presentation: How Microsoft IT Deploys Windows Server 2008 Failover Clusters for File Services (Level 300)

Technical Case Study: Microsoft IT Deploys Windows Server 2008 Failover Clusters for File Services (Level 300)

TechNet Webcast: How Microsoft Designs the Virtualization Host and Network Infrastructure (Level 300)

Related Material

Clustering and High Availability Blog

File Cabinet

The Storage Blog at Microsoft

Data Protection Manager Blog

Active Directory® Blog

Ask the Directory Services Team

Windows Server Division Weblog

Server Core

Engineering Windows 7

TechNet Webcast: How Microsoft Does IT: Improving the Sustainability and Use of SQL Server at Microsoft (Level 300)

Green IT in Practice: SQL Server Consolidation in Microsoft IT

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, visit any of the following sites:

http://www.microsoft.com

http://www.microsoft.com/technet/itshowcase

http://www.microsoft.com/windowsserver2008

© 2009 Microsoft Corporation. All rights reserved.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Active Directory, Hyper-V, Windows, Windows Server, and Windows Vista are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft