Windows Server 2003 R2
Take Back Your Bandwidth With New Replication Features In Windows Server 2003 R2
Alan von Weltin
At a Glance:
- File replication basics
- Benefits of DFS Replication
- Implementing file and print services
- Fine-tuning file replication
Windows Server 2003 R2
Distributed File System
While the Distributed File System (DFS) has been in existence since Windows NT 4.0, DFS namespaces haven’t often been used to their full potential beyond software
repositories and user home shares. When polling administrators to see why that is the case, one often hears that a scalable, WAN-friendly replication engine has been missing from the DFS solution. So while DFS namespaces can really help administrators solve real-world problems that have direct relationship to cost savings and administrative efficiency, the lack of a more robust replication engine may have limited general adoption. With the release of Windows Server™ 2003 R2, though, DFS becomes an umbrella term for both DFS Namespaces and DFS Replication, creating a true distributed system that combines the best of virtual namespaces and scalable replications. Given these major changes, it’s time to take a look at the improvements made to this important set of technologies.
In a nutshell, DFS Namespaces present a logical data structure to users, while DFS Replication efficiently moves that data among the different servers involved. Doing so provides high availability by leveraging support for failover and fault-tolerance capabilities, but in a manner that is transparent to users. This structure abstracts actual server and share names, mapping them to friendly names specified by the administrator. For example, rather than connecting to meaningless names such as share 0U812 on server R06-SRVF-ADC01, DFS enables users to connect to something like \\Company\Finance. This functionality is easy to set up and manage, allows for greater flexibility, and minimizes user impact during scheduled server maintenance.
DFS Replication Features
These are some of the key improvements to DFS Replication within R2:
New Scenario-Driven Management Building on the Microsoft Management Console (MMC) 3.0 and the .NET Framework 2.0, the new DFS management console simplifies common administrative tasks and accelerates administrator proficiency, thus quickly putting these new technologies to work.
Remote Backup and Administrative Tasks DFS Replication was designed to meet the needs of branch office deployments where there may be no local staff available to perform backup operations and where bandwidth is limited. Rather than running backup jobs at the remote site, files are replicated up to a hub site and then archived.
Separate Replication from the Namespace DFS Replication in R2 is a true general-purpose replication system. It introduces a level of granularity that enables replication to be separate from the DFS namespace (formerly known as a DFS root). Admins can use DFS Replication between any two folders on any two servers without even needing to share the folders or create a DFS namespace. The granularity extends to delegation of administration so that only specific admins can control certain replication groups.
Efficient Replication DFS Replication is very efficient when replicating changes to files over a WAN link. With the implementation of RDC technology, files are broken down into chunks so that only the differential changes are replicated. When Windows Server 2003 R2 Enterprise Edition is running on at least one end of a replication partnership, greater efficiency is realized as up to five existing similar files on the receiving end are used to build the incoming file.
Bandwidth Throttling DFS Replication can be adjusted to support a range of network speeds through the use of bandwidth throttling. It also enables individual replication connections between servers to be scheduled down to 15-minute intervals over a seven-day period. DFS Replication is tolerant of network outages and will resume file transfers if interrupted prior to completion.
Service Health Monitoring DFS Replication includes a comprehensive reporting mechanism to alert admins to potential problems and to summarize key metrics such as bytes saved over the WAN. This built-in mechanism may not be suitable for more than 50 servers or so. For larger deployments, Microsoft is planning to make a DFS Replication service management pack available for Microsoft Operations Manager (MOM).
State-Based Replication DFS Replication replicates only the most current state of files, rather than all individual updates to files as with FRS.
Historically, when data replication between servers hosting shares within a DFS namespace was required, administrators often turned to the File Replication Service (FRS), the Robocopy utility, homegrown scripts, or third-party solutions to control the flow of data between the folder targets within the DFS structure. At the time, DFS Namespaces were limited to using FRS as their automatic replication solution. Also, FRS could only be used for domain-based DFS namespaces. While FRS has improved over the years, it remains a complex service that administrators must carefully plan and monitor to prevent common problems, such as journal wraps and excessive replication.
At the end of the day, using manual scripts or FRS does not address situations where the data needs to traverse slow or high-latency links to keep DFS targets synchronized. With either Robocopy scripts or FRS, the smallest unit of replication is a file, which means that even small changes to large files require that the entire file be retransmitted across the network. Initial testing with DFS Replication and Remote Differential Compression (RDC) technology have shown significant bandwidth savings for the file types most commonly used by knowledge workers (such as Office documents). For example, changing the title in a 5MB Microsoft® PowerPoint® file resulted in less than 80KB of traffic on the wire.
Windows Server 2003 R2 provides a new set of tools and technologies to address the needs of several typical distributed computing scenarios. Over the past few years, Microsoft has been working on architecting a new replication engine to replace FRS. That new replication technology is now available in the form of DFS Replication. To complement the improvements made to DFS Replication that are listed in the "DFS Replication Features" sidebar, R2 also includes all of the enhancements to DFS Namespaces that were introduced with Windows Server 2003 SP1 and exposes these enhancements within the administration console. The new DFS console also makes it easier to manage both namespaces and replication simultaneously.
File and Print Scenarios
One of my clients needs to support thousands of users in diverse roles who are distributed across the United States. While some users are located at regional datacenters, many others are in small field offices. For file and print services, this company has implemented a variety of logical silos aligned to their organization. These silos contain emulated Windows NT® 4.0 domains to support users mapping drives from Windows®-based workstations through Server Message Block (SMB) or Common Internet File System (CIFS). As a result, there are severe restrictions on the ability of users to share files and access resources across these logical boundaries.
The vision for a new file services infrastructure in this organization is to implement a single namespace that encompasses all company locations and that enables users to find data, work with files, and collaborate in a seamless fashion while minimizing the cost and complexity of deploying servers outside of the datacenter.
To provide a new file and print services architecture, my client is implementing a two-tier file and print service model: tier 1 is the regional datacenter, and tier 2 is a local site server. This model provides high availability and efficient WAN utilization. It also eliminates local administration and backup requirements at remote sites and enables users to easily navigate the entire namespace through a single root structure. Note, though, that this client had already implemented Active Directory® (a single domain) and started migrating both user accounts—via Microsoft Identity Integration Server (MIIS) synchronization from Lotus Notes—and workstation computer objects to the domain.
In tier 1, the regional datacenters provide concentrations of high bandwidth enterprise functions such as backup and administration. The company will place two servers running Windows Server 2003 R2 Enterprise Edition at the datacenter locations. Each server participating in the DFS namespace and replication architecture will utilize external storage, such as a Storage Area Network (SAN) configuration. Only tier 1 servers will host DFS namespaces. This is a design trade-off because there will be too many tier 2 servers for all of the servers to host the namespace. Users will receive a DFS referral from the tier 1 servers and will cache the referral for a configurable period of time.
In tier 2, specific remote sites will receive a Windows Server 2003 R2 server for local file and print services, and will participate in DFS Replication to eliminate remote backup (this represents an immediate savings in backup hardware, tapes, and administration). Based on the Active Directory site topology, clients at remote sites automatically receive a DFS referral for folder targets located on their local server. As domain controllers are not deployed at remote locations, "empty" Active Directory sites were required to be created to associate the remote subnets with a logical site entry. If a local server becomes unavailable, the DFS namespace will be configured to automatically utilize a server at the tier 1 level. Note that an update is available for Windows XP SP2 and Windows Server 2003 to enable automatic failback when the local server becomes available again.
Figure 1 illustrates the new file server architecture based on DFS Replication.
Figure 1 DFS File Server Infrastructure
All file servers participating in the DFS Namespace architecture will use DFS Replication groups to provide granular folder and file replication. In general, the company will use a replication group for each DFS folder target, and will customize the connection topology so that replication from remote servers is balanced between the two datacenter servers. Additionally, this configuration does not replicate files twice over the WAN and provides fault tolerance if the primary connection is unavailable.
To balance the replication traffic and provide fault tolerance, the replication group uses a custom schedule that enables primary and secondary connections from the remote server to each datacenter server. The primary connection is enabled for part of each day, while the secondary connection conducts replication over the remaining hours. Each deployed remote server alternates the datacenter server used as the primary connection. Bandwidth throttling will be used and adjusted for the WAN connection. Adjusting the replication hours and implementing throttling is a trade-off between providing replication during working hours and respecting other applications and services using the same link to the datacenter.
Figure 2 illustrates the replication configuration between tier 1 and tier 2 locations.
Figure 2 DFS Replication Routing
DFS Replication uses a staging directory to replicate files to replication partners so that updates may continue to the source files; if the file change levels are very high, the staging folder may not empty within the time allocated for replication to occur. The primary and secondary connection strategy also serves to ensure that the replication process will eventually replicate any files that may be located in the staging directory on the tier 2 servers.
For both convenience and to eliminate configuration errors, setting up all of the replication groups will be scripted using the new DFSRAdmin utility that ships with R2. DFSRAdmin enables a complete DFS Replication structure to be established from the command line, including schedule, bandwidth, servers, RDC threshold, primary member, and other necessary features.
Resource Access Strategy
To control which folders may be accessed by different users, the client will use a resource access model that utilizes a common set of global groups for each shared folder (providing read, change, and full control). A set of matching domain local groups actually apply the permissions at the NTFS folder level. The model provides a fair degree of flexibility by enabling groups from any region to be nested into the global groups to easily grant access to files.
There are no implicit permissions in the model; to receive access to files, users (or other groups) will be placed into Global Groups and the Global Groups made members of domain local groups. This model makes clear the distinction between granting access to a resource (membership in a global group) and the application of permissions (domain local groups). For a smaller, less complex organization, this model could be simplified by using only one set of groups.
DFS Replication enhances the resource access strategy by replicating the application of permissions through the domain local groups. The domain local groups need only be applied permissions on a single server’s set of folders as the next replication cycle will replicate the access control list (ACL) out to all other servers within the replication group. Figure 3 illustrates the resource access strategy.
Figure 3 Resource Access Strategy
Considerations for Using DFS in R2
This article only skims the surface of the new DFS capabilities available in Windows Server 2003 R2. For more information, see the following additional resources:
This section deals with a few things that admins need to be aware of with DFS Replication and DFS Namespaces in R2. First, DFS Replication requires a schema update to the Active Directory. Also note that files that are encrypted with EFS will not replicate through DFS Replication in the current release of the product.
DFS Replication only works for files once they are closed (either by the user or by the application, such as auto-save functionality). This means, in a scenario where users in an organization may share the same files on two different physical servers that are configured to replicate files, users may overwrite each other’s changes. (DFS Replication does not proxy or forward file locks between servers.) However, by default DFS Replication saves the conflicting file locally and will write an event log entry. For scenarios where tight consistency is required, Microsoft recommends using SharePoint® (which provides for document check-out) or a product from partners specializing in Windows-based wide area file sharing enhancement technology.
The DFS Namespace interface now permits the creation of empty folders to better establish a hierarchy, but limits folders with targets to the leaf node (last link). This means a DFS folder with a target may not have an empty folder below it. Microsoft recommends using DFS interlinks (links to additional namespaces) to establish a complex hierarchy that appears as a single namespace to users.
DFS Replication may not be used together with Microsoft server hardware clustering as the DFS Replication service does not support failover to a virtual server resource. DFS automatic failback requires an update for Windows XP clients.
DFS Replication in R2 cannot be used for Active Directory %SystemRoot%\Sysvol folder replication. You can use DFS Replication on a domain controller for folders other than the the SYSVOL though. For DFS to utilize lowest-cost referral ordering, you must have the Active Directory site topology configured properly with the default Bridge all site links setting enabled.
A new admin pack is required to manage the R2 components and is included on the R2 CD or online. DFS Replication may require special configuration with the Dfsrdiag.exe utility to function across firewalls as it is RPC-based.
Alan von Weltin, Microsoft Certified Systems Engineer, is an Infrastructure Architect with Microsoft Services in the US Eastern Region. Alan has worked with Active Directory and DFS since the inception of both technologies and has assisted many large organizations in planning and deployment.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.