Windows Server 2003 R2
Your Bandwidth With New Replication Features In Windows Server 2003 R2
Alan von Weltin
At a Glance:
- File replication basics
- Benefits of DFS Replication
- Implementing file and print services
- Fine-tuning file replication
While the Distributed File System (DFS) has been in
existence since Windows NT 4.0, DFS namespaces haven’t often been used to their full potential beyond software
repositories and user home shares. When polling administrators to see why that is the case, one often hears that a scalable, WAN-friendly replication engine has been missing from the DFS solution. So while DFS namespaces can really help administrators solve real-world problems that have direct relationship to cost savings and administrative efficiency, the lack of a more robust replication engine may have limited general adoption. With the release of Windows Server™ 2003 R2, though, DFS becomes an umbrella term for both DFS Namespaces and DFS Replication, creating a true distributed system that combines the best of virtual namespaces and scalable replications. Given these major changes, it’s time to take a look at the improvements made to this important set of technologies.
In a nutshell, DFS Namespaces present a logical data structure to users, while DFS Replication efficiently moves that data among the different servers involved. Doing so provides high availability by leveraging support for failover and fault-tolerance capabilities, but in a manner that is transparent to users. This structure abstracts actual server and share names, mapping them to friendly names specified by the administrator. For example, rather than connecting to meaningless names such as share 0U812 on server R06-SRVF-ADC01, DFS enables users to connect to something like \\Company\Finance. This functionality is easy to set up and manage, allows for greater flexibility, and minimizes user impact during scheduled server maintenance.
Historically, when data replication between servers hosting shares within a DFS namespace was required, administrators often turned to the File Replication Service (FRS), the Robocopy utility, homegrown scripts, or third-party solutions to control the flow of data between the folder targets within the DFS structure. At the time, DFS Namespaces were limited to using FRS as their automatic replication solution. Also, FRS could only be used for domain-based DFS namespaces. While FRS has improved over the years, it remains a complex service that administrators must carefully plan and monitor to prevent common problems, such as journal wraps and excessive replication.
At the end of the day, using manual scripts or FRS does not address situations where the data needs to traverse slow or high-latency links to keep DFS targets synchronized. With either Robocopy scripts or FRS, the smallest unit of replication is a file, which means that even small changes to large files require that the entire file be retransmitted across the network. Initial testing with DFS Replication and Remote Differential Compression (RDC) technology have shown significant bandwidth savings for the file types most commonly used by knowledge workers (such as Office documents). For example, changing the title in a 5MB Microsoft® PowerPoint® file resulted in less than 80KB of traffic on the wire.
Windows Server 2003 R2 provides a new set of tools and technologies to address the needs of several typical distributed computing scenarios. Over the past few years, Microsoft has been working on architecting a new replication engine to replace FRS. That new replication technology is now available in the form of DFS Replication. To complement the improvements made to DFS Replication that are listed in the "DFS Replication Features" sidebar, R2 also includes all of the enhancements to DFS Namespaces
that were introduced with Windows Server 2003 SP1 and exposes these enhancements within the administration console. The new DFS console also makes it easier to manage both namespaces and replication simultaneously.
File and Print Scenarios
One of my clients needs to support thousands of users in diverse roles who are distributed across the United States. While some users are located at regional datacenters, many others are in small field offices. For file and print services, this company has implemented a variety of logical silos aligned to their organization. These silos contain emulated Windows NT® 4.0 domains to support users mapping drives from Windows®-based workstations through Server Message Block (SMB) or Common Internet File System (CIFS). As a result, there are severe restrictions on the ability of users to share files and access resources across these logical boundaries.
The vision for a new file services infrastructure in this organization is to implement a single namespace that encompasses all company locations and that enables users to find data, work with files, and collaborate in a seamless fashion while minimizing the cost and complexity of deploying servers outside of the datacenter.
To provide a new file and print services architecture, my client is implementing a two-tier file and print service model: tier 1 is the regional datacenter, and tier 2 is a local site server. This model provides high availability and efficient WAN utilization. It also eliminates local administration and backup requirements at remote sites and enables users to easily navigate the entire namespace through a single root structure. Note, though, that this client had already implemented Active Directory® (a single domain) and started migrating both user accounts—via Microsoft Identity Integration Server (MIIS) synchronization from Lotus Notes—and workstation computer objects to the domain.
In tier 1, the regional datacenters provide concentrations of high bandwidth enterprise functions such as backup and administration. The company will place two servers running Windows Server 2003 R2 Enterprise Edition at the datacenter locations. Each server participating in the DFS namespace and replication architecture will utilize external storage, such as a Storage Area Network (SAN) configuration. Only tier 1 servers will host DFS namespaces. This is a design trade-off because there will be too many tier 2 servers for all of the servers to host the namespace. Users will receive a DFS referral from the tier 1 servers and will cache the referral for a configurable period of time.
In tier 2, specific remote sites will receive a Windows Server 2003 R2 server for local file and print services, and will participate in DFS Replication to eliminate remote backup (this represents an immediate savings in backup hardware, tapes, and administration). Based on the Active Directory site topology, clients at remote sites automatically receive a DFS referral for folder targets located on their local server. As domain controllers are not deployed at remote locations, "empty" Active Directory sites were required to be created to associate the remote subnets with a logical site entry. If a local server becomes unavailable, the DFS namespace will be configured to automatically utilize a server at the tier 1 level. Note that an update
is available for Windows XP SP2 and Windows Server 2003 to enable automatic failback when the local server becomes available again.
Figure 1 illustrates the new file server architecture based on DFS Replication.
Figure 1 DFS File Server Infrastructure
All file servers participating in the DFS Namespace architecture will use DFS Replication groups to provide granular folder and file replication. In general, the company will use a replication group for each DFS folder target, and will customize the connection topology so that replication from remote servers is balanced between the two datacenter servers. Additionally, this configuration does not replicate files twice over the WAN and provides fault tolerance if the primary connection is unavailable.
To balance the replication traffic and provide fault tolerance, the replication group uses a custom schedule that enables primary and secondary connections from the remote server to each datacenter server. The primary connection is enabled for part of each day, while the secondary connection conducts replication over the remaining hours. Each deployed remote server alternates the datacenter server used as the primary connection. Bandwidth throttling will be used and adjusted for the WAN connection. Adjusting the replication hours and implementing throttling is a trade-off between providing replication during working hours and respecting other applications and services using the same link to the datacenter.
Figure 2 illustrates the replication configuration between tier 1 and tier 2 locations.
Figure 2 DFS Replication Routing
DFS Replication uses a staging directory to replicate files to replication partners so that updates may continue to the source files; if the file change levels are very high, the staging folder may not empty within the time allocated for replication to occur. The primary and secondary connection strategy also serves to ensure that the replication process will eventually replicate any files that may be located in the staging directory on the tier 2 servers.
For both convenience and to eliminate configuration errors, setting up all of the replication groups will be scripted using the new DFSRAdmin utility that ships with R2
. DFSRAdmin enables a complete DFS Replication structure to be established from the command line, including schedule, bandwidth, servers, RDC threshold, primary member, and other necessary features.
Resource Access Strategy
To control which folders may be accessed by different users, the client will use a resource access model that utilizes a common set of global groups for each shared folder (providing read, change, and full control). A set of matching domain local groups actually apply the permissions at the NTFS folder level. The model provides a fair degree of flexibility by enabling groups from any region to be nested into the global groups to easily grant access to files.
There are no implicit permissions in the model; to receive access to files, users (or other groups) will be placed into Global Groups and the Global Groups made members of domain local groups. This model makes clear the distinction between granting access to a resource (membership in a global group) and the application of permissions (domain local groups). For a smaller, less complex organization, this model could be simplified by using only one set of groups.
DFS Replication enhances the resource access strategy by replicating the application of permissions through the domain local groups. The domain local groups need only be applied permissions on a single server’s set of folders as the next replication cycle will replicate the access control list (ACL) out to all other servers within the replication group. Figure 3 illustrates the resource access strategy.
Figure 3 Resource Access Strategy
Considerations for Using DFS in R2
This section deals with a few things that admins need to be aware of with DFS Replication and DFS Namespaces in R2. First, DFS Replication requires a schema update to the Active Directory. Also note that files that are encrypted with EFS will not replicate through DFS Replication in the current release of the product.
DFS Replication only works for files once they are closed (either by the user or by the application, such as auto-save functionality). This means, in a scenario where users in an organization may share the same files on two different physical servers that are configured to replicate files, users may overwrite each other’s changes. (DFS Replication does not proxy or forward file locks between servers.) However, by default DFS Replication saves the conflicting file locally and will write an event log entry. For scenarios where tight consistency is required, Microsoft recommends using SharePoint® (which provides for document check-out) or a product from partners specializing in Windows-based wide area file sharing enhancement technology.
The DFS Namespace interface now permits the creation of empty folders to better establish a hierarchy, but limits folders with targets to the leaf node (last link). This means a DFS folder with a target may not have an empty folder below it. Microsoft recommends using DFS interlinks (links to additional namespaces) to establish a complex hierarchy that appears as a single namespace to users.
DFS Replication may not be used together with Microsoft server hardware clustering as the DFS Replication service does not support failover to a virtual server resource. DFS automatic failback requires an update
for Windows XP clients.
DFS Replication in R2 cannot be used for Active Directory %SystemRoot%\Sysvol folder replication. You can use DFS Replication on a domain controller for folders other than the the SYSVOL though. For DFS to utilize lowest-cost referral ordering, you must have the Active Directory site topology configured properly with the default Bridge all site links setting enabled.
Alan von Weltin, Microsoft Certified Systems Engineer, is an Infrastructure Architect with Microsoft Services in the US Eastern Region. Alan has worked with Active Directory and DFS since the inception of both technologies and has assisted many large organizations in planning and deployment.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited
A new admin pack is required to manage the R2 components and is included on the R2 CD or online
. DFS Replication may require special configuration with the Dfsrdiag.exe utility to function across firewalls as it is RPC-based.