Choosing an Availability Strategy for Business-Critical Data
Updated: March 28, 2003
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
If a file server contains business-critical data, you need to make certain that the data is highly available. Windows Server 2003 provides two primary strategies for increasing data availability: FRS and clustering.
FRS This strategy involves creating one or more domain-based DFS namespaces, using link targets that point to multiple file servers and using File Replication service (FRS) to synchronize the data in the link targets. This chapter describes the design and deployment process for FRS, although you can also synchronize data manually by using tools such as Robocopy or by using third-party replication tools.
Clustering A server cluster is a group of individual computer systems working together cooperatively to provide increased computing power and to provide continuous availability of business-critical applications or resources. This group of computers appears to network clients as if it were a single system, by virtue of a common cluster name. A cluster can be configured so that the workload is distributed among the group, and if one of the cluster members fails, another cluster member automatically assumes its duties.
Both of these strategies involve using multiple file servers to ensure data availability. If for some reason you cannot use multiple file servers, follow the guidelines in "Planning for File Server Uptime" earlier in this chapter to increase the availability of the physical server.
When evaluating these two strategies, you must keep in mind your organization’s tolerance for inconsistent data. FRS can cause temporary data inconsistency as data is replicated across multiple servers. Clustered file servers maintain only one copy of the data; therefore, data inconsistency does not occur.
If your organization plans to implement geographically dispersed clusters for disaster tolerance, you need to understand your data consistency needs in different failure and recovery scenarios and work with the solution vendors to match your requirements. Different geographically dispersed cluster solutions provide different replication and redundancy strategies, ranging from synchronous mirroring across sites to asynchronous replication. For more information about geographically dispersed clusters, see "Designing and Deploying Server Clusters" in this book.
Using FRS as an Availability Strategy
You can use FRS to replicate data in domain-based DFS namespaces on file servers running a Windows 2000 Server or Windows Server 2003 operating system. When evaluating FRS, you must determine whether your organization can tolerate periods of inconsistent data that can occur within a replica set. Data inconsistency can occur at the file and folder level as follows:
07FRS uses a "last writer wins" algorithm for files. This algorithm is applied in two situations: when the same file is changed on two or more servers, and when two or more different files with the same name are added to the replica tree on different servers. The most recent update to a file in a replica set becomes the version of the file that replicates to the other members of the replica set, which might result in data loss if multiple masters have updated the file. In addition, FRS cannot enforce file-sharing restrictions or file locking between two users who are working on the same file on two different replica set members.
FRS uses a "last writer wins" algorithm when a folder on two or more servers is changed, such as by changing folder attributes. However, FRS uses a "first writer wins" algorithm when two or more identically named folders on different servers are added to the replica tree. When this occurs, FRS identifies the conflict during replication, and the receiving member protects the original copy of the folder and renames (morphs) the later inbound copy of the folder. The morphed folder names have a suffix of "_NTFRS_xxxxxxxx," where "xxxxxxxx" represents eight random hexadecimal digits. The folders are replicated to all servers in the replica set, and administrators can later merge the contents of the folders or take some other measure to reestablish the single folder.
Temporary data inconsistency due to replication latency is more likely to occur in geographically diverse sites with infrequent replication across slow WAN links. If you want to use replication among servers in the same site, consistency is probably not an issue, because the replication can occur quickly after the file changes — assuming that only one user makes changes to the data. If two users make changes to the data, replication conflicts occur and one user loses those changes.
Replication works well in the following scenarios.
When the data is read-only or changes infrequently
Because changes occur infrequently, the data is usually consistent. In addition, FRS has less data to replicate, so network bandwidth is not heavily affected.
When the sites are geographically dispersed and consistency is not an issue
Geographically dispersed sites might have slower bandwidth connections, but if your organization does not require the data in those sites to always be consistent with each other, you can configure replication in those sites on a schedule that make sense for your organization. For example, if your organization has sites in Los Angeles and Zimbabwe, you can place one or more replicas of the data in servers in those sites and schedule replication to occur at night or during periods of low bandwidth use. Because in this scenario replication could take hours or days to update every member, the delay must be acceptable to your organization.
When each file is changed by only one person from one location
Replication conflicts rarely occur if only a single user changes a given file from a single location. Some common scenarios for single authorship are redirected My Documents folders and other home directories. Conversely, if users roam between sites, replication latency could cause the file to be temporarily inconsistent between sites.
When replication takes place among a small number of servers in the same site
Replication latency is reduced by frequently replicating data using high-speed connections. As a result, data tends to be more consistent.
Replication should not be used in the following scenarios.
In organizations with no operations group or dedicated administrators
Organizations that do not have the staff or the time to monitor FRS event logs on each replica member should not implement FRS. Organizations must also have well-defined procedures in place to prevent the accidental or unintentional deletion of data in the replica set, because deleting a file or folder from one replica member causes the file or folder (and its contents) to be deleted from all replica members. In addition, if a folder is moved out of the replica tree, FRS deletes the folder and its contents on the remaining replica members. To avoid having to restore the files or folders from backup, you can enable shadow copies on some of the replica members so that you can easily restore a file or folder that was accidentally deleted. For more information about shadow copies, see "Designing a Shadow Copy Strategy" later in this chapter. For more information about FRS logs, see the Windows Security Collection of the Windows Server 2003 Technical Reference (or see the Windows Security Collection on the Web at http://www.microsoft.com/reskit).
In organizations that do not update virus signatures or closely manage folder permissions
A virus in FRS-replicated content can spread rapidly to replica members and to clients that access the replicated data. Viruses are especially damaging in environments where the Everyone group has share permissions or NTFS permissions to modify content. To prevent the spread of viruses, it is essential that replica members have FRS-compatible, up-to-date virus scanners installed on the servers and on clients that access replicated data. For more information about preventing the spread of viruses, see "Planning Virus Protection for File Servers" and "Planning DFS and FRS Security" later in this chapter.
When the rate of change exceeds what FRS can replicate
If you plan to schedule replication to occur during a specified replication window, verify that FRS can replicate all the changed files within the window. Replication throughput is determined by a number of factors:
The number and size of changed files
The speed of the disk subsystem
The speed of the network
Whether you have optimized the servers by placing the replica tree, the staging directory, and the FRS data on separate disks.
Each organization will have different FRS throughput rates, depending on these factors. In addition, if your data compresses extremely well, your file throughput will be higher. To determine the replication rate, perform testing in a lab environment that resembles your production environment.
If the amount of data changes exceeds what FRS can replicate in a given period of time, you need to change one of these factors, such as increasing the speed of the disk subsystem (number of disks, mechanical speed, or disk cache) or network. If no change is possible, FRS is not recommended for your organization.
In organizations that always use clustered file servers
Some organizations use clustered file servers regardless of whether the server contains business-critical data. Although storing FRS-replicated content on the cluster storage of a clustered file server might imply increased availability of the data, combining clustering and FRS is not recommended. Data might become inconsistent among the members of the replica set, thus defeating the purpose of clustering, which is to have highly available data that remains consistent because only one copy of the data exists. In addition, Windows Server 2003 does not support configuring FRS to replicate data on cluster storage.
In organizations that use Remote Storage
Remote Storage is a feature in Windows Server 2003 that automatically copies infrequently used files on local volumes to a library of magnetic tapes or magneto-optical disks. Organizations that use Remote Storage must not use FRS on the same volume. Specifically, do not perform any of the following tasks:
Do not create a replica set on a volume that is managed by Remote Storage.
Do not add a volume that contains folders that are part of an FRS replica set to Remote Storage.
If you use Remote Storage for volumes that contain FRS replica sets, backup tapes might be damaged or destroyed if FRS recalls a large number of files from Remote Storage. The damage occurs because FRS does not recall files in media order. As a result, files are extracted randomly, and the process can take days to complete and might damage or destroy the tape in the process. Random extraction from magneto-optical disks can also be extremely time consuming.
Windows Server 2003 does not prevent you from using Remote Storage and FRS replica sets on the same volumes, so take extra precautions to avoid using these two features on the same volume.
When locks by users or processes prevent updates to files and folders
FRS does not replicate locked files or folders to other replica members, nor does FRS update a file on a replica member if the local file is open. If users or processes frequently leave files open for extended periods, consider using clustering instead of FRS.
When the data to be replicated is on mounted drives
If a mounted drive exists in a replica tree, FRS does not replicate the data in the mounted drive.
When the data to be replicated is encrypted by using EFS
FRS does not replicate files encrypted by using EFS, nor does FRS warn you that EFS-encrypted files are present in the replica set.
When the FRS jet database, FRS logs, and staging directory are stored on volumes where NTFS disk quotas are enabled
If you plan to store a replica set on a volume where disk quotas are enabled, you must move the staging directory, FRS jet database, and FRS logs to a volume where disk quotas are disabled. For more information, see "Planning the Staging Directory" later in this chapter.
Using Clustering as an Availability Strategy
If the data changes frequently and your organization requires consistent data that is highly available, use clustered file servers. Clustered file servers allow client access to file services during unplanned and planned outages. When one of the servers in the cluster is unavailable, cluster resources and applications move to other available cluster nodes. Server clusters do not guarantee nonstop operation, but they do provide sufficient availability for most business-critical applications, including file services. A cluster service can monitor applications and resources and automatically recognize and recover from many failure conditions. This ability provides flexibility in managing the workload within a cluster and improves overall system availability.
Server cluster benefits include the following:
High availability Ownership of resources, such as disks and IP addresses, is automatically transferred from a failed server to a surviving server. When a system or application in the cluster fails, the cluster software restarts the failed application on a surviving server, or it disperses the work from the failed node to the remaining nodes. As a result, users experience only a momentary pause in service.
Manageability You can use the Cluster Administrator snap-in to manage a cluster as a single system and to manage applications as if they were running on a single server. You can move applications to different servers within the cluster, and you can manually balance server workloads and free servers for planned maintenance. You can also monitor the status of the cluster, all nodes, and resources from anywhere on the network.
Scalability Server clusters can grow to meet increased demand. When the overall client load for a clustered file server exceeds the cluster’s capabilities, you can add additional nodes.
Clustered file servers work well in the following scenarios.
When multiple users access and change the files
Because only one copy of the file exists, Windows Server 2003 can enforce file locking so that only one user can make changes at a time. As a result, data is always consistent.
When large numbers of users access data in the same site
Clustered file servers are useful for providing access to users in a single physical site. In this case, you do not need a replication method to provide data consistency among sites.
When files change frequently and data consistency is a must
Even with a large number of changes, data is always consistent and there is no need to replicate the changes to multiple servers.
When you want to reduce the administrative overhead associated with creating many shared folders
On clustered file servers, you can use the Share Subdirectories feature to automatically share any folders that are created within a folder that is configured as a File Share resource. This feature is useful if you need to create a large number of shared folders.
When you want to ensure the availability of a stand-alone DFS root
Creating stand-alone DFS roots on clustered file servers allows the namespaces to remain available, even if one of the nodes of the cluster fails.
When you want to make encrypted files highly available
Windows Server 2003 supports using the EFS clustered file servers. Using EFS in FRS replica sets is not supported. For more information about using EFS on clustered file servers, see "Planning Encrypted File Storage" later in this chapter.
Some issues to consider when using clustered file servers include the following.
Dynamic disks are not available
If you want to use dynamic disks, you must use them on nonclustered file servers or on the local storage devices of the cluster. If the node hosting the local storage devices fails, the data becomes unavailable until the node is repaired and brought back online.
If you need to extend the size of basic volumes used for shared cluster storage, you can do so by using DiskPart. For more information about extending basic volumes, see "Extend a basic volume" in Help and Support Center for Windows Server 2003.
Clustered file servers must use complete cluster systems
For your server clusters to be supported by Microsoft, you must choose complete cluster systems from the Windows Server Catalog for the Windows Server 2003 family. For more information about support for server clusters, see article Q309395, "The Microsoft Support Policy for Server Clusters and the Hardware." To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources.