Deploying Microsoft Cluster Server

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Archived content - No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By The Enterprise Services Assets Team

To keep the network running, systems administrators conduct regular backups, monitor for potential service or server outages, and quickly resolve issues that affect system availability. Another method for ensuring the manageability, scalability, and availability of enterprise systems is clustering—connecting a group of independent systems, so they can work together as a single entity. This article explains what a large printing company found when it considered using Microsoft Cluster Server (MSCS) to provide server failover and increased system availability.

Company Details

Description

Large contract printer

Network

Two Windows NT servers, Windows NT 2500 clients, 300 print queues.

Challenge

Network print servers experienced overloads and often went out of service.

Solution

Network administrators decided to cluster the print servers using Microsoft Windows NT Server 4.0 Enterprise Edition with Microsoft Cluster Server, configured in passive/active mode: the active server provides the printing services, the passive server is available as a backup.

Achieving Failover with Cluster Server

Microsoft Cluster Server (MSCS) connects two servers so that their data can be accessed and managed as a single system. The resulting cluster consists of a network link between two servers (nodes), and each node's link with a shared small computer system interface (SCSI) hard drive, where the shared cluster data is stored.

Cc722946.depclus1(en-us,TechNet.10).gif

Figure 1: An MSCS configuration

To perform failover, MSCS:

  • Automatically detects a node failure and transfers its data onto the other node. Users experience only a momentary pause in service.

  • Allows resources, including network applications and services, to remain available even when a server is taken offline for maintenance or other reasons.

  • Enables cluster-aware applications to load-balance and scale across multiple servers within a cluster. To be cluster-aware, an application must use cluster application programming interface (API) calls and resource dynamic-link library (DLL) functions to access cluster features.

Cluster resources can use three failover techniques: mirrored disk, shared device, and shared nothing. Mirrored-disk failover allows each server to maintain its own disks and run software that copies the data from one server onto another. Shared device failover permits nodes to access data on any device. Because access must be synchronized, this method requires specialized software called Distributed Lock Manager (DLM), which tracks references to cluster hardware resources. Shared nothing failover requires that each server own its own disk resources; it uses DLM to transfer ownership of a disk from one server to another.

MSCS supports the shared nothing model; it supports the shared device model as long as applications supply DLM. The latter arrangement can affect performance because DLM generates some additional traffic between nodes and serializes access to hardware resources.

A fault-tolerant system (of which MSCS is a component) can be described as one with less than 30 seconds of downtime in a year. Fault tolerance requires a range of technologies:

  • Uninterruptible power supply (UPS)

  • Redundant power supply

  • Error correction code (ECC) memory

  • Redundant array of inexpensive disks (RAID) storage

  • Fault-tolerant network interface cards (NICs)

  • Redundant network fabric

  • High-availability software (in this case, MSCS)

This table highlights the failover scenarios in which you can use MSCS:

Point of failure

Microsoft Cluster Server solution

Other possible solutions

Network component, such as a hub, router, etc.

None

Spare components, redundant routes, etc.

Power supply

None

UPS

Server hardware, such as CPU, memory, network card, etc.

Failover

None

Non-shared disk

Failover

None

Shared disk

None

RAID

Server connection

Failover

None

Server software, such as the operating system, a service, or an application

Failover

None

Planning Failover with MSCS at the Printing Company

Here are the steps that the printing company used to evaluate MSCS and plan for its own implementation.

Select Applications to Cluster

The first step involves selecting the types of system applications and services that need to be highly available, such as mission-critical applications and file shares. One way to determine the level of criticality for each service is to rank them according to their value to end users and how much they cost your business when they are down.

Cc722946.depclus2(en-us,TechNet.10).gif

Figure 2: Examples of server applications for clustering

Three types of server applications benefit from MSCS:

  • Core Windows NT Server, Enterprise Edition services. File shares, print queues, Internet/intranet sites managed by Microsoft Internet Information Server (IIS), Microsoft Message Queue Server services, and Microsoft Transaction Server services.

  • Generic applications and services. Resources that you want to cluster for basic error detection, automatic recovery, and management.

  • Cluster-aware applications. Applications that use clustering APIs to access cluster features.

Choose a Cluster Model for Failover

How your system handles failover is determined by the function and performance requirements of the applications and services you select for clustering. This section explains the five cluster models (A through E, below) you can choose from when implementing MSCS.

Model A: High-Availability Solution with Static Load Balancing

This model achieves failover by having each node maintain its own resources and the resources of another node. Use this model for resources that require high availability, such as file and print shares. Network performance depends on the types of resources you choose and the capacity of the nodes.

Cc722946.depclus3(en-us,TechNet.10).gif

Figure 3: Cluster model for high availability and static load balancing

If a node fails, its file and print groups are transferred to the other node to maintain. When the failed node returns on-line, the groups fail back to the original node and performance returns to normal.

Model B: Hot Spare Solution with Maximum Availability

This model achieves failover by having one cluster node provide all the resources, keeping another node (a hot spare) available in case the main node fails. Because this requires a second server you have to weigh its cost against the need for continuous high availability of critical resources.

Cc722946.depclus4(en-us,TechNet.10).gif

Figure 4: Cluster model using hot spare for maximum availability

Use this model if you have resources for which high availability is critical (such as Web servers that customers use to place orders) and your business cannot tolerate degraded performance during failover.

Model C: Partial Cluster Server Solution

This model also achieves failover by using a hot spare for high availability resources but uses the designated primary server to run non-cluster-aware applications. This means you don't have to purchase a server to run only non-cluster-aware applications, but it also means that if the primary fails the non-clustered resources will unavailable until it is brought back online.

Cc722946.depclus5(en-us,TechNet.10).gif

Figure 5: Partial cluster model

This model is useful if you have non-cluster-aware applications you can afford to lose temporarily if the primary fails.

Model D: Virtual-Server-Only Solution (No Failover)

This model does not provide failover (because it uses only one server) but it increases the performance of specific resources and allows you to manage them with MSCS clustering strategy. The printing company used this model to group file and print shares resources (virtual servers) so that users could access them more easily. The section "Define Resource Groups" explains how to build groups.

Figure 6: Cluster model using only a virtual server (no failover)

Figure 6: Cluster model using only a virtual server (no failover)

One advantage to using this model is that if you do add another server to your system, groups are already created—the only thing you have to configure are the failover policies.

Model E: Hybrid Solution

This model incorporates the advantages of the other four by combining multiple failover scenarios into a single cluster. Use it if you have a variety of resources that require maximum failover and you have the hardware required to support it. While it is effective from the failover point of view, putting both shares on a single node can reduce performance.

Cc722946.depclus7(en-us,TechNet.10).gif

Figure 7: Hybrid cluster model

This model provides static load balancing for two database shares by logically grouping file and print share resources to create two virtual servers for user and administrative convenience on Node A, and a non-cluster-aware application group with no failover protection on Node B.

Identify Which Resources to Cluster

The next step is to decide which file shares, printers, and applications to cluster. Do this by evaluating their availability requirements—their importance to end users and your company's operation. For a resource to be failed over by MSCS it must use Transmission Control Protocol/Internet Protocol (TCP/IP) for network communication and be configured to store data on a shared SCSI drive.

Resource type

Description

Distributed Transaction Coordinator

Clustered installation of Microsoft Distributed Transaction Coordinator (DTC)

File share

File shares accessible by a network path, such as \\servername\sharename

Generic application

Network or desktop applications, such as a database program

Generic service

Windows NT services, such as a logon-authentication service

IIS virtual root

Microsoft IIS 3.0 (or later) virtual roots for World Wide Web (WWW), File Transfer Protocol (FTP), and Gopher

IP address

Internet Protocol (IP) network address

Microsoft Message Queuing Server

Clustered installation of Microsoft Message Queuing Server

Network name

The virtual-server computer name for a network device or service

Physical disk

Disk resources on the shared SCSI bus for shared folders or storage

Print spooler

Printer queues for network-attached printers

Time service

Special resource that maintains time consistency between cluster nodes

For these resources to be defined within a cluster and have basic failover functionality, MSCS includes generic resource DLLs. You can also write your own resource DLLs for non-cluster-aware applications or services to take better advantage of MSCS features.

List all server-based applications and sort them according to cluster share groups, virtual servers, or non-cluster applications. Your network capacity is the total of these three resource types.

Build Dependency Trees for Resources to Define Resource Dependencies

Dependency trees show the relationships of the resources that reside in the same group on a node. You can refer to them to make sure you do not create share clusters that have dependencies between critical resources, such as file and print shares.

Refer to these rules as you build dependency trees:

  • Do not include the cluster name and cluster IP address in the dependency tree or group—they are created automatically during installation.

  • A resource can depend on any number of other resources. Use lines to link a resource to all of its dependencies.

  • Resources in the same dependency tree must all be online on the same node of a cluster. Make sure to include all resource dependencies.

  • A resource can be active or online on only one node in the cluster at a time.

  • Resources of a dependency tree can be contained in only one cluster group.

  • You can bring a resource online only after all of the resources that it depends on are online. The hierarchy in the tree should display dependent resources above the resources on which they depend.

  • You must take a resource offline before taking off any resources on which it depends. Place resources that have other resources dependent on them at the bottom of the tree.

  • Do not link critical resources with high availability needs to other resources in the group. You cannot put multiple resources of these types within a single cluster: Microsoft Distributed Transaction Coordinator (MSDTC), Microsoft Message Queuing Server (MSMQ), and the Time Service resource.

Figure 8 shows the dependency scheme for all resource types. Dashed lines show "typical" but not required links:

Cc722946.depclus8(en-us,TechNet.10).gif

Figure 8: Dependency scheme for all resource types

Sample Dependency Trees

Figures 9 and 10 display the relationship between two file shares and an IIS virtual root. Tree A in Figure 9 represents a single dependency tree. Because the resources all belong to one group, dependencies cannot be divided across cluster nodes.

Cc722946.depclus9(en-us,TechNet.10).gif

Figure 9: Sample dependency Tree A

With minor modifications to Tree A, you can build dependencies across the nodes to create static load balancing (see Figure 10). Defining a second IP address resource (IP-2) creates two independent dependency trees that can be defined in separate groups, allowing the file shares to be active on one node while the virtual root is active on another.

Cc722946.depclu10(en-us,TechNet.10).gif

Figure 10: Sample dependency Tree B

Defining Resource Properties

Once you have mapped your resources using dependency trees, you should define general, advanced, and resource-specific properties. These properties are set when you implement MSCS on your system. For a detailed description of these properties see "Setting Properties" in Chapter 4: Managing MSCS of the Microsoft Cluster Server Administrator's Guide.

Define Resource Groups

A group links the dependencies for a collection of dependent or related resources to be managed as a single unit—typically all the resources needed to run a specific application or service. Grouping is the primary means of achieving static load balancing. Careful lab testing and resource monitoring can help you determine a an optimal-performance grouping system for your system.

Resource group definition rules:

  • A group can be active on only one cluster node at a time.

  • Resources within the same group can be owned by only one node in the cluster.

  • A resource cannot span groups.

  • A dependency tree cannot span groups.

Define Group Properties

You must define group properties (general, failover, and failback) before you can set them during the implementation stage. Do this in conjunction with establishing a failover policy (described in the next section). For a detailed description of these properties, see the MSCS on-line help or the Microsoft Cluster Server Administrator's Guide. (Use the "Sync Contents" button [Ctrl+S] to view the chapters and articles in this guide.)

Establish Failover Policies

Failover policies determine how groups behave during failover. The online and offline transitions occur in a predefined order: a resource is brought online after all resources it depends on are brought online, and a resource is taken offline before any resource it depends on is taken offline. Failback is when a group returns to the node it was active on prior to a failover.

The cluster service automatically initiates failover when it detects a failure on one of the cluster nodes. Because each cluster node monitors its own processes and the other node processes, the need for failover is detected with minimal delay. In most cases, MSCS can detect a node has failed and begin failover in less than 10 seconds.

A failover policy is defined as the maximum number of times (the threshold) that a group is allowed to failover in a specified number of hours (the period) before it is taken offline. When a group exceeds the failover policy MSCS leaves it offline. Assign an appropriate failover policy to each resource group in MSCS.

Implementing Failover Using MSCS

With the information you have gathered in the planning stage, you can configure MSCS according to your failover design. Implementing failover involves configuring servers, software, and clusters on MSCS.

Configuring Hardware

You have to make configuration changes to your system hardware to ensure proper failover.

Standard Server Hardware

Although you can configure MSCS using the minimum requirements (two nodes configured similarly, one SCSI disk on a shared bus, and one network interface card for each node), you can achieve more effective high availability by analyzing your system and eliminating all single points of failure. This table summarizes single points of failure and possible options for addressing them.

Point of failure

Typical solution

Preferable solution

Optimal solution

Cluster power source

Use an uninterruptible power supply (UPS) with power conditioning capabilities to protect the entire cluster from AC power problems (brownouts, surges, power loss).

Use a UPS for each node of the cluster and any external drive cabinets so that there are backup UPS units in case of failure.

Use UPS systems with redundant power paths: AC power to the UPS systems from different circuits or power grids; generator backups for the AC power source.

Cabinet power supply

Use multiple power cords connected to independent power sources for each cabinet in the cluster.

Use multiple internal power supplies and multiple power cords connected to independent power sources, for each cabinet in the cluster.

Use multiple, hot-pluggable, internal power supplies and multiple power cords connected to independent power sources, for each cabinet in the cluster.

Cluster inter-connect

Use a single network interface card (NIC) per cluster node. This interface is configured as both the public network for client-to-cluster connectivity and as the private network for node-to-node communications.

Use dual NICs per cluster node, with one NIC dedicated to the private network and the other to the public network. 32-bit (or faster) PCI network cards are recommended to minimize failures due to adapter congestion.

Use at least two NICs per cluster node—one dedicated to the private network and the other to the public network configured as a backup route in the event that the private network fails. 32-bit (or faster) PCI network cards are recommended to minimize failures due to adapter congestion.

 

Note: This configuration is not recommended because congestion on the public network could result in the delay or loss of cluster heartbeat messages, in turn causing unexpected resource failover conditions.
32-bit (or faster) PCI network cards are recommended to minimize failures due to adapter congestion.

 

 

Client-to-cluster connectivity

Use a routed TCP/IP network with redundant routers on the cluster's public network subnet. Multiple gateways ensure connectivity in the event of a router failure.

Use a routed TCP/IP network with multiple gateways on every server and client subnet.

Use physically multi-homed clients and servers with fully redundant network paths.

Server memory

Use 8-bit, no parity.

Use 9-bit, with 1-bit parity to allow detection of single-bit memory errors.

Use Error Correction Code (ECC) memory. ECC detects double-bit errors and corrects single-bit errors.

Private disk storage

Use one or more high-performance, physical disks for each server's local storage. (SCSI disks are recommended for best performance but are not required.) Configure the disks to the operating system as either independent volumes or a non-fault-tolerant Redundant array of Inexpensive disks (RAID 0). Each physical disk represents a single point of failure.

Use multiple, high-performance, physical disks for each server's local storage—use the Windows NT fault-tolerant disk driver to configure the disks as RAID 1 or RAID 5 volumes. A single disk failure does not cause a node failure; however, the node needs to be brought down in order to replace a failed drive.

Use multiple, high-performance, physical disks for each server's local storage—use hardware RAID support to combine the disks into fault-tolerant volumes (for example, RAID 1 or 5). Hot-swap capability allows failed disks to be replaced without taking down the node.

Shared disk storage

On the shared bus, use one or more physical disks configured as either independent volumes or non-fault-tolerant RAID (RAID 0). Note that shared disks must be addressable through a SCSI interface.

Use hardware RAID to create a fault-tolerant disk volume. This volume acts as the quorum device and hosts application data for a single resource group. Note that using the Windows NT fault-tolerant disk driver to create a volume is not supported on the shared bus.

Use hardware RAID to create multiple, fault-tolerant disk volumes. The availability of multiple volumes allows the cluster to be arranged in an active-active configuration.

 

Note: This configuration is not recommended because failure of a disk or volume results in a loss of service for all resources hosted on it; they are not restartable on another node of the cluster and data loss is likely.
You should make the shared disk hosting the quorum resource fault-tolerant, because the quorum contains the cluster configuration data and loss of this device results in the loss of the entire cluster.

 

 

SCSI Connectivity

Each server connected to the local shared bus must contain a high-performance, single-ended SCSI controller card. Use multiple controller cards or a card that supports multiple SCSI buses if local disk storage is SCSI—local storage cannot be located on the shared bus.

Each server contains one or more high-performance, differential SCSI controller cards. Each of these cards may be used to support multiple SCSI buses.

Use fiber channel technology to create a shared bus.

Many configuration options meet the minimum requirements of MSCS but Microsoft provides technical support only for configurations that have passed cluster validation testing (use the Hardware Compatibility List (HCL) on TechNet or at https://winqual.microsoft.com/download/default.asp as a reference for assembling valid clusters).

To be valid, configurations must use the same hardware for both nodes of the cluster, a hardware RAID controller, and a PCI network interface card on the cluster interconnect. The HCL also lists various system components such as SCSI adapters, fiber channel adapters, and RAID devices that have passed Cluster Component Candidate testing.

Peripherals

You should connect peripherals (printers, fax devices, modems, tape drives) directly to the network (if applicable) or to another, non-clustered server—that is, treat peripherals as "local" resources rather than cluster resources. There is no facility to failover peripherals if a node fails, and adding a local resource to a cluster can severely restrict some administrative operations. For instance, if you add a fax server to one of the cluster nodes you can no longer perform scheduled maintenance on the fax server during the day because taking down its node results in a loss of service.

RAID Array Configuration and Partitioning

Configure disks on the shared bus as fault-tolerant volumes so that when you create additional shared volumes the cluster can use an active-active configuration. (See the table above.)

Recommendations:

  • Format each volume for NTFS only.

  • Use a single partition on each disk (logical partitions cannot be failed over independently).

  • Permanently assign the same drive letter to a given shared disk on each node.

SCSI Conventions

Your system must have at least one shared SCSI bus formed by a PCI-based SCSI controller. To form a shared bus:

  • Change the value of one of the SCSI IDs. The IDs must be different before the controllers can be connected to the same bus; the default is 7.

  • Disable each controller's boot-time for the SCSI bus reset operation (use the manufacturer's configuration utilities).

Capacity Requirements

Cluster configuration involves balancing each node's capacity so that resources perform optimally and either node can temporarily run the resources of the other during failover. The capacity requirements for cluster nodes are:

  • Hard-disk-storage. Each node in a cluster must have enough hard-disk capacity to store permanent copies of all applications and other resources required to run all groups. Plan disk space allowances so that either node can efficiently run all resources during failover.

  • CPU. Failover can strain the CPU processing capacity of an MSCS server when the server takes control of the resources from a failed node. If you don't plan properly, the CPU of a surviving node can be pushed beyond its practical capacity during failover, slowing response time. Plan your CPU capacity on each node so that it can accommodate new resources without greatly affecting responsiveness.

  • RAM. Each node in your cluster should have enough RAM to handle all applications that run on either node. Also, make sure to set the Windows NT paging files appropriately for each node's physical memory.

IP Addresses

MSCS does not support DHCP-assigned IP addresses for the cluster administration address (associated with the cluster name), or for any IP address resources. To configure Windows NT on each node, use either static IP addresses (which ensures the highest degree of availability) or DHCP permanently-leased IP addresses (with a slight chance of failure).

Private Network Addressing

If you configure a cluster with a private interconnect it is good practice to assign addresses from one of the private networks defined by the Internet Assigned Numbers Authority (IANA). Refer to RFC 1597 "Address Allocation for Private Internets" and RFC 1631 "The IP Network Address Translator (NAT)" located at https://safety.net/rfc.html. RFC 1597 defines three network classes for private networks as listed below.

  • 10.0.0.0 – 10.255.255.255 (Class A), Subnet Mask: 255.0.0.0

  • 172.16.0.0 – 172.31.255.255 (Class B), Subnet Mask: 255.255.0.0

  • 192.168.0.0 – 192.168.255.255 (Class C), Subnet Mask 255.255.255.0

Do not assign default gateways or WINS servers to the adapters on the private interconnect: it causes the server name to be registered as a multi-homed value and registers the public and private addresses in WINS. Clients select addresses randomly and will not be able to connect with the server if they select a private address.

Using a Public Network (Subnet)

To move from a private to the public network see RFC 1631, "The IP Network Address Translator (NAT)" (https://safety.net/rfc1631.txt). You should reserve IP addresses from the public network subnet for the cluster: one for each physical node, one for the cluster, and one for each resource group that includes an IP address resource.

Naming Conventions

For proper failover, clients must connect using the virtual server names instead of connecting directly to the cluster nodes. You should establish strict naming conventions for each server type as a means of differentiating them, because no automated method exists for hiding the physical server names from the browse list.

Configuring Software

Windows NT Server Configuration

Before installing MSCS, you must install Microsoft Windows NT Server, Enterprise Edition on the non-shared disk(s) of each node, and you must apply Service Pack 4. Make sure that no paging files or system files reside on the shared disk(s). For more detailed information on configuring servers, see your Microsoft Server Administrator's Guide.

Non-Clustered Applications

If you have resources that you do not want to run in the context of a resource group, install them on a local, non-shared disk.

Configuring Clusters

Use the information you gathered during the planning stage to implement resources, dependencies, and groups.

Determining Resource Parameters

To optimize performance on your system, you need to observe a fully configured cluster running on hardware from the Hardware Compatibility List. To optimize, try changing the defaults for polling intervals, and for the pending timeout, RestartThreshold, and RestartPeriod parameters.

You can tune the values of the LooksAlive or IsAlive polling intervals to determine how quickly a cluster service becomes aware of a resource failure. For more information on setting polling intervals using Cluster Administrator, see "Setting Properties" in Chapter 4: Managing MSCS of the Microsoft Cluster Server Administrator's Guide or the Cluster Administrator Help.

Tune the Pending Timeout parameter for each resource using the worst case restart time. You should consider that restart times vary with the size of a resource's transaction logs (for example, Microsoft SQL Server and Microsoft Exchange). Calculate the timeout for the node under maximum load to avoid setting the parameter too low, which can cause the cluster service to put a resource in a failed or offline state even if a failure has not occurred.

The RestartThreshold and RestartPeriod parameters are calculated in combination (and in conjunction with the Pending Timeout value) to define how many attempts are made to restart a resource before the group is moved to another node. (Note: >= indicataes greater than or equal to.)

	 RestartPeriod >= RestartThreshold x Pending Timeout

For example, a resource has a 10-second restart time and a worst case restart time of 30 seconds. If three restart attempts are allowed, you should set the RestartPeriod to no less than 90 seconds:

	 RestartPeriod >= 3 x 30 seconds

Setting RestartPeriod too low can create problems. Consider what happens if you set it to 45 seconds for the case above. A nominal restart time (10 seconds) needs about 30 seconds to make three restart attempts, after which the threshold is reached and the resource group is failed over. If the server is heavily loaded, however, three restarts can never complete in 45 seconds, so the process will never meet the RestartThreshold and the process will keep trying restarts on the same node indefinitely.

Implementing Dependencies

As you implement the resource dependencies you established during the planning stage, keep in mind that the relationships are transitive. For instance, if the print spooler resource is dependent on a network name that is in turn dependent on an IP address resource, you don't have to define a dependency relationship between the print spooler and the IP address. See "Setting Properties" in Chapter 4: Managing MSCS of the Microsoft Cluster Server Administrator's Guide.

Implementing Groups

You have to determine the FailoverThreshold and FailoverPeriod parameters for groups the same way you determined them for individual resources. Use the worst case failover time for the entire resource group.

	 FailoverPeriod >= FailoverThreshold x (worst-case Group Failover Time)

For example, don't set the FailoverPeriod lower than 180 seconds for a group with a worst case failover time of 45 seconds and a threshold of four restart attempts:

	 FailoverPeriod >= 4 x 45 seconds

See "Setting Properties" in Chapter 4: Managing MSCS of the Microsoft Cluster Server Administrator's Guide.

Establishing Administrative Procedures

Here are some recommended administrative procedures for MSCS.

Proper Startup/Shutdown Procedures

MSCS startup and shutdown procedures mostly depend on the hardware platform you are using. Because cluster nodes access information on SCSI disks, in most cases you should power up shared SCSI drives before the cluster nodes and wait about a minute to give the nodes adequate time to properly spin-up and become fully operational. You should shut down cluster nodes completely before you power off the shared SCSI drives.

In addition, before you perform system maintenance on a node you should transfer all failover groups to the other node. Transfer them back to the original node when the maintenance is complete.

Maintaining Shared Applications

The proper maintenance procedures for applications installed on the cluster's shared SCSI drives depend largely on the application. If registry changes are required and the application resource is configured to replicate registry entries, installation on the active node may be all that is required. Cluster-aware applications are better equipped to support this type of installation method. If the application is not cluster-aware, you may have to install the software on both nodes.

Backup Procedures

For MSCS: back up the operating system for each cluster node, the data on the SCSI bus drive, and the data on each node's local drive.

Operating System

Windows NT 4.0, Enterprise Edition includes a separate utility, Cluster Configuration Backup (ClusConB) for backing up your cluster configuration. For more information, see the MSCS Release notes (\MSCS\Readme.doc) in CD 2. Use the same process to back up and restore cluster nodes that you use for other Windows NT Server installations: use Windows NT Backup for the registry and boot and system drives. You can also use RDISK.EXE to keep a current emergency repair disk (ERD) for both nodes.

Because the hardware settings and the disk signatures for the shared SCSI bus are stored in the registry, you cannot restore the Windows NT backup onto another computer. For instance, if a cluster node fails and you replace it with a new node, you must reinstall Windows NT Server, Enterprise Edition on the new node computer. If the other cluster node is still functional you can run Cluster Administrator on it, and evict the replaced node, install MSCS on the new node, join the existing cluster, and restore applications and data.

Knowledge Base articles on restoring Windows NT after replacing hardware

Article ID

Title

112019

Changing Primary Disk System After Installation

130928

Restoring a Backup of Windows NT to Another Computer

139822

How to Restore a Backup to Computer with Different Hardware

139820

Moving or Removing Disks & Fault Tolerant Drive Configurations

113976

Using Emergency Repair Disk With Fault Tolerant Partitions

Shared SCSI Bus Drive

You can back up data on the shared SCSI bus drives from the node that owns the disk resource you want to back up and you can back up data from a remote computer to a hidden administrative share using a network connection. For instance, you can use the New Resource wizard to create FBackup$, GBackup$, and HBackup$ file shares for the root of drives F, G, and H. The shares are not displayed in the Windows NT browse list and can be configured so that only members of the Backup Operators group can access them.

For more information, see Chapter 6: Backing Up and Restoring Network Files in Windows NT Server Version 4.0 Concepts and Planning, and Chapter 5: Preparing for and Performing Recovery in the Windows NT Server Resource Guide. (To view sections and articles in those chapters, Use the "Sync Contents" button [Ctrl+S].)

Local Drives

It is also critical to back up data on the shared local drive. It is acceptable, but not the best approach, to install backup hardware and software on each node and let each node back up its data. A better solution is identify a non-clustered Windows NT Server and schedule it to copy data regularly from the shared drive to a backup server, from which you can back up a copy.

Using Cluster Administrator

Cluster Administrator performs most of the MSCS administrative functions. It is installed by default on both cluster nodes when you install MSCS. You can also install it on any Windows NT 4.0 Workstation or Server computer on the network. There are a few things to watch out for when using the Cluster Administrator.

Verify Resource Settings

Make sure you accurately enter directory and file names for resources you create. If you make a mistake while entering a file share or generic application name, Cluster Administrator does not verify them.

Increase Size of Quorum Log File

When a node is unavailable, MSCS writes all configuration changes and management data to a file called the quorum log. If both nodes are offline, the first node back online checks this log for any configuration changes before it brings the cluster online.

Entries are removed from the log once all nodes have processed any changes. But if a node is down for a long time or there are a large number of resources in the cluster (such as a cluster print server with numerous printers) the log file can fill and data can be lost. You can increase the log file size (default 64 KB) to prevent this.

Additional clustering tasks can be performed by the Cluster Administrator executable program located in the %systemroot%\system32 folder. For more information, see the MSCS on-line help or the Microsoft Cluster Server Administrator's Guide. (Use the "Sync Contents" button [Ctrl+S] to view the chapters and articles in this guide.)

Disaster Recovery Procedures

MSCS is only one part of a fault-tolerant system. You should follow traditional disaster recovery guidelines when using it, such as:

  • Perform regular system backups (including registries) and store copies of backup tapes offsite.

  • Purchase and configure a spare cluster configuration to protect against failure of original equipment.

  • Implement cluster shared disks as hardware RAID devices (the first release of MSCS does not support software-based RAID). This is especially important for the quorum log because without this redundancy SCSI disks represent a single point of failure for the cluster.

  • Create a distributed system with components in different data centers. This is not possible with a traditional SCSI-based cluster because of the extremely restricted distance limitations imposed by the SCSI bus.

Fiber channel solutions are currently being developed to deploy MSCS as fault resistant in building-wide (or larger) disaster scenarios, but current fiber channel implementations often depend on a fiber channel hub, which can represent a new single point of failure. This issue should be addressed in future releases.

Maintenance Procedures

When maintenance is scheduled on one of the cluster nodes, transfer all resources and applications to the other node. When maintenance is complete, transfer them back to the original node and begin maintenance procedures on the second node. When you are done, load-balance resources manually or let them failback to their preferred node. You must synchronize firmware and software upgrades on both nodes of a cluster.

MSCS Mode Requirements

MSCS supports two modes of operation: active/active (both nodes provide services to users) and active/passive (one node does the work and the other is on standby). For capacity planning, use Performance Monitor to ensure that each node can accommodate the applications and services running on the other node. On a Windows NT 4.0 system you should at a minimum monitor these counters:

Memory

  • Pages/sec. Number of requested pages accessed from disk because RAM was not immediately available (acceptable range is 0 - 20).

  • Available Bytes. Amount of available physical memory (acceptable value is 4 MB or higher).

  • Committed Bytes. Amount of virtual memory that is committed to physical RAM or to pagefile space (should be less than the amount of physical RAM).

  • Pool Non-paged Bytes. Amount of RAM in the non-paged pool system memory area; space acquired by operating system components as they accomplish their tasks (value should not increase).

Processor

  • % Processor Time. Amount of time the processor is busy (should not exceed 75%).

  • % Privileged Time. Amount of time the processor spends performing operating system services (should not exceed 75%).

  • % User Time. Amount of time the processor spends on user services, such as running a word processor (should not exceed 75%).

  • Interrupts/sec. Number of interrupts the processor is servicing from applications or hardware devices (depends on the processor, but should be fairly low).

  • System:Processor Queue Length. Number of requests the processor has in its queue (should not exceed two).

Disk

  • % Disk Time. Amount of time the disk drive is busy servicing read and write requests (acceptable value is less than 50%).

  • Disk Queue Length. Number of pending disk I/O requests for the disk drive (acceptable range is 0-2).

  • Avg. Disk Bytes/Transfer. Average number of bytes transferred to or from the disk during read/write operations (depends on the disk subsystem, but value should be high).

  • Disk Bytes/sec. Rate at which bytes are transferred to or from disks during read/write operations (depends on the disk subsystem, but value should be high).

These counters provide a good measuring stick for system performance, but you may want to monitor other counters depending on the applications and services that are clustered. Collect erformance data at peak and off-peak periods for a specified timeline to see how the system is faring. For more information on performance tuning see the MS Windows NT Server 4.0 Networking Guide. (Use the "Sync Contents" button [Ctrl+S] to view the chapters and articles in this kit.)

Printing

If you use MSCS as a print server be aware of these limitations:

  • You must stop and restart the print spooler to add printer ports remotely for both Windows NT Server 4.0 or Windows NT 4.0, Enterprise Edition. If you have multiple remote locations and cannot interrupt user printing during peak business hours, you'll have to add ports during non-business hours.

  • You should place all printer ports and drivers on both cluster nodes so that they are available during system maintenance or failures.

Testing MSCS on Your System

As soon as a cluster is implemented, you should perform several tests.

Validating Installation and Configuration

Most support problems are related to initial installation and hardware configuration. To identify and eliminate these problems as early as possible, implement a basic acceptance test immediately after installing the cluster.

Testing Failover Scenarios

You also need to test for potential failover scenarios. For instance, you should verify that the disk and cluster resources are available on the correct node after a system failure. Test this before you install applications or configure other resources. The table below shows you how to use Cluster Administrator on a remote computer to test for failover.

Validating failover scenarios

Test categories

Node A owns all resources. Fail to Node B.

Node A owns all resources. Node B fails.

Node B owns all resources. Fail to Node A.

Node B owns all resources. Node A fails.

Group move (administrative)

X

 

X

 

Resource failure (administrative)

X

 

X

 

Node failure—system restart

X

X

X

X

Node failure—Windows NT trap

X

X

X

X

Node failure—system reset

X

X

X

X

Node failure—power down

X

X

X

X

Note: The default configuration for resource failure is to restart upon failure. Unless you modify these parameters, the disk resources must be failed four times in succession for a failover to occur (using the Initiate Failure command).

Designate a node as the failing node, then perform an orderly operating system shutdown on it to test the system restart. To initiate a Windows NT blue screen trap test, use the "KILL" utility in the Windows NT Resource Kit to terminate the WINLOGON process.

Most server hardware includes a system-reset switch for rebooting without completely shutting off the power. The system reset and power down tests help determine if a complete loss of power to one node affects the SCSI bus termination.

This level of validation testing can effectively eliminate hardware configuration or software installation as problem sources.

Testing Units

You should perform unit tests on each virtual server you created in the planning stage before you fully implement MSCS on your system. It is the best source of information for determining static load balancing. Use these guidelines (tests vary depending on resource type):

  • Resource parameters. Verify that restarts function correctly and that failover occurs according to the period and threshold settings for each resource. Carefully validate the Pending Timeout setting for startup and shutdown of each resource—when resources exceed this parameter they are placed in a failed state and loss of service results. You should also account for the loss of other non-critical network services, such as name servers, that may coincide with a failover.

  • Resource dependencies. Ensure that all dependencies between resources are configured and functioning properly. Take each resource in the dependency tree offline and bring it back online to verify that it behaves appropriately during startup and shutdown.

  • Resource failures. Wherever possible, induce resource failures without using Cluster Administrator. For example, you can fail network name and IP address resources by introducing duplicates on the network. These are generally more meaningful test cases than artificially initiated failures.

  • Group parameters. Ensure that all group parameters are functioning as specified. Test to see that failback to a preferred node works even in abnormal startup conditions. For instance, check the low virtual-memory state during the saving of a system dump file that follows a blue-screen trap.

  • Failover scenarios. Repeat all of the failure scenarios described in the section on validating failover scenarios table above.

  • Checkpoint restart. Test the worst case restart times for applications (such as Microsoft SQL Server or Exchange) that perform checkpoints and use a transaction log.

  • Scalability boundary cases. Design scalability tests to account for worst-case growth predictions to take into account the fact that some virtual servers are designed to support a variable number of associated resources. For instance, a virtual server that hosts home directory shares uses one disk resource, one network name resource, one IP address resource, and a variable number of shares. On the other hand, a virtual server for print services uses fixed MSCS resources but the number of printers defined on the spooler varies greatly.

  • Client load. Some resources may perform differently when placed under a heavy client load. In most cases it is impractical to set up a lab with the hundreds (or thousands) of clients necessary to duplicate a worst-case load scenario. If available, you can use benchmarking or load simulation tools to simulate the impact of a large number of clients, which requires fewer machines. Another (less reliable) approach is to track carefully the load signature with varying numbers of smaller client loads and extrapolate the worst-case conditions.

  • Client response. For testing, use the client software that you used to connect to the cluster during production. Although you may not be able to change client behavior in response to a restart or failover, it is very important to understand and document it. For instance, at the first threshold, a loss of service may not be visible to the client. At a second threshold, a dialog box may appear giving the client the opportunity to retry an operation. But by the third threshold, retries may no longer be successful and the client may be required to restart. You can use this type of information to implement a failover scenario that minimizes impact to users.

  • Rolling upgrade. Before implementing MSCS, you should include a process for upgrading or applying patches to the hardware or applications without a loss of service—a rolling upgrade. Test this process as much as is practicable.

  • Symmetry. Because most cluster configuration is not node-specific (with the exception of preferred owners) you don't have to repeat unit testing scenarios with resources that are active on the alternate node.

If, while you are executing your test plan, you think of other tests you would like to run, document them and perform them during integration testing.

Testing for Proper Integration

Usually, traditional test plans use integration testing to exercise and verify the functionality of modules after you have combined them in the unit tests. In a cluster environment, the situation is somewhat different because combining virtual servers on a cluster should not result in any functional changes from the client's perspective.

So integration testing in a cluster is designed to ensure that each virtual server functions the same while coexisting with other virtual servers as it did during unit testing. Some performance impact is to be expected, but there should be no functional changes as a result of grouping virtual servers.

To conduct an integration test, configure all virtual servers to a single node and re-verify the unit test cases in the worst-case scenario. To produce a system-wide, worst-case load signature, make sure that you execute scalability and client load scenarios in parallel.