Chapter 4 - Managing MSCS

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

This chapter describes the various tasks in administering MSCS. The following topics are discussed:

  • Using Cluster Administrator

  • Understanding how MSCS changes your environment

  • Creating resources and groups

  • Setting properties

  • Manipulating nodes, groups, and resources

  • Administering clusters from the command line

  • Installing new SCSI hardware

  • Managing security

  • Recovering disk-subsystem failures

On This Page

Using Cluster Administrator
How MSCS Changes Your Environment
Creating Resources and Groups
Setting Properties
Manipulating Nodes, Groups, and Resources
Forcing Node Time Synchronization
Administering Clusters from the Command Line
Installing New SCSI Hardware
Managing Security
Recovering Disk-Subsystem Failures

Using Cluster Administrator

Cluster Administrator shows you information about the groups and resources on all of your clusters and specific information about the clusters themselves. A copy of Cluster Administrator is automatically installed on both cluster nodes when you install MSCS. For remote administration, you can install separate copies of Cluster Administrator on other computers on your network running Service Pack 3 and either Windows NT Workstation or Windows NT Server version 4.0. The remote and local copies of Cluster Administrator are identical.

This section describes the uses of Cluster Administrator, including the ways in which it changes procedures you might have been familiar with in a non-MSCS environment.

For more information on installing Cluster Administrator on a computer running Windows NT Workstation or Windows NT Server for remote administration of MSCS clusters, see "Installing Cluster Administrator on a Remote Computer" in Chapter 3, "Setting Up an MSCS Cluster."

Connecting to a Cluster

You can connect to a cluster using the network name or the IP address of either the cluster itself or one of its nodes. You can also simultaneously connect to multiple clusters.

When you start Cluster Administrator, any connections you left open at the end of your most recent Cluster Administrator session are automatically restored. If this is your first session, Cluster Administrator prompts you to enter a cluster name, a node, or an IP address to which to connect before it displays any information. Click Browse to view all the clusters in the same domain as the computer that is running Cluster Administrator.

Viewing the Default Groups

Every new MSCS cluster includes two types of default resource groups: Cluster Group and Disk Group. These groups contain the settings for the default cluster and some typical resources that provide generic information and failover policies.

  • Cluster Group

    The default Cluster Group contains an IP Address resource, a Cluster Name resource, and a Time Service resource. (This group is essential for connectivity to the cluster).

  • Disk Group

    One Disk Group is created for each disk resource on the shared SCSI bus. A Physical Disk resource is included in each group.

Do not delete or rename the Cluster Group. Instead, model your new groups on the Cluster Group by modifying the Disk Groups. For example, you might add an IP Address resource, Network Name resource, and IIS Virtual Root resource to Disk Group 1 and then rename the group, Web Group.

Note: Each cluster needs only one Time Service resource. You do not need, and should not create, a Time Service resource in each group.

Identifying Failovers

Several types of icons appear in Cluster Administrator beside the various listings of groups, resources, and other elements. The most important icon to recognize is the node down icon, which indicates that a failure has occurred on a node and that its groups and resources have been transferred to the surviving node.

Figure 4.1: The node down icon

Figure 4.1: The node down icon

This icon does not necessarily mean that you have lost functionality in any of your groups or resources. In most cases, the group will have failed over to the alternate node.

It is important to routinely monitor the status of all clusters and check for failover activities that diminish either the performance or the availability of resources. Failover should be easily tolerated, and your workflow should not suffer if all applications are working, failover policies have been correctly set, and you have provided sufficient capacity for all situations. If any of these three criteria are not met, you must take action to meet them.

Serious failover situations can affect performance and availability. If a node fails and its companion node is unable to serve clients efficiently, the situation must be resolved immediately. If groups persistently fail over and fail back without surviving on either node for very long, availability to clients is completely lost.

Responding to a Failover Event

When investigating possible problems with failovers, answer the following questions:

  • How serious is the problem?

    For example, other resources could be suffering due to the increased load placed on the alternate node.

  • What caused the failover?

    Many types of failure can cause failovers. Usually, these occur at the operating-system and hardware levels. For help with determining the exact type of a failure, see Chapter 5, "Troubleshooting."

    To review what happened prior to and during a group failure, run Windows NT Event Viewer and look at the Application log, System log, and the Security log (if applicable). This helps you determine what types of errors occurred.

  • Are any changes needed to prevent this kind of event in the future?

    If your performance has seriously suffered as a result of a failover, review the capacity-planning issues presented in Chapter 2, "Planning Your Cluster Environment."

    If you determine that the nodes did not efficiently transfer control of various groups and resources during failover, review the failover policies you have set. For example, MSCS could be attempting to bring a File Share resource online before the corresponding Physical Disk resource was brought online. This type of error is easily fixed by configuring dependencies.

For more information about diagnosing and resolving specific kinds of failures and errors, see Chapter 5, "Troubleshooting." For more information on resource dependencies, see "Resource Groups and Dependencies" in Chapter 1, "Microsoft Cluster Server Concepts."

Cluster Administrator Command-Line Options

Cluster Administrator supports command-line parameters. You can specify multiple cluster names, cluster node names, and cluster IP addresses. When you start Cluster Administrator with a command-line parameter, Cluster Administrator does not open any of your previous cluster connections.

How MSCS Changes Your Environment

MSCS changes some aspects of the network environment. Besides changing some views of the network, MSCS also changes some procedures that network administrators perform.

Viewing Your Network

Previously, you probably used various administrative tools to monitor the contents and activities of the servers running Windows NT Server on your network. These tools continue to help you administer your network by providing access to servers that do not act as MSCS nodes. For administering MSCS clusters, Cluster Administrator provides a way to view cluster network resources.

Cc750173.xcla_d04(en-us,TechNet.10).gif

Figure 4.2: The Cluster Administrator window

With Windows NT Explorer and many other tools, you see only physical servers. Cluster Administrator shows you all the entities you define for your MSCS environment: clusters, resources, and groups. Of these, groups are the most significant because they are the virtual servers that clients see over the network and to which the clients connect.

As an administrator, you control all the entities that appear, such as those in the diagram of a typical cluster, shown in Figure 4.3.

Cc750173.xcla_d01(en-us,TechNet.10).gif

Figure 4.3: A cluster, as viewed in Cluster Administrator

Your ability to recognize these entities, to name them appropriately, and to manage their relationships to each another is the key to successfully managing your MSCS environment.

When you deploy MSCS in your environment, assign appropriate names to all groups (virtual servers) so that these client connections are logical and easy to identify.

Assigning Network Names

When users view or browse for cluster resources, they see the following NetBIOS names:

  • The cluster name (for example, \\SalesCluster)

  • The cluster nodes names (for example, \\SalesNode1 and \\SalesNode2).

  • Any defined virtual server names (for example, \\Sales)

Users who browse or view the cluster name (\\SalesCluster in the example) can see all available shares on the cluster and also any local shares on the node that owns the resource group. Do not create any local shares on a node. If users connect to the local shares and the node fails or is taken offline, those users are disconnected.

Users should not connect to cluster resources though node names or the cluster name; they should always use virtual-server names to connect to cluster resources because this ensures that a failover will not stop their work. If users do view individual nodes, they see those resources currently owned by the node. For example, if \\Sales\FileShare is running on \\SalesNode1, users could browse and find \\SalesNode1\FileShare. However, if users connect to cluster shares through a node name (for example, \\SalesNode1\FileShare) instead of through a virtual-server name, those users will be disconnected if the node fails or is taken offline. Similarly, if users connect to cluster resources through the cluster name, those users are disconnected when the resource moves to the other node.

You can minimize the potential for problems by:

  • Using cluster and node names that are easily identifiable by administrators, but not users.

  • Defining virtual-server names that are easily identifiable by users.

Reevaluating Administrative Procedures

Typically, an administrator will follow a set of server operating procedures when performing administrative tasks, such as rebooting a server, backing up applications and data, and installing new hardware or software. These operating procedures should be reexamined and some will have to be modified for the MSCS nodes installed on your network.

Reexamining Procedures that Take Servers Offline

Because MSCS clusters consist of two servers, the jobs of administering servers and performing common server operating procedures are less disruptive to users. For example, when you must perform an administrative task that would make a server unavailable to users, an MSCS server has a second node that can continue to make the server resources available to users. Therefore, you may not have to wait for non-peak hours, when:

  • A server must be restarted.

  • New hardware must be installed.

  • New software must be installed.

Reexamining Backups

Your backup procedures should be modified to account for clustering issues. When using MSCS, you should back up the following items:

  • The operating-system installation on each node in the cluster.

  • Data on the shared SCSI bus drives.

  • Data on local drives on each node.

Microsoft provides a separate utility for backing up your cluster configuration. For more information, see the MSCS Release notes.

Backing up and restoring cluster nodes is no different than backing up other installations of Windows NT Server: use Windows NT Backup to back up the boot and system drives, and the registry. Also, use Rdisk.exe to keep a current emergency repair disk (ERD) for both nodes.

Note: Because of the hardware settings and the disk signatures for the disks on the shared SCSI bus are stored in the registry, you cannot restore your Windows NT backup on another computer. If a computer that is used as a cluster node fails, and you replace the computer, you must reinstall Windows NT Server, Enterprise Edition on that computer. If the other node of the cluster is still functional, run Cluster Administrator on the other node, and evict the replaced node. Then, install MSCS on the new node and join the existing cluster. You can then restore your applications and data.

For more information about replacing hardware, see "Installing New SCSI Hardware" later in this chapter. For more information on restoring Windows NT after replacing hardware, see the following Microsoft Knowledge Base articles (available on Microsoft Technet, and https://www.microsoft.com/kb): 112019, 130928, 139822, 130962, 139820, and 113976.

The backup of data on the shared SCSI bus drives may be done from the node that owns the disk resource you want to back up. You can also back up this data from a remote computer through a network connection to a hidden administrative share. For example, you might use the New Resource wizard to create FBackup$, GBackup$, and HBackup$ file shares for the root of drives F, G, and H. These shares would not appear in the Windows NT browse list and could be configured to allow access only to members of the Backup Operators group.

For information on backing up and restoring Windows NT Server, see Chapter 6, "Backing Up and Restoring Network Files," in Windows NT Server Version 4.0 Concepts and Planning, and Chapter 5, "Preparing for and Performing Recovery" in the Windows NT Server Resource Kit Resource Guide.

Creating Resources and Groups

Most MSCS administration tasks involve managing groups and resources. This section describes the process for creating each of these. The process generally consists of:

  • Running the New Group wizard.

  • Bringing the group online.

  • Running the New Resource wizard.

  • Bringing the new resource online.

For step-by-step instructions, see Cluster Administrator Help.

Running the New Group Wizard

Use the New Group wizard in Cluster Administrator to add a new group to your cluster. To start the New Group wizard, on the File menu, click New, and then click Group.

When you add a new group, the New Group wizard guides you through the two-step process. Before running the New Group wizard, make sure you have all the information you need to complete the wizard. Use the following table to prepare to run the wizard.

Table 4.1 Information required to run the New Group wizard

Information required

What it is used for

The name you will assign to the group

The name that you give the group is used only for administrative purposes. It is not the same as the Network Name, which allows users to access resources through Virtual Servers.

The text you will use to describe the group

The group description appears in the right pane of Cluster Administrator when you select the Groups folder in the left pane.

The name of the node that will be the preferred owner of the new group

The preferred owner is the node on which you prefer each group to run.

Assigning a Preferred Node

The preferred owner is the node on which you prefer each group to run. For example, the "Static Load Balancing" model (discussed in Chapter 2) performs best when groups are appropriately balanced between two nodes. When a node fails, the remaining node takes over the groups from the failed node, but performance is diminished. By setting those groups to fail back to their preferred server (the failed node), you automatically restore maximum performance when failback occurs. A group does not fail back if a preferred owner is not selected.

You do not always choose a preferred server because it may not matter where the group resides; all that matters to you is that the group is still running on one of the two nodes. Or, the nodes may be equally capable of handling the load required to utilize some or all of the resources. If a group does not have a preferred owner, it will never fail back.

For more information on configuring a group to fail over and fail back, see "Setting Group Properties" later in this chapter.

Bringing a Group Online

New groups appear in Cluster Administrator and are offline and without resources by default.

You bring a group online by selecting it, clicking the File menu, and then clicking Bring Online.

Running the New Resource Wizard

The New Resource wizard guides you through the necessary steps to successfully add new resources. You start the New Resource wizard by clicking the File menu, clicking New, and then clicking Resource.

When you add a resource, you are asked to specify the following:

  • The name of the resource

    The name of the resource is the name that appears in Cluster Administrator. You can also provide a more specific description of the resource for administrative purposes.

  • The type of resource

    Your selection of the resource type determines the next step in the wizard. After resources are configured, the resource type for each resource also appears in a column in Cluster Administrator.

    For more information on steps necessary to configure each resource type, see the following section, "Specific Resource-Type Settings."

  • The name of the group to which the resource will belong

    You must give each resource in your cluster a unique identity. It is a good idea to create a new group for each set of resources you create. Never use the default groups for your new resources.

  • Your preference regarding a separate Resource Monitor

    Resource Monitors are software components that allow a resource to run separately from other MSCS resources. You specify this preference by selecting the Run this resource in a separate Resource Monitor check box. A Resource Monitor does not affect the performance or availability of resources. Generally, running ill-behaved and hand-coded resources in separate Resource Monitors makes it easier to debug them.

  • A list of possible owners

    The list of possible owners specifies which nodes in the cluster are capable of running that resource. By default, both nodes appear as possible owners, so the resource can run on either node. In most cases, it is appropriate to use this default setting. If you want the resource to be able to fail over, both nodes must be designated as possible owners.

    If you need to restrict a resource (and thereby its whole group) to running only on a specified node, remove the other node from the possible-owners list. Usually, this should be done only if the alternate node does not have the ability to handle the group or if keeping that resource available is less important than maintaining a relatively high level of performance on the surviving node.

    All resources in a group must show at least one possible owner, and that node must be a possible owner for all resources in that group. If you want a group to fail back to a particular node after it is restored to service, that node must be configured as the preferred owner, which you can do in the group Properties dialog box, on the General tab.

  • Any dependencies

    When you create a resource for a group that already has resources in it, the New Resource wizard asks you to define resource dependencies for the resource you are creating. If the group does not have any resources within it, you are not prompted to provide this information.

    For more information on dependencies, see "Resource Groups and Dependencies" in Chapter 1, "Microsoft Cluster Server Concepts," and "Setting Resource Dependencies," later in this chapter.

  • Any parameters specific to the resource

    You must select the parameters for the specific Resource Type.

Specific Resource-Type Settings

The wizard options vary depending on what you specify in Resource Type on the first screen of the New Resources wizard. The following table shows the information that is needed for each resource type. Each of these types is discussed in greater detail in the sections that follow.

Table 4.2 Information required when adding resources

Resource Type

Specific information you must supply

DHCP Server

Path to the DHCP database.

Distributed Transaction Coordinator

None

File Share

Name of the share
Full network path to the share location

Generic Application

The command line for the application
The current directory

Generic Service

Name of the service

IIS Virtual Root

The type of Web service (FTP, Gopher, or WWW)
The name of the root directory
The alias ("friendly name") for directory

IP Address

The name of the network to use (intranet network or cluster network)
The IP address (x.x.x.x) and subnet mask (x.x.x.x)

Microsoft Message Queue Server

None

Network Name

The computer name you want to create

Physical Disk

Drive

Print Spooler

The name of the spool folder

Time Service

None

Note: Do not create more than one resource of each of the following types: DHCP Server, Distributed Transaction Coordinator, Microsoft Message Queue Server, or Time Service. You can create one of each of these, but not more than one of each of these.

DHCP Server

You can use the DHCP Server resource type to provide Dynamic Host Configuration Protocol (DHCP) services from an MSCS cluster. The only parameter that is specific to the DHCP Server resource is the path to the DHCP database.

DHCP Server resources have required dependencies on an IP Address resource and a storage class resource (typically a Physical Disk resource).

Distributed Transaction Coordinator

You can use the Distributed Transaction Coordinator resource type to use Microsoft Distributed Transaction Coordinator (MSDTC) in an MSCS cluster. Because there is no Parameters tab for Distributed Transaction Coordinator resources, you need only specify the Distributed Transaction Coordinator resource dependencies when you create a new Distributed Transaction Coordinator resource. Distributed Transaction Coordinator resources have required dependencies on:

  • A storage class device (typically a Physical Disk resource).

  • A Network Name resource.

File Share

When you add a new file share and specify the File Share resource type, the New Resource wizard prompts you to specify the following information:

  • The share name

  • The share path

  • A comment

  • The maximum number of users that can connect to the share at any one time

  • The share access permissions

The configuration of a file-share resource using Cluster Administrator is identical to the configuration of a file share using Windows NT Explorer or My Computer. However, if you change share permissions using Windows NT Explorer or My Computer instead of using Permissions on the Parameters tab in Cluster Administrator, the permissions are lost when the resource is taken offline.

For more information on sharing directories, see Chapter 4, "Managing Shared Resources and Resource Security" in Windows NT Server Version 4.0 Concepts and Planning.

Generic Application

When you select Generic Application as the resource type, you set up the possible owners and the dependencies of the application. You then configure the properties of the application. In the example shown in Figure 4.4, MSCS is configured to monitor the resource, Netmon.exe.

Cc750173.xcla_d21(en-us,TechNet.10).gif

Figure 4.4: Configuring Netmon in the New Resource wizard

To configure the specific parameters for an application, in Command line, type the command line, including the executable filename. If you want clients connected to the MSCS nodes to have access to the application, select the Allow application to interact with desktop check box. Otherwise, the application does not appear on the desktop or toolbar.

Generic Service

When you add a service to the MSCS clustered environment, click Generic Service as the resource type.

After you configure the possible owners and, if necessary, the dependencies, you must supply the name of the service. For example, Figure 4.5 shows the Microsoft SQL Server service name: MSSQLServer. You must type the exact name of the service because the method of maintenance of services is specific to the name. The service name is not case sensitive. Specify the service executable file name and path in Command line when you need to specify a startup parameter for the service.

Cc750173.xcla_d23(en-us,TechNet.10).gif

Figure 4.5: Naming the service in the New Resource wizard

After you click Finish, a message appears, confirming that the resource has been created. By default, the resource is offline, so you must bring the service online before it successfully operates as an MSCS resource.

IIS Virtual Root

The IIS Virtual Root resource type is used to provide failover capabilities for IIS version 3.0 or later virtual root directories. If a virtual directory does not already exist, it is automatically created in IIS when an IIS Virtual Root resource is created in Cluster Administrator.

When you select IIS Virtual Root in Resource type, you specify the possible owners and configure other dependencies, as usual. You then add the IP Address resource for the group as a dependency. The New Resource wizard does not allow the virtual root to be created unless an IP Address resource is already present in the group and in the list of dependencies for the virtual root.

Next, you configure the IIS-specific parameters. Specify the service with which the virtual root will be used by clicking FTP, Gopher, or WWW. In Directory, type the path to the directory on the disk on which the files reside. In Alias, type the name you want to appear for the virtual root. Under Access, select check boxes for the types of access you want to give clients. The WWW service supports Read and Execute. The FTP service supports Read and Write. The Gopher service does not support configurable access.

Cc750173.xcla_d25(en-us,TechNet.10).gif

Figure 4.6: Parameters of an IIS Virtual Root resource

IP Address

You can assign an IP address to only one of the network adapters in each node. Since each adapter is on a different network, you can assign the IP address by clicking the appropriate network in Network to use. If you want a group to have its own IP address, first get a static IP address and the subnet mask for your network from your network administrator. Then, type the static IP address in Address and the subnet mask in Subnet mask.

Cc750173.xcla_d29(en-us,TechNet.10).gif

Figure 4.7: Parameters of an IP Address resource

MS Message Queue Server

You can use the Message Queue Server resource type to use Microsoft Message Queue Server (MSMQ) in an MSCS cluster. Because there is no Parameters tab for MSMQ Server resources, you need only specify the MSMQ Server resource dependencies when you create a new MSMQ Server resource. MSMQ Server resources have required dependencies on:

  • A Distributed Transaction Coordinator resource.

  • A storage class device (typically a Physical Disk resource).

  • A Network Name resource.

Network Name

The Network Name resource gives identity to the group. With a Network Name resource, your group can become a virtual server. Clients can browse for it (for example, using Network Neighborhood), just like any other server.

When you configure the dependencies for this new resource, you must include an IP Address resource. The New Resource wizard does not allow a Network Name resource to be configured without an IP Address resource in the group. The Network Name allows a particular group in the cluster to be accessed remotely, using UNC names. The only parameter that is specific to the Network Name resource is the actual name that you want to be associated with the group. For example, if the network name is "Pygmalion," and the group in which this Network Name resource resides has a share, the share can be accessed as \\Pygmalion\Sharename.

Physical Disk

For MSCS to use a disk on the shared SCSI bus, the disk must be configured as a disk resource and reside within a group. After naming the resource and specifying that it is a Physical Disk resource, leave the dependencies page blank and go to the next screen.

Multiple resources can be dependent on a specific disk resource, but all the resources must be located within the same group. This ensures that if the group fails over to a different node, all of the resources that depend on the disk fail over at the same time.

Before using the New Resource wizard to add a new Physical Disk resource, review the procedures in the "Installing New SCSI Hardware" section later in this chapter.

Print Spooler

When you create a Print Spooler resource, you must ensure that the group has a storage class resource (typically a Physical Disk resource), an IP Address resource, and a Network Name resource. The Print Spooler must be dependent on both a Physical Disk and a Network Name (which, in turn, is dependent on an IP Address resource).

After you name the Print Spooler resource, identify its type, and specify the group in which it will reside, you must do the following:

  • Configure the possible owners.

  • Add the Physical Disk and Network Name resources as dependencies.

  • Provide the path to the print-spool folder.

    On the Parameters tab, type a fully qualified path for the spool folder (for example, G:\Spool). The spooler creates the folder if it does not already exist.

    Note: That the folder must reside on a disk resource that is on the shared SCSI bus.

After you have created and configured the Print Spooler resource:

  • Install the printer ports and printer drivers on each node.

  • Add printers to the clustered spooler.

For more information on creating Print Spooler resources, see "Creating a Print Spooler Resource," later in this chapter.

Time Service

The time service has no unique parameter screens. The time service should not be dependent on anything. It is a Windows NT service that coordinates the synchronization of the date and time on the cluster nodes.

Note: Each cluster needs only one Time Service resource. You do not need, and should not create, a Time Service resource in each group.

Creating a Print Spooler Resource

MSCS supports the clustering of print services using the Print Spooler resource. Multiple Print Spooler resources can exist in a cluster, but there can be no more than one Print Spooler resource per group. Clients access clustered printers in exactly the same way they access non-clustered printers. Clients can use either the network name or IP address as the server name.

When a group containing a Print Spooler resource fails over to another node, the document that is currently being spooled to the printer is restarted from the other node after failover. When you move a Print Spooler resource or take it offline, MSCS waits until all jobs that are spooling to the printers are finished (or until the configured wait time has elapsed). Documents that are spooling from an application to a Print Spooler resource are discarded and must be respooled (or reprinted) to the Print Spooler resource if the group containing the Print Spooler resource fails over before the application has finished spooling.

To create a Print Spooler resource

  1. Run the New Resource wizard to create and configure the Print Spooler resource.

  2. Install the printer ports and printer drivers on each node.

    Each node must have the necessary printer ports and printer drivers installed. Use the local print folder on each node to install all printers (which installs all the printer drivers) and create all the printer ports the cluster spooler will need. (You cannot run the Add Printer wizard from a remote computer when installing printer ports and printer drivers.) All printer ports used by the clustered spooler must have the same name on both nodes.

  3. Add the printer to the clustered spooler.

    After the printer ports and printer drivers have been installed on each node, the printers can be created and administered remotely (by connecting to a cluster over the network and double-clicking the Printers folder. To add a printer, run the Add Printer wizard on the remote computer. For example, click Start, click Run, enter the cluster name (such as "\\ClusterServe"), click OK, double-click the Printers folder, and then double-click the Add Printer wizard and install the printers. If you do not have Administrative privileges on both cluster nodes, you will not see the Add Printer wizard option. You should not share the printers installed on either node. To prevent misuse, you can delete the printers on each node because the printer drivers are not removed, and will continue to work with the Print Spooler resource.

For more information on setting the Print Spooler properties when running the New Resource wizard, see "Print Spooler," earlier in this chapter.

Setting Properties

After setting up your cluster, you may need to change the configuration of some resources and groups from time to time. To make changes to a selected group or resource, from the File menu, click Properties.

For detailed procedures on setting properties, see Cluster Administrator Help.

Setting Group Properties

Using the group Properties dialog box, you can:

  • View the State of the group (whether it is online or offline) and view which Node currently owns the group.

  • Change the group Name, Description, and Preferred owner.

  • Set the failover policy (using the Failover tab)

  • Set the failback policy (using the Failback tab)

Figure 4.8 shows a group Properties dialog box open to the General tab.

Cc750173.xcla_d30(en-us,TechNet.10).gif

Figure 4.8: The General tab on a group Properties dialog box

Setting Failover Policy

Use the Failover tab to set failover policy for the group. You can set the failover threshold and the failover period. The failover threshold specifies the number of times the group can fail over within the number of hours specified by the failover period. For example, if a group failover threshold is set to "5" and its failover period to "3," the clustering software stops attempting to bring the group online and leaves the resources within the group in their current state. If the IP Address resource is brought online but the Network Name resource fails, the group is left offline, but the IP Address resource is left online.

Setting Failback Policy

By default, groups are set not to fail back. Unless you manually configure your group to fail back after failing over, it continues to run on the alternate node after the failed node comes back online.

When you configure a group to automatically fail back to the preferred node, you specify whether you want the group to fail back as soon as the preferred node is available or to fail back only during specific hours that you define. This option is useful if you want the failback to occur after peak business hours or if you want to make sure the preferred node is able to support the group when it does come back online.

The group must be configured to have a preferred owner to fail back. You can specify a preferred owner on the General tab of the group Properties dialog box.

Setting Resource Properties

Using the resource Properties dialog box, you can:

  • View or change the resource name

  • View or change the resource description and possible owners

  • Specify whether the resource should run in a separate memory space

  • View the resource type, group ownership, resource state, and which node currently owns the resource

  • View preexisting dependencies and modify resource dependencies

  • Specify whether to restart a resource and, if so, what settings to use when restarting the resource

  • Specify how often you want MSCS to check whether the resource is in the online state (by setting the "Looks Alive" and "Is Alive" polling intervals)

  • Specify the amount of time that a resource in a pending state (Online Pending or Offline Pending) has to resolve its status before MSCS puts the resource in Offline or Failed status

  • Set specific resource parameters

Figure 4.9 shows a resource Properties dialog box open to the General tab.

Cc750173.xcla_d35(en-us,TechNet.10).gif

Figure 4.9: A resource Properties dialog box

The General, Dependencies, and Advanced tabs are the same for every resource. However, the Parameters tab is different for each resource type that has a Parameters tab. Some resource types support additional tabs.

Setting Resource Dependencies

MSCS uses the dependencies list when bringing resources online and offline. For example, if a group in which a physical disk and a file share are located is brought online, it is futile to try to bring the file share online before bringing the physical disk (on which the file share is located) online. It is important to remember this when you configure resources because the groups function properly only if these dependencies are correctly configured.

Table 4.3 shows resources and their dependencies. The resources listed in bold type must be configured before you create the resource. The other dependencies (not in boldface type) are not required, but it is highly recommended that you include them as dependencies.

Table 4.3 Resources and resource dependencies

Resources

Dependencies (Required or Recommended)

DHCP Server

Physical Disk or other storage class device (on which the files are located)
IP Address (for client access to the DHCP Server)

Distributed Transaction Coordinator

Physical Disk or other storage class device

 

Network Name (so that remote clients can access it)

File Share

None

Generic Service

None

IIS Virtual Root

IP Address (that you want to correspond to the virtual root)
Physical Disk or other storage class device
Network Name (so that remote clients can access it)

IP Address

None

Microsoft Message Queue Server

Physical Disk or other storage class device

 

Network Name (so that remote clients can access it)

Network Name

IP Address (that you want to correspond to the name)

Physical Disk

None

Print Spooler

Physical Disk or other storage class device (on which the spool is located)
Network Name (so that remote clients can access it)

Time Service

None

The New Resource wizard enforces the use of resource dependencies: You cannot finish creating a resource without first specifying each resource upon which it is dependent.

IIS Virtual Root

The IIS Virtual Root is dependent on an IP Address resource, Network Name resource, and a Physical Disk resource.

Network Name

The Network Name resource is dependent on an IP Address resource. Clients typically access network resources using a computer name.

Print Spooler

The Print Spooler resource needs both a Network Name resource and a Physical Disk resource. Because the Print Spooler is meant to be accessed over the network, it needs a point of reference which the Network Name provides. The spool itself should be located on a shared drive. That drive is a required dependency, so that when MSCS brings these resources online, the drive is up when the spooler looks for it.

Setting Advanced Resource Properties

Use the Advanced tab of the resource Properties dialog box to:

  • Specify whether MSCS should attempt to restart a resource or allow the resource to fail.

    If you want MSCS to restart the resource, you must specify whether to affect the group. If you want the group to be failed over when this resource fails, select the Affect the group check box and then enter the values you want in Threshold and Period. These values determine how many times MSCS attempts to restart the resource before failover and the amount of time in which the threshold number of restart attempts take place. If the Affect the group check box is not selected, a failure of this resource never causes the group to fail over.

  • Specify how often MSCS should make a cursory check ("Looks Alive") or in-depth check ("Is Alive") to determine if resource is in the online state.

    Click Use resource type value to use the default number for the resource type.

  • Specify the amount of time that a resource in a pending state (Online Pending or Offline Pending) has to resolve its status before MSCS puts the resource in Offline or Failed status.

Setting Resource Parameters

Most resources have a resource-specific Parameters tab in their Properties dialog box. Table 4.4 lists each resource and its configurable parameters.

Table 4.4 Resources and configurable parameters

Resource

Configurable Parameters

DHCP Server

You can specify the path to the DHCP database.

Distributed Transaction Coordinator

None.

File Share

You can control which users and how many users have access to the share. You do this by setting permissions on the share and limiting the number of simultaneous users.
You can also set the name of the share (because clients will detect it in their browse or explore lists), set the share comment, and set the path to the shared files.

Generic Application

You can use the Parameters tab for the Generic Application Properties dialog box to specify the application command line, current directory, whether it should use the Network Name for the computer name, and whether or not it can interact with the desktop.

Generic Service

You can use the Parameters tab for the Generic Service Properties dialog box to specify the service name, startup parameters, and whether it should use the Network Name for the computer name.

IIS Virtual Root

You can configure which service the virtual root will be for. You do this by clicking FTP, Gopher, or WWW.
You can enter the path to the directory disk resource on the shared SCSI bus on which the files reside.
You can also configure the alias, which is the name under which the virtual root appears in the particular service.
You can configure the type of permissions users accessing the virtual root have for the WWW and FTP services.

IP Address

You can configure the IP address, subnet mask, and the network parameters for the IP Address resource. You must also specify which cluster network to use.

MSMQ Server

None.

Network Name

You can configure only the computer name parameter for the Network Name resource.

Physical Disk

You can specify only which drive to use for the Physical Disk resource.

Print Spooler

You can configure the path to the Print Spooler folder and the job completion timeout for the Print Spooler resource.

Time Service

You cannot configure any parameters for the Time Service resource.

Setting Network Properties

After setting up your cluster, you may need to change the configuration of the networks that support client-to-cluster communication and node-to-node communication. To make changes, click a network, click the File menu, and then click Properties.

For detailed procedures on setting network properties, see Cluster Administrator Help.

Using the network Properties dialog box, you can:

  • View or change the network name

  • View or change the network description

  • Enable or disable the network for cluster use. If you enable the network for cluster use, you can specify whether to use the network for only client-to-cluster communication, only node-to-node communication, or both.

  • View the network's state

  • View the network's IP address

  • View the network's address mask

Setting Network Interface Properties

After setting up your cluster, you may need to change the configuration of the network interfaces. To make changes, click a network interface, click the File menu, and then click Properties.

For detailed procedures on setting network interface properties, see Cluster Administrator Help.

Using the network interface Properties dialog box, you can:

  • View the node and network name

  • View or change the network interface description

  • View the network interface's state

  • View the name of the network adapter for the network interface

  • View the network interface's IP address

Setting Network Priorities

MSCS nodes communicate over multiple networks. You can change the network's priority to specify the order in which the nodes will attempt to communicate over the networks. To view or change the network priorities, right-click the cluster name and then click the Network Priority tab.

For information on changing the networks over which MSCS nodes can communicate over, see "Setting Network Properties" earlier in this chapter.

Manipulating Nodes, Groups, and Resources

You can use Cluster Administrator to:

  • Pause and restore a node

  • Evict a node

  • Stop and start the Cluster Service

  • Test failover policies

  • Change the online status of groups and resources

  • Delete a group or resource

  • Rename a group or resource

  • Move a resource from one group to another

For more specific information about Cluster Administrator procedures, see Cluster Administrator Help.

Pausing and Resuming a Node

When you pause a node, existing groups and resources stay online, but additional groups and resources cannot be brought online. You pause a node in Cluster Administrator by clicking the node, clicking the File menu, and then clicking Pause Node.

If you have paused a node, you can bring the node back online by clicking the node, clicking the File menu, and then clicking Resume Node.

Evicting a Node

When you evict a node, you prevent the node from participating in the cluster. To evict a node, click the node, click the File menu, and then click Evict Node.

Stopping and Starting the Cluster Service

When you stop the Cluster Service on a node, you prevent clients from accessing cluster resources through that node. When you stop the Cluster Service on a node, all groups move to the other node (if the failover policies allow it).

You stop the Cluster Service on a node by clicking the node, clicking the File menu, and then clicking Stop Cluster Service.

If you have stopped the Cluster Service on a node, you can restart the service by clicking the node, clicking the File menu, and then clicking Start Cluster Service.

Testing Failover Policies

You can test the failover policies you establish for a single group and its resources by manually failing over those elements.

To test the failover policy for a group, type 0 (zero) in Threshold in the group Properties dialog box. Then, right-click the resource and click Initiate Failure. MSCS immediately fails over the group to the alternate node.

To test dependencies and to make sure that applications will work on both nodes, right-click the group in the Cluster Administrator window, and then click Move Group. MSCS takes all of the resources offline, moves the group, and then attempts to bring the resources back online in the appropriate order.

In a test environment, you can fail over all groups from one node to another by using Cluster Administrator to stop the Cluster Service, pressing the reset button on the computer, or turning off the power on one node.

You initiate failure by clicking Initiate Failure on the File menu.

Moving All Groups on a Node for Server Maintenance

You can take advantage of MSCS when you do maintenance on nodes in your MSCS environment. This enables you to take a server offline without losing availability of your resources. You can do this by right-clicking each group and clicking Move Group.

Changing the Online Status of Groups and Resources

You can use Cluster Administrator to manually bring individual resources online or take them offline. You change the status by selecting the resources you want and then, on the File menu, clicking either Bring Online, Take Offline, or Initiate Failure. For example, you can take a Print Spooler resource offline so that it stops accepting client jobs. When the spooler finishes the jobs in the queue, you can perform maintenance on the printer.

You can also use Cluster Administrator to manually bring groups online or take them offline. You change the status by selecting the group you want and then, on the File menu, clicking either Bring Online or Take Offline. Changing the state of a group also changes the state of all resources that are members of that group. The resources change to the appropriate state in the order dictated by their dependencies.

For more information on online and offline status, see "Resource States" in Chapter 1, "Microsoft Cluster Server Concepts." For more information on resource dependencies, see "Resource Groups and Dependencies," also in Chapter 1, and also see "Setting Resource Dependencies," earlier in this chapter.

Deleting a Group or Resource

You can use Cluster Administrator to delete individual resources or groups. When you delete a group, all resources in the group are also deleted. When you delete an individual resource, all other resources that depend on that resource are also deleted. You delete a selected group or resource by clicking the File menu, and then clicking Delete.

Renaming a Group or Resource

You can rename any resource or resource group. Changing the name of a resource does not affect the failover policy for that resource or the group to which it belongs.

You rename a selected resource or group by clicking Rename on the File menu.

Changing the Group

If you want to move a resource from one group to another, right-click the resource, point to Change Group, and then click the destination group. If the resource is dependent on other resources, or other resources are dependent on it, you are given the option to move all resources as a unit, or to move none of them.

Forcing Node Time Synchronization

The MSCS Time Service resource coordinates the synchronization of the date and time on the cluster nodes.

Note: Each cluster needs only one Time Service resource. You do not need, and should not create, a Time Service resource in each group.

The Time Service resource synchronizes the date and time when the two nodes join to form a cluster. The date and time are then synchronized again within the next hour, 8 hours after that, and every 12 hours thereafter.

If you change the time on node A, node B eventually synchronizes to node A's time if node A owns the Time Service resource in the Cluster Group at the end of the time update interval. However, node A synchronizes to node B's time if node B owns the Time Service resource at the end of the time update interval.

To force node-time synchronization

  1. Change the time on node A.

    Note: This procedure assumes that node A owns the Time Service resource in the Cluster Group. This ensures that the Time Service resource does not automatically adjust the time before you do so manually.

  2. Run the net time command on node B, specifying the computer name of node A, to synchronize the time. For example, if the computer name for node A is \\NodeA, type the following at the Windows NT command prompt and then press enter.

    net time /set /y \\nodeA

Administering Clusters from the Command Line

As an alternative to Cluster Administrator, you can use Cluster.exe to administer clusters from the Windows NT command prompt. You can also call Cluster.exe from command scripts to automate many cluster administration tasks.

You can use Cluster.exe to administer clusters from either node, from nodes of other MSCS clusters, or from other computers running Service Pack 3 with version 4.0 of either Windows NT Workstation or Windows NT Server. Cluster.exe is installed on all cluster nodes, and when you install only Cluster Administrator.

For information on Cluster.exe commands, see Appendix A, "Cluster.exe Commands."

Installing New SCSI Hardware

After your MSCS cluster is up and running, you probably need to add additional hardware at some future date. For example, you may need to:

  • Install an additional SCSI bus in one of the nodes to support a tape device or SCSI drives for local (unshared) storage

  • Install additional SCSI drives on the existing local SCSI bus on one of the nodes

  • Install additional drives on the shared SCSI bus

  • Replace failed drives on the shared SCSI bus

  • Install an additional shared SCSI bus between the two nodes

Specific procedures should be followed to perform each of these tasks to prevent SCSI conflicts and drive letter reassignments that can affect the operation of either node in the cluster.

For a review of SCSI hardware issues related to using a shared SCSI bus (such as proper transmission methods, the use of signal converters, wide and narrow data-path size, and proper termination) see "SCSI Concepts and Terminology" in Chapter 3, "Setting Up an MSCS Cluster."

Installing local SCSI Buses and devices

Installing additional local SCSI buses should not affect the drive-letter assignments of either node. A SCSI bus can be added to a node without turning off both nodes, as long as you do not unterminate the shared SCSI bus.

In the following procedure, node A refers to the node on which you are adding SCSI hardware. Node B refers to the node that continues supporting clients while node A is offline.

To install an additional local SCSI bus and local SCSI devices

  1. Move all groups to node B.

  2. Shut down Windows NT Server, Enterprise Edition on node A and turn off the computer.

  3. Install and configure the SCSI card and devices for the new local SCSI bus, and then install the appropriate SCSI drivers.

  4. Turn on the computer and restart Windows NT Server, Enterprise Edition on node A.

  5. If you installed a new drive, run Disk Administrator and then partition and format the new drive. Assign new drive letters as appropriate.

  6. Move groups from node B to node A, as appropriate for your cluster configuration.

Adding or Replacing Drives on the Shared SCSI Bus

Unless your drive cabinet supports adding or removing drives without turning off the cabinet (called hot swapping or hot power), you must shut down and turn off both nodes before adding or replacing drives on the shared SCSI bus. The following procedure applies to cluster configurations that do not support hot swapping.

To add or replace drives on the shared SCSI bus

  1. Shut down Windows NT Server, Enterprise Edition on both nodes, and turn off both computers.

  2. Add or replace drives on the shared SCSI bus.

  3. Restart Windows NT Server on one node. Turn on the second node and press the spacebar when the OS Loader screen appears (not allowing Windows NT Server to start).

  4. On the first node, run Windows NT Disk Administrator and assign drive letters to the disks available on the new shared SCSI bus.

  5. Shut down the first node, but do not turn off the computer.

  6. On the second node, select the Windows NT Server operating system, and then press enter.

  7. Run Windows NT Disk Administrator and assign drive letters to the disks on the shared SCSI bus, using the same letter assignments as on the first node.

  8. Restart the first node.

For instructions on reconstructing a mirror set or a stripe set with parity, see Chapter 5 in the Windows NT Server Resource Kit Resource Guide.

Installing an Additional Shared SCSI Bus

Adding an additional shared SCSI bus should not affect the drive letter assignments of either node.

To install an additional shared SCSI bus between two nodes

  1. Shut down Windows NT Server, Enterprise Edition and turn off one computer.

    Prepare one SCSI controller for the shared SCSI bus, referring to your SCSI bus owner's manual for instructions.

    • Install the SCSI controllers for the shared SCSI bus.

    • Ensure the SCSI controllers for the shared SCSI bus are using different SCSI IDs.

    Note: Do not connect the shared SCSI bus to both computers while configuring the two systems. Install the controllers but do not connect them to the shared SCSI bus.

  2. Connect the shared SCSI devices to the shared bus, and then connect the shared SCSI bus to both nodes.

    Important: After the shared SCSI bus is connected to both nodes, do not run Windows NT Server on both nodes at the same time until you have completed this procedure.

  3. Start Windows NT Server on one node. Turn on the second node, and press the Spacebar when the OS Loader screen appears (not allowing Windows NT Server to start).

  4. On the first node, run Windows NT Disk Administrator and assign drive letters to the disks available on the new shared SCSI bus.

  5. Shut down the first node, but do not turn off the computer.

  6. On the second node, select the Windows NT Server operating system and press enter.

  7. Run Windows NT Disk Administrator and assign drive letters to the disks on the shared SCSI bus, using the same letter assignments as on the first node.

  8. Restart the first node.

Managing Security

After MSCS is installed and working properly, you may need to:

  • Change the account under which the Cluster Service runs

  • Change the password of the account under which MSCS runs

  • Specify which users can administer a cluster

  • Limit access to shared data

  • Audit access to shared data

  • Take ownership of files or folders

For a review of MSCS security concepts, see "MSCS and Security" in Chapter 1, "Microsoft Cluster Server Concepts."

Changing the Account Under Which the Cluster Service Runs

The Cluster Service runs under a domain user account, rather than the system account. The following restrictions apply to the account under which the Cluster Service runs:

  • The account must be a domain account–it cannot be a local account.

  • The Cluster Service on each node must run under the same account. (If the two nodes are running under different accounts, the two nodes cannot join.)

  • The account must belong to the Administrators group on both nodes.

    The account must have the following rights:

    • Back up files and directories

    • Increase quotas

    • Increase scheduling priority

    • Load and unload device drivers

    • Lock pages in memory

    • Logon as a service

    • Restore files and directories

Use the following procedure to change the account under which the Cluster Service runs.

Note: The Cluster Service on both nodes must be stopped and restarted during this procedure. The Cluster Service must use the same account and password at all times on all nodes within the cluster.

To change the account under which the Cluster Service runs

  1. Run User Manager for Domains on both nodes, and make sure than the new account under which the Cluster Service will run has all restrictions cleared except the following:

    • User cannot change password

    • Password never expires

  2. Ensure that the account has membership in the local Administrators group on both nodes.

    Grant the following rights to the account, or to the local Administrators group, on both nodes:

    • Logon as a service

    • Lock pages in memory

  3. Run Control Panel on both nodes and double-click Services; then select the Cluster Service, and click Stop.

  4. On one node, Click Startup.

  5. Type the account name in This Account, type the password in Password, and then confirm the password.

  6. Click OK.

  7. Repeat steps 5, 6, and 7 on the other node.

  8. In Services in Control Panel on both nodes, click Start to start the Cluster Service.

Changing the Password of the Account Under Which MSCS Runs

Because the Cluster Service runs under the same domain user account on both nodes, you must follow special steps when changing the account password, including stopping the Cluster Service on both nodes.

To change the Cluster Service account password

  1. Run Control Panel on both nodes and double-click Services; then select the Cluster Service, and click Stop.

  2. On the first node, press ctrl+alt+del, and then click Change Password.

  3. Type the old password, the new password, confirm the new password, and then click OK.

  4. In Services in Control Panel on the first node, click Startup.

  5. Type the same old password and new password you used in step 3, and then click OK.

  6. Repeat steps 2 through 5 on the second node.

  7. In Services in Control Panel on both nodes, click Start to start the Cluster Service.

Specifying Which Users Can Administer a Cluster

To administer a cluster, users must have either administrative permissions on both nodes or specific permissions to administer the cluster. By default, the local Administrators group on both nodes has permissions to administer the cluster. You can easily give a user permissions to administer a cluster by running User Manager for Domains and adding the user's account to the local Administrators group on both nodes.

To give a user permissions to administer a cluster without giving the user Administrative permissions on both nodes

  1. Run Cluster Administrator from either node.

  2. Right-click the cluster name, and then click Properties.

  3. Click Permissions.

  4. Specify which users and groups may administer the cluster.

    Access to the cluster can be granted or denied only to the local Administrators group on each node, domain users, and global groups. Do not attempt to grant or deny access to local users or local groups (other than the local Administrators group).

Limiting Access to Shared Data

Use Cluster Administrator and standard Windows NT security to limit access to files and folders that reside on drives on the shared SCSI bus.

When setting access permission on nodes that are not BDCs or a PDC-BDC pair, do not specify permissions based on local user accounts or groups. On non-domain controllers, local users and local groups have security context only on the local computer; the security context of these accounts and groups is meaningless when failed from one node to another. For this reason, the MSCS Permissions dialog box does not allow you to give local users or local groups permissions to administer the cluster. The single exception to this rule is the local Administrator group. This is not a problem on domain controllers (DCs) because the local accounts and groups have security context on all DCs in the domain.

Important: If you change share permissions using Windows NT Explorer or My Computer instead of using Permissions on the Parameters tab in Cluster Administrator, the permissions are lost when the resource is taken offline.

For more information on using Cluster Administrator to limit access to files and folders on disks that are on the shared SCSI bus, see Cluster Administrator Help.

Auditing Access to Shared Data

Use standard Windows NT security to audit access to files and folders that reside on drives on the shared SCSI bus. Use either My Computer or Windows NT Explorer to set file or directory auditing; Cluster Administrator does not provide a user interface to do this. Note that security events are written to the security log on the node that owns the File Share resource.

For more information on enabling auditing, or specifying file and directory auditing, see Windows NT Server Version 4.0Concepts and Planning.

Taking Ownership of Files or Folders

Use either My Computer or Windows NT Explorer to set file or directory auditing; Cluster Administrator does not provide a user interface to do this.

For more information on enabling auditing, or specifying file and directory auditing, see Windows NT Server Version 4.0Concepts and Planning.

Recovering Disk-Subsystem Failures

Different types of disk failures are possible in a fault-tolerant-disk-set configuration. The most common type of failure involves a single disk, which is acceptable in most cases. Immediately after failure, a message appears on the Windows NT Server console of the node owning the resource. MSCS continues to operate in this case.

An unacceptable failure of a fault-tolerant resource is one that makes the data unavailable. MSCS treats the failure of a fault-tolerant resource as a standard resource failure and attempts to fail over. Because the quorum resource cannot be brought online by the other node, the quorum resource enters an inconsistent state. Dealing with both of these problems is relatively straightforward.

The following sections discuss the basics of recovery. For complete instructions on these topics, see Chapter 5 in the Windows NT Server4.0 Resource KitResource Guide.

For specific instructions on dealing with a failed quorum resource, see "Quorum Resource Fails" in Chapter 5, "Troubleshooting."

Mirrored-Drive Failures

When a drive of a mirror fails, it becomes an orphaned drive. Before replacing the drive, you must first use Disk Administrator to break the mirror-set relationship, which exposes the remaining secondary partition as a separate volume. This step prevents problems when restarting the system.

When you break the mirror set, the remaining working member of the mirror set receives the drive letter that was previously assigned to the complete mirror set. The orphaned partition receives the next available drive letter.

Replace the failed disk with any disk that is the same size or larger.

Disk Failure in a Stripe Set with Parity

When a member of a stripe set with parity is orphaned, you can reconstruct the data for the orphaned member from the remaining members. To determine which disk failed, look at the disks in Disk Administrator. Disk Administrator normally shows a failed disk as offline.

Bus Failures

Except in duplexing or greater, a failure of the shared-SCSI bus or controller results in all disk resources on the shared-SCSI bus entering an inconsistent state.

Make sure that you replace the failed controller with the same make and model as the original. Ensure both controllers use the same firmware revisions. When the computer is restarted, the disk set should function normally.