Replacing a Head Node Configured in a Failover Cluster in Windows HPC Server 2008 R2

Updated: February 2011

Applies To: Windows HPC Server 2008 R2

This topic describes the process of replacing a head node that is part of a failover cluster that has been supporting a head node in a Windows® HPC Server 2008 R2 cluster. This replacement process applies Maynly to a scenario in which one head node fails, but you might also need to replace a functioning head node that is configured in a failover cluster – for example, to perform a hardware upgrade. This topic applies only to a two-node failover cluster for the head node that has been configured according to the procedures in Configuring Windows HPC Server 2008 R2 for High Availability of the Head Node (https://go.microsoft.com/fwlink/?LinkId=194786) or Configuring Windows HPC Server 2008 R2 for High Availability with SOA Applications (https://go.microsoft.com/fwlink/?LinkId=198300).

In this topic:

  • Step 1: Evict the failed server from the failover cluster

  • Step 2: Prepare the server that you will use to replace the failed server

  • Step 3: Add the new server to the failover cluster

  • Step 4: Install Microsoft HPC Pack 2008 R2 on the new server

  • Step 5: Reconfigure services and network topology for the new server

Step 1: Evict the failed server from the failover cluster

Before you begin the replacement process, verify that head node services have failed over and are running on the second (functioning) head node in the failover cluster. Then, evict (remove) the failed server from the failover cluster.

Important
If you are replacing a head node that is still functioning, before evicting that node, do not uninstall HPC Pack 2008 R2. Doing so might affect the functionality of the currently functioning head node in the failover cluster.

To verify that head node services are running on the functioning head node

  1. Log on to the functioning server in the two-node failover cluster using a doMayn account that has administrator permissions on both nodes in the failover cluster.

  2. To open the Failover Cluster Manager snap-in, click Start, click Administrative Tools, and then click Failover Cluster Manager. (If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Yes.)

  3. In the Failover Cluster Manager snap-in, if the cluster you want to manage is not displayed, in the console tree, right-click Failover Cluster Manager, click Manage a Cluster, and then select or specify the cluster that you want.

  4. Expand Services and Applications, and then select the clustered instance of the head node.

  5. In the center pane, verify that Current Owner is the currently functioning node in the failover cluster.

  6. Verify that the clustered instance of the head node has a status of Online.

To evict the failed server from the failover cluster

  1. In the Failover Cluster Manager snap-in, if the cluster you want to manage is not displayed, in the console tree, right-click Failover Cluster Manager, click Manage a Cluster, and then select or specify the cluster that you want.

  2. If the console tree is collapsed, expand the tree under the cluster you want to manage.

  3. Expand the console tree under Nodes.

  4. Right-click the failed server, and then click More Actions.

  5. Click Evict, and when prompted, confirm your action.

  6. If the server that you evicted did not previously fail completely, to confirm that all failover cluster configuration information has been removed from the evicted node, perform the following steps:

    1. On the evicted node (only), click Start, click Administrative Tools, and then click Windows PowerShell Modules.

    2. At the Windows PowerShell command prompt, type:

      Clear-ClusterNode
      
    3. When prompted, confirm your action.

    For more information about Clear-ClusterNode, type Get-Help Clear-ClusterNode –full, or see Clear-ClusterNode (https://go.microsoft.com/fwlink/?LinkId=143781).

Step 2: Prepare the server that you will use to replace the failed server

In this procedure, you prepare the server that you will use to replace the failed server in the failover cluster.

Important
Before you prepare the replacement server, ensure that the failed server is shut down.

To prepare the server that you will use to replace the failed server

  1. If possible, use a server with hardware that is similar to the server that is currently functioning as the head node (within the failover cluster).

  2. On the new server, install the edition of the Windows Server 2008 R2 operating system (either Windows Server 2008 R2 Enterprise or Windows Server 2008 R2 Datacenter) that is installed on the server that is currently functioning as the head node.

  3. Apply the same software updates (and service packs, if relevant) that are applied to the server that is currently functioning as the head node.

    Note
    In a later step, you will run the Validate a Configuration Wizard from Failover Cluster Manager to confirm that the correct software updates (and service packs, if relevant) have been applied to the new server. You can correct any mismatches in software updates then.
  4. Install the File Services role by following the procedure in Set Up Failover Clustering and File Services for Servers that Will Run the Head Node (https://go.microsoft.com/fwlink/?LinkId=210568).

  5. Install the Failover Clustering feature by following the procedure in Set Up Failover Clustering and File Services for Servers that Will Run the Head Node (https://go.microsoft.com/fwlink/?LinkId=210568).

  6. Configure the network connections on all interfaces on the new server to match those on the failed server. For more information, see Prepare Hardware Before Validating a Failover Cluster (https://go.microsoft.com/fwlink/?LinkId=190316).

  7. Ensure that the disks (LUNs) that are used for storage in the failover cluster are exposed to the replacement server.

    Note
    You should use the same type of interface (for example, iSCSI, or an interface provided by the manufacturer of the storage) that is used on the currently functioning node in the failover cluster.
  8. Join the new server to the doMayn where Windows HPC Server 2008 R2 is running, and give it the same name as the server that failed.

  9. Configure the local Administrators group on the new server so that it contains only the designated security group for HPC administrators (and, if desired, the local Administrator account). For more information about configuring the local Administrators group on the servers in the failover cluster, see “Prepare to run HPC Pack 2008 R2 Setup on a server in the failover cluster” in Install HPC Pack 2008 R2 on a Server that Will Run Head Node Services (https://go.microsoft.com/fwlink/?LinkId=210570).

  10. Log on to the currently functioning server in the failover cluster with an account that has administrator rights and permissions on the servers in the failover cluster.

    Important
    The account does not need to be a DoMayn Admins account; it can be a DoMayn Users account that is in a group that belongs to the Administrators group on the servers.
  11. On the currently functioning server in the failover cluster, in Failover Cluster Manager, in the console tree, select Failover Cluster Manager (do not select the failover cluster), and then under Management, click Validate a Configuration.

  12. Follow the instructions in the wizard to specify the servers (the currently functioning server and the new server) and to run all tests.

    Important
    If you have clustered disks that are online, those disks cannot be tested. It is recommended that you select in the wizard to take the clustered disks offline so that they can be tested. If you do this, they are brought online automatically after the tests. However, taking the nodes offline will make the HPC head node services unavailable for the duration of the tests.

    Running all tests may take several minutes. The storage tests are extensive, but they are important because they confirm that the storage configuration is correct.

  13. The Summary page appears after the tests run. To view Help topics that can help you interpret the results, click More about cluster validation tests.

  14. While still on the Summary page, click View Report and read the test results.

    Note
    To view the results of the tests after you close the wizard, see SystemRoot\Cluster\Reports\Validation Report DateTime.mht (where SystemRoot is the folder in which the operating system is installed, for example, C:\Windows).
  15. If necessary, make changes in the configuration of the new server, and then rerun the tests.

    For more information about the tests in the Validate a Configuration Wizard, see Understanding Cluster Validation Tests (https://go.microsoft.com/fwlink/?LinkId=201706).

Step 3: Add the new server to the failover cluster

In this procedure, you add the new server to the failover cluster.

To add the new server to the failover cluster

  1. In the Failover Cluster Manager snap-in, if the cluster you want to manage is not displayed, in the console tree, right-click Failover Cluster Manager, click Manage a Cluster, and then select or specify the cluster that you want.

  2. In the Actions pane, click Add Node.

  3. Follow the instructions in the wizard to specify the new server to add to the failover cluster.

  4. After the wizard runs and the Summary page appears, if you want to view a report of the tasks the wizard performed, click View Report and read the test results.

    Note
    To view the report after you close the wizard, see: SystemRoot\Cluster\Reports\AddNodesDateTime.mht where SystemRoot is the location of the operating system (for example, C:\Windows).
  5. To confirm that the process of creating the failover cluster succeeded, in the Failover Cluster Manager snap-in, do the following:

    • In the console tree, select the name of the failover cluster that you created.

    • Expand the console tree under Nodes, and verify that the name of the replacement server appears.

    • Verify that the status of the replacement server is Up.

    Note
    If there is a problem adding the new server to the failover cluster, some configuration information might have been left behind when you evicted the failed node. To correct this, on the replacement server you can run the Clear-ClusterNode PowerShell cmdlet, as described in Step 1: Evict the failed server from the failover cluster, earlier in this topic. Then, try adding the node to the failover cluster again.

Step 4: Install Microsoft HPC Pack 2008 R2 on the new server

In this procedure, you install HPC Pack 2008 R2 on the new server that you added to the failover cluster.

Important
  • If you previously applied a service pack to HPC Pack 2008 R2 on the servers in the failover cluster, you must ensure that you use HPC Pack 2008 R2 Setup files that install both HPC Pack 2008 R2 and the appropriate service pack. You cannot install the release to manufacturing (RTM) version of HPC Pack 2008 R2 on the new server and then apply the service pack. If you do this, the installation of HPC Pack 2008 R2 on the new server in the failover cluster can fail.

  • If you need to install HPC Pack 2008 R2 with service pack 1 (SP1) on the new server, be aware that an installation program for HPC Pack 2008 R2 with SP1 is not available from Microsoft. An installation program is available only for the service pack itself. You must create your own Setup files for HPC Pack 2008 R2 with SP1 by manually merging the Setup files for HPC Pack 2008 R2 and the Setup files for SP1, as described in the following procedure.

To create Setup files for HPC Pack 2008 R2 with SP1 (optional)

  1. On a computer that is different from the servers in the failover cluster, install Windows Server 2008 R2 HPC Edition, or another edition of Windows Server 2008 R2.

  2. Run setup.exe from the HPC Pack 2008 R2 RTM installation media or from a network location that contains the HPC Pack 2008 R2 RTM Setup files.

  3. On the Getting Started page, click Next.

  4. On the Select Installation Edition page, select the edition of HPC Pack 2008 R2 that corresponds to the edition that is currently installed on the HPC Pack 2008 R2 SP1 head node, and then click Next.

  5. On the Microsoft Software License Terms page, read or print the software license terms in the license agreement, and accept or reject the terms of that agreement. If you accept the terms, click Next.

  6. On the Select Installation Type page, click Create a new HPC cluster by creating a head node, and then click Next.

  7. Continue to follow the steps in the installation wizard.

  8. Apply HPC Pack 2008 R2 SP1 on the computer. For more information, see Release Notes for Microsoft HPC Pack 2008 R2 Service Pack 1.

  9. Copy and merge the installation files for HPC Pack 2008 R2 and the service pack in a folder on the computer:

    1. Create a folder on the computer with a name such as C:\HPC2008R2SP1.

    2. Copy the installation files from the installation media or shared folder for HPC Pack 2008 R2 to C:\HPC2008R2SP1.

    3. Copy all files under \\localhost\REMINST to C:\HPC2008R2SP1. Select to replace folders and files that have the same name.

To install Microsoft HPC Pack 2008 R2

  1. Using the appropriate HPC Pack 2008 R2 Setup files, run HPC Pack 2008 R2 Setup on the new server in the failover cluster by using the following procedure:

    Install and Configure HPC Pack 2008 R2 on the Other Server that Will Run Head Node Services (https://go.microsoft.com/fwlink/?LinkId=201572)

  2. To verify that the new head node is functioning in the Windows HPC Server 2008 R2 cluster, do the following:

    • On either server, click Start, point to All Programs, click Microsoft HPC Pack 2008 R2, and then click HPC Cluster Manager.

    • In Node Management, in the Navigation Pane, click Nodes.

    • In the view pane, check that the node health of the new node is OK.

  3. To confirm that the services and shared folders that are used by Windows HPC Server 2008 R2 can fail over successfully to the other server in the failover cluster, see the procedures in Validate Installation of Head Node Services in the Context of the Failover Cluster (https://go.microsoft.com/fwlink/?LinkID=201573).

Step 5: Reconfigure services and network topology for the new server

In this step, you reconfigure two services on the new server, DHCP Server service and Windows Deployment Services Server service, and reset the network topology so that the new server can function correctly with the older server.

Important
  • Before you perform these procedures, make sure that both nodes in the failover cluster have a status of Up. To check this in Failover Cluster Manager, select the name of your failover cluster, expand the console tree, and then click Nodes.

  • You do not need to perform these procedures if your HPC cluster is configured to use network topology 5.

To reconfigure services for the new server

  1. On the new server, click Start, click Administrative Tools, and then click Services. (Falls das Dialogfeld Benutzerkontensteuerung angezeigt wird, bestätigen Sie, dass die angezeigte Aktion der gewünschten Aktion entspricht, und klicken Sie anschließend auf Ja.)

  2. In the list of services, right-click DHCP Server, click Properties, and then do one of the following to start (or restart) the service:

    • If the service is stopped, click Start.

    • If the service is running, click Stop and then click Start.

  3. In the list of services, right-click Windows Deployment Services Server, click Properties, and in the list for Startup type, click Automatic. Click Start, and then click OK.

To reset the network topology for the new server

  1. On either server, click Start, point to All Programs, click Microsoft HPC Pack 2008 R2, and then click HPC Cluster Manager.

  2. In the Deployment To-do List, click Configure your network. The Network Configuration Wizard appears.

  3. In the Network Topology Selection page, note the option that is selected, and then click 5. All nodes only on an enterprise network. Then click Next.

  4. Continue to follow the Network Configuration Wizard to configure the network in topology 5.

  5. After the Network Configuration Wizard has finished configuring the network in topology 5, wait several minutes so that the change is propagated to both servers in the failover cluster.

    Note
    When the configuration change is complete, an entry similar to the following appears in the operations log: Updating the configuration of <DoMaynName>\<SecondHeadNodeName>
  6. After both servers in the failover cluster are updated, start the Network Configuration Wizard again.

  7. In the Network Configuration Wizard, select the option that was originally configured for your network, and then click Next.

  8. Continue to follow the Network Configuration Wizard to configure the cluster network as it was originally configured.