Windows 2000 Clustering: Performing a Rolling Upgrade

This paper discusses the benefits and mechanics of performing a rolling upgrade of clusters running the Microsoft Windows NT 4.0 Server, Enterprise Edition operating system to the Windows 2000 Advanced Server operating system. This document also provides a step-by-step walkthrough for a rolling upgrade of a two-node cluster, outlines some known issues, and provides some strategies for troubleshooting.

On This Page

Introduction
Benefits
Requirements
Limitations
How Rolling Upgrades Work
Rolling Upgrade Walkthrough
Known Issues
Troubleshooting
Conclusion

Introduction

One of the most exciting new features introduced in Service Pack 4 (SP4) for the Microsoft Windows NT 4.0 Server, Enterprise Edition operating system is the ability to perform a rolling upgrade of the operating system. A rolling upgrade is a process of upgrading cluster nodes, one node at a time, in such a way that services and resources offered by the cluster are always available even though nodes being upgraded are not available. A rolling upgrade of a server cluster allows maintaining the availability of resources and services offered by clustered servers during an upgrade. The system downtime associated with the upgrade is reduced to a few minutesthe time needed to move resources from one node to anotheras compared to the few hours that are usually needed to upgrade a Windows NTbased server.

Beginning with SP4, administrators have a choice of performing a rolling upgrade to move the system to a new service pack or to the Windows 2000 Server operating system. Cluster service (or Microsoft Cluster Server, as it is known in the Windows NT 4.0 context) is an optional component of Windows NT 4.0 Server, Enterprise Edition, Windows 2000 Advanced Server, and Windows 2000 Datacenter Server that implements server clustering. Cluster service has been redesigned to allow nodes running versions of Windows NT 4.0 or Windows 2000 to work together seamlessly in a server cluster. This means that a Windows 2000based node can successfully join a Windows NT 4.0based cluster, and a Windows NT 4.0based node can successfully join a Windows 2000based cluster.

The mixed-mode cluster, a cluster composed of nodes running different versions of the Windows NT or Windows 2000 operating system, offers the same level of availability as a homogeneous cluster. Resources that support rolling upgrades can be moved, as well as fail over and fail back, between nodes in a mixed mode cluster.

The same procedure can be used to do a rolling upgrade of system hardware or applications if they support it. However, this paper focuses only on the rolling upgrade of the operating system. If you are interested in doing a rolling upgrade of an application, inquire with the application vendor and request that they implement support for rolling upgrades.

This paper assumes that the reader has a basic knowledge of clustering and is familiar with Microsoft Cluster Server (MSCS) in Windows NT 4.0 and Cluster service in Windows 2000 (Cluster service replaces MSCS in Windows 2000). For more information about these components, consult the resources available in Windows 2000 Advanced Server Online Help at https://windows.microsoft.com/windows2000/en/advanced/help/ or in the Windows 2000 Resource Kit.

Terminology

The node running an older version of the operation system, application or hardware is called a down-level node.

The node running a newer version of the operation system, application or hardware is called an up-level node.

The cluster composed of nodes running different revisions of the operating system, application, or hardware, is called a mixed-mode cluster.

Benefits

A rolling upgrade offers multiple benefits and should be considered as an alternative of choice for any mission-critical system that requires high availability.

Consider two examples: Jack and Jill. Jack runs a mission-critical database application on a stand-alone Windows 2000based server. He regularly applies Windows 2000 service packs (every quarter), upgrades his application once a year and performs a maintenance hardware upgrade once a year as well. From past experience, he can tell that the service pack installation takes on average of 60 minutes. An application upgrade generally takes four hours. Hardware upgrades take anywhere from 30 minutes to four hours. He can also testify that once a year, something goes wrong and an upgrade takes four times as long as planned. Total downtime per year of this system averages:

4 x 60 + 240 + 240 + 240 = 16 hours, which equals 99.8 percent uptime.

Jill runs a mission critical e-mail application on a two-node cluster. She applies the same rules as Jack, but instead of upgrading both nodes at once, she performs rolling upgrades. She measured that it takes five minutes on average to move this application from one node to another; thus, the downtime associated with each upgrade is limited to five minutes. Total downtime per year for his system averages:

4 x 5 + 5 + 5 = 30 minutes, which equals 99.99 percent uptime.

Rolling upgrades are advantageous in that they:

  • Minimize downtime. Rolling upgrades minimize downtime associated with software or hardware upgrades.

  • Minimize risk. Rolling upgrades minimize the risk of losing the service in case the upgrade fails. When an upgrade of one node fails, the other node can still provide the service, giving the system administrator the choice to repair or replace a failed node without incurring any additional downtime.

  • Increase flexibility. The nearly negligible system downtime caused by a rolling upgrade means that administrators could decide to perform a rolling upgrade during a working day instead of performing it late at night or on weekends.

Requirements

  • To perform a rolling upgrade of the operating system on cluster nodes, you must start with Windows NT 4.0 Server, Enterprise Edition SP4. Earlier versions do not support rolling upgrades.

  • A rolling upgrade can be used to upgrade Windows NT 4.0 Server, Enterprise Edition SP4 to SP5, SP5 to SP6, and so forth. It can also be used to upgrade any Enterprise Edition service pack later than SP3 to Windows 2000 Advanced Server.

  • Upgrades to Windows 2000 Datacenter Server are not supported. Windows 2000 Datacenter Server can be installed in a clean install process only.

Limitations

A rolling upgrade is not as disruptive as a regular upgrade, but you should bear in mind that it requires the applications to be moved between nodes and therefore does cause some minimal disruption in services. When an application is moved from one node to another, it must be stopped. Once the application is stopped, Cluster service moves all the resources the application uses, such as disks, Internet Protocol (IP) addresses or network names to another node, and restarts the application on another node. Any sessions between clients and the server application are cancelled during this process; database transactions are aborted and file handles are invalidated. Client applications can retry and eventually reconnect and recover once the server application is restarted on the second node. While the impact may be minimal, it should not be ignored.

The availability of a two-node cluster while one node is being upgraded is limited. Any failure of the second node while the other node is being upgraded will cause the cluster to fail.

During a rolling upgrade, all resource groups are moved to one node. In a cluster that is running in an active/active mode where resources are distributed among all nodes in the cluster, this means that the node hosting all resources can run at maximum capacity while the other node is being upgraded. This can affect the application response time.

Cluster service guarantees that an up-level node can always join a cluster formed by a down-level node. It also guarantees that resources can fail over and fail back to and from a down-level node. However, it does not guarantee that applications can also support rolling upgrades of the operating system. Table 1 below summarizes the behavior of cluster-aware resources supported by Cluster service.

Table 1 Resources supported/not supported during rolling upgrades.

Resource

Note

File Share

Supported during rolling upgrades.

IP Address

Supported during rolling upgrades.

Network Name

Supported during rolling upgrades.

Physical Disk

Supported during rolling upgrades.

Time Service

Supported during rolling upgrades.

Distributed Transaction Coordinator

Distributed Transaction Coordinator (DTC) is a part of Component Services that coordinates two-phase transactions. During the rolling upgrade, DTC will be unavailable while the first node is being upgraded. After that, failover to the second node will not be possible until that node has been upgraded.

Internet Information Services

Internet Information Services (IIS) version 4 is supported during rolling upgrades. IIS version 3, and the IIS Virtual Root resource type (used with IIS version 3 but not version 4) are not supported during rolling upgrades. However, the configuration information for an IIS Virtual Root resource is not lost during an upgrade. To complete the upgrade of an IIS Virtual Root resource, after completing the upgrade of your server, click Start, click Help, click the Search tab, select the Search titles only check box, and then type "create a new resource" (including the double quotation marks). Follow the procedure for creating a new resource. For a resource type, choose IIS Server Instance, and when prompted to choose an IIS server, choose the IP address that was used by your IIS Virtual Root resource. An IIS Server Instance resource will be created, using the configuration information from your IIS Virtual Root resource.

Message Queuing

Primary Enterprise Services, Primary Site Services, and Backup Site Services are not supported during rolling upgrades. All other Message Queuing Services configurations are supported during rolling upgrades.

Print Spooler

The only Print Spooler resources supported during a rolling upgrade are those on line printer remote (LPR) ports.

Exchange 5.5 EE

Supported during rolling upgrades.

SQL Server 6.5 EE

Supported during rolling upgrades.

SQL Server 7.0 EE

Supported during rolling upgrades. See the relevant Knowledge Base article at https://support.microsoft.com/default.aspx?scid=kb;en-us;239473&sd=tech for additional information.

Other resource types

See the product documentation supplied with the application or resource.

Before performing a rolling upgrade, identify the resources on your cluster that do not support rolling upgrades. This will help in determining which upgrade procedure you should use.

Application Considerations

As stated earlier, Cluster service does not guarantee that applications can support rolling upgrades. Applications can, however, support rolling upgrades provided that they do not:

  • Store program files on the clustered disk.

  • Change the resource DLL name or location.

  • Delete application registry keys either in the system registry or in the cluster configuration database.

  • Change application on-disk data structures.

An application that does any of the above will not support rolling upgrades.

How Rolling Upgrades Work

Phase 1: Preliminary Configuration

In a common scenario, each node runs Windows NT 4.0 Server, Enterprise Edition, with the following software installed:

  • Microsoft Cluster Server (MSCS).

  • Internet Information Services (IIS) resource, IIS version 4.

  • The latest released service pack (SP4 or greater) for Windows NT 4.0 Server, Enterprise Edition. The service pack must be applied after installing IIS and MSCS, even if it was also applied earlier.

At this point, the cluster is configured as in Figure 1 below so that each node handles client requests (an active/active configuration).

Bb742504.rollup01(en-us,TechNet.10).gif

Figure 1: Active/active cluster configuration

Phase 2: Upgrade Node 1

Node 1 is paused, as Figure 2 below shows. All resource groups on Node 1 are moved to Node 2. Because Node 1 is paused, no new groups can be created or moved to this node by users. At this point, Node 2 handles all cluster resource groups. Node 1 is idle and can be upgraded.

Bb742504.rollup02(en-us,TechNet.10).gif

Figure 2: Node 1 idle/Node 2 active

At this point, you can start the installation of a service pack or an upgrade to Windows 2000 Advanced Server (see Figure 3 below). Once the upgrade of Node 1 is complete, you can perform a test to verify that the operating system is fully functional.

Bb742504.rollup03(en-us,TechNet.10).gif

Figure 3: Node 1 upgrade

Cluster service maintains each node's version of the operating system and the Cluster service itself, as well as the aggregate version of the cluster. It uses these version numbers to determine if a node that runs a different operating system version can join a cluster.

Phase 3: Upgrade Node 2

Up-level Node 1 rejoins the cluster. Cluster service guarantees that the up-level node understands the down-level node's protocols and can join a down-level cluster.

Before upgrading Node 2, confirm that the cluster operates correctly. A simple test is to:

  1. Select a resource group that is not critical, and move it to Node 1. If this step succeeds, it proves that Cluster service is working.

  2. Move this resource group back to Node 2. This step proves that Node 1 did not tamper with the resources in this resource group, and that they will be able to fail back in case Node 1 fails.

Node 2 is paused, as we see in Figure 4 below. All resource groups on Node 2 are moved to Node 1. Because Node 2 is paused, no new groups can be created or moved to this node by other users. At this point Node 1 handles all cluster resource groups. Node 2 is idle and can be upgraded.

Bb742504.rollup04(en-us,TechNet.10).gif

Figure 4: Node 1 upgraded/Node 2 idle

You can now start the installation of a service pack or an upgrade to Windows 2000 Advanced Server on Node 2 (Figure 5 below). Once the upgrade of Node 2 is complete, you can perform a test to verify that the operating system is fully functional. At this point, the up-level Node 2 should successfully join the up-level Node 1.

Bb742504.rollup05(en-us,TechNet.10).gif

Figure 5: Node 2 upgraded

Phase 4: Final

Node 2 rejoins the cluster, and you redistribute the resource groups back to the active/active cluster configuration, as represented in Figure 6 below.

Bb742504.rollup06(en-us,TechNet.10).gif

Figure 6: Active/active configuration with both nodes upgraded

Rolling Upgrade Walkthrough

This walkthrough covers the rolling upgrade of a Windows NT 4.0 Server, Enterprise Editionbased cluster to Windows 2000 Advanced Server. For the latest information on rolling upgrades, please refer to the release notes on the Windows 2000 Advanced Server CD.

Note: All machine names used in this section are those that would be created if you set up a test bed using the "Step-by-Step Guide to a Common Infrastructure for Windows 2000 Server Deployment - Part 1: Installing a Windows 2000 Server as a Domain Controller" found at https://www.microsoft.com/windows2000/techinfo/planning/server/serversteps.asp.

If you did not set up your machines using that guide, the names will be different. In our example, Node 1 is the machine named HQ-RES-SRV-01 and Node 2 is the machine named HQ-RES-SRV-02. See also the "Step-by-Step Guide to Installing Cluster Service" at https://www.microsoft.com/windows2000/techinfo/planning/server/clustersteps.asp.

  1. Synchronize the time on all cluster nodes with the time on the domain controller. To do so, click Start, click Run, and type cmd in the Open box. Click OK. At the command prompt, type:

    Net time /domain:reskit.com /set

  2. Prepare a list of all resources.

  3. Identify resources that do not support rolling upgrades.

    If you have a resource that does not support rolling upgrades, you have three choices:

    1. Take the resource offline and continue with the rolling upgrade. The resource will be unavailable during the entire rolling upgrade process. You will have to bring this resource online once all nodes have been upgraded.

    2. Remove the resource. The resource will be unavailable during the entire rolling upgrade process. You will have to reinstall this resource once all nodes have been upgraded.

    3. If the majority of the resources do not support rolling upgrades, you may consider upgrading all nodes at once, or performing a clean install. In both cases, all resources will be unavailable during the upgrade/install process.

  4. To start Cluster Administrator (see Figure 7 below), click Start, point to Programs, point to Administrative Tools, and click Cluster Administrator.

    Bb742504.rollup07(en-us,TechNet.10).gif

    Figure 7: Cluster Administrator

    All nodes and resource groups are up and online.

  5. Click the first node (in our example, HQ-RES-SRV-01).

  6. Click the File menu and click Pause Node. The status of HQ-RES-SRV-01 changes to Paused. (See Figure 8 below.)

    Bb742504.rollup08(en-us,TechNet.10).gif

    Figure 8: HQ-RES-SRV-01 is paused.

  7. In the left pane, click the + next to HQ-RES-SRV-01 to expand it. Double-click Active Groups. All groups currently hosted on HQ-RES-SRV-01 are displayed in the right pane.

  8. Click Disk Group 1 in the right pane. Click the File menu and click Move Group. Repeat this step for each group listed.

    The services will be interrupted during the time that the services in each group are being moved to the other node and restarted. After all groups are moved, HQ-RES-SRV-02 hosts all groups and handles all client requests. HQ-RES-SRV-01 is idle.

    You will now need to review all resource types installed on your cluster.

  9. Click Resource Types in the left pane (Figure 9 below).

    Bb742504.rollup09(en-us,TechNet.10).gif

    Figure 9: Resource types

  10. Determine which resource types do not support rolling upgrades and take offline all resources that are not supported. (Refer to Table 1 above.)

  11. After ensuring that the latest released Service Pack has been applied, use Windows 2000 Advanced Server Setup to upgrade HQ-RES-SRV-01. If you installed the Microsoft Cluster Server after applying a service pack, you have to apply the service pack again. (You must have SP4 or later installed.)

    Note: To make sure that Windows 2000 Setup does not place temporary files on one of the clustered disks, use the installation option /tempdrive:X. (For more information, refer to the Known Issues section of this paper.)

    Setup detects Cluster Server on HQ-RES-SRV-01 and automatically installs Cluster service for Windows 2000 Advanced Server. HQ-RES-SRV-01 automatically rejoins the cluster at the end of the upgrade process, but is still paused and does not handle any cluster-related work.

    You will now need to perform validation tests on HQ-RES-SRV-01 to confirm functionality.

  12. Start Cluster Administrator on HQ-RES-SRV-01. You should see both nodes: HQ-RES-SRV-01 in Paused state, and HQ-RES-SRV-02 in Up state.

  13. Click Resource Types. Note that new resource types are introduced during the rolling upgrade (see Figure 10 below).

    Bb742504.rollup10(en-us,TechNet.10).gif

    Figure 10: New resource types

    Note: You should not attempt to create resources of the new types until both nodes have been upgraded.

  14. Click HQ-RES-SRV-01 and then click Resume Node.

  15. Repeat steps 4, 5, 6, 7, 8, 11, 12, and 14 on the second node (HQ-RES-SRV-02).

  16. Once all nodes are upgraded, take the Time Service resource offline and delete it. In the left pane, click the group Cluster Group. In the right pane click the resource Time Service. Click the File menu, click Take Offline and finally click Delete. (See Figure 11 below.)

    Bb742504.rollup11(en-us,TechNet.10).gif

    Figure 11: Time Service

  17. If necessary, upgrade any applications that do not support rolling upgrades. For example, if you have a Distributed Transaction Coordinator resource, at this point use the tool Comclust.exe to configure DTC on all nodes.

Test the cluster by moving groups between nodes. All resources should come online on any node.

Known Issues

Location of Temporary Files

The upgrade program, Winnt32.exe, selects a disk with the maximum free space for the storage of all temporary files. In a cluster, this will most likely be a clustered disk. Use the /tempdrive:X option to make sure that the upgrade program uses the appropriate drive (not a clustered disk) for the storage of temporary files.

Start the upgrade by running a command such as Winnt32 /unattend /tempdrive:C, where C is the drive letter of the drive where temporary files should be stored.

The Print Spooler resource supports rolling upgrades but only for the LPR port and standard TCP/IP port types. Other port types are not supported on Windows 2000based server clusters.

You should not attempt to modify the printer spooler or printer configuration while running in mixed mode.

The Windows 2000 print processor enhanced metafile (EMF) 1.008 is not compatible with Windows NT 4.0. If you use the EMF port processor, change it to the raw data format.1

Time Service Resource

Microsoft Cluster Server in Windows NT 4.0 uses its own time service to synchronize time between the nodes. In Windows 2000, cluster nodes should use the domain controller time service for that purpose. You must manually take the time service resource offline and delete it once all nodes have been successfully upgraded. Also run the net time /set /domain command on each node.

Changing Configuration in Mixed Mode

Changing the cluster configuration while in mixed mode is not recommended or supported. In particular, you should not attempt to create new resources of the type that are only available on up-level Windows 2000based nodes. Windows 2000 introduces new resource types for Windows Internet Naming Service (WINS), Dynamic Host Configuration Protocol (DHCP), distributed file system (Dfs) root, Simple Mail Transfer Protocol (SMTP), and Network News Transfer Protocol (NNTP) services.

Troubleshooting

Resource does not support rolling upgrades.

Follow the instructions in step 3 above.

Resource does not come online after an upgrade of one node.

There are a few reasons for a resource to fail to come online or fail after it was brought online on the up-level node.

  1. The resource doesn't support rolling upgrades.

    Move the resource back to the down-level node and take it offline. You will not be able to bring it online until all nodes are upgraded

  2. The resource is not compatible with Windows 2000.

    Visit https://www.microsoft.com/windows2000/professional/howtobuy/upgrading/compat/search/software.asp to consult a list of Windows 2000certified applications and consult with your vendor.

  3. Another resource that your application depends on failed to come online.

    Make sure that the other resource does not fall into categories 1 or 2 above.

  4. The time difference between the node and the domain controller is too large.

    Synchronize the time on the node with the time service on the domain controller.

Node upgrade fails.

When a node upgrade fails, you have the following options:

  1. Repair the node.

  2. Restore the node from a backup.

  3. Perform a clean install of the node. In this case, you will have to evict the node from the cluster, and rejoin the cluster once the installation is complete. This process may affect other applications running on the cluster. You may need to reinstall these applications.

Conclusion

This document has presented information that should help prepare you to perform a rolling upgrade of your cluster. A rolling upgrade will save you time and money, will provide you with the comfort of knowing that even if something goes wrong you will still have a working server, and will increase the satisfaction of clients using services offered by your cluster.

A Final Note We recommend that you install and use the Windows 2000 Resource Kit tool Uptime.exe version 2.0 to measure the uptime of your cluster. Version 2 is cluster-aware and will correctly calculate the uptime of each node and an entire cluster. Here is an example output generated by this tool:

Availability Statistics for the Cluster: Since 10/18/1999:

Cluster Availability: 100.0000%

Total Cluster Uptime: 120d 15h:14m:19s

Total Cluster Downtime: 0d 0h:0m:0s

Total Cluster Reboots: 0

(Note that in this example, Uptime.exe reported 100 percent uptime for the cluster even though we know that the resources were not available during the time needed to move them from one node to another. The Uptime tool reports the uptime of the cluster service only. Cluster service was available on one node or another during the entire rolling upgrade. Uptime does not provide any information about resources, resource groups, or virtual servers.)

For More Information

For the latest information on Windows 2000 Server, check out our Web site at https://www.microsoft.com/windows2000.

1 Raw is the default data type for clients other than Windows 2000-based programs. The raw data type tells the spooler not to alter the print job at all prior to printing. With this data type, the entire process of preparing the print job is done on the client computer.