Chapter 12 - Failover Clustering

Article
01/28/2010

Continuous uptime in a production environment, whether it is a database powering a mission-critical client/server application or an e-commerce Web site, is becoming a common businesses requirement. This chapter describes Microsoft® SQL Server™ 2000 failover clustering, one method of creating high availability. Failover clustering is available only in SQL Server 2000 Enterprise Edition.

Failover clustering is a process in which the operating system and application software work together to provide continuous availability in the event of an application failure, hardware failure, or operating-system error. Failover clustering provides hardware redundancy through a configuration in which other servers essentially act as clones of the main production system. Failover clustering also allows system maintenance to be performed on a computer while another node does the work. This benefit can also ensure that system downtime due to normal maintenance is minimized.

Failover clustering can be part of an overall scale-up and scale-out solution that takes into account backups, redundancy, and performance. However, failover clustering does not protect against viruses, denial of services, database corruption, logical corruption, or failed software installations, and is also not a load balancing solution. Failover clustering should be used for redundancy, not for increasing performance. If software and/or hardware problems occur, failover clustering combined with other high availability methods (such as SQL Server 2000 log shipping) can enable a production environment to be up and running in a short amount of time. For more information about high availability and achieving five nines, see Chapter 16, "Five Nines: The Ultimate in High Availability."

Enhancements to Failover Clustering

SQL Server 2000 failover clustering offers improvements over the clustering functionality provided in SQL Server version 7.0, Enterprise Edition. Some of the enhancements in SQL Server 2000 failover clustering include:

SQL Server 2000 is built on multiple instances , which allows SQL Server to simultaneously support up to 16 instances of SQL Server.
Because you create a failover cluster through SQL Server Setup, to remove failover clustering you must uninstall SQL Server 2000.
SQL Server 2000 has extensive support for recovering from a failure of a node in the cluster. If a node fails it can be removed, reinstalled, and rejoined to the cluster. It is then a simple operation with SQL Server 2000 Setup to add the new server back into the virtual server definition.
SQL Server 2000 supports a one-node cluster. This enhancement best serves a two-node cluster—when one node is lost completely, the cluster will still function properly. A one-node cluster, however, can be used for testing purposes in a development environment to simulate a cluster environment without having two or more computers. Another potential use is if clustering is going to be done later, it may be easier to set up a one-node cluster right from the start, and then add nodes later.
SQL Server 2000 running on Windows 2000 Datacenter Server supports up to four nodes in a cluster .
All nodes now have local copies of the SQL Server tools as well as the executables. In SQL Server 7.0, the binaries were on the shared disk, and only the primary node had shortcuts to the tools and proper COM components registered. This meant that administration had to be performed from a client computer in the event of a failover.
SQL Server 2000 failover clustering supports Microsoft Search Services.
SQL Server 2000 cluster configurations can be updated by rerunning the setup program. This is true even when applying a service pack. You only need to rerun the setup program.SQL Server 2000 supports multiple network addresses. This enables SQL Server 2000 to use different networks for internal and external communications. It does not completely remove a single point of failure, because the clustered SQL Server resource is dependent upon the IP address, and the SQL Server resource can fail while the IP address may be alive. It does provide some fault tolerance and, potentially, extra bandwidth so one network connection will not be overworked, but if there is a hardware failure (for example, a router) external to the cluster, Windows Clustering will still function properly although external clients may not be able to access the cluster.
Database administrators can now use SQL Server Service Manager or SQL Server Enterprise Manager to start and stop SQL Server without having to use Cluster Administrator to start and stop SQL Server services.

Windows Clustering

SQL Server 2000 failover clustering is built on top of Windows Clustering. There are two main types of clusters in a Windows environment:

Server cluster

SQL Server 2000 clustering is built on top of a server cluster. A server cluster provides high availability, scalability, and manageability for resources and applications by clustering as many as four servers to maintain client access to applications and server resources during unplanned outages due to hardware failures, natural and man-made disasters, software failure, and so on. Unlike the behavior of a Network Load Balancing cluster, when a server, resource, or cluster-aware application within the cluster becomes unavailable, it will be transferred to another server that is available.
Network Load Balancing cluster

A Network Load Balancing cluster provides high availability and scalability for TCP/IP-based services, including Web servers, FTP servers, other mission-critical servers, and COM+ applications. In an Network Load Balancing scenario, multiple servers run independently, and do not share any resources. Client requests are distributed among the servers, and in the event of a server failure, a Network Load Balancing cluster detects the problem and the load is distributed to another server. SQL Server 2000 clustering does not fall into this category, but it can be part of an overall architecture where a Web farm using a Network Load Balancing cluster connects to a backend SQL Server that may be part of a server cluster. Because you employ a Network Load Balancing cluster due to an application requirement, you need to consider Network Load Balancing during the application planning and configuration stage.

For more information about the architecture, technology, and terminology of a Windows cluster, see "Windows Clustering Technologies: Cluster Service Architecture" at https://www.microsoft.com/technet/prodtechnol/windows2000serv/default.mspx.

Microsoft Cluster Service Components

Windows Clustering is the collective name for the operating-system components that comprise Windows-level clustering. This section describes the various components involved.

Hardware

The following is a list of hardware components used with Windows Clustering, available with Windows 2000, and Microsoft Cluster Service (MSCS), available with Windows NT 4.0:

Cluster Nodes

A node is a server within the cluster. Windows NT Server 4.0, Enterprise Edition and Windows 2000 Advanced Server both support two-node clustering, and Windows 2000 Datacenter Server supports up to four-node clustering.
Heartbeat

The heartbeat is a private network set up between the nodes of the cluster that checks to see if a server is up and running. This occurs at regular intervals known as time slices. If the heartbeat is not functioning, a failover is initiated and another node in the cluster will take over the services.
External Networking

Besides the heartbeat private network, at least one public network needs to be enabled so that external connections can be made to the cluster.
Shared Disk Array

The shared disk array is a collection of physical disks (SCSI RAID or Fibre Channel) that will be accessed by the cluster. Windows Clustering supports shared nothing disk arrays. A shared nothing disk array is a setup in which only one node can own a given resource at any given moment. All other nodes are denied access until they own the resource. This protects the data from being overwritten data if two computers have access to the same drives concurrently.
Quorum Drive

The Quorum Drive is a logical drive designated on the shared disk array for Windows Clustering. This drive contains information about the state of the cluster. If this drive becomes corrupt or damaged, the cluster installation will also become corrupt or damaged.

Operating System

The following is a list of components that are exposed at the operating-system level:

Cluster Name

This is the name that all Windows NT or Windows 2000 external connections will use to refer to the cluster; the individual nodes will never be referenced.
Cluster IP Address

This is the IP address that all external connections will use to reach the failover cluster. SQL Server 2000 allows multiple IP addresses to be associated with a SQL Server virtual server.
Cluster Administrator Account

This account is used to administer and own the failover cluster. A Cluster Administrator account must be created at the domain level.
Cluster Resource Types

Cluster resources include any services, software or hardware that can be configured within a cluster. These include: DHCP, File Share, Generic Application, Generic Service, Internet Protocol, Network Name, Physical Disk, Print Spooler, and WINS.
Cluster Group

A Cluster Group is a collection of logically grouped cluster resources, and may also contain cluster-aware application services such as SQL Server 2000. Conceptually, think of a cluster group as a folder on your hard drive that contains related information.

For more information about Windows NT Server 4.0, Enterprise Edition, see https://www.microsoft.com/ntserver/, and for more information about Windows 2000 Advanced and Datacenter Servers, see https://www.microsoft.com/windows2000/.

Virtual Server

Understanding the concept of a virtual server is a key to understanding clustering. To a client or application, a virtual server is the server name or IP address(es) that is used for access. The connection from the client to the virtual server does not need to know which node within a cluster is currently hosting the virtual server. An instance of a clustered SQL Server is known as a SQL Server virtual server.

SQL Server 2000

SQL Server 2000 is built on top of the existing operating-system clustering, because it is a cluster-aware application. The SQL Server 2000 virtual server sits on top of the existing Windows Clustering installation, as shown in the following illustration.

Cc917638.sql2kclu(en-us,TechNet.10).gif

Components

SQL Server Virtual Server Name

This is the name that all applications and clients will use to refer to the SQL Server instance; the individual nodes will never be referenced.
SQL Server Virtual Server IP Address(es)

There are one or more IP addresses that all external connections will use to reach the SQL Server instance.
SQL Server Virtual Server Administrator Account

This is the SQL Server service account. This account must be a domain administrator. For more information about creating this account, see "Setting up Windows Services Accounts" in SQL Server Books Online.

Instances of SQL Server

An instance is an installation of SQL Server that is completely separate from any other, with a few underlying shared components that affect how SQL Server 2000 works in a clustered environment.

The following components are the underlying shared components:

Full-text search

Although each instance gets its own clustered full text search resource, which relies on the underlying Microsoft Search service that is shared by all instances and, potentially, by other applications.
Microsoft Distributed Transaction Coordinator (MS DTC)

There is only one MS DTC resource per cluster.
Microsoft Message Queuing (MSMQ)

Any application using MSMQ has the same limitation as with MS DTC; all instances share one resource.

SQL Server 2000 supports multiple instances per server—one default instance, and up to 15 named instances. SQL Server can either be installed as a default instance or a named instance. If a SQL Server 7.0 installation exists, and SQL Server 2000 is installed, the SQL Server 7.0 instance will become the default instance, or it can be upgraded to SQL Server 2000. Only one SQL Server 7.0 instance can be running, but multiple SQL Server 2000 named instances can be running in conjunction with the single SQL 7.0 instance. In a clustered environment, SQL Server 6.5 and SQL Server 7.0 clustering is not supported on the same computer on which an instance of SQL Server 2000 exists, whether it is clustered or not. A SQL Server 2000 virtual server can also have local named instances or a local SQL Server 7.0 default instance, but these will not appear visible to Windows Clustering. These are instances local to the particular server in which they are installed.

With instances, come two new concepts for failover clustering: single instance, which replaces an active/passive cluster, and multiple-instances, which replaces an active/active cluster.

Single-Instance Cluster

A single-instance cluster has only one active instance of SQL Server owned by a single node, and all other nodes of the cluster are in a wait state. Another node would be enabled in the event of a failure on the active node, or during a manual failover for maintenance. The following illustration shows a single-instance cluster.

Cc917638.sineinst(en-us,TechNet.10).gif

Multiple-Instance Cluster

A multiple-instance cluster has up to four nodes, and although SQL Server can support up to the 16-instance limit, having more than four (which is a 1:1 ratio for virtual servers to nodes) is not recommended. Each virtual server requires individual shared disk resources. The physical disks may be in the same, shared disk array, but the logical names must be unique to the instance. SQL Server in a clustered environment also behaves differently from a stand-alone named instance when it comes to IP ports. The default behavior for a stand-alone named instance is that it picks a dynamic port when it is first started, and then keeps that port number. In a cluster, multiple instances can share the same port because it listens only on the IP address assigned to the SQL Server virtual server, and is not limited to a 1:1 ratio as is the case on a single computer. It is recommended, however, that each SQL Server virtual server be assigned to its own unique port to ensure that no potential problems may occur during setup, when the SQL Server services are started and then clustered. It also ensures that each virtual server is completely unique. The following illustration shows a multiple-instance cluster.

Cc917638.multinst(en-us,TechNet.10).gif

How SQL Server 2000 Failover Clustering Works

The clustered nodes check over the heartbeat to see whether each node is alive, both at an operating-system and SQL Server–level. The operating system issues a protocol whereby all nodes in the cluster compete for the resources of the cluster. The primary node reserves the resource every 3 seconds, and the challenger every 5. The process lasts for 25 seconds. If something happened, for example, at second 19, the challenger will detect it at the 20 second mark, and if it is determined that the primary node no longer has control, the challenger will take over the resource.

From a SQL Server perspective, the node hosting the SQL Server resource does a looks alive check every 5 seconds. This is a lightweight check to see whether the service is running and may succeed even if the SQL instance is not fully operational. The is-alive check is more thorough, and involves running a SELECT @@VERSIONSQL query against the server to determine whether it is operational. If this query fails, it will retry five times and then attempt to reconnect to the instance of SQL Server. Should all five retries fail, the SQL Server resource will fail. Depending on the failover threshold configuration of the SQL Server resource, it will either attempt to restart the resource on the same node or fail over to another available node. The execution of the query will tolerate a few errors, such as licensing issues or having a paused SQL Server, but will ultimately fail if its threshold is exceeded.

During the failover from one node to another, the SQL Server service on the other node will start, and go through the recovery process to start the databases. This means that any completed transactions in the transaction log will be rolled forward, and any incomplete transactions will be rolled back. In most cases, a failover will usually not take longer than 30 seconds to one minute

End users and applications access a SQL Server virtual server with its name or IP address. The user or application does not need to worry about which node owns the resources. During the failover process, any active connections are broken. For Web browser users, a simple refresh of the Web page should create a new database connection. In a more traditional client/server application, or one that relies heavily on a middle tier, application designers may want to consider checking to see if the connection exists, and if not, reconnect. Therefore, whatever the user was working on when the server went down may not be completed unless the transaction completes before the server goes down or the transaction is handled within the application.

For more information about failover clustering, see SQL Server Books Online.

Configuring SQL Server 2000 Failover Cluster Servers

Perhaps the most important aspect of a successful SQL Server 2000 failover cluster installation is ensuring that the right hardware and software are correctly deployed for the application designed to run on the failover cluster. The hardware should be highly performing, and scale along with the specific needs of the application(s) accessing SQL Server.

Software Requirements

SQL Server 2000 failover clustering requires SQL Server 2000 Enterprise Edition and one the following operating systems:

Windows NT Server 4.0, Enterprise Edition (with a minimum of Service Pack 5)
Windows 2000 Advanced Server
Windows 2000 Datacenter Server

Note Windows 2000 Datacenter Server is part of the Windows Datacenter Program (WDP). For more information, see https://www.microsoft.com/windows2000/.

Hardware Requirements

A SQL Server 2000 virtual server should be a highly performing server, and one that takes into account high availability. There are two main factors that go into determining what hardware is needed:

What is the application or Web site's current workload, and what is the projected workload in six months, a year, or even two years from now?

This is the information that most people do not have prior to implementing a solution. Having benchmarks on how an application or website performs is critical in determining which operating system and what hardware to buy. The best way to benchmark an application prior to production is in a lab environment. Using tools such as System Monitor (Performance Monitor in Windows NT 4.0) can also establish performance trends. Without a baseline, or some sort of performance documentation, it will be difficult to determine exactly what is needed. Also, any application issues affecting performance, either in current production versions or updates being planned, should be taken into account. SQL Server and its underlying hardware platform will only perform as well as the code that was written to access it.
How much money is budgeted to the project?

A good rule of thumb is to plan for at least six months worth of capacity, but also take into account some amount of time (whether it is a year, two years, or more) in which the server being configured will still be employed and expected to perform. Underestimating the power you will need in the future and incomplete planning will result in poor performance or system downtime, reducing how highly available the solution is.

Important All nodes in a cluster should be configured the same or potentially greater (if the node will own more than one instance) in all aspects to ensure proper application performance in a failover.

For more information about creating a highly available server, see Chapter 16, "Five Nines: The Ultimate in High Availability."

Memory

Depending on which operating system is employed, SQL Server can take advantage of different amounts of maximum memory. An installation of SQL Server 2000 Enterprise Edition supports up to 32 gigabytes (GB) of memory on Windows 2000 Datacenter Server (without AWE enabled). The table shows the maximum amount of memory available to SQL Server per operating system.

Operating system	Maximum
Windows NT 4.0, Enterprise Edition	3 GB
Windows 2000 Advanced Server	8 GB (with AWE enabled)
Windows 2000 Datacenter Server	64 GB (with AWE enabled)

Address Windowing Extensions Memory

With Address Windowing Extensions (AWE), a memory-intensive application can now run much more efficiently under SQL Server 2000 to increase performance. Windows 2000 servers introduced the enhanced AWE API. AWE allows applications to access large amounts of physical memory. Due to limitations of 32-bit memory addressing, only up to 4 GB of physical memory can be used by Windows NT 4.0 and Windows 2000. By default, 2 GB of memory is dedicated to the operating system and 2 GB of memory to the application. With a /3GB switch in the BOOT.INI used by the operating system, the application can access up to 3 GB of memory, and the operating system is reduced to 1 GB of memory. As a result, even if a server were configured with 8 GB of memory, anything past 4 GB would have been virtually unusable. This problem limited the amount of scalability of an application. AWE is the support built into the operating system as a way of exposing extended memory to Win32-based applications.

AWE requires an application, like SQL Server 2000, to be coded specifically for AWE. AWE support within SQL Server 2000 must be configured using the sp_configure option awe enabled. This is set per instance. By default, awe enabled is set to 0, or off. Enabling AWE support in SQL Server 2000 also requires some additional operating-system configuration. For more information, see SQL Server Books Online.

Note If you are enabling AWE memory, it is highly recommended that the configuration be tested prior to bringing the server(s) online in a production capacity.

There are considerations to take into account when you implement AWE memory:

The instance of SQL Server does not dynamically manage the size of the memory address space used.

When AWE is enabled with SQL Server 2000, if the max server memory configuration parameter is not set, it will grab the total memory available (except 128 MB to allow the base operating system to function), potentially depriving the operating system and any other processes that would be running on the same server.
Once initialized, AWE memory holds all the memory acquired at startup until it is shut down.

If AWE is enabled and is taking too much memory, SQL Server would need to be shut down to reconfigure it, causing downtime (which makes a high availability option like failover clustering less available). Because the memory pages used by the SQL Server instance are taken from the nonpageable pool of Windows memory, none of the memory can be exchanged. This means that if the physical memory is filled up, it cannot use the page file set up on a physical disk to account for the surplus in memory usage.

For more information about configuring AWE memory on your server, see "Using AWE Memory on Windows 2000" in SQL Server Books Online.

Disk Configuration and File Placement

The main component of any database system is its storage—it contains the valuable data used and inserted by an application. For high availability, disks should be part of an external array and should be fault tolerant. The disks should be high speed for performance and support a large capacity.

The disks can be configured either with a small computer system interface (SCSI) or Fibre Channel. Fibre Channel is the typical method of implementing. Fibre Channel was specifically designed for high bandwidth and capacity. Storage Area Networks (SANs) are disk arrays that use networking protocols over fibre to do all I/O. Use of SANs may be supported for use in conjunction with failover clustering as a Cluster/Multi-Cluster Device. SCSI is not supported for Windows 2000 Datacenter, so this consideration should be taken into account when deciding on hardware and the operating system.

Note SCSI is not supported in a Windows Datacenter Server cluster. Fibre Channel must be used.

Data and log devices, as well as tempdb, should be placed on separate disks. If possible, put them all on different channels. If your system is very large, or has hotspots, you may decide to use filegroups as a method of splitting up the disk I/O. Further segmenting this by putting filegroups on different disks, on different channels, can result in a performance boost. It is important to keep the file placement in mind when you are analyzing your high-availability design. Performance issues caused by bottlenecks can be incorrectly viewed as availability problems.

Data should be configured with RAID 0+1 for high availability and performance, but RAID 5 is also acceptable (however, RAID 5 may result in slower restore times). Logs should be placed on RAID 1.

For more information on RAID levels and high availability, see Chapter 16, "Five Nines: The Ultimate in High Availability," in this book.

Quorum Disk

It is recommended that you do not install SQL Server database files on the quorum disk, and that you do not put SQL Server or any other applications in the same group as the quorum disk. Furthermore, SQL Server 2000 Setup will never use the quorum disk by default unless there are no other disks available.

Controller Configuration

Choose a card with enough channels to split the logical grouping of disks (for example, data and logs) to reduce I/O contention. If the Fibre Channel/RAID controller is internal to the node, and not in the shared disk array, writeback caching should be disabled. The reason for this is that even with battery backup, once the resources fail over to another node, there could be items in the cache. If the services are failed back over to the node, corruption could occur because it will attempt to overwrite things on the disk. It would also cause data loss in a failover if transactions were in the cache but not yet processed.

Using more than one RAID controller will also increase performance and reduce I/O contention.

Making Sure Logical Disks are Seen by the Virtual Server

A common problem in failover clusters occurs when it is improperly configured to allow the virtual server to see the Physical Disk resource. This could be caused by one of two things:

The proper disk drivers might not be installed.

Make sure that they are installed. In some cases, for example, an operating-system upgrade from Windows NT 4.0, Enterprise Edition to Windows 2000 Advanced Server, there may be specific drivers for Windows 2000, but the old drivers may still be on the system.
The drive might not be a dependency of the SQL Server 2000 virtual server.

To check whether the drive is a dependency of the virtual server

Start Cluster Administrator, right-click the SQL Server 2000 virtual server, and then click Properties.
On the SQL Server (nodename) Properties dialog box, click the Dependencies tab, and then click Modify.
On the Modify Dependencies dialog box, the available resources for the cluster appear in the Available resources list. Select the drive to add, click the arrow to move the resource to the Dependencies list, and then click OK.
To verify that the resource is now a dependency, on the SQL Server (nodename) Properties dialog box, click the Dependencies tab.

Networking

SQL Server 2000 supports multiple IP addresses and network cards. For configuration information, see "Best Practices" in this chapter. If larger bandwidth is required, SQL Server 2000 has support for higher bandwidth networking with Giganet or Compaq's Servernet II technology on Compaq hardware. If these technologies are used, they will create higher performance between multiple SQL Servers. Giganet support is built-in, and the update to enable Servernet II is located at https://www.microsoft.com/sql/.

Location

In Windows Clustering, server nodes must be physically located near each other. That means that, in a data center, when there is an event such as a power failure, no one will be able to connect to the cluster. There are two approaches to solving this issue:

Configure log shipping to a server in a different location.

A log-shipped server is one in which the transaction logs from the primary server are applied to a secondary server. This server may not be clustered, but it will at least allow SQL connectivity for users. A disaster recovery plan will need to be in place.

Hardware Compatibility List

Before deciding on all final hardware, consult the Microsoft Hardware Compatibility List. All hardware must appear on the HCL, or the cluster configuration will not be supported. There are specific cluster categories, including "Cluster," which contains complete server configurations. The HCL can be found at https://www.microsoft.com/whdc/hcl/default.mspx.

Configuration Worksheets

A SQL Server 2000 failover cluster configuration requires certain resources. At a minimum, you need a domain-level administrator account for the cluster installation (this account may be an existing administrator in the domain and does not need to be a specific account). Some environments have a main cluster administrator account, and one for the SQL Server services; this is recommended to conceptually make the cluster's relationship with the domain easier to understand and troubleshoot. Some combine both. Also needed are specific static IP addresses for the following: MS DTC, any SQL Server virtual server, and the cluster itself.

Two worksheets are provided below that can be copied and placed in a Run Book (for more information about a Run Book, see Chapter 16, "Five Nines: The Ultimate in High Availability") or a configuration document. These worksheets are designed for you to enter the values used in your environment for the configuration of failover clustering. Using these worksheets will not only help you reconfigure your clustering solution if servers must be rebuilt, but can serve as an excellent resource for anyone who wants to know how your SQL Server environment is configured.

Cluster Configuration Worksheet

Parameter	Value
Cluster Domain Administration Account
Cluster Domain Administration Password
SQL Server Domain Administration Account
SQL Server Domain Administration Password
Cluster Name
Cluster IP Address
Number Of Nodes In Cluster
Disk Array Configuration (RAID Level, size, logical disk names, number of disks per each drive, channels, cache setting, and so on)

SQL Server Virtual Server Configuration Worksheet

There can be up to 16 instances of SQL Server per failover cluster. If more than 4 instances are used, copy the following chart with new virtual server numbers.

Parameter	SQL VS1	SQL VS2	SQL VS 3	SQL VS4
Instance Name
IP Address(es)
IP Port(s)
Network Name(s)
Allocated drives

Implementing SQL Server 2000 Failover Clustering

This section describes the implementation considerations when configuring your failover cluster. For complete installation instructions, see "Installing Failover Clustering" in SQL Server Books Online. To ensure a successful installation, restarting the server is recommended to allow locked resources to be released and any pending file renames to be completed.

Prerequisites

Prior to installing SQL Server 2000, verify that only the services necessary for the operating system are running. Any other services should be stopped because they may interfere with the installation process. These services include SNMP, the World Wide Web Publishing service, and vendor specific programs. The easiest way to start and stop multiple services is to create two batch files, one that contains multiple net stop commands, and one that contains the corresponding net start commands.

Below is a list of services that should be left running:

Windows NT 4.0 Server, Enterprise Edition

· Alerter
· Cluster Service
· Computer Browser
· Event Log
· License Logging Service
· Messenger
· Net Logon
· Windows NT LM Security Support Provider

· Plug And Play
· Remote Procedure Call (RPC) Locator
· Remote Procedure Call (RPC) Service
· Server
· Spooler
· TCP/IP NetBIOS Helper
· Time Service
· Workstation

Windows 2000 Advanced Server and Windows 2000 Datacenter Server

· Alerter
· Cluster Service
· Computer Browser
· Distributed File System
· Distributed Link Tracking Client
· Distributed Link Tracking Server
· DNS Client
· Event Log
· IPSEC Policy Agent
· License Logging Service
· Logical Disk Manager
· Messenger
· Net Logon
· Windows NT LM Security Support Provider

· Plug and Play
· Process Control
· Remote Procedure Call (RPC) Locator
· Remote Procedure Call (RPC) Service
· Remote Registry Service
· Removable Storage
· Security Accounts Manager
· Server
· Spooler
· TCP/IP NetBIOS Helper
· Windows Management Instrumentation Driver Extensions
· Windows Time Service
· Workstation

Installation Order

This section provides the installation order for your specific operating system and SQL Server 2000.

Windows NT 4.0 Server, Enterprise Edition

Install Windows NT 4.0 Server, Enterprise Edition (do not install Internet Information Service).
Create domain users.
Install Windows NT 4.0 Service Pack 3.
Install Microsoft Internet Explorer 5.
Disable NetBIOS on internal private networks.
Install Microsoft Cluster Server (MSCS) on both nodes.
Manually create MS DTC Cluster Resources, see "Creating the MS DTC Resources (Windows NT 4.0, Enterprise Edition Only)" in this chapter.
Install Windows NT 4.0 Option Pack if you want, but do not install MSMQ.
Install Windows NT Service Pack 5 or later.
Stop unnecessary services.
Install SQL Server 2000.

Windows 2000 Advanced/Datacenter Servers

Install Windows 2000 Advanced/Datacenter Server.
Install Microsoft Internet Explorer 5 Update (if necessary).
Create domain users.
Disable NetBIOS on internal private networks.
Install MSCS on all nodes.
Create MS DTC Cluster Resource (for installation instructions, see "Failover Clustering Dependencies" in SQL Server Books Online).
Stop unnecessary services.
Install SQL Server 2000.

Creating the MS DTC Resources (Windows NT 4.0, Enterprise Edition Only)

This section provides the instructions for configuring the MS DTC resources for servers running Windows NT 4.0, Enterprise Edition, which requires a more complex setup procedure than that for Windows 2000 servers.

IP Address

To configure the MS DTC IP address for the cluster

In Cluster Administrator, select the disk group that contains the Quorum disk resource. Right-click the disk group, and then rename it.
Select the disk group you want, and on the File menu, click New, and then click Resource. In the New Resource dialog box, in the Name box, enter MSDTC IP Address; in the Resource Type box, select IP Address; and in the Group box, select the group you want. Click Next to continue.
Both nodes of the cluster should appear as possible owners. If not, add the node(s), and select Next.
In the Dependencies dialog box, select the disk resource in the group you selected from the Available Resources box, and then click Add. The disk resource appears in the Resource Dependencies box. Click Next to continue.
In the TCP/IP Address Parameters dialog box enter the TCP/IP information. In the Address box, enter the static IP address (for example, 10.1.14.131); in the Subnet mask box, enter the IP subnet (for example, 255.255.255.0); and in the Network to use box, select the cluster network you want. Click Finish to continue.
A message appears confirming that the IP address is successfully configured.
In the Cluster Administrator window, the newly created resource appears in the right pane. To start the resource (which is currently offline), right-click the resource, and click Bring Online.

Network Name

To configure the MS DTC network name for the cluster running on Windows NT 4.0, Enterprise Edition

In Cluster Administrator, on the File menu, point to New, and then click Resource.
In the New Resource dialog box, in the Name box, enter MSDTC Network Name; in the Resource type box, select Network Name; and in the Group box, select the group you want. Click Next to continue.
In the Possible Owner dialog box, both nodes of the cluster should appear as possible owners. If not, add the node(s), and click Next.
In the Dependencies dialog box, the MSDTC IP address resource you configured previously appears in the Available resources box. Select the resource, and then click Add. The resource appears in the Resource dependencies box. Click Next to continue.
In the Network Name Parameters dialog box, enter MSDTC, and then click Finish.
A message appears confirming that the IP address is successfully configured.
In the Cluster Administrator window, the newly created resource appears in the right pane. To start the resource (which is currently offline), right-click the resource, and then click Bring Online.

Best Practices

In addition to understanding the fundamentals of failover clustering, you may find it useful to keep the following tips and best practices in mind.

Using More IP Addresses

When configuring a network card for use within a cluster, you should consider the options available based on how many types you must support, given the number of network cards available. For example, you could maximize the use of your network cards by configuring one card for all communications including:

All external client connections and inter-node traffic.
Internal cluster (private network between the cluster nodes only).
Client access only (public network to allow client connectivity).

Even though only one network card in a cluster can handle all cluster network communications, this creates a single point of failure. The optimum configuration is to have a separate network card to handle each type of connectivity.

Ideally, three IP addresses and three network cards should be associated with any instance of SQL Server:

Heartbeat

This could be configured as "internal cluster communications only" or "all communications". If configured as "all communications", although it serves a dual purpose, it will only decrease potential bandwidth on the internal side. Choose "internal cluster communications only" for higher availability.
Client Connectivity

This should be configured to "client access only". This, too, could also be configured as "all communications", but the intra-cluster communications will affect the client connectivity pipe.
Separate Private Network

This is different than the Heartbeat, and should be configured as "client access only". It should be configured so that only the servers in the cluster will be able to access this particular IP address. It will enable files to be transferred, or optimally, allow log shipping to be configured in a way that it will not affect the Heartbeat or Client network traffic.

All three IP addresses cannot be on the same subnet. Connectivity problems may be encountered if the same subnet is used for more than one IP address, even if it is not currently being used in the cluster. For example, the following server configuration will not work:

Network Card 1: 172.22.10.1

Network Card 2 (configured as Public Network): 172.22.10.2

Network Card 3 (configured as Private Network): 172.23.7.3

Network Card 4 (configured as Heartbeat): 172.24.2.5

Also, there are network cards that support multiple IP addresses being bound to them. Although this would allow a failover cluster to talk over more than one network, it still creates a single point of failure, which is something that must be avoided in a high availability solution.

Configuring Node Failover Preferences

When you use more than two nodes in a cluster, it is important to consider in the event of a failover, which node should own the SQL Server processes? With up to four nodes available, there should be an order that makes logical sense for the production environment. The failover preferences should be set for the group containing all the resources for the instance of SQL Server (not only on the virtual server) to ensure that all resources properly fail over to the same node.

To configure the nodes available for failover and the preferred order

Start Cluster Administrator. Right-click the group containing the SQL Server 2000 virtual server, and then click Properties.
On the General tab, the Preferred owners list box displays all cluster nodes that can potentially own the processes in that group, and the current order in which they would fail over. To change these, click Modify.
In the Modify Preferred Owners dialog box, make any changes to which nodes can own the processes. All available nodes (if not already selected) will appear in the left pane, and all nodes currently configured as potential owners will appear in the right pane in the order of failover preference. For example, there are four nodes in a cluster: DENNIS, TOMMY, ANN, and CHUCK. All four nodes of the cluster can be potential owners, and the order of failover if DENNIS goes down is set to be ANN then TOMMY, and finally CHUCK if both ANN or TOMMY are unavailable.

Memory Configuration

This section presents the considerations for memory usage in a failover cluster.

Single Instance Failover Cluster

In a single-instance SQL Server 2000 failover cluster, the failover scenario is simple: If the primary node fails, all processes go to the designated secondary node configured (see "Configuring Node Failover Preferences" in this chapter). The secondary node should always be configured exactly the same as Node A in terms of hardware. If not, problems may occur if the failover node does not have the same capacity as the primary node, especially in terms of memory, as evidenced in Example Two. Keep in mind any other processes that may be running on the server node, as well as overhead for the operating system.

Example One: Two Nodes, Exact Configuration

There are two nodes in the cluster, A and B. Both are configured exactly the same with 4 GB of memory, and SQL Server is configured to use 3 GB. Because it is a single-instance failover cluster, at least one node is waiting to have services failed over to it.

Cc917638.single11(en-us,TechNet.10).gif

MSCS detects failure on Node A, and fails over to Node B.

Cc917638.single13(en-us,TechNet.10).gif

All services fail over successfully, and there is no memory contention on Node B.

Example Two: Nonexact Configuration

There are two nodes in the failover cluster, A and B. Node A is configured with 4 GB of memory, of which SQL Server uses 3 GB. Node B is configured with 2 GB of memory.

Cc917638.single21(en-us,TechNet.10).gif

Node A has some sort of failure detected by Windows Clustering, and fails over to Node B.

Cc917638.single22(en-us,TechNet.10).gif

Node B does not have the physical memory to support the instance of SQL Server from Node A. In the failover process, the server node will immediately start paging to disk because SQL Server is looking for more memory than is physically available. The server will now be short of resources, potentially causing the node to become unresponsive and stop responding.

Cc917638.single23(en-us,TechNet.10).gif

Multiple-Instance Failover Cluster

In a multiple-instance SQL Server 2000 failover cluster, the scenario becomes more complex. With up to 16 instances that could be active at a time on one node, how does one effectively manage memory? First and foremost, ensure that all servers have the same amount of memory, and that it is enough to handle the instances that could potentially fail to that node. Another important thing is to cap memory usage of the instance of SQL Server with max server memory, as with the previously described AWE. Especially if AWE memory is enabled, max server memory must be set in a multiple-instance cluster to prevent starving the server node, as shown in Example Two that follows. Consider any other processes that may be running on the server, as well as overhead for the operating system.

Example One: Two Instances of SQL Server, Noncapped Memory

Nodes A and Node B both have the same configuration, with 4 GB of physical memory, and each has a SQL Server virtual server using 3 GB of memory.

Cc917638.multi1-1(en-us,TechNet.10).gif

There is a failure on Node B, requiring the failover process to change the SQL Server resources to Node A.

Cc917638.multi1-2(en-us,TechNet.10).gif

Because there is not enough memory on Node A, Node A cannot properly handle both instances of SQL Server.

Cc917638.multi1-3(en-us,TechNet.10).gif

Example Two: Two Instances of SQL Server, Capped Memory

In this example, there are two Nodes, A and B. Both are configured with 8 GB of AWE memory and are each running an instance of SQL Server with max server memory capped to 3 GB.

Cc917638.multi2-1(en-us,TechNet.10).gif

There is a failure on Node B, requiring the failover process to change the SQL Server resources to Node A.

Cc917638.multi2-2(en-us,TechNet.10).gif

Because the system is configured properly, both instances of 3 GB fit comfortably under the 8 GB of physical memory on the server, and there is still 2 GB of memory left for the operating system and any other processes running.

Cc917638.multi3-2(en-us,TechNet.10).gif

Using More Than Two Nodes

When using more than two nodes on a SQL Server 2000 failover cluster, consider the following:

How much memory should be configured for each instance?
What nodes are the failover cluster nodes for the particular instance? What is the preferred order?
Is there enough disk space and memory to support every instance configured to fail over to a particular node?

Because SQL Server 2000 can use four nodes when supported by the operating system (the number of virtual servers is only limited by the choice of operating system and the capacity of your hardware), and have up to 16 instances, these considerations become more important as mission-critical systems become larger.

Scenario One: Four-Node Multiple-Instance SQL Server 2000 Failover Cluster, Three Active Nodes, One Standby (N+1)

With four-node support, Windows 2000 Datacenter Server provides more flexibility in terms of a cluster configuration. The recommended way of using a four-node Windows 2000 Datacenter Server cluster in a SQL Server environment is to have three of the nodes each owning an instance SQL Server, and have the fourth be the warm standby, not unlike a log shipping scenario, or a single-instance failover cluster where at least one node is waiting for work. This scenario is known as N+1. Because a failure should be considered unlikely to happen, the fourth node should be configured as the primary failover. This would reduce the issue of having too many instances starving the resources of one node. AWE memory should be enabled in this scenario to allow each instance of SQL Server to not have to be capped at or under 1 GB of memory. This would also enable a highly scalable solution as well. The following illustration shows an example of an N+1 cluster.

Cc917638.3node(en-us,TechNet.10).gif

Scenario Two: Four Node Multi-Instance SQL Server 2000 Failover Cluster, All Four Nodes Active

Running four instances of SQL Server on four nodes requires planning, so that another instance will not starve resources due to memory and processor consumption in the event of a failover. Memory is not as much of a problem as processor resources. For example, if the workload on the production OLTP system regularly pegs 8 processors at 50-percent utilization and all four active SQL Servers demonstrate similar behavior, memory can only compensate for processor so much. More processors would have to be added (if possible).

Miscellaneous Configuration Issues

If encryption is used in a clustered SQL Server environment, all nodes must have the certificate applied.
It is not recommended to have both Microsoft Exchange 2000 and SQL Server 2000 on the same cluster.

For additional considerations, see SQL Server Books Online.

Failover/Failback Strategies

An overall cluster failback/failover policy should be implemented. Failovers can be controlled in terms of a threshold, meaning that after a certain point, a resource will not be failed over. In the event of a failover, the cluster can be configured to failback to the primary node when and if it becomes available again. By default, this option is turned off because usually there is no problem with continuing on the secondary node, and it will allow you a chance to analyze and repair the problem on the failed node.

To configure automatic failback

Start Cluster Administrator. Right-click the group containing the SQL Server 2000 virtual server, and click Properties.
In the Properties dialog box, click the Failback tab.

To prevent an automatic failback, select Prevent Failback. To allow automatic failback, select Allow Failback, and then one of these options:
- Immediately
  
  This means that the second MSCS detects that the preferred cluster node is online, and it will fail back any resources. This is not advisable because it could disrupt clients and applications, especially at peak times in a business day.
- Failback between n and n1 hours
  
  This option allows a controlled failback to a preferred node (if it is online) during a certain time period. The hours are set using numbers from 0 through 23.

To configure failover thresholds

Start Cluster Administrator. Right-click the group containing the SQL Server 2000 virtual server, and click Properties.
In the Properties dialog box, click the Failover tab.
To configure the failover policy, in the Threshold box, enter the number of times the group is allowed to failover within a set span of hours. In the Period box, enter the set span of hours.

For example, if Threshold is set to 10 and Period is set to 6, the Cluster service will fail the group over at most 10 times in a six-hour period. At the eleventh failover in that six-hour period, MSCS will leave the group offline. This affects only resources that were failed over; therefore, if the SQL Server resource failed 11 times, it would be left offline, but the IP could be left online.

Maintaining a SQL Server 2000 Failover Cluster

Maintaining a failover cluster can be challenging. For example, how do you create an environment that is seamless, and works no matter what node owns the SQL Server processes? There are some unique considerations to take into account in a clustered environment.

Backing Up and Restoring

Because a SQL Server 2000 failover cluster is a single-server instance, normal best practices around backing up and restoring the databases on that server will apply, including:

Frequent backups.
Offline and tape (or other media) rotation to offsite storage of backups.
Testing restore of backups on another server to ensure that data can successfully and reliably be recovered in the event of an emergency.

Backing Up to Disk

If a network drive other than part of the cluster were selected, all nodes would need to have this drive mapped. Or, if the node were defined with a* * \\servername\sharename convention, it would have access to the drive. Another consideration is that if the shared disk array is going to be used as a backup medium, it could be considered a single point of failure in the event of a drive failure or disk corruption unless allowances are made for another backup medium. Besides mirroring the disks with RAID, a third-party disk mirroring solution (possibly even a distance cluster) would solve the single point of failure problem.

Another way would be to provide two steps in the backup task. Set up two backup drives: (either two network or one local and one network). Set up your maintenance plan, and then alter the backup task. If the first backup succeeds, exit with success code, and if it fails (for any reason), the second step is invoked. The second step would be the backup to the alternate target. The second task should send an e-mail notification that there was a failure of the primary backup area.

To make this even more secure, part of the backup task should include making a copy of the new backup file to a secondary storage area. Ideally, you want a local backup to restore from, a secondary backup on another server, and a tertiary backup on a tape. This would be in addition to any standby or clustering solution. Because clustering is a high availability solution that focuses on the hardware portion of the design, you should call a lot of attention to your disaster recovery plan. There is little point in spending the money on the cluster if you do not adequately backup your data and provide the fastest and most secure methods allowable.

Backing Up to Tape

Using tape as a backup solution presents a larger challenge. The device would need to be shared between all nodes of the cluster, or an identical device configured in exactly the same way would need to exist on each node of the cluster. A better solution is backing up the databases to either a share or a remote mirror, and then back up the files to tape from that location, simplifying the process.

Snapshot Backups

One way to backup and restore a clustered SQL Server is to use a snapshot backup. SQL Server 2000 supports snapshot backups, which involves the mirroring of your disks, and then breaking a complete set of disks off the mirror and using them as a backup. This requires specialized hardware, but is fully supported within SQL Server. Snapshot backups may create higher availability.

Backing Up an Entire Clustered System

One challenge to any environment is backing up the entire system (all files, operating system, databases, and so on). The biggest obstacle is that Windows locks files when they are in use. Third-party backup solutions exist that allow hot backups. In a clustered configuration, the problem is compounded because the backup solution must be cluster aware and work on each node. The previous considerations for backing up to disk and tape still apply: Ensure that there are no single points of failure and all nodes have access to the same devices in the same way.

Ensuring a Virtual Server Will Not Fail Due to Other Service Failures

To prevent the failure of specific services from causing the SQL Server group to fail over, configure those services using Cluster Administrator in Windows NT 4.0 or Windows 2000. For example, to prevent the failure of the Full-Text Search service from causing a failover of SQL Server, clear the Affect the Group check box on the Advanced tab of the Full Text Properties dialog box. If SQL Server causes a failover, however, the full-text search service will restart.

Adding, Changing, or Updating a TCP/IP Address

In earlier versions of SQL Server 2000 failover clustering, changing the TCP/IP address required SQL Server to be unclustered. Now, you can change the TCP/IP address by running the setup program again. Also, due to the new multiple network card/IP address support built into SQL Server 2000, additional TCP/IP addresses can be configured for the instance. For example, if you have internal and external customers accessing the instance, you can assign SQL Server two separate IP addresses to maximize network utilization and to more easily track the use of your instance of SQL Server.

To add, change, or update a TCP/IP address

Insert the SQL Server 2000 Enterprise Edition compact disc in your CD-ROM drive. Select Install SQL Server 2000 Components.
Click Install SQL Server 2000 Components, click Install Database Server, and then click Next.
On the Computer Name dialog box, select Virtual Server, and enter the name of an existing clustered instance of SQL Server 2000.
On the Installation Selection dialog box, select Advanced options, and then click Next.
On the Advanced Options dialog box, select the Maintain a Virtual Server for Failover Clustering, and then click Next.
On the Failover Clustering dialog box, a TCP/IP address can be added to or removed from the selected instance of SQL Server 2000.

To remove a TCP/IP address, select the address, and click Remove.

Note An instance of SQL Server 2000 in a failover cluster requires a TCP/IP address to function. Only remove a TCP/IP address if more than one exists and if this will not affect users or applications accessing SQL Server.

To add a TCP/IP address, enter the new TCP/IP address in the IP Address box, select the network to use, and then click Add. The new IP address appears after the existing IP address.
Click Next.
On the Remote Information dialog box, enter the user name and password for the domain administrator account used for the clustered instance of SQL Server 2000, and then click Next.
When the process is complete, click Finish.

Adding or Removing a Cluster Node from the Virtual Server Definition

Another new feature of SQL Server 2000 failover clustering is the ability to add or remove a cluster node from a SQL Server virtual server definition. Adding nodes to the existing SQL Server virtual server definition performs all the necessary operations on the new nodes (including installing binaries, system components, and creating services) and performs the necessary modifications to the cluster configuration.

To add or remove a node

Insert the SQL Server 2000 Enterprise Edition compact disc in your CD-ROM drive. Select Install SQL Server 2000 Components.
Click Install SQL Server 2000 Components, click Install Database Server, and then click Next.
On the Computer Name dialog box, select Virtual Server, and enter the name of an existing clustered instance of SQL Server 2000.
On the Installation Selection dialog box, select Advanced options, and then click Next.
On the Advanced Options dialog box, select the Maintain a Virtual Server for Failover Clustering, and then click Next.
On the Failover Clustering dialog box, a TCP/IP address can be added to or removed from the selected instance of SQL Server 2000.

To remove a TCP/IP address, select the address, and click Remove.

Note An instance of SQL Server 2000 in a failover cluster requires a TCP/IP address to function. Only remove a TCP/IP address if more than one exists and if this will not affect users or applications accessing SQL Server.

To add a TCP/IP address, enter the new TCP/IP address in the IP Address box, select the network to use, and then click Add. The new IP address appears after the existing IP address.
Click Next.
Select the appropriate nodes to add or remove from the cluster, and then click Next when you are finished.

Troubleshooting SQL Server 2000 Failover Clusters

This section presents some of the common issues encountered when implementing failover clustering, and how to solve them. For more tips and tricks, see "Failover Cluster Troubleshooting" in SQL Server Books Online.

Q When the installation process attempts to install the SQL Server binaries on the other node(s), it fails (possibly with a failed logon request error). Why?

A If you are installing from a network share, make sure that both nodes have connectivity to the share without specifying a network password to connect (for example, you should be able to view \\sharecomputer\sharepath without specifying credentials). If installing from CD-ROM drive on Node A, make sure the cluster nodes are configured to communicate properly, including making sure the proper accounts exist on each node, and that the other nodes are set up for Windows authentication. Mapping a drive letter, even if it is the same on all nodes, will not work because the installation process accesses the UNC path of the share.

Q After installing and rebooting the server, the SQL Server install doesn't seem to complete. Why?

A Sometimes file renames are blocked due to locks on startup (for example, MDAC); therefore, if the file remains read only, it never completes.

Q How do I troubleshoot my cluster in the event of a failure?

A The best place to look from a SQL Server perspective is in the directory that is set for the variable %clusterlog% (generally \\winnt\cluster). These are the files:

Sqlstpn.log – log for the SQL Server setup, where n is the number of the setup attempts
Sqlclstr.log – log for the clustered SQL Server(s)
Cluster.log – the main cluster log file
Sqlstp.log – the log from the user interface part of Setup

Follow clustering guidelines for enabling cluster logging. Here are some sample settings:

set ClusterLog=D:\WINNT\cluster\cluster.log
set ClusterLogLevel=3
set ClusterLogSize=10 (this number is in MB)

Q MSCS lost connectivity to my SQL Server virtual server. Why?

A This may be because the BUILTIN\Administrators user was removed. This user is used to run the IsAlive process.

Q Problems ensue when changing the network name of SQL Server after the install. How can this be fixed?

A SQL Server binds itself to the network name used at install time, and if, for some reason, it is changed, in certain occurrences, it may require a complete reinstall.

Q After an installation of a new SQL Server virtual server, clients cannot connect to the server, especially using graphical user interface (GUI) tools. Why?

A Sometimes a successful manual connection using a command line tool (for example, the bcp or osql utilities) must be initiated first.

Q Microsoft Exchange 2000 and SQL Server 2000 are both installed on my cluster; however, the full-text service in SQL Server seems to be failing. Why?

A If both must exist on the same cluster, install Exchange 2000 first, and then install SQL Server 2000.

Q I am encountering problems with full-text setup in my failover clustering. What might be wrong?

A In some cases, Setup may fail because the full-text resource type does not exist. The resource is called Microsoft Search Service. If this is the case, you may have to manually create the type to get the installation to complete next time. To create the resource, execute the following from a command prompt:

cd %windir%\cluster
regsvr32 gathercl.dll

It would be a good idea to register this file on all nodes. You can then re-run Setup and everything should work fine.

Finding More Information

For more information about SQL Server 2000 failover clustering, see the following resources:

Microsoft SQL Server home page at https://www.microsoft.com/sql/.
MSDN® developer program at https://msdn2.microsoft.com/default.aspx.
Microsoft Support Services Web site (contains technical articles and downloadable updates) at https://support.microsoft.com/.
High Availability Operations Guide for Windows NT Server 4.0 at https://www.microsoft.com/ntserver/ProductInfo/default.asp.

Chapter 12 - Failover Clustering

Enhancements to Failover Clustering

Windows Clustering

Microsoft Cluster Service Components

Hardware

Operating System

Virtual Server

SQL Server 2000

Components

Instances of SQL Server

Single-Instance Cluster

Multiple-Instance Cluster

How SQL Server 2000 Failover Clustering Works

Configuring SQL Server 2000 Failover Cluster Servers

Software Requirements

Hardware Requirements

Memory

Address Windowing Extensions Memory

Disk Configuration and File Placement

Quorum Disk

Controller Configuration

Making Sure Logical Disks are Seen by the Virtual Server

Networking

Location

Hardware Compatibility List

Configuration Worksheets

Cluster Configuration Worksheet

SQL Server Virtual Server Configuration Worksheet

Implementing SQL Server 2000 Failover Clustering

Prerequisites

Windows NT 4.0 Server, Enterprise Edition

Windows 2000 Advanced Server and Windows 2000 Datacenter Server

Installation Order

Windows NT 4.0 Server, Enterprise Edition

Windows 2000 Advanced/Datacenter Servers

Creating the MS DTC Resources (Windows NT 4.0, Enterprise Edition Only)

IP Address

Network Name

Best Practices

Using More IP Addresses

Configuring Node Failover Preferences

Memory Configuration

Single Instance Failover Cluster

Multiple-Instance Failover Cluster

Using More Than Two Nodes

Scenario One: Four-Node Multiple-Instance SQL Server 2000 Failover Cluster, Three Active Nodes, One Standby (N+1)

Scenario Two: Four Node Multi-Instance SQL Server 2000 Failover Cluster, All Four Nodes Active

Miscellaneous Configuration Issues

Failover/Failback Strategies

Maintaining a SQL Server 2000 Failover Cluster

Backing Up and Restoring

Backing Up to Disk

Backing Up to Tape

Snapshot Backups

Backing Up an Entire Clustered System

Ensuring a Virtual Server Will Not Fail Due to Other Service Failures

Adding, Changing, or Updating a TCP/IP Address

Adding or Removing a Cluster Node from the Virtual Server Definition

Troubleshooting SQL Server 2000 Failover Clusters

Finding More Information

Additional resources