Windows 2000 Advanced Server: Clustering Microsoft ITG Infrastructure Services

Article
12/09/2009

w2kclu01

Executive Summary

Microsoft Cluster Server (MSCS), first introduced in the Windows NT Server 4.0 Enterprise Edition operating system, is now called Cluster service in Windows 2000. Cluster service in Windows 2000 Advanced Server and Windows 2000 Data Center Server provides high availability by allowing a server in a cluster to take over and run a service or application that was running on another server that has failed, a process referred to as failover. These services or applications are provided by means of "virtual servers". A virtual server appears as a single system to users. The cluster can provide any number of virtual servers, limited only by the capacity of the servers in the cluster and the available storage to provide the required performance. Administrators control the cluster servers as a single unit, and can administer the cluster remotely.

As one of Microsoft's earliest adopters, Microsoft's Information Technology Group (ITG) clustered several critical infrastructure services on the Windows 2000 Advanced Server operating system before its release to manufacturing. As part of every major product release, ITG deploys new Microsoft software in production, a process known as "eating our own dogfood."

Cluster service provides many benefits to ITG, including:

Rolling upgrade support: A rolling upgrade entails taking a server (called a node) in a cluster offline and upgrading it, moving all the cluster resources to that node, and then doing the same with the other nodes. This provides the minimum planned downtime possible, and is especially important in ITG's environment, where frequent upgrades are required throughout the software development process.
Improved use of hardware resources: For example, a file-share server cluster can support thousands of users accessing terabytes of data located in thousands of file folders, while still maintaining fast response times.
Greater availability: Consolidating data on a server cluster extends the failover protection provided by the server cluster to all data. Both planned and unplanned downtime is minimized, with many ITG failovers occurring in less than a minute or two.
Ease of user access: Integrating shares from many separate file servers into one clustered file share directory structure results in easier and faster access to shares by users. Users access a virtual server, and are unaware of which physical server in the cluster is providing the data.

This paper describes the hardware and software specifications ITG used to cluster several critical production infrastructure services on Windows 2000 Advanced Server in its Windows 2000 native-mode production domain. These core infrastructure services form the basis of Microsoft's new worldwide Windows 2000-based network architecture.

The first part of the paper describes common components of the Cluster service for all infrastructure services ITG has implemented as of this writing. The second part of this paper provides details for the clustered ITG infrastructure services, including the Dynamic Host Configuration Protocol service (DHCP), Windows Internet Naming System (WINS), and Distributed File System (DFS). Representative hardware specifications for the different servers are given, along with monitoring and maintenance tips. Note that the names and IP addresses provided are for illustration only and do not necessarily reflect actual names in use, for security reasons.

The Microsoft enterprise network comprises both an inward-facing network (known as Corpnet) and a customer-facing network (Internet properties). This paper focuses primarily on the Corpnet servers located on the corporate main campus in Redmond, Washington. In some cases, server specifications vary slightly in other parts of Microsoft's global enterprise. This document is based on initial design efforts begun by ITG before the release of Windows 2000 Advanced Server. It is important to note server designs and specifications have undergone frequent changes and will continue to be refined.

This paper is intended for IT architecture, engineering and operations staff and consultants at the enterprise level. It is not intended to serve as a procedural guide. Each enterprise environment comprises unique circumstances, and therefore, each organization should adapt the plans, specification, and "lessons learned" described in this paper to meet its specific needs.

Cluster Service Overview

Cluster service minimizes downtime and support costs by providing high availability, using more than one server to provide services to clients. In the event that one server fails, another server in the cluster provides the services of the server that failed. The resources that provide services to clients appear to clients as virtual servers. These virtual servers are not tied to any particular physical server, and clients are unaware of which physical server is actually providing services. They are even unaware of the fact that they are communicating with a cluster, instead of a stand-alone server. Clustering also allows ITG to perform rolling hardware and software upgrades to clustered nodes with minimal planned downtime (typically less than one or two minutes).

A cluster in this document is defined as a group of two independent computer systems, known as nodes, running the Windows 2000 Advanced Server Cluster service, all attached to common external storage, working together as a single system to ensure that mission-critical services and resources remain available to clients.

Cluster service is based on a shared-nothing model, in which each server owns and manages its local devices. Devices common to the cluster, such as a common disk drive and connection media, are selectively owned and managed by a single server at any given time.

The shared-nothing model makes it easier to manage disk devices, standard applications and services. Using the shared-nothing model enables Cluster service to support Windows 2000- and Windows NT-based applications and disk resources, as well as "cluster aware" applications and services, such as SQL Server 2000 and Microsoft Exchange 2000.

Cluster service clusters must not be confused with other cluster technologies in Windows 2000, such as Network Load Balancing (NLB). For example, ITG uses NLB to distribute incoming Web requests among a cluster of Internet server applications (such as those based on Microsoft Internet Information Services, the Web server built into Windows 2000 Server), as well as for Proxy and streaming media services. ITG uses Cluster service servers to provide infrastructure services, such as DHCP, WINS, and DFS, as well as back-end messaging and database applications.

Cluster service supports standard Windows 2000- and Windows NT Server-based drivers for local server storage devices and media connections. However, the external storage devices that are common to the cluster require small computer system interface (SCSI) devices. Cluster service supports standard PCI-based SCSI connections, including SCSI bus with multiple initiators and SCSI over Fibre Channel (FC). Fibre Channel connections are SCSI devices hosted on a Fibre Channel bus instead of a SCSI bus.

Virtual Servers

Applications and services running on nodes in the cluster are exposed to users and workstations as virtual servers. A virtual server is a group of one or more resources that provide an application or service. Each group has its own network name and IP address and is failed over as a unit. Because each virtual server has its own network name and IP address, the process of connecting to a virtual server appears to users and clients like connecting to a single, physical server. In fact, the connection is made to a virtual server, which may be hosted by any node in the cluster.

A cluster node can host multiple virtual servers representing multiple applications or services, as illustrated in Figure 1.

Figure 1: Virtual servers under Cluster service

Clients connect to the IP address published by Cluster service for the virtual server. In the event of an application or server failure, Cluster service moves the entire resource group providing the service or application to another node in the cluster. When the client detects a failure in its session, it attempts to reconnect in exactly the same manner as the original connection. Because Cluster service simply reassigns the IP address of the virtual server to a surviving node in the cluster, the client can reestablish the connection to the application or service without knowing that it is now hosted by a different node in the cluster. The client view of four virtual servers is illustrated in Figure 2.

Figure 2: Client view of Cluster service virtual servers.

Note: Microsoft Transaction Service (MTS) and Microsoft Message Queue (MSMQ) are part of Windows 2000 Server.)

Hardware Components

A cluster is composed of a number of hardware components, including nodes, aclustered disk, and network interfaces.

Nodes. A Windows 2000 Advanced Server cluster consists of two identical servers, called nodes. A node is a server using Windows 2000 Advanced Server as its operating system. Nodes must either be domain controllers or member servers authenticated by domain controllers. Nodes have their own resources, such as a hard disk, and a dedicated Network Interface Card (NIC) for private cluster network communication. Nodes in a cluster also share access to cluster resources, such as an external disk storage system, called a clustered disk.

Clustered disk. A clustered disk is an external disk system that connects to all nodes in a cluster. Each node in the cluster also contains its own separate system disk. ITG uses external disk arrays, called Storage Area Networks (SANs), as the clustered disk.

Network interface Cards (NICs). ITG uses dual network adapters that provide two services in clusters: client communication and private cluster communication. Each NIC can be configured for one of three settings:

Client communication with the cluster
Cluster node communication within the cluster
Both functions.

One example of a cluster configuration with two network interfaces is shown in Figure 3. Both NICs are connected to the corporate networkone provides client connectivity, while the other is used only for internal cluster communication.

Figure 3: Example cluster configuration with two NICs

SANs

A Storage Area Network (SAN) is a high-speed network that establishes a direct connection between storage elements and host servers. A SAN can be local or remote, shared or dedicated storage. SANs offer ITG external disk access over greater distances by using serial Fibre Channel (FC signaling at 1Gb/sec).

ITG analysis revealed SANs would improve performance for many applications moving large amounts of data between multiple servers over the network: Network resources are freed up for other transactions, and bulk data transfers are performed on the SAN at a much-improved rate by sharing common storage. For example, before implementing SANs, ITG maintained a large sales database that performed five 70GB transfers over the network per weekend and incurred 24 hours of planned downtime. With the SAN architecture, the same operation takes only 2-3 hours and the network is not used.

As another example, ITG has also designed and implemented multiple large servers running SQL Server, using FC attached RAID external disk storage. The bulk of this storage is private to each server but with FC controllers and a common FC hub or switch. Multiple SQL Server-based servers use a shared clustered disk and bypass the copy phase of dump-copy-load SQL maintenance procedure. ITG analysis showed dump-copy-load delays were usually due to network speeds between adjacent servers. Using a shared dump volume eliminated the most time-consuming "copy" part of the process.

Key elements common to different kinds of hardware-specific SANs include:

Externalized storage: storage that is not installed for private single server access.
Centralized storage: storage that can be centrally located, managed, and controlled.
Remote clustering: storage that enables single server and multi server access.

The hardware components that make up a SAN are similar to those of a network with storage elements. Host servers require FC interfaces and storage components such as tape drives, disk drives, RAID controllers, hubs and switches.

Point-to-point topologies, as well as hub-based FC Arbitrated Loop (FC-AL) and switch-based FC Fabric, are available. The current FC standard is based on a data transfer rate of 1gigabit/sec, which can be carried over either copper or optical media to great distances. The node address limitation of a single FC-AL is 127 nodes, and the limit for FC switches is 16 million. Hubs and switches can be combined in a SAN.

The only software required for a server to participate in a SAN is an FC Peripheral Component Interconnect (PCI) Host Bus Adaptor (HBA) interface driver.

Quorum Disks

The most important disk in the SAN is the quorum disk, which is a single disk in the system designated as the quorum resourcea disk that provides persistent physical storage across system failures. The Cluster configuration is kept on this disk, and all nodes in the cluster must be able to communicate with the node that owns it. It is possible (but not recommended) for one disk to store cluster application or service data and also to be the quorum disk.

When a cluster is created or when network communication between nodes in a cluster temporarily fails, the quorum resource is used to prevent the nodes from forming multiple clusters. To form a cluster, a node must arbitrate for and gain ownership of the quorum resource. For example, if a node cannot detect a cluster during the discovery process, the node will attempt to form its own cluster by taking control of the quorum resource. However, if the node does not succeed in taking control of the quorum resource, it cannot form a cluster.

The quorum resource stores the most current version of the cluster configuration database in the form of recovery logs and registry checkpoint files that contain node-independent storage of cluster configuration and state data. When a node joins or forms a cluster, Cluster service updates the node's private copy of the configuration database. When a node joins an existing cluster, the Cluster service can retrieve the data from the other active nodes. Cluster service uses the quorum resource's recovery logs to:

Guarantee that only one set of active, communicating nodes is allowed to operate as a cluster.
Enable a node to form a cluster only if it can gain control of the quorum resource.
Allow a node to join or remain in an existing cluster only if it can communicate with the node that controls the quorum resource.

A simple cluster arrangement showing ownership of the quorum disk is shown in Figure 4.

Figure 4: Simple cluster configuration

In this figure, two nodes are connected to one SAN, but there are multiple volumes (in this case, two) on the SAN.

Node A is providing a virtual server. This means it is providing the network name, IP address, service or application, and has exclusive access to the application or service data associated with that virtual server. Node B has no activity related to providing this virtual server to clients.

Node B has ownership of the quorum disk, illustrating that the node providing the virtual server does not have to be the node that also has control of the quorum resource. Node A could have ownership of the quorum disk in this cluster. As a best practice, ITG does not store data for a virtual server on the disk in the SAN serving as the quorum disk.

Failover

Cluster service uses the private network to detect node failures, and status changes, and to manage the cluster as a single entity. "Cluster heartbeat" messages are sent on a pre-set interval (by default every 5 seconds). If a node fails to respond a specified number of times (by default 3), the surviving server initiates the pre-determined failover process.

Hardware Requirements

Each Cluster service solution (including nodes, controllers and storage) must meet the hardware requirements for Windows 2000 Advanced Server. Hardware must be on the Cluster service Hardware Compatibility List (HCL). For example, each of the HCL-approved nodes requires:

Two PCI network adaptersone for connection to the public network and the other for the node-to-node private cluster network.
A separate PCI storage host adapter (SCSI or Fibre Channel) for the shared disks, in addition to the boot disk adapter for each individual node.
An HCL-approved SAN that connects to all nodes, used as the clustered disk. All shared disks, including the quorum disk, must be physically attached to a shared bus and accessible from all nodes. SCSI devices must be assigned unique SCSI identification numbers and properly terminated. The shared disks must be configured as Windows 2000 basic (not dynamic) disks, and all partitions on the disks formatted as NTFS. Although not required, the use of fault-tolerant RAID configurations is strongly recommended for all disks.
Storage cables to attach the shared storage device to all computers.
Identical hardware, slot for slot, card for card, for all nodes. This makes configuration easier and eliminates potential compatibility problems.

Software Components

Various software components compose a cluster, including the operating system, the Cluster service, cluster resources, and the clustered services or applications themselves. Note that some software components have direct correlation to hardware components.

Operating System. Each node in the cluster must be running Windows 2000 Advanced Server. Windows 2000 Server does not support clustering. Windows 2000 Advanced Server supports two-node clusters.

Cluster Service. Each node in the cluster must have the Cluster service installed. This service must run with a domain user account, so both nodes are able to authenticate. The same account must be used on all nodes, and must be an administrator on all nodes.

Resources. Resources represent individual items that serve functions within the cluster. These include disk resources, the quorum resource, IP addresses, network names, and clustered services.

Disk: The disks configured in the clustered disk attached to all nodes. In this context, it does not refer to the physical hardware that is attached to all nodes. A disk is a resource within a cluster that is capable of storing service or application data and/or being the quorum resource. Note that on a single physical disk system, multiple disk volumes may be configured. In a SAN, the disk holds data used either by services, server applications running on the cluster, or by applications managing the cluster. At any one time, only one node of the cluster owns, or can gain access to, any one disk on the cluster SAN. The ownership of a disk moves from one node to another when the disk group fails over or moves to another node.
- Quorum disk: The quorum disk (also referred to as "quorum resource") stores the cluster configuration database, which contains information necessary to recreate the cluster in its current configuration. The database exists in physical storage so that the cluster can begin operation in the event of node failure. To form a cluster, a node must have access to the quorum device. To join a cluster, a node must be able to communicate with the node that has access to the quorum device. To ensure cluster configuration data is available to other nodes, in the event that the node owning the quorum device fails, the quorum resource must be stored on a physical storage device to which all nodes have a connection. As a best practice, ITG dedicates a single disk in the storage array solely as the quorum resource.
IP Resource: The IP address of the cluster or the IP address clients use to communicate with a virtual server. It must be a static IP address on the same subnet as the IP addresses of the nodes, but different from the IP address configured in the nodes of the cluster. There must be one IP resource configured for each virtual server the cluster provides.
Name Resource: The cluster network name or the network name clients use to communicate with a virtual server. This must be a different name from the name configured in the nodes of the cluster. There must be one unique name resource for each virtual server the cluster provides.
Clustered Services: The services the cluster provides to clients. Examples include file shares, print spooler, DFS server, DHCP server, and WINS server. A clustered service must first exist as a service within each of the individual servers. For example, to create a DHCP cluster, the DHCP service must be installed on each individual node. After that is done, the clustered DHCP server resource is created in the cluster, and the clustered service is managed using the cluster administration tool. To stop the service or application, the entire cluster resource is taken offline.

Not all services or applications can use Cluster service. Services and applications that can be clustered share the following characteristics:

Clients communicate with the services using IP addresses.
Data for a service can be moved to a location specified by the cluster administrator.
Clients can recover from losing connectivity to the server.

In addition, Cluster service supports resources labeled as "Generic Application" and "Generic Service" if the application or service meets the three criteria.

Cluster Service in Windows 2000 Advanced Server natively supports the following core infrastructure services:

DHCP
Distributed Transaction Coordinator (DTC)
File Sharing
Message Queue Service (MQS), part of Windows 2000 Server
Network News Transfer Protocol (NNTP) server
Print Spooler
Simple Mail Transfer Protocol (SMTP) server
Time Server
WINS

Other services, like database and messaging services, are also supported, but the resources for these are provided with the SQL Server 2000 and Exchange Server 2000 products.

Each collection of resources forming a fail-over unit is known as a resource group; all resources in the group move from one node to another together, and are restarted on the new node in the correct order. Groups are commonly used to create virtual servers. When using a group to create a virtual server, the group contains one IP address, one name, the resource or resources for the service or application being provided, and at least one disk resource.

Network Requirements

In order to cluster an infrastructure network service, at least three items related to the network are required, including:

Unique NetBIOS names for the cluster and virtual server(s).
Six (or more) unique IP addresses: two for the network adapters on the private network, two for the network adapters on the public network, one for the cluster itself, and one for each virtual server.
A domain account for Cluster service (all nodes must be members of the same domain).

Setup Procedure

After each node is set up, and the shared external disk storage is correctly cabled, cluster configuration procedures vary somewhat, depending on whether the cluster will be used for network-specific or server-specific information. However, the common steps for cluster configuration, regardless of intended use, include:

Partitioning cluster disks, as needed.
Creating the directory structure as appropriate.
Installing the Cluster service on the first node.
Starting Cluster Administrator.
Installing Cluster service on additional nodes, and joining those nodes to the cluster.
Setting the Cluster quorum-log size as appropriate.
Creating the Physical Disk, IP Address, Network Name, and File Share resources.
Assigning file-system permissions.
Starting the service (such as DHCP or WINS) on the virtual server.

ITG Clustered Infrastructure Services

As of this writing, ITG has a number of Windows 2000 Advanced Server-based clusters providing various services and applications. This section of the paper examines the details of hardware and software elements ITG configures to provide clustered infrastructure services, such as DHCP, WINS, and DFS. All clustered infrastructure services discussed share the elements examined in the first section of this paper in common, except where noted. Clustered applications (such as SQL Server 2000 and Exchange Server 2000) are beyond the scope of this paper, because the clustering implementation in those products is specific to each product.

Three DHCP clusters supply IP addresses to Redmond area clients, providing ITG with proactive redundancy in an environment that previously had only reactive recovery procedures. The other option for DHCP was a "split scope" design, which greatly increases complexity and administrative overhead.

One WINS cluster serves as the replication hub of the corporate WINS matrix, reducing the number of replication hubs from two to one for each replication partner and simplifying the replication matrix.

A file server cluster serves as the product install point for encryption products that cannot be exported outside of the United States. A DFS root cluster serves as the point of entry for corporate campus area "products" installs. Clustering provides redundancy to these environments, which previously had none.

DHCP

Windows 2000 natively supports DHCP server clusters. In the event that a node providing a DHCP virtual server suffers a failure, the other node takes ownership of resources, including the IP address, network name, external disk system, and DHCP server service, so that there will be minimal interruption of service to clients. This fail-over process often occurs in less than a minute.

ITG's DHCP implementation spans the globe, with over 150 servers located throughout the Corpnet. Virtually all IP address allocation at Microsoft is done by means of DHCP. For example, in the Windows NT 4.0 environment, five DHCP servers managed roughly 80,000 address leases for corporate campus offices, development labs, remote access clients, and servers in the corporate data center, as shown in Figure 5. This is twice as many as the rest of Microsoft's network put together. However, load balancing of this task was not efficient, and testing showed severe performance discrepancies between the servers. In the event of one DHCP server outage, roughly 35,000 clients would be unable to obtain or renew leases. While not always resulting in an instant outage to all clients, such an outage would still have a significant impact to users attempting to renew or obtain leases and could severely impact Microsoft's business operations.

Figure 5: DHCP topology before clustering

To relieve these problems and provide proactive redundancy, ITG formed three DHCP server clusters, using six servers and three FC external disk systems. Each DHCP cluster, composed of two servers and one external disk system, was sized for future growth, and to balance the active leases between them, so that each one would have approximately 25,000 active leases, as illustrated in Figure 6. Because of the large number of scopes and addresses used, any alternative solution that involved creating scopes after a failure would not meet ITG's response time goals.

Figure 6: DHCP Topology after clustering

In addition to anticipated future use, these servers were sized aggressively because DHCP servers in Windows 2000 perform more tasks than in Windows NT 4.0, such as:

DDNS registration, updates, and removal of both forward and reverse lookup records.
Allocating IP multicast addresses through Multicast Address Dynamic Client Allocation Protocol (MADCAP).
Client configurations based on class ID's, which allow specific configuration based on client type. For example, Windows 2000 DHCP clients can be configured to release their DHCP leases on shutdown.

In addition, DHCP in Windows 2000 supports Vendor and User classes, which dramatically reduces lease management. For example, limiting RRAS client lease times.

High-level hardware specifications for each node in ITG's DHCP clusters are shown in Table 1.

Table 1 High-level specifications for clustered DHCP server

Minimum CPU	Minimum RAM	EDS/cluster	RAID Configuration
Dual P3 500Mhz	256 Mb	4 x 4GB	2 X RAID 1 4GB

Each node's SAN disks were configured as one RAID 1 drive, one of which housed the DHCP database, while the other housed the quorum disk. Since fault tolerance is provided by the RAID configuration, each drive has two subdirectories, one for backup and the other for auditing log files.

After the DHCP clusters were created, ITG activated conflict detection and migrated scopes from the existing DHCP servers to the new DHCP servers eight at a time, using the following steps:

Create eight new scopes on a DHCP cluster.
Activate the scopes on the DHCP cluster.
Update the UDP port 67 broadcast forwarding of the router serving that subnet.
Deactivate the scopes on the existing DHCP server.
Wait until all clients of these scopes have had their leases removed in the "old" DHCP server.
Begin the process again with the next set of eight scopes.
Once the migration of all scopes has been completed, deactivate conflict detection.

WINS

WINS solves the problems inherent in resolving NetBIOS names through IP broadcasts, and it frees ITG administrators from the demands of updating static mapping files such as LMHOST files. Dynamic DNS in Windows 2000 is the standard Microsoft corporate network name resolution method, and the majority of ITG managed desktops are Windows 2000 Professional. However, legacy clients (Windows 3.X, Windows 9.X Windows NT 4.0) and NetBIOS-dependent applications required for development and testing still exist. Consequently, ITG must maintain the WINS infrastructure so that NetBIOS-enabled clients can register themselves with a designated WINS server.

The ITG Windows NT 4.0-based WINS replication topology, one of the largest WINS infrastructures in the world, consisted of more than 40 WINS servers, which contained registrations for over 800,000 records. In order to maintain consistency and to provide accurate information to clients, WINS client records are replicated to all the WINS servers.

The WINS server architecture is comprised of two server types:

Name Server: A server that registers and resolves the majority of queries for name resolution from clients. Each WINS client has two name servers (called query pairs) configured as primary and secondary; configured either statically or through DHCP. The primary is the server that registers the client and is the "owner" for its records. In the event that a query fails on the primary server, or in the event that the primary server itself fails, the secondary server is used.
Replication Hub/Backup: Two replication hubs were configured to perform regional or global replication between name servers. Each WINS server has a database holding records for all clients of the servers in the replication matrix. After initial replication of the database, client updates continue to be replicated throughout the WINS infrastructure. Replication times varied from 15 minutes between servers on the same subnet to up to 8 hours to servers across the WAN.

To simplify the replication matrix, provide redundancy, and more efficiently manage the WINS traffic load, two new servers and one SAN were installed to form a cluster serving as the WINS replication hub.

Figures 7 and 8 depict the replication matrix before and after the WINS cluster implementation. Note that although new regional WINS servers have been added in the "After" matrix, the complexity of the replication matrix is still greatly reduced.

Figure 7: WINS replication topology before clustering

Figure 8: WINS replication topology after clustering

High-level hardware specifications for each of these WINS replication hub nodes are detailed in Table 2.

Table 2 High-level specifications for clustered DHCP server

Minimum CPU	Minimum RAM	EDS/cluster	RAID configuration
Dual PIII 450 Mhz	256 Mb	7 X 9.1 GB 2 x 4 GB	RAID 5 (database) RAID 1 (Quorum)

After creating the cluster, ITG set several WINS parameter settings on each node, as indicated in Table 3, and then started the WINS hub replication service on the node.

Table 3 WINS parameter settings

Value	Data
Backup Path	D:\wins-db\backup
Backup dB on shutdown	Yes
Renew Interval	4 Days
Extinction Interval	4 Days
Extinction Timeout	6 Days
Verification Interval	24 Days
Enable periodic dB consistency Check	No
Log Database Changes	Yes
Log Detailed Events	No
Enable Burst Handling	Medium
Database Path	D:\wins-db\wins.mdb
Start Version Count	0
Replicate Only with Partners	Yes
Migrate	No
Trigger Pull replication on initial start	Yes
Persistent Connections Push/Pull	Enabled

DFS - the "Products" DFS Root

DFS provides redundancy and load balancing for file shares. ITG maintained a very active DFS root with Windows NT 4.0 that provided over 100 different links to a number of different file servers and shares. The customers of this DFS root were employees working in the main campus area who were installing Microsoft products from the shares.

As a first step in the plan for a global domain-based DFS architecture at Microsoft, the group owning product distribution implemented a Web page to serve as a front-end for product installations.

DFS roots can be hosted on a member server or domain controller, and can be implemented as stand-alone or domain-based. Unfortunately, with stand-alone DFS, the DFS root itself is a single point of failure. Domain-based DFS provides ITG with several advantages:

Windows 2000 automatically publishes the DFS topology in the Active Directory service, making it visible to users on all servers in the domain.
ITG administrators have the ability to replicate the DFS roots and shared folders to multiple servers in the domain, permitting users to access their files even if one of the physical servers on which the files reside becomes unavailable.

When more than one alternate path exists for a volume, DFS clients provide a degree of load balancing by randomizing the list of referrals returned by the DFS root server.

DFS servers store information about the DFS topology in a structure called the Partition Knowledge Table (PKT). The PKT data structure consists of the DFS directory name and the list of referral servers that DFS clients actually connect to. DFS clients "walk" the locally cached subset of the PKT from the top down when a directory in the DFS name space is used. They return to the DFS root or child replicas when a Time To Live (TTL) timer expires, the client is rebooted, or none of the servers in the client's PKT are available.

Note that at the time of this writing, DFS supports Cluster service using machine-based DFS only. You cannot create fault tolerant DFS topologies on systems running Cluster service.

Representative high-level hardware specifications for this much-used DFS cluster are detailed in Table 4.

Table 4 DFS Cluster hardware specifications

Minimum CPU	Minimum RAM	EDS/cluster	RAID configuration
Dual P600 733 Mhz	256 Mb	4 X 9 Gb 2 x 4GB	Raid 5 (Data) RAID 1 (Quorum)

ITG also upgraded an existing Windows NT Enterprise Edition 4.0-based print cluster, which provided service to 155 printers in the main corporate campus area. This cluster was chosen to pilot rolling upgrades from Windows NT 4.0 Enterprise Edition clusters to Windows 2000 Advanced Server clusters without loss of service to heavily used services such as printing. No additional redundancy was gained. However, benefits from Windows 2000, such as publishing the printers in Active Directory, were realized with no loss of service.

Representative high-level hardware specifications for the clustered print servers are detailed in Table 5.

Table 5 Clustered print server specifications

Minimum CPU	Minimum RAM	EDS/cluster	RAID configuration
Dual P600 733 Mhz	256 Mb	3 x 9GB (Spool data) 2 x 4GB (Quorum)	RAID 5 (Spool) RAID 1 (Quorum)

ITG Cluster Management

Clustered Service Administration

With a few exceptions, ITG manages services on a cluster with the same tools that are used to administer those services on a stand-alone server. Either the Microsoft Management Console (MMC) snap-in or command line tools connect to the name or IP address of the virtual server that is providing the service, instead of connecting to an individual node in the cluster. For example, when configuring WINS replication, partners are configured to replicate with the IP address of the virtual server, not the individual IP's of the nodes, and administrating a clustered DFS root is done by connecting to the name of the virtual server.

Monitoring

ITG manages clustered infrastructure services largely in the same manner as regular servers, and with mostly the same tools. An example of an exception is file share (not directory) permissions, which must be modified by means of the Cluster Administration tool.

Monitoring cluster status involves monitoring the Cluster service on each individual node, monitoring the cluster resources themselves, and (as required) monitoring the service the cluster is providing. ITG uses various third-party scripting tools for:

Monitoring individual nodes to ensure that nodes are available and participating in the cluster and that the Cluster service is running on each node.
Monitoring cluster resources, in case a cluster resource goes offline or fails.
Monitoring the services provided. Different scripts are used depending on the service.

Troubleshooting Tools

Event Viewer

One of the most effective tools for troubleshooting any Windows 2000-based problem is the Event Viewer. Event replication is enabled on clusters by default, so examining the event log on any cluster will give the events from all nodes in the cluster. In addition, cluster nodes register events from all nodes in each node's event logs, so administrators can view all node events in chronological order in one location. This feature can be turned off if desired.

A summary of important events ITG routinely monitors on clusters is presented in Table 6.

Table 6 Important event log EventIDs

Event source	EventID# -severity	Event text	Meaning and actions
Disk	51, Warning	An error was detected on device <devicename> during a paging operation.	A common warning on server clusters, typically occurring when ownership of a disk is transferred from one node to another. Unless other events are occurring which indicate reason for alarm, this warning can be ignored.
ClusSvc	1009, Error	The Clustering service could not join an existing cluster and could not form a new cluster. The Clustering service has terminated.	This error will have a data word that can be used to troubleshoot the issue. Convert the data word from hex to decimal, then in a command prompt use "net helpmsg <data word in decimal>" to view the message details. For example, a data word of 000013de gives 5086 in decimal. "net helpmsg 5086" displays the message "The quorum disk could not be located by the cluster service."
ClusSvc	1062, Information	Cluster service successfully joined the cluster.	The node issuing this message has joined an existing cluster. This message is not logged by the first node in the cluster to start.
ClusSvc	1122, Information	The node (re)established communication with cluster node '<node name>' on network '<interface name>'.	The node issuing this message has regained network connectivity to the node specified over the interface specified. As with ClusSvc warning 1123, this event does not indicate which node actually had the problem. Investigation may be warranted to determine the cause.
ClusSvc	1123, Warning	The node lost communication with cluster node '<node name>' on network '<interface name>'.	The node issuing this message has lost network connectivity to the node specified over the interface specified. This error message does not give any indication which node is actually experiencing the network problem. Examine the cluster nodes to determine the problem, and correct.
ClusSvc	1135, Warning	Cluster node <node name> was removed from the active cluster membership. The Clustering service may have been stopped on the node, the node may have failed, or the node may have lost communication with the other active cluster nodes.	Some problem has caused the node to get into a state such that it is removed from the cluster. Investigate the node to determine the cause.

Cluster Log

Cluster service allows detailed cluster activity information to be recorded at each node. Unlike event logs, this information is not synchronized together from all nodes. Each node's log presents its own unique perspective on the cluster. By default, a node's cluster log is enabled and a log (named cluster.log) is created in the %SystemRoot%\Cluster directory of each node in the cluster. Once enabled, the cluster log grows to a static size of 8 MB (configurable in 1-MB increments) and clears events in a first-in, first-out sequence (FIFO). Diagnosing problems using the cluster log file is a complex subject, beyond the scope of this paper. Sources for information on cluster diagnosis can be found at the end of this paper in the section "For Further Information".

Windows 2000 DHCP Performance Counters

The Windows 2000 operating system includes new performance counters for monitoring existing DHCP server performance and capacity planning. ITG routinely monitors:

Nacks/sec: The rate of DHCP Nacks sent by the DHCP server.
Offers/sec: The rate of DHCP Offers sent out by the DHCP server.
Acks/sec: The rate of DHCP Acks sent by the DHCP server.
Active queue length: The number of packets in the processing queue of the DHCP server.
Conflict check queue length: The number of packets in the DHCP server queue waiting on conflict detection (ping).
Declines/sec: The rate of DHCP Declines received by the DHCP server.
Discovers/sec: The rate of DHCP Discovers received by the DHCP server.
Duplicates dropped/sec: The rate at which the DHCP server received duplicate packets.
Informs/sec: The rate of DHCP Informs received by the DHCP server.
Milliseconds per packet (avg.): The average time per packet taken by the DHCP server to send a response.
Packets expired/sec: The rate at which packets are expired in the DHCP server message queue.
Packets received/sec: The rate at which packets are received by the DHCP server.
Releases/sec: The rate of DHCP Releases received by the DHCP server.
Requests/sec: The rate of DHCP Requests received by the DHCP server.DHCP System Log Events

The most important DHCP system events ITG monitors are:

Event ID 1011: The DHCP server issued a NACK to the client (MAC address) for the address (IP address) request. This indicates the DHCP server declined to issue the specified IP address to the client using the specified MAC address. The most common cause for this is the DHCP server not having a lease for the client, which is usually due to a router being configured to forward DHCP discoveries to a DHCP server that does not have a lease for that subnet.
Event ID 1014: The Jet database returned the following error: [number]. DHCP uses the Jet database engine for storing lease information. The number referenced in this error can be referenced to a list of Jet database errors.
Event ID 20057: The DHCP/BINL server has determined that it is not authorized to service clients on this network for the Windows NT Domain: [domain name]. The DHCP server is not authorized to operate in its domain. Use the DHCP snap-in tool to authorize it.

WINS Performance Counters

WINS-specific objects and counters ITG uses to assess client activity are detailed in Table 7.

Table 7 WINS performance counters

Objects and counters	Description
Group Conflicts/sec	The rate at which group registration received by the WINS server resulted in conflicts with records in the database.
Unique Conflicts/sec	The rate at which unique registrations/renewals received by the WINS server resulted in conflicts with records in the database.
Total Number of Conflicts/sec	The sum of the unique and group conflicts per second. Conflicts were seen by the WINS server at this total rate.
Group Registrations/sec	The rate at which the WINS server receives group registrations.
Total Number of Registrations/sec	The sum of the unique and group registrations per second. This is the total rate at which registration are received by the WINS server.
Unique Registrations/sec	The rate at which unique registration are received by the WINS server.
Group Renewals/sec	The rate at which group renewals are received by the WINS server
Unique Renewals/sec	The rate at which the WINS server receives unique renewals.
Total Number of Renewals/sec	The sum of the unique and group renewals per sec.
Queries/sec	The rate at which the WINS server receives queries.
Successful Queries/sec	Total number of successful queries/second.
Failed Queries/sec	Total number of failed queries/second.
Releases/sec	The rate at which the WINS server receives releases.
Successful Releases/sec	Total number of successful releases/second.
Failed Releases/sec	Total number of failed releases/second.

Backing Up Data

Only the node owning a disk will have access to that disk for performing backups. ITG routinely backs up the C$ drive of each node, which contains the system partition and operating system files. Specific disks on the SAN are backed up by specifying the virtual server name or cluster name, depending on which disk is to be backed up.

Changing Cluster Service Password

The Cluster service requires a user account to start the service. The same account must be used on all nodes and must be an administrator in each node. ITG's security model does not allow accounts to be set so that passwords never expire. As a result, a brief outage is planned for routine password changes to this account. The password is verified only when the Cluster service starts. However, the password must not only match the current password on the domain, it must also match the password as used by the other nodes. ITG uses the following process to minimize the impact of required password changes for a two-node cluster with one resource group:

Determine which node the group is currently running on.
Change the password on the domain.
Stop the Cluster service on the node that does not own the group.
Change the password for the Cluster service on this node.
Change the password for the Cluster service on the node that owns the group.
Cycle the Cluster service on this node.
Start the Cluster service on the node that does not own the group.

For two node clusters with two or more groups, ITG uses the following procedure:

Move all groups to one node.
Change the password on the domain.
Stop the Cluster service on the node that does not own the groups.
Change the password for the Cluster service on this node.
Change the password for the Cluster service on the node that owns the groups.
Cycle the Cluster service on this node.
Start the Cluster service on the node that does not own the groups.
Move the groups back to their original locations.

Lessons Learned

ITG has found Cluster service useful for a variety of purposes, including consolidating servers and providing high availability. However, as in any high-availability environment, proper hardware configuration, monitoring and maintenance are only part of the necessary framework. Proper training and change control processes, such as monitoring event logs, are important as well. Change control was critically important, as was becoming extremely familiar with each SAN hardware vendor's implementation.

In the process of implementing clustered infrastructure services ITG learned the following lessons:

Support for rolling upgrades is critical for maintaining service through frequent software and hardware upgrades. For example, ITG frequently upgraded the operating systems of Windows 2000-based servers during development and final testing prior to release. At a minimum of 30 minutes per upgrade, this represented a significant amount of downtime because of the very large number of servers. Clustering these servers reduced this downtime to less than one minute per upgrade. As new maintenance is required for both hardware and software, ITG expects this will continue to be one of the most significant benefits of Cluster service.
As a standard high-availability procedure, ITG ensures that an adequate supply of spare parts is on hand for redundancy, including spare parts for the Storage Area Networks.
The quorum disk in the SAN is a critical component. The disk designated as the quorum disk should not also store data for a virtual server.
Proper network configuration of both the private and public network connections is critical. Cluster network names, like other NetBIOS names, are subject to duplicate network name problems. Changing IP, subnet, or network adapter settings on a node can cause a cluster name offline error.
Power management features on cluster servers should be turned off. A cluster node that turns off disk drives or enters 'standby' mode can initiate a cluster failure, since it will appear offline to the other members of the cluster.
Educating support personnel is crucial. Stand-alone configurations for any given service are much more prevalent in ITG than clustered configurations. Many ITG support personnel who are familiar with supporting Windows NT 4.0 and its services were not familiar with supporting services in a clustered environment.
As with many ITG implementations, these clusters were among the first based on Windows 2000 Advanced Server to be implemented in a production environment. There was no precedent for predicting how the hardware platforms and SANs would perform under production load. Evaluating the environment gave much insight into how future environments should be scaled, as well as suggesting product improvements in Windows 2000 before it was released to market.
Monitoring availability typically involves looking at individual server availability. However, availability of an individual server in a cluster is not necessarily related to the availability of the virtual server the cluster provides. Standard availability monitoring also presents special challenges. Just ensuring the virtual server is present is not sufficient. If there are problems with a node, the virtual server may still be providing services, but redundancy may be lost. ITG is developing new methods of determining uptime.

Future Steps

Future development directions of Cluster service will focus on several areas of interest to ITG:

Certification and support for even larger multi-node cluster configurations.
Easier installation and verification of cluster configurations, including support for new types of hardware.
Simpler, more powerful management of cluster-based applications and services, including continued focus on scripted, remote, and "lights out" management.
Extension of cluster-based availability and scalability benefits to even more system services.
Tighter integration of the infrastructure and interfaces of all Windows-based clustering technologies to enhance performance, flexibility, and manageability.
Continued support for independent software vendors (ISVs) and corporate developers to simplify the development, installation, and support of cluster-aware applications, both for higher availability and scalability.

Conclusion

ITG's clustered infrastructure services projects highlighted a number of Cluster service advantages, including:

Reduced total cost of ownership through consolidating servers.
Increased availability of applications and services. Cluster services provide minimal downtime through the support of rolling upgrades and ability to fail-over resources during scheduled and unscheduled outages.
Improved performance and availability of high-end heavily used file and infrastructure servers.

ITG's experiences helped in building standards for cluster topologies that will be used when implementing future clusters. Already ITG is moving forward with more standardization of cluster hardware and cluster topologies.

ITG is sharing these "lessons learned" in hopes that, when applicable, customers can benefit from its experiences. As Microsoft continues to deploy Windows 2000 Advanced Server and Windows 2000 Data Center Server, ITG will continue to share its experiences with customers.