Chapter 4 - Cluster Services

This chapter deals with the basic clustering services that Microsoft Application Center 2000 (Application Center) provides. These services encompass basic cluster creation and administration activities, such as creating a cluster and changing its topology by adding or removing servers. Detailed information is provided about each of the cluster service features as they are used, including set up and configuration tips. You'll also get an inside look at the sequence of events and processing activities that occur when you use a particular feature.

On This Page

Recommended Server Configuration
Default Accounts and Services
Deployment Infrastructure Example
Application Center Cluster Services
Connecting to a Cluster
Creating a Cluster
Cluster Administration
Background Services

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

Before attempting to set up a cluster controller and create a cluster, you have to assess the processing capabilities of the server that you plan to use as the controller. This server has to have enough resources to comfortably run Microsoft Windows 2000 Server and Microsoft Windows 2000 Advanced Server, Application Center, Application Center Event and Performance Logging (optional), Internet Information Services version 5.0 (IIS), and the applications it's serving—whether they are Web-based or COM+ applications.

Note The Application Center Administrative client runs on any Windows 2000 operating system. Because it includes the Windows 2000 Management Console (MMC), you can run the client on a server outside a cluster and administer any server on a cluster—provided, of course, that you have the appropriate access permissions and can provide the necessary authentication information. In addition to supporting remote administration via a graphical client, Application Center provides a command-line tool that you can use for most cluster administration tasks. The command-line tool is covered in more detail in Chapter 11, "Working with the Command-Line Tool and Scripts," which includes information about using scripts and batch files.

You can use the following configuration for a server running Windows 2000 Server and IIS as a guideline for configuring a server for use on an Application Center cluster.

Memory

The official minimum memory for running Application Center on Windows 2000 Advanced Server is 256 MB of RAM on a 400 MHZ system. However, as you probably know, there are several factors to consider when determining a server's memory requirements.

Windows 2000 Server and IIS require a minimum of 256 MB of RAM; however, from 512 MB through 1 GB is recommended. The high end of the range should be considered if the site is hosting an e-commerce application, contains a large amount of content, uses dynamic pages extensively, uses COM+-based applications, or has a high volume of traffic. Remember that the IIS cache size defaults to half the available amount of real memory.

Important You should monitor memory and cache settings on an ongoing basis. For more information about these settings, see Chapter 8, "Creating Clusters and Deploying Applications."

Fixed Disk

The disk partition must contain adequate space for all the installed programs, paging file space, and site content. You also have to factor in the space required for content replication. Because the replication engine copies all the content to a temporary directory on the destination server, and then moves these files to the appropriate folders during synchronization, the required disk space is approximately double the volume of the content to be replicated.

Note The Synchronization Service uses a two-phase commit process to ensure data integrity, which is why the replication engine uses a temporary directory on a target.

Finally, don't forget to allow enough free space to support disk defragmentation. A minimum of 15 percent of the disk should be free in order to support effective defragmentation. (A higher percentage of free disk space will improve disk defragmentation.)

Network Adapter

Each system should have at least two network adapters if it is part of a load-balanced cluster (and must have at least two, if Network Load Balancing [NLB] is to be used). The front-end adapter (also called the load-balanced adapter) is used for front-end traffic such as NLB heartbeats, convergence, and load balancing. The back-end adapter (also called the management-traffic adapter) is used for back-end traffic generated by different cluster activities, notably content replication and synchronization.

Some form of name resolution should be enabled for the back-end network adapter, and NetBIOS needs to be bound to the network adapter in order for the Application Center name resolution service to work (this is recommended, but not mandatory; see the following note). Typical options for providing DNS name resolution are:

  • Custom host files. 

  • Firewalls that support independent name resolution on each side of the firewall (this "split DNS" is supported by products such as Gauntlet and Sidewinder). 

  • Independent DNS servers. 

Note There are two reasons for requiring two network adapters. First, it enables NLB to bypass a loop back condition that occurs when using unicast mode. Second, the replication engine that Application Center provides uses COM calls extensively, and replication will fail if a connection gets dropped or reset. IP address changes or deletions can cause connection drops or resets, which is why using the front-end network adapter for the high volume data transfers that typify content replication is risky. This issue isn't exclusive to replication but includes other Application Center features, such as cluster services and monitoring.

IP Addresses

Application Center supports several IP address binding scenarios for the front- and back-end adapters. In most cases a single DHCP-assigned address is bound to the back-end adapter. The front-end adapter's IP address story isn't quite as straightforward.

Note After installing Application Center you should verify that NetBIOS over TCP/IP is enabled for each IP address. You can access this setting by opening the Internet Protocol (TCP/IP) Properties dialog box for the IP bound to the adapter. Next, click Advanced TCP/IP Settings, and then click the WINS tab.

For the Controller

If your cluster uses NLB, the controller requires a minimum of one static IP address bound to the front-end adapter. This single IP address serves as the cluster IP address for the adapter and there is no dedicated IP address, which is restrictive from a network management perspective. For example, you can't ping the front-end adapter on a specific server by using the IP address because the address is common to all the cluster members in a load-balanced cluster. The various server and cluster configurations that we use in this book all use two IP addresses for this very reason. (For more information about load balancing, adapter configuration, and traffic implications, see Chapter 5, "Load Balancing.")

For a Member

In an NLB cluster, the front-end adapter requires a minimum of one IP address, which can either be static or assigned by DHCP. In the latter case, Application Center will automatically change the address setting from Obtain an IP address automatically to Use the following IP address. When the member is added to the cluster, Application Center binds the cluster IP address to the adapter. Once again, network management considerations should determine whether you want to use one or two IP addresses on the front-end adapter.

Figure 4.1 illustrates a server configuration for an Application Center cluster member that's using NLB. In this particular example, there are two static IP addresses bound to the front-end network adapter. The first is a dedicated static IP address that enables you to communicate directly with the front-end adapter. The second address is the cluster IP address that carries the load-balanced cluster traffic.

Bb734909.f04uj01(en-us,TechNet.10).gif 

Figure 4.1 Cluster member configuration 

Subnets

Notice that the front-end and back-end adapters in Figure 4.1 are on separate subnets. There are two reasons for this. First, using separate subnets provides a more secure implementation by isolating the back-end (internal) traffic from the front-end, or external, traffic. (Another technique for isolating network traffic is by using network segments.) Second, using separate subnets improves traffic distribution over the network adapters when you're using NLB. This has to do with the way Application Center configures adapter interface metrics and the way NLB routes traffic. If you create a cluster that uses NLB, Application Center sets the interface metric for the load balanced adapter to be one higher than the interface metric for other cards on the same computer.

For example, let's assume that you have a server configured with two adapters, both of which are on the same subnet. If the interface metric for the back-end adapter is 1, Application Center sets the interface metric for the load-balanced adapter (the front-end) at 2.

When a response is sent to the client, NLB routes it to the adapter on the same subnet that has the lowest interface metric. In this scenario, outgoing traffic is routed to the back-end adapter, which is where Application Center transmits all cluster management and synchronization traffic. Depending on the amount of traffic on the back end, this could have a negative impact on services, such as the Synchronization Service.

Separate subnets aren't mandatory; however, they are supported if you decide to use them for your clusters.

Although Application Center will run on any server that meets the minimum requirements, homogenous hardware is recommended. This is the best way to ensure balanced and consistent performance across a cluster, as well as making it easier for you to tune your servers for optimum performance. If you're using NLB or one of the compatible third-party load balancers, you can adjust load-balancing weights to compensate to some extent for performance differences between servers.

However, you have to remember that each cluster member is synchronized to the controller. As a result, there is very little leeway in customizing individual configurations, especially IIS.

Note Multiple disk partitions can be used, but the idea of homogeneity extends to the file system structure on a disk partition. Identical file system structures are required (System Root, Program Files path, Application Center path), and we strongly recommend the use of NTFS rather than FAT32. For more information about the security and replication issues related to drive formats, refer to Chapter 6, "Synchronization and Deployment."

The main factors in selecting homogenous systems for a cluster are the number of CPUs, CPU speed, disk partitioning/formatting, and memory on each server.

Default Accounts and Services

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

During installation, Application Center Setup:

  • Verifies that the computer on which you're installing the product meets certain mandatory requirements, such as the operating system version. 

  • Creates the specific groups and user accounts that it requires. 

  • Installs and starts the services that it uses. 

Table 4.1 summarizes the group and account information that Application Center creates and uses, as well as the existing IIS account information.

Table 4.1 Application Center User Groups and Accounts 

Group/account name

Description

Group: ACA_ machinename

Application Center group. This account is used for logging and other administrative operations.

User: ACC_ machinename

Member of ACA_ machinename. This cluster controller account is created on each cluster member. This account is used to manage cluster communication—it authenticates across servers, replicates content, and administers the cluster servers. (1)

User: ACL_ machinename

Member of ACA_ machinename. This local utility account is used by an Application Center server. The server does not necessarily need to be in a cluster (that is, a server that will be used for staging). It is used for administrative work related to the server, such as writing event log information to Application Center Event and Performance Logging.

User: IUSR_machinename

IIS uses this Windows 2000 local account to authenticate anonymous users on the Web site. This account is not used in an Application Center cluster.

User: IUSR_controllername

This account is used by the cluster instead of the IUSR_ machinename . It is replicated from the cluster controller to each new cluster member and provides a single, cluster-wide account for anonymous access. (2)

1 When another server is promoted to cluster controller the account name is not changed.

2 The account name does not change when another cluster member is promoted to cluster controller. Also, if a member is removed from the cluster its IUSR_controller_name account will continue to be the anonymous access account.

Table 4.2 describes the services that Application Center installs and launches during the set-up process.

Table 4.2 Application Center Services 

Service name

Description

Application Center Administration Service

Provides internal administration support for Application Center.

Application Center Cluster Service

Allows Application Center to configure the cluster.

Application Center Log Query Helper

Provides helper functions for querying Application Center performance and event data.

Application Center Name Resolution Service

Allows Application Center to resolve computer names to IP addresses; dependent on RPC.

Application Center Synchronization Service

Used to synchronize content across the cluster; dependent on RPC.

Service Dependencies and Failures

In addition to Application Center-specific services, there are other services that the product uses. If any of these services are configured incorrectly or fail, the features that rely on them are likely to fail.

The basic configuration for each of these services, unless otherwise noted, is as follows:

  • General – Startup type: Automatic 

  • Log on - Log on as: Local system account 

  • Allow service to interact with desktop: No 

Table 4.3 provides information about service configuration settings that you should verify if you receive a "Misconfigured Services" error message.

Table 4.3 Troubleshooting Service Configuration 

Service

Configuration information

Application Center Synchronization

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days, enter 0.

Application Center Cluster

On the Recovery tab, for each of the three lists, click Restart. In the Reset fail count after [x] days box, enter 49710. In the Restart service after [x] minutes box, enter 0.

Application Center Administration

On the General tab, in the Startup type list, click Manual.

 

On the Log On tab, click Local System account. Select the Allow service to interact with desktop check box.

 

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days box, enter 0.

Application Center Log Query Helper

On the General tab, in the Startup type list, click Manual.

 

On the Log On tab, click Local System account. Select the Allow service to interact with desktop check box.

 

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days box, enter 0.

Application Center Name Resolution

On the Recovery tab, for each of the three lists, click Restart. In the Reset fail count after [x] days box, enter 0. In the Restart service after [x] minutes box, enter 0.

IIS Admin

On the Recovery tab, for each of the three lists, click Run a File. In the Reset fail count after [x] days box, enter 0. Under Run File, in the File box, type %System32%\Iisreset.exe, and then select the Append fail count to end of command line check box.

Remote Procedure Call

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days box, enter 0.

Windows Management Instrumentation

On the Recovery tab, for each of the three lists, click Restart. In the Reset fail count after [x] days box, enter 1. In the Restart service after [x] minutes box, enter 1.

MSSQL\MSAC

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days box, enter 0.

SQLAgent$MSAC

On the Recovery tab, for each of the three lists, click Take No Action. In the Reset fail count after [x] days box, enter 0.

Deployment Infrastructure Example

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

An example of an Application Center cluster deployment infrastructure is illustrated in Figure 4.2. As you can see, each Application Center cluster is in its own Windows domain. As an additional management and security measure, you could also establish organizational units within each domain. By positioning these clusters between the two firewalls that demarcate the demilitarized zone (DMZ), you can create a highly secure cluster environment.

Bb734909.f04uj02(en-us,TechNet.10).gif 

Figure 4.2 Application Center deployment infrastructure 

The infrastructure illustrated in Figure 4.2 is only meant to serve as a starting point for designing and implementing a robust and secure cluster topology. Your business needs will dictate the requirements for back-end database servers or clusters, component servers or clusters, and multi-level testing/staging configurations.

Application Center Cluster Services

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

As indicated in Chapter 2, "Feature Overview," this group of tasks encompasses everything related to the structure of a cluster, ranging from creating to disbanding clusters. During the life of a cluster, ongoing tasks involve adding servers, removing servers, and changing the designated cluster controller.

Because virtually all cluster administration tasks involve access to a cluster, or a specific server in a cluster, the Connect to Server dialog box is used frequently.

Connecting to a Cluster

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

Launched from the console tree, or from a pop-up dialog box if the server you're working on isn't already part of a cluster, the Connect to Server dialog box prompts you for the name of a server in the cluster to which you want to connect. As shown in Figure 4.3, additional inputs include:

  • An option button that allows you to either manage the cluster for the server you specify (enabled by default), or manage a single server—as would be the case if you wanted to work with only one cluster member. If you click Manage the specified server only and the specified server is a cluster controller, Application Center opens up a member-only view of the controller. 

  • A check box that, when enabled, allows you to submit authentication information to the server to which you want to connect. 

Bb734909.f04uj03(en-us,TechNet.10).gif 

Figure 4.3 The Connect to Server dialog box 

After you click the OK button, the user interface tool checks the local membership list to determine if the server whose name you provided is part of an existing cluster. (It also retrieves the name of the current cluster controller, if it exists.) If the server that you identified isn't already a cluster member, you're given the option of joining a cluster or creating a new cluster.

Note When you connect to the localhost, the credentials of the logged on user will be used for any connections to the local host. If you enter localhost for a computer that is a member server and click Manage cluster for the specified server, the logged on credentials are used for a local connection to obtain the controller name. After the controller name is obtained, supplied credentials are used for a connection to the controller.

The Controller Discovery Protocol (CDP) 

The various Application Center services need a reliable way of discovering which server is the current controller in a cluster. The CDP provides the means for polling a cluster to determine which server is the controller, as well as whether the controller is available. Each member of the cluster executes the CDP every four minutes to verify which server is the cluster controller.

Without going into too much detail, the CDP works as follows.

Whenever a server (S0) needs to determine which machine is the current controller, it executes the CDP. First, S0 looks in its own configuration store to get a list of cluster members (for example, S1…Sn). Then, server S0 contacts each member in turn and requests two pieces of information that help identify the cluster controller: the controller name and the version number associated with that name.

The server doing the CDP determines which server is the cluster controller on the basis of the version numbers it receives. For example, let's say that server S2 identifies server S7 with a version number of 8 as the controller, and server S3 identifies server S5 with version number of 9 is the controller. The CDP decides that the server with the higher version number, in this case S5, is the cluster controller.

In cases where a consistent controller can't be discovered—two different servers have the same version number, for example—the cluster goes into a controller-less state (which occurs rarely). If this happens, manual intervention is required to designate one of the servers as the cluster controller. The subsequent cluster synchronization will update the cluster configuration settings in the configuration store for each member. Then, the CDP can be run against a fresh list that contains the servers and their version numbers.

Now let's examine cluster creation in detail, a process that is very interesting because it accomplishes two tasks: cluster creation and cluster load balancing configuration.

Creating a Cluster

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

Through its New Cluster Wizard, Application Center achieves plug and play clustering. By using default settings that are dependent on user-supplied responses, the wizard masks the complexity of network adapter and load-balancing configuration. After you create the cluster, you can use various properties dialog boxes to modify the settings that were created by the wizard.

Note Several of the choices that you make while creating a cluster will determine the role of your cluster members as well as how load balancing is managed. If you're not familiar with how load balancing works in a cluster environment, you should read Chapter 5, "Load Balancing," before you create a cluster.

Processing Activities and Their Sequence

Let's revisit the New Cluster Wizard. In addition to getting a look at the behind-the-scenes processing that occurs, you'll see which default settings are used in response to the various user-supplied responses.

Analyzing Server Configuration

The wizard analyzes the current network configuration (for example, installed software, software versions, installed network adapters, and IP address configuration) and determines how many network adapters and static IP addresses are installed on the server. The New Cluster Wizard uses this information to either stop the cluster creation process or continue by using pre-determined settings. For example, if the configuration analysis shows that NLB is already bound to a network adapter—and there are two network adapters installed—the NLB configuration is flagged as an upgrade. This, in turn, triggers a page that gives you the options of either keeping the current NLB settings or changing them.

Note If there are two or more static IPs bound to the front-end adapter the wizard selects the first as the dedicated IP and the last as the cluster IP.

Cluster Name and Description

This page lets you provide a name, as well as an optional description, for the cluster. The cluster name has to conform to standard 15-character machine name validation and be a valid DNS name.

Note By default, Application Center does not register the cluster name with DNS—this is left up to you.

If you choose not to provide a name, the wizard defaults to a combination of the machine name (truncated to 8 characters, if necessary) and the word "Cluster."

Cluster Type

This page allows you to identify the primary role—determined, for the most part, by the type of content and applications that are hosted—on your cluster. The following prompts are displayed on the wizard page:

  • General/Web cluster This cluster hosts Web sites and local COM+ applications, or will be used for general-purpose activities, such as server management or staging applications. (A fairly common server configuration is one that has COM+ components running on the same computer but out-of-process in their own COM+ processes.) 

    Note If NLB was detected during the server configuration analysis the only available option displayed for the Cluster Type is General/Web cluster. In order to use the other options, you have to exit the wizard, unbind NLB, and then re-run the wizard. 

  • COM+ application cluster This cluster hosts only COM+ applications that can be referenced by other servers in an Application Center Web or COM+ routing cluster, or Windows-based applications. If your intent is to host distributed COM+ components that are called by either Web or Microsoft Win32 clients, you should configure the cluster as a COM+ application cluster. 

    Note If you choose COM+ Applications only, the next item you see will be a dialog box that asks you to identify one of two sources for client calls. The first option is applications running on other servers, such as Web servers running Active Server Pages (ASP) and Component Load Balancing (CLB). The second option is desktop COM client applications, which are typically written in a Win32 development environment such as Microsoft Visual Basic. In the case of Win32-based applications, two network adapters are required on each cluster member because, in the Win32 clients' case, NLB is used as the load-balancing technology. In scenarios where the clients are Web or routing clusters, CLB can be used. After you submit your choice, the wizard moves to the Monitoring Options page, ignoring the load balancing configuration pages that are displayed for other cluster types. 

  • COM+ routing cluster This cluster's primary role is routing requests to a COM+ application cluster, but it can also function as a Web server cluster. 

    Note If you want to fully exploit Application Center's CLB feature, your minimum configuration will consist of a COM+ /Web routing cluster of one member and a COM+ application cluster with two members on the back tier. A single server COM+ application cluster is fully functional for responding to Common Gateway Interface (CGI) calls, but since a single server receives all the calls, the net effect is zero component load balancing. 

The information that you provide is used to determine whether NLB should be used on the cluster. NLB is used by default if either the General/Web cluster or COM+ routing cluster options are selected.

Network Load Balancing Service (NLB) Upgrade

This page is displayed only if the server configuration analysis determines that NLB is already bound to the network adapter. At this point you have the option of retaining the existing load balancing settings or reconfiguring load balancing.

Note If you remove a member that was originally configured by using Keep existing settings, these settings will be lost when Application Center unbinds NLB on the member. This is done to ensure that cluster integrity—in terms of configuration and content synchronization—is preserved.

Load Balancing

If you select either General/Web cluster or COM+ routing cluster as the cluster type, the Load Balancing page is displayed. This page presents three load balancing options: Network Load Balancing (NLB), Other load balancing, or None. NLB is selected by default unless the server analysis indicated that there is only one network adapter present or that DHCP is enabled on both network adapters. If either of these conditions exists, NLB is disabled and the only available options are third-party load balancing or no load balancing.

Note If an existing NLB binding with single-host (fail-over) port rules is detected, the wizard will not allow you to proceed further with cluster setup.

Load Balancing Options

The wizard identifies the network adapters that will be used when you select NLB as your load balancing option for the cluster. Application Center selects the Management traffic and load-balanced network adapters by default. You have the option of changing which adapter/IP address combination to use for the cluster's management traffic.

Note By default, Application Center sets the NLB client affinity to Single if the cluster type you select is either a Web cluster or routing cluster. In most cases, this affinity setting provides the optimal load balancing for intranet-based clients. Internet clusters typically use Class C affinity. For more information about load balancing, adapter configuration, and traffic implications, see Chapter 5, "Load Balancing."

Monitoring Notifications

This page lets you set up the default notification e-mail address and the name of the SMTP server. The SMTP server defaults to the local server if SMTP is installed.

Completing the New Cluster Wizard

After the selection process is finished, the wizard launches a creation component that does some final validation checks and sets up the cluster. The wizard launches the creation component to perform the following tasks:

  • Checks whether all the following services and components are installed: the Cluster Service, the Replication Service, and Monitoring. 

  • Writes the appropriate controller identification information to its configuration store. 

  • Configures these settings: a globally unique identifier (GUID) to identify the cluster, cluster-related configuration storage paths and settings, NLB port rules for the virtual sites if NLB is used, and the default monitors. 

  • Starts the Cluster Service and Replication Service, and then flags their startup as automatic in the Service Control Manager. 

  • Polls the metabase to determine whether the operation was successful and sends an error message to the user interface if cluster creation fails. 

Assuming that your server is set up with the necessary hardware, software, and properly configured network adapters, the entire cluster creation process only takes a few minutes.

Cluster Administration

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

The primary administrative tasks on a cluster are:

  • Adding a server 

  • Removing a server 

  • Restarting a server 

  • Changing the cluster controller 

  • Disbanding a cluster 

Because of their scope and complexity, certain tasks that can be viewed as administrative—such as forcing cluster synchronization or modifying a member's load balancing configuration—are covered later in this chapter.

Adding a Server

When you want to add a server to a cluster, you can use the Add Cluster Member wizard; it uses a dialog that is similar to the New Cluster Wizard but with fewer steps. The wizard steps are used to identify the new member, provide credentials if required, analyze the server's configuration, and add the server to a cluster.

Note Before you add a server to the cluster, you need to assess its hardware configuration. You should do this for two reasons: first, to verify that it meets the minimum configuration requirements to be a cluster member; and second, to determine whether its processing capabilities are adequate. Don't forget, there is the potential for any member to be pressed into service as a cluster controller. Use the existing controller's configuration as a guideline for evaluating this server.

Processing Activities and Their Sequence

As trivial as a welcoming page may seem—users do tend to ignore them and hit the Next button—let's start with this page because it presents important information.

Note You can add a server to a cluster by invoking the wizard from either the server that you want to add, from the cluster controller, or remotely from a computer outside the cluster. In any case, you must supply the appropriate administrative credentials to connect to the server that you're not logged on to—either for the cluster controller or the potential new member.

Welcome to the Add Cluster Member Wizard

In addition to telling you what the wizard does, the opening page provides these set up warnings:

  • Two network adapters are required for NLB. 

  • Web content may be overwritten when the server is added to the cluster. 

Server Name and Credentials

With this page, you specify the server to add, either by browsing the network or by entering the server's name or IP address. You have to provide explicit credentials for an account that has administrative privileges to continue.

Controller

Virtually identical to the Server Name and Credentials page, this is where you provide the name of the cluster controller for the target cluster. Unless you're working on the controller, you will have to provide administrative credentials.

Analyzing Server Configuration

During this analysis phase, the wizard checks the configuration of the server you want to add as well as the target cluster controller. The following information is gathered:

  • The number of network adapters installed on the server you want to add. A server with one network adapter can be added only if the target cluster is not using NLB. 

    The IP address configuration(s) on the front-end network adapter are checked to see if:

    • NLB is already bound to the adapter, which triggers an upgrade case. 

    • The adapter has a DHCP assigned IP, which causes the IP assignment to be set to static, and the cluster IP address is assigned. 

    • There is a single static IP, which becomes the dedicated IP and the cluster IP is bound as the second IP address for the adapter. 

  • Whether Application Center is installed on the server that you want to add. (This really applies only if you launch the program from a server other than the one you want to add.) 

  • The cluster membership is checked to determine whether the new server is already part of the cluster you want to join. If it is, an error message is displayed that indicates that the server you're working with is already a cluster member. 

Load Balancing Options

If the cluster that you're joining already has NLB installed, the network adapter selection list appears dimmed. If not, you'll have to specify the network adapter that you want to use for load balancing. The load balancing cases described for the cluster creation process also apply in this case. The screen capture in Figure 4.4 illustrates the load balancing options that are available.

Bb734909.f04uj04(en-us,TechNet.10).gif 

Figure 4.4 Available load balancing options when adding a cluster member 

Two items should be noted on the Cluster Member Options page shown in Figure 4.4. They are the settings for Automatically synchronize this cluster member and Bring this cluster member online, which are enabled by default. There are cases where you will not want the member to be synchronized to the controller and/or brought online for load balancing immediately. This may be after a staged deployment or if you want to test new content by using live users.

Finish

During this phase, the wizard generates setup XML for the new member, updates the cluster membership list on the controller and new member, generates member and cluster configuration settings, and returns a success or failure notification. The final step is synchronization, in which cluster controller content and settings are replicated to the new member. The new member is brought online for load balancing by default, but you can defer this step until later if you prefer.

Figure 4.5 illustrates the network-level configurations that occur when a server is added to an NLB cluster.

Bb734909.f04uj05(en-us,TechNet.10).gif 

Figure 4.5 Network-level configurations as a result of adding a cluster member 

Two items are of particular interest in the illustration shown in Figure 4.5. First, the static IP address on the controller's front-end network adapter, which is used for load balancing, is bound to the front-end network adapter on the new member. This cluster IP address is used for servicing all incoming TCP/UDP requests according to the NLB port-rule settings for a given port. Which is to say, HTTP for port 80. On a COM+ routing cluster, NLB uses this address to service incoming RPC activation requests. Second, if NLB is used for load balancing, the media access control address of the controller's front-end network adapter is assigned as the media access control address for the front-end network adapter on the new member. This is why, at the Ethernet level; all cluster members can "hear" inbound TCP/UDP level requests that are sent to the cluster IP address.

NLB and network adapter media access control addresses 

In Unicast mode, NLB overwrites the network adapter's media access control address with its own virtual media access control address by using the registry. Some network adapter drivers do not allow their media access control address to be overwritten in the registry.

The work-around is to use multicast mode, which adds a virtual media access control address to the existing network adapter's media access control address, or use a different network adapter that allows overwriting the media access control address in the registry. Because the Application Center user interface doesn't enable you to create a multicast cluster, you have to do this manually. The following steps are required:

  • Manually configure NLB on the controller before creating a cluster. 

  • Choose Keep existing settings when running the cluster creation wizard. 

When you add a member to the cluster, the multicast settings are replicated to the new member.

The IP addresses on the back end are dynamically allocated by DHCP and are used for transmitting cluster heartbeats as well as for content replication.

Note DHCP-assigned addresses are not mandatory on the back-end adapter; you can choose to use static IP addresses on the back-end.

Application Center cluster heartbeats 

The cluster controller sends an Internet Control Message Protocol (ICMP) ping to every member at 2-second intervals. The cluster controller makes a call to the name resolution service to determine the appropriate IP address to ping. If this fails or the service is turned off, the controller calls the Windows Socket API (Winsock) function GetHostByName() for each member to determine the IP address to ping. Each member has 1 second in which to respond to the ping. If a member doesn't respond to more than 2 consecutive pings, it is assumed to be "Dead" from a networking perspective. Its status will switch back to Alive if it starts responding again and does so for 3 consecutive pings. This heartbeat, transmitted over the back-end network adapters, doesn't do any health or performance checking on the application level; it simply verifies that a server can communicate at the TCP level.

For more information about ICMP and/or GetHostByName(), see the Platform Software Development Kit.

Removing a Server

You can launch the Remove Cluster Member dialog box from the individual member's node in the MMC. If the member is still online, you'll be cued with a warning to that effect. In the case of a Web-based cluster, the online members are actively servicing HTTP requests, so they should be set offline before removing them from the cluster. You can, however, simply force a member's removal without any draining period.

Warning If you choose to remove a member without first setting it offline, there is a strong potential for terminating client connections in mid-session. Any work that these users are doing may be lost.

If you're initiating a member's removal from a different cluster member, you will have to connect to the target member by using an account that has administrative privileges on that member.

Processing Activities and Their Sequence

After the preliminary identification and validation is completed, a component is called that carries out the following tasks:

  • Checks to see if the member to be removed is the local server. If it is, and the member is the cluster controller, the entire cluster is disbanded as part of the removal process. 

    Note If there is more than one cluster member, Remove Cluster Member is unavailable for the cluster controller node in the member tree. 

  • If you didn't set the member offline before initiating its removal, the component notifies the other cluster members that new requests should not be directed at the member that is being removed. 

  • Updates the cluster configuration store on the controller. 

The final step in the removal process is the execution of a component that cleans up the member that was removed. This component:

  • Unbinds NLB if it was configured on the member, regardless of whether or not it was an NLB upgrade case when it was added to the cluster. 

  • Removes the load balancing IP address from the front-end network adapter. 

  • Deletes cluster-related configuration settings. 

  • Stops the Cluster Service and Synchronization Service. 

  • Deletes the member's cluster-wide account. 

  • Sends out a completion notification via a WMI event. 

Tip If you do end up in a situation where a server becomes unstable or inoperable, you should remove it from the cluster and use the command-line tool CLUSTER /CLEAN against the server to clean up all the cluster configuration settings. After you have a clean server, you can re-install Application Center and add the server back into the cluster.

Restarting a Member

You can force a restart of any cluster member whose node has focus in the console tree by using Restart Cluster Member (All Tasks). This action forces a warm restart of the specified member.

Whenever various Application Center services have to be restarted because of a Service Control Manager net start or a member restart, the following sequence of events occurs:

  • The Cluster Service uses the CDP to determine which member is the cluster controller. 

  • The Synchronization Service is started and initialized. 

  • The Cluster Service is reported as started. 

If the member being restarted isn't the controller but the controller was found, the next set of activities are added to the restart sequence:

  • The cluster configuration information for the member is synchronized from the controller to the member. 

  • The cluster membership list is checked to verify that the server is still part of the cluster. If it isn't, all the cluster-related settings are deleted, and the server is not brought into the cluster. 

  • Cluster configuration that may have changed, such as the load-balancing configuration, is checked. 

  • If the member is flagged for a full synchronization before coming online, a full synchronization is requested from the controller and the restart sequence is held until the synchronization finishes. 

If NLB is configured on the cluster, an additional set of start-up actions is triggered. These actions are:

  • The front-end network adapter is checked to verify that NLB is bound to the network adapter—this binding may not exist if the network adapter was replaced. If NLB isn't bound to the network adapter, an event is fired that generates the appropriate notification. 

  • The member starts listening for NLB events such as "NLB started" and "NLB converged" so that online/offline actions are taken directly from the wlbs command rather than the Application Center user interface. 

At this point the restart sequence executes some final tasks before finishing the server restart:

  • The Web Service (W3SVC) is started. 

  • Any monitors that are flagged for checking are checked before the member is set online. 

  • Load balancing is started. 

  • If a full synchronization wasn't required before adding the member to the load-balancing loop, a work item is queued that will request a full synchronization of the member. 

Changing the Cluster Controller

Changing the designated controller for a cluster is a fairly simple process from the user's perspective—it consists of selecting a member node in the console tree (assuming that the user is connected to the controller) and launching the Designate as Controller command. Alternatively, if the cluster controller is down, you can connect directly to the member that you want to designate as the controller and invoke the preceding command.

Tip Prior to promoting a member to controller status, you should do a full synchronization of the member to the current controller.

Processing Activities and Their Sequence

There are two situations that can exist when you decide to explicitly change the designated cluster controller:

  • The controller is up and running. 

  • The controller is not available or is in an unstable condition. 

The Controller Is Up and Running

In this scenario you've decided to promote a member to controller status even though the current controller is up and running. (Reasons for making this change may be to add more memory or replace one of the network adapters.)

You should not change the controller if one of the following operations is in progress:

  • Synchronization. 

  • A cluster administrative activity, such as setting a server offline/online. 

Warning If you launch the Designate as Controller command while one of the previously described operations is in progress, the controller change will fail.

If the preceding conditions do not exist, the following sequence of events occurs involving the current controller (S1) and the member that will become the controller (S2).

In the first step, the administration program verifies that S2 can be contacted. If not, the operation is stopped. The next step involves a call to S1 to see if the cluster is in a controller-less state. If it is, the processing described in the following section, "The Controller Is Not Available," occurs.

If the current controller (S1) can be contacted and appears to be functioning normally, S1:

  • Notifies the user that he or she should perform a full synchronization on S2

  • Disallows all new requests for administrative changes and turns off automatic synchronization. If any of these activities are currently in progress, they are allowed to finish. 

  • Notifies cluster members that a controller change is about to occur. This gives the members a chance to stop any operations that reference S1, such as request forwarding. The members also cancel currently executing requests and set a flag indicating that the cluster controller is changing. When this notification is received, each member starts a local timer. 

If S1 fails before completing the preceding step, an error is returned to the administration program and no timers are started. If an S1 failure occurs after notifying some of the members, these members will have started their timers. As soon as these members are notified of the S1 failure, they expire their timers and wait to be notified of a controller recovery or controller change.

Assuming there isn't a failure, S1:

  • Fires a "Controller is changing" event. 

  • Makes a synchronous COM call to S2, telling it to take over as controller. If S2 fails during the time-out associated with this call, an error is returned to the administrator and the administrator may retry the command. 

If control is successfully transferred to the new controller, S2: 

  • Re-enables changes that require synchronization to the rest of the cluster. 

  • Informs all members that it has taken over as the new controller and that they can re-enable operations that reference the controller. In response, the members stop their timers and set their local pointers to reference S2 as the controller. 

  • Sets its own pointer in the local cluster members list to point to itself and fires a "new controller is S2" event. 

If S2 fails between responding to the COM call from S1 and the firing of the new controller event, the members expire their timers and revert to regarding S1 as the controller. If S2 fails after telling only a subset of the members that it's taking over as the controller, the timer expires on the members that haven't been told about the* *controller change—they switch back to regarding S1 as the controller. Members on which the timer expires fire an event/alert that tells the administrator what has happened.

If controller transfer is successful to this point, the call from S1 to S2 returns and S1 changes its local pointer to reference S2 as the cluster controller. When the S1 cluster reference change is saved, the controller change is finished. (If S2 fails before this reference is changed, the administrator is notified and the controller change has to be redone.)

There are additional special cases that can happen during the course of a controller change:

  • S1 fails at any time before S1 changes its local pointer to reference S2. The cluster will enter the controller-less state and administrative action may be needed to recover from this state. 

  • A server other than S1 or S2 fails. No special processing is required; the new controller is picked up automatically when the failed member recovers. 

The Controller Is Not Available

When the controller (S) is not available and the cluster is in a controller-less state, in your role of administrator you have to designate another cluster member as the controller.

  • The user interface calls a method on S that checks the configuration on the member that you want to promote to confirm that the cluster is, in fact, in a controller-less state. If it isn't, an error is generated indicating that there is currently a cluster controller. At this point you can decide whether or not you still want to promote a member to controller status. 

  • If the cluster is controller-less, S sets the local pointers on all the cluster members to point to S as the controller. 

  • S fires a "new controller is S" event and controller re-assignment is finished. 

Disbanding a Cluster

In order to disband a cluster, you have to remove each cluster member, leaving the cluster controller as the last member to remove. As noted earlier, the option to remove the cluster controller is not available unless it's the only cluster member.

Although this approach may seem tedious when faced with the task of disbanding a large cluster, it's actually an excellent feature from a production perspective. If a cluster, regardless of its size, could be disabled with a single command, the potential for wreaking havoc in a production cluster is frightening. (For a script example that illustrates how you can remove a group of members from a cluster by using a single batch file, see Chapter 11, "Working with the Command-Line Tool and Scripts.")

When you initiate the removal of the cluster controller from the cluster, the following activities occur:

  • The user interface goes through the same steps that were described in "Removing a Server" earlier in this chapter. 

  • All cluster-related configuration settings are deleted on the controller. 

  • An event notification of the success or failure of the cluster disbanding is sent. 

  • The MMC is refreshed to show the current state of the Application Center environment; there is no cluster node and no member node. 

Background Services

Bb734909.spacer(en-us,TechNet.10).gif Bb734909.spacer(en-us,TechNet.10).gif

In addition to the services we've described, there are background services that Application Center runs, namely cluster time synchronization and reliable name resolution.

Cluster Time Synchronization Service

Application Center provides its own mechanism for ensuring that the internal clocks of all the cluster members are synchronized. The main reason for doing this is to ensure that monitoring and performance data that is logged is time stamped accurately. This is particularly important with performance data, which is collected across the cluster and aggregated.

Note This is not a native operating system service but is provided by the Application Center Cluster Service.

A cluster member's clock may get out of synchronization with the other members because:

  • A user modifies the time setting manually. 

  • An application updates the clock. 

  • There is an electrical problem, such as a CMOS battery failure, which prevents the clock from getting incremented correctly. 

The Cluster Time Synchronization Service keeps the cluster member clocks synchronized by replicating the controller's date and time setting:

  • At the startup of cluster services on the controller 

  • At the startup of cluster services on a member 

  • At 60-minute intervals 

  • Whenever there is a cluster controller change 

The design criteria for the Cluster Time Synchronization Service is to keep all member clocks set to within +/- 5 seconds of the time on the controller. However, this level of accuracy may not always be possible because:

  • CPU utilization on the controller is so high that obtaining the system date and time may take an unusually long time. 

  • CPU utilization on a member may be so high that the time service "set" operation may take an unusually long time. 

  • Network latency may cause significant delays in propagating the time setting to all the cluster members. 

The time synchronization service is enabled and disabled in the metabase by setting MD_WEBCLUSTER_DO_TIME_SYNC to 1 or 0 (/AppCenter/Cluster, property ID (5739), type DWORD).

This ability to toggle time synchronization is necessary because it allows you to take advantage of the Reliable Time Service (RTS) of a domain controller, which you should use if you're using Kerberos V5 authentication. (It's possible to break Kerberos V5 authentication because the Cluster Time Synchronization Service is unaware of time settings on the domain controller or ticket granter.) Other situations that may require disabling of the Cluster Time Synchronization Service is the presence of conflicting software services, such as some virus checkers.

At cluster creation time, Application Center checks to see whether the computer that will become the controller is part of a domain. It the controller is part of a domain, the configuration flag is set to 0; if it isn't, this flag is set to 1.

Note The Cluster Time Synchronization Service is active under the same conditions as the System Application, which is to say date and time is synchronized even if a member is out of the synchronization loop. For more information, see Chapter 6, "Synchronization and Deployment."

Reliable Name Resolution Service

The Reliable Name Resolution Service ensures that cluster administration services, such as the Cluster Service and the Synchronization Service, only use the back-end network adapters for their traffic. The main reason for this requirement is that network connections on the front-end adapter can be torn down at any time—specifically when TCP/IP or NLB configuration settings get altered. A second, but equally important consideration, is the segregation of production and administration traffic, either for reasons of performance or security. The Reliable Name Resolution Service also helps to ensure that Application Center services do not try to communicate with unusable IP addresses.

The Reliable Name Resolution Service addresses traffic control issues in the following manner.

Note Both the time (in seconds) between host file updates and the update mechanism itself can be controlled via metabase entries. The update interval, which is set to 5 minutes by default, can be changed by editing the 57448 entry. The minimum legal value is greater than or equal to 60 seconds. You can disable the update mechanism by setting the 57449 entry to False.

Bb734909.spacer(en-us,TechNet.10).gif