Server Clusters: Security Best Practices, Windows 2000 and Windows Server 2003, Deployment and Operations

Applies To: Windows Server 2003 with SP1

Published: January 1, 2003

Also In This White Paper

Server Clusters: Security Best Practices, Windows 2000 and Windows Server 2003, General Assumptions

In This Section

This section of the white paper contains the following subsections:

Cluster administrators

Remotely managing and configuring clusters

Cluster service account

Changing the Password

Using Kerberos Authentication in a Server cluster

Creating Computer Objects

Disabling Computer Objects

Renaming Computer Objects

Password Rotation

Caveats, Issues and Considerations

Network Security

NTLM V1 and NTLM V2

NetBIOS

IPSec

Cluster Disks

Managing file shares in a cluster

Cluster Server nodes as Domain Controllers

Summary of security attributes

Majority Node Set considerations

Upgrading to Windows Server 2003 considerations

Developing Cluster-aware Applications

Calling the Server cluster APIs

Backup/Restore APIs

Cluster administrators

Administrators can specify groups or individuals that are allowed to manage the cluster. In current versions of Server cluster, there are no fine details of control; either a user has rights to administer the cluster or the user does not. To grant a user or group rights to administer the cluster, the user or group must be added to the cluster security descriptor. This can be done through cluster administrator or the cluster.exe command line tool.

Note

Apart from the local Administrators group on a node, all other members of the cluster security configuration MUST be either domain user accounts or global groups. This is to ensure that the account is the same, well defined and authorized account on all nodes in the cluster.

By default, the local Administrators group is added to the cluster service security descriptor.

Adding a user or group to the cluster security descriptor means that the user can manage all aspects of the cluster configuration including (but not limited to):

  • Taking resources offline and bringing resources online

  • Shutting down the cluster service on nodes

  • Adding and removing nodes from the cluster

  • Adding and removing resources from the cluster

Because of the scope of impact on the applications and services running in the cluster, careful consideration must be taken when adding a user to the cluster security descriptor.

The cluster service runs the code associated with a resource under the cluster service domain user account (this should not be confused with the account used to administer the cluster). Since a cluster administrator can add new resources to a cluster and since those resources run as the cluster service account, a cluster administrator can install code that runs with local administrator rights on the machine.

Best Practices

  • Cluster administrators should use a different account than the cluster service account to administer the cluster. This allows different policies (such as password expiration etc.) to be applied separately to the cluster service account and the domain account(s) used to administer the cluster.

  • You should only add users with local Administrator rights to the cluster service security descriptor.

    Note

    By adding a domain user or global group to the local Administrator group, that group or account will automatically be a cluster administrator.

  • Do not remove the local Administrator group from the cluster service security configuration.

Remotely managing and configuring clusters

Administration tools or other applications that call the Server cluster APIs (ClusAPI) can be run from remote workstations. The general assumption is that the cluster administrator must ensure that the applications are run from trusted computers. Any compromise on the computers on which the applications are executing (that the cluster administrator runs) can compromise the cluster.

When a cluster is created or the configuration is changed (such as adding a new cluster node), the Cluster Configuration Wizard will create a log file on the machine on which the wizard is run so that in the event of failures, the administrator can use the log for debugging and troubleshooting. This log file can contain cluster configuration data such as cluster IP addresses, network names etc. This data could be used to extend the attack surface if it is read by unauthorized users.

Cluster service account

The cluster service account is the account under which the cluster service is started. The credentials for this account are stored in the service control manager (SCM) which is the Windows component that is responsible for starting the cluster service when the cluster nodes are booted.

The cluster service account must be the same on all nodes in a cluster and it must be a domain-level account that also has local administrative rights to each node in the cluster. The domain account must exist before the cluster is created and the Cluster Configuration Wizard will prompt you for an existing account to be used. If the account is not already a member of the local Administrators group, the Cluster Configuration Wizard will automatically add it to the local Administrator group when the cluster is created. Likewise, when nodes are added to the cluster, the cluster service account will be added to the local Administrators group. If a node is evicted from a cluster or the last node is removed, the cluster service account is not removed from the local Administrators group.

You need to be aware of these semantics to avoid unintentionally granting a domain account local administrator rights to a given set of nodes.

Note

Evicting a node from a cluster does NOT remove the cluster service account from the local Administrators group. When you have removed a node from the cluster, you should manually remove the cluster service account from the local Administrators group to avoid having stale accounts with local administrator rights to a machine.

The nodes in a Server cluster use authenticated communication mechanisms to ensure that only valid members of the cluster can participate in the intra-cluster protocols. It is essential that each node in the cluster has the same cluster service account in order to provide authentication consistency. It is also a requirement of the cluster service account password utility introduced in Microsoft Windows Server 2003.

Required Privileges

As well as being a member of the local Administrator group, the cluster service account requires a set of additional, locally granted privileges:

  • Act as part of the operating system (required for Windows 2000 and beyond).

  • Back up files and directories.

  • Increase quotas.

  • Increase scheduling priority.

  • Load and unload device drivers.

  • Lock pages in memory.

  • Log on as a service.

  • Restore files and directories.

You should NOT remove any of these privileges from the cluster service account. If you remove any of these required privileges, the cluster service may not start up or operate correctly.

During the cluster server setup process, these privileges are granted locally to the account. If you ever need to manually re-create the cluster service account, you must also grant these additional privileges. KB article 269229: How to Manually Re-Create the Cluster Service Account describes the steps necessary to re-create the cluster service account.

Password Policies

The cluster service account is just like any other domain account, it has a password that can have password expiration policies associated with it. If the password has expiration policies assigned to it, then the cluster account password must be changed before it expires. Failure to do so will cause the cluster to stop functioning when the password expires (since the intra-cluster communication can no longer be successfully authenticated).

Changing the Password

In most production deployments, domain accounts will have password expiration policies that force the password to be changed relatively frequently (for example every 30 days). Changing the cluster service account password requires careful planning.

Windows 2000

The cluster service account on ALL nodes in the cluster must match to ensure that the intra-cluster communication can be successfully authenticated. The cluster service itself sends messages between cluster nodes under a variety of conditions and if any of those communications fail, the cluster node will be removed from the cluster (i.e. the cluster service will be stopped). It is not possible to determine when the cluster service will establish communication and therefore there is no clear window that allows the cluster service account to be changed in a reliable way while ensuring that the cluster remains running.

On Windows 2000, the cluster account password can only be reliably changed using the following steps:

  1. Stop the cluster service on ALL nodes in the cluster

  2. Change the password of the cluster service account at the domain controller

  3. Update the service control manager password on ALL cluster nodes

  4. Re-start the cluster service on all of the cluster nodes

Windows Server 2003

The cluster.exe command on Windows Server 2003 has the ability to change the cluster account password dynamically without shutting down the cluster service on any of the nodes. The cluster.exe command changes the domain account password and updates the service control manager account information on all nodes in the cluster.

Note

This command will only work for Windows Server 2003 nodes. If there are any Windows 2000 nodes in the cluster, their cluster service account password will not be changed.

Cluster /cluster:cluster_name1[,cluster_name2,] /changepassword[:new_password[,old_password]] [/skipdc] [/force] [**/**options]

The command arguments have the following meanings:

Argument Description

cluster:cluster_name1 [,cluster_name2,]

Identifies the cluster(s) for the account password change. If multiple clusters are specified, they must use the same cluster service account. If some nodes are unavailable, the password is not change on any of the nodes or at the domain controller.

/changepassword [:new_password [,old_password]]

Changes the cluster service account password on the domain controller and all cluster nodes from old_password to new_password. If the passwords are not supplied on the command line, you will be prompted to provide them.

/skipdc

Changes the cluster service account password only on the cluster nodes.

/force

Forces the execution of the password change on the available nodes even if some nodes are not available.

noteNote
This command will only work for Windows Server 2003 nodes. If there are any Windows 2000 nodes in the cluster, their cluster service account password will not be changed.

The details of the command are documented in the on-line help available from the Cluster Administrator tool.

Best Practices

  • The cluster service account should NOT be a domain administrator account.

    • All accounts should be given the minimal possible rights and privileges to avoid potential security issues if a given account is compromised.

    • Use delegation to give administration rights to specific accounts on ALL nodes of a cluster.

  • If there are multiple clusters in a single domain, using the same cluster service account on all nodes makes administration easier.

    • You need to balance the ease of management with the potential security risks associated with using a single account for many clusters. If the account is compromised, the scope of the impact depends on how many clusters are involved.

    • With Windows Server 2003, the cluster service account password can be changed on multiple clusters at the same time.

  • If you have set password expiration policies on your cluster service accounts you should NOT use this account for other services.

    • Do not use the cluster service account for SQL Server 2000 or Exchange 2000, for example, if you have set password expiration policies. If you have multiple services using the same account, coordinating the password change across the cluster service and other services is complex and will lead to the complete cluster and/or service being unavailable during a password rotation. It is better to have a dedicated account for each service, each of which can be maintained independently.

    • With Windows Server 2003, the cluster service account password can be changed online without taking the cluster down ONLY if there are no services using the same service account.

  • Changing the cluster service account password on Windows 2000 requires the cluster to be shut down completely before the account password is changed. Shutting down and re-starting the cluster may mean that the cluster cannot meet the availability requirements. For example 99.999% uptime requires less than five minutes downtime per year for the applications and services. This level of availability cannot be achieved if the cluster is shutdown for password rotation. In these high availability environments, with Windows 2000, the cluster service account should have a password expiration policy of never expiring.

  • With Windows 2000 SP3 and Windows Server 2003, the cluster service can publish virtual servers as computer objects in Active Directory. To ensure correct operation, the cluster service account should have appropriate access rights or privileges to be able to create and manipulate these objects in the Active Directory Computers container.

    See section Using Kerberos Authentication in a Server cluster.

  • If you intend to deploy multiple clusters using different cluster service accounts, you should create a Global or Universal group which implements all of the policies and has all of the privileges described above. Each cluster service account should then be placed into the group.

    • This eases management of the cluster service accounts by providing a single container for all cluster service accounts and a single point of management for changing account policies.

Using Kerberos Authentication in a Server cluster

Kerberos authentication was released as part of the Windows platform in Windows 2000. It is the primary (and default) security mechanism for the Windows platform moving forward and it has a number of benefits over the previous authentication mechanisms (such as NTLM):

  • Provides mutual authentication of client and server: The server ensures that the client has access to the service or applications provided at the server and the client can be sure that the server it is communicating with really is the server that it was expecting it to be.

  • Allows delegation of authentication across multiple machines: Allows end-to-end impersonation based on the credentials associated with the original request in a multi-tiered application deployment. For example, a web site may contain an IIS web server front-end, business objects at the middle-tier and a database back-end. Kerberos allows the original client authentication to be carried across all of the tiers from the web server to the database for end-to-end authentication and authorization.

  • An industry standard allowing a common authentication mechanism across multiple platforms.

For more information about Kerberos and how it works, see the TechNet web site: https://www.microsoft.com/windows2000/techinfo/howitworks/security/kerberos.asp

In a Server cluster, applications are deployed in virtual servers. A virtual server is a Server cluster resource group that contains an IP Address resource and a Network Name resource as well as any resource required for a given service or application. Clients connect to the clustered service using the IP Address or Network Name associated with the service. When an application fails over from one node to another, so do the IP Address and Network Name resources, thus the client continues to use the same target location for a service or application regardless of which machine in the cluster is currently hosting it.

For Server clusters in Windows 2000 (up to SP2), the only mechanism available for authenticating a client against a virtual server name is NTLM, thus the benefits provided by Kerberos are not available to deployments that contain Server clusters.

In Windows 2000 SP3 and Windows Server 2003, Kerberos authentication will be available to cluster server applications, allowing end-to-end impersonation in a highly available deployment.

Virtual Server Computer Objects

Kerberos authentication requires a backing computer object (CO) to be published in Active Directory. In Windows 2000 SP3 and beyond, the network name cluster resource has the ability to publish a computer object in Active Directory for the virtual server. This is controlled through the new resource property RequireKerberos with the following values:

  • RequireKerberos = 0

    The Network Name resource does not create an object in Active Directory. This provides the same semantics as Windows 2000 (up to SP2) and is the default value for backward compatibility as well as ensuring correct behavior during rolling upgrade.

    (In the case of a network name that is required for MSMQ, RequireKerberos is enabled by default to ensure that MSMQ continues to work as expected after the upgrade is complete.)

  • RequireKerberos = 1

    The Network Name resource does create an object in Active Directory thus enabling Kerberos authentication against the virtual server computer name.

The full semantics for this property are described later in this section.

Note

Although the Network Name resource publishes a computer object in Active Directory, that computer object should NOT be used for administrations tasks such as applying Group Policy. The ONLY role for the virtual server computer object in Windows 2000 and Windows Server 2003 is to allow Kerberos authentication and delegation and for cluster and Active Directory-aware services (such as MSMQ) to publish service provider information.

The Lifetime of a Computer Object

Network Name resource controls the creation and deletion of computer objects as dictated by a number of factors including resource property settings and the resources own lifetime.

Creating Computer Objects

The Network Name resource creates a computer object when the RequireKerberos resource property is set to one (the default value is zero). Setting this value to one requires that the cluster service account has access to Active Directory either by having the Add Workstations to the Domain privilege or by granting specific access to Active Directory. If the account does not have access, the Network Name resource will fail to come on line.

Domain administrators can pre-create computer objects for virtual servers (in a separate OU if required) to avoid giving the Add Workstations to the Domain privilege to the cluster account. In this case, the computer object must have an Access Control Limit (ACL) that allows the cluster service account to reset the password and write to the DnsHostName and ServicePrincipalName attributes (which have individual access rights).

If an orphan computer object or a computer object created by the domain administrator exists in Active Directory corresponding to the virtual computer name, and the cluster service account has the appropriate access rights to the object and the object is not disabled, the network name resource will hijack the existing computer object. This is the same behavior that is visible if a new computer is added to a domain that contains an old and unused computer object with the same name.

The Network Name resource sets the DnsHostName attribute of the computer object to the fully qualified DNS name of the Network Name resources Name property and the primary DNS suffix of the node. (All nodes in a cluster must be in the same NT and DNS domain.)

Disabling Computer Objects

Computer objects are disabled when either Kerberos authentication is no longer required (the RequireKerberos property is changed from one to zero) or when the resource itself is deleted. If the computer object still exists, but the Network Name resource is brought online without Kerberos authentication enabled, client authentication will fail. When removing Kerberos authentication from a Network Name resource, the system administrator must make an explicit decision to delete the corresponding computer object. Note that Active Directory-aware applications hosted in the virtual server (such as Exchange server and MSMQ) may have attached their own information to the computer object. If the computer object is deleted, those properties will also be deleted and the applications may no longer function correctly. Extreme care should be taken when removing Kerberos authentication from a network name.

When the Network Name resource is deleted or Kerberos authentication is removed, the Network Name resource makes a one-time best effort to disable the computer object. All disable operations, along with their final status, are logged in the cluster and system event logs. If disabling the computer object fails for any reason, no other attempt is made to disable it.

Renaming Computer Objects

The computer object is intimately tied to the virtual network name exposed by the Network Name resource. If the Name property of the network name resource is changed, that change must be reflected in the computer object. These changes are tightly coupled and synchronized; for that reason, changing the Name property can occur only when the Network Name resource is offline. If renaming the computer object fails for any reason, the change to the Name property is also failed.

There is one caveat to this: the Network Name resource allows property values to be changed while the Network Name resource is online; in this case, the rename operation is deferred until the name is taken offline. If that rename attempt fails, the operation is retried as part of the online process. The name will not be brought online until the rename operation is successful.

Note

Many applications and services such as MSMQ and SQL Server do NOT support changing the Name property on the Network Name resource.

Password Rotation

The computer object maintains a password used for authentication when issuing Kerberos tickets. In the current implementation for Windows 2000 SP3 and Windows Server 2003, the password on the computer object is not changed once the computer object is created.

Caveats, Issues and Considerations

Orphan Computer Objects

Many operating system services and applications use the Negotiate security package which is a component provided by the Windows platform that allows a client to authenticate with a service. This package hides the particulars of any given authentication scheme and allows a client to try Kerberos authentication, and if that is unavailable, it falls back to NTLM authentication. If there is a computer object in Active Directory that corresponds to the target name that the client is using, the negotiate package will assume that Kerberos authentication is required. (In an all Windows Server 2003 domain this will work as expected.) If a virtual computer object exists in Active Directory (disabled or not) future client connections to that virtual server name will NOT fail-back to NTLM.

If authentication fails against a virtual server and that virtual server should be using NTLM, then you should check whether there is an orphaned or old computer object in active directory corresponding to the virtual server name.

Cluster-Aware and Active Directory-Aware Services

There are some services that are both cluster-aware and Active Directory-aware, in particular, SQL Server 2000, Exchange 2000 and MSMQ. Some of these services attach service information to the computer objects. MSMQ in particular attaches service information to the virtual server computer object.

When removing Kerberos authentication from a Network Name resource, the computer object will be disabled, leaving the system administrator with an explicit decision to delete the computer object. If the computer object is deleted, properties attached by Active Directory-aware applications will also be deleted and the applications may no longer function correctly. Extreme care should be taken when removing Kerberos authentication from a Network Name resource.

Active Directory Replication Delays

Thus far, we have talked about Active Directory as though it were a single, highly consistent infrastructure. In a production environment, multiple domain controllers are deployed to ensure that the domain infrastructure is highly available. Changes to Active Directory are replicated across nodes and in some cases that replication delay can be large. This can cause some seemingly inconsistent effects that you will need to be aware of:

  • Clients cannot see the computer object for a network name resource

    Different nodes may access different domain controllers. If a computer object is created on one domain controller, it may not be visible to the clients until it is replicated to other domain controllers.

Best Practices

  • Plan your use of Kerberos authentication in a cluster carefully:

    • Enable Kerberos authentication on those virtual servers that make sense for your deployment.

    • You should not disable Kerberos authentication on a virtual server unless you fully understand the services using that virtual server and the implications that will have on the service. The computer object and any service information attached to it WILL be deleted (assuming correct privileges) which may lead to a service being unavailable or unrecoverable.

  • The cluster service account should have the Add workstations to the domain privilege to allow it to create computer objects in Active Directory. If you do not wish to grant that privilege to the cluster service account, you MUST create the computer object by hand in Active Directory before enabling Kerberos authentication.

    Note

    The Add workstations to the domain privilege allows a default of 10 computer objects to be created from that account. In addition, the Add workstations to the domain by itself does NOT allow the account to delete or rename the computer object.

  • The cluster service account should have Write all properties access rights to allow it to rename the computer object.

  • Many services, including MSMQ and SQL Server 2000 do not support changing the Network Name resource Name property. You should only ever change this property if you fully understand the implications. In some cases changing the Network Name resources Name property can lead to loss of data or failure of a service.

  • You should ensure that the domain controllers have had a chance to replicate newly created computer objects associated with Network Name resources before allowing clients to access them or before failing over the resource to another node in the cluster. Failure to do this can lead to unpredictable results (see section Active Directory Replication Delays).

Network Security

Network Flooding

The cluster service uses UDP port 3343 for intra-cluster communication. This communication includes heartbeat traffic to detect node failure and cluster control operations. Some of these operations (such as heartbeats) are time-sensitive. If there is significant load on the ports due to network flood attacks, it can result in false node failure detection and therefore cause unnecessary failover operations which will lead to application downtime.

Port Squatting

It is possible for a rogue application to hijack or squat on port 3343. This will stop the cluster service from starting up. In this case, the only option is to kill processes using the port. Port 3343 is registered exclusively for the cluster service.

Rogue Servers

The cluster service provides remote management capabilities through the cluster APIs so that a cluster can be managed from a management station. The cluster APIs use NTLM to authenticate with the server. This allows the server to authenticate that the client has sufficient rights and privileges to manage the cluster, however, it does not provide mutual authentication. In other words, the client has no cast-iron guarantee that the node or cluster that the client connected to is the real cluster. If a rogue machine were to appear on the network with the same IP address or network name (if the DNS information were compromised), then the rogue machine could masquerade as the cluster. If that rogue machine also appeared to implement the cluster service APIs, any management commands sent to the cluster would be intercepted by the rogue machine. In many cases, this does not represent a threat since the administrator would simply receive a false positive or negative that an action was performed. However, there are sensitive operations (such as changing the cluster configuration) where an unauthorized recipient could potentially collect cluster configuration data which may help to extend the attack surface.

This type of attack requires a number of factors to occur:

  1. The rogue computer must be visible on the same subnet as the target cluster and respond to traffic to the cluster IP address (this may be the IP address of the physical computer or may be the IP address of virtual servers hosted by the computer).

  2. In a typical environment, the rogue computer can only take over the IP address if the cluster node or virtual server is not running.

  3. The rogue computer must appear to implement the cluster APIs. For a typical management operation, the client application makes several calls to the server and in some cases, returns handles for subsequent calls.

  4. The administrator/operational procedure must change the configuration that includes the sensitive data (e.g. the administrator must change the cluster account password).

Best practices

Attacks that compromise the cluster typically involve being on the same subnet as the cluster. To protect against these attacks, the subnets should be protected:

  • Client-access networks

    The subnet used by the cluster nodes should not extend beyond the set of nodes that can be physically secured or trusted. You must ensure that rogue or potentially insecure machines cannot be attached to the subnet containing the clusters. Since this network may contain domain controllers, DNS servers, WINS servers, DHCP servers and other network infrastructure, you must take steps to ensure that these infrastructure servers are also secure.

    Typical network security procedures should be put in place such as firewalls that only allow specific application requests etc. Securing networks and infrastructure servers is beyond the scope of this document.

  • Private networks

    Only nodes in the cluster should be visible on a private network (multiple clusters can use the same private network). No other network infrastructure servers or other application servers should be on the private subnet. This can be achieved by either:

    • Physically constraining the network itself (e.g. the private network is a LAN that is only connected to the cluster nodes).

    • Isolating the cluster private networks using VLAN-capable switches.

NTLM V1 and NTLM V2

Windows 2000

Windows 2000 clusters (up to and including SP2) must use NTLM V1 authentication between cluster nodes. There are several lockdown tools and policies (such as HiSecDC) that apply different policies to the nodes, forcing NTLM V2 as the default security profile. If these policies or tools are applied to a Windows 2000 cluster, the cluster service will fail. The default security profile must be reset to Send LM and NTLM responses. This is detailed in KB articles 295091 and 272129.

Windows Server 2003 and Windows 2000 SP3 (and above)

In Windows 2000 SP3 and above, as well as Windows Server 2003, the cluster service is capable of using NTLM V2. Security policies that enable NTLM V2 will not compromise the cluster.

NetBIOS

Security of a system is dependent on the number of ways into the system. In Windows Server 2003, the cluster service does not require NetBIOS, however a number of services are affected if NetBIOS is disabled. You should be aware of the following:

  • By default, when a cluster is configured, NetBIOS is enabled on the cluster IP Address resource. Once the cluster is created you should disable NetBIOS by clearing the check box on the parameters page of the Cluster IP Address resource property sheet.

  • When you create additional IP Address resources you should clear the NetBIOS checkbox.

  • With NetBIOS disabled, you will not be able to use the Browse function in Cluster Administrator when opening a connection to a cluster. Cluster Administrator uses NetBIOS to enumerate all clusters in a domain.

  • Print and File services are disabled no virtual names are added as redirector endpoints.

  • Cluster Administrator does not work if a cluster name is specified. Cluster Administrator calls GetNodeClusterState which uses the remote registry APIs which, in turn, use named pipes based on the virtual name.

IPSec

Although it is possible to use Internet Protocol security (IPSec) for applications that can failover in a Server cluster, IPSec was not designed for failover situations and we recommend that you do NOT use IPSec for applications in a Server cluster.

The primary issue is that Internet Key Exchange (IKE) Security Associations (SAs) are not transferred from one server to the other if a failover occurs because they are stored in a local database on each node.

In a connection that is protected by IPSec, an IKE SA is created in phase-I negotiations. Two IPSec SAs are created in phase II. A time-out value is associated with the IKE and IPSec SAs. If Master Perfect Forward Secrecy is not used, the IPSec SAs are created by using key material from the IKE SAs. If this is the case, the client must wait for the default time-out or lifetime period for the inbound IPSec SA to expire and then wait for the timeout or lifetime period that is associated with the IKE SA.

The default time-out for the Security Association Idle Timer is five minutes. In the event of a failover, clients will not be able to reestablish connections until at least five minutes after all resources are online, using IPSec.

Although IPSec is not optimally designed for a clustered environment, it may be used if your business need for secure connectivity outweighs client downtime in the event of a failover.

Cluster Disks

In general, cluster disks (i.e. disks that have a corresponding resource in the cluster configuration) are just like any other disks hosted by Windows; however, there are some additional considerations that you need to understand:

General Best Practices

  • Cluster server only supports the NTFS file system on cluster disks. This ensures that file protection can be used to protect data on the cluster disks. Since the cluster disks can failover between nodes, you must only use domain user accounts (or Local System, Network Service or Local Service) to protect files. Local user accounts on one machine have no meaning on other machines in the cluster.

  • Cluster disks are periodically checked to make sure that they are healthy. The cluster service account MUST have write access to the top level directory of all cluster disks. If the cluster account does not have write access, the disk may be declared as failed.

Quorum Disk

  • The quorum disk health determines the health of the entire cluster. If the quorum disk fails, the cluster service will become unavailable on all cluster nodes. The cluster service checks the health of the quorum disk and arbitrates for exclusive access to the physical drive using standard I/O operations. These operations are queued to the device along with any other I/Os to that device. If the cluster service I/O operations are delayed by extremely heavy traffic, the cluster service will declare the quorum disk as failed and force a regroup to bring the quorum back online somewhere else in the cluster. To protect against malicious applications flooding the quorum disk with I/Os, the quorum disk should be protected. Access to the quorum disk should be restricted to the local Administrator group and the cluster service account.

  • If the quorum disk fills up, the cluster service may be unable to log required data. In this case, the cluster service will fail, potentially on all cluster nodes. To protect against malicious applications filling up the quorum disk, access should be restricted to the local Administrator group and the cluster service account.

  • For both reasons above, the quorum disk should NOT be used to store other application data.

Cluster Data Disks

  • As with the quorum disk, other cluster disks are periodically checked using the same technique. If malicious applications flood the cluster application disks with I/Os, the cluster service health check may fail, thus causing the disk (and any applications dependent on the disk) to be failed over to another cluster node. To avoid denial of service attacks like this, access to the cluster disks should be restricted to those applications that store data on the specific disks.

EFS and Server clusters

With Windows Server 2003, the encrypting file system (EFS) is supported on clustered file shares. To enable EFS on a clustered file share, you must perform a number of tasks to configure the environment correctly:

  1. EFS can only be enabled on file shares when the virtual server has Kerberos enabled. By default, Kerberos is not enabled on a virtual server. To enable Kerberos you must check the Enable Kerberos Authentication check box on the network name resource that will be used to connect to the clustered file share.

    Note

    Enabling Kerberos on a network name has a number of implications that you should ensure you fully understand before checking the box.

  2. All cluster node computer accounts, as well as the virtual server computer account, must be trusted for delegation. See online help for how to do this.

  3. To ensure that the users private keys are available to all nodes in the cluster, you must enable roaming profiles for users who want to store data using EFS. See online help for how to enable roaming profiles.

Once the cluster file shares have been created and the configuration steps above carried out, users data can be stored in encrypted files for added security.

Managing file shares in a cluster

Normal file shares

Normal file shares are the most flexible and easily understood in terms of security. The only real difference is that you administer share level security using the cluster user interface instead of Windows Explorer. You administer NTFS security settings on files on the cluster disks using standard Windows tools such as Windows Explorer. For more information about administering cluster file shares, see the online documentation for Server clusters.

File shares created through Cluster Administrator are created with the same default protection as a single node unless the ACL is set using the cluster administration tools. See file services documentation for default ACL for a file share.

Note

Always use the cluster administration tools to change security of a clustered file share. If you use the file share management tool, any configuration will be lost when the file share is failed over to another node in the cluster.

Shared Subdirectories

Subdirectory shares are available in versions of Windows later than Windows NT 4.0 Service Pack 4. Shared subdirectories allow administrators to rapidly create directories to host large numbers of shares such as home directories with a single cluster resource. A root share is specified, and all of the subdirectories one level below the specified root are created as regular file shares.

There is no way to specify different security attributes for each shared subdirectory and therefore each share inherits the same share level permissions as the root share. Because each share is typically used by a different user, the share-level permissions should be left to everyone, and file-based security at the file system level should be used to control who has access to which files.

DFS roots

DFS roots are available in Windows 2000 onwards. Server clusters only support standalone DFS roots as cluster resources. (Note that does NOT mean you cannot create domain roots on cluster nodes since a cluster node is just like any other server, however, they are not cluster resources.) You can use share level permissions for the root through the Cluster Administrator user interface and you can administer each link through file share permissions on the appropriate server. However, this method of controlling access can be difficult for DFS trees spanning a large number of servers and links. We recommend you administer DFS trees by leaving file share level permissions open and use NTFS file system permissions to restrict access.

Cluster Server nodes as Domain Controllers

To have Server clusters function properly (where the cluster service starts on each node), the node that is forming the cluster must be able to validate the cluster service domain account, which is the account that you configure when the cluster is configured. To accomplish this, each node must be able to establish a secure channel with a domain controller to validate this account. If the account cannot be validated, the cluster service does not start. This is also true for other clustered programs that must have account validation in order for services to start, such as Microsoft SQL Server and Microsoft Exchange.

If a cluster deployment is such that there is no link with either a Windows NT 4.0 domain, a Windows 2000 domain or a Windows Server 2003 domain, you have to configure the cluster nodes as domain controllers so that the cluster service account can always be validated to allow for proper cluster functionality.

If the connectivity between cluster nodes and domain controllers is such that the link is either slow or unreliable, consider having a domain controller co-located with the cluster, or configuring the cluster nodes as domain controllers. Microsoft does not recommend using cluster nodes as domain controllers.

If you must configure the cluster nodes as domain controllers, consider the following important notes:

  • If one cluster node in a 2-node cluster is a domain controller, all nodes must be domain controllers. It is recommended that you configure at least two of the nodes in a 4-node Datacenter cluster as domain controllers.

  • There is overhead that is associated with the running of a domain controller. A domain controller that is idle can use anywhere between 130 to 140 megabytes (MB) of RAM, which includes the running of Server clustering. There is also replication traffic if these domain controllers have to replicate with other domain controllers within the domain and across domains. Most corporate deployments of clusters include nodes with gigabytes (GB) of memory so this is not generally an issue.

  • If the Windows 2000 cluster nodes are the only domain controllers, they each have to be DNS servers as well, and they should point to each other for primary DNS resolution, and to themselves for secondary DNS resolution. You have to address the problem of the ability to not register the private interface in DNS, especially if it is connected by way of a crossover cable (2-node only). For information about how to configure the heartbeat interface refer to KB article 258750. However, before you can perform step 12 in KB article 258750, you must first modify other configuration settings, which are outlined in KB article 275554.

  • If the cluster nodes are the only domain controllers, make them all Global Catalog servers. For background information about the placement of Global Catalog servers, see:

    https://go.microsoft.com/fwlink/?LinkId=91432

  • The first domain controller in the forest takes on all flexible, single master operation roles (refer to KB article 197132). You can redistribute these roles to each node. However, if a node fails over, the flexible, single master operation roles that the node has taken on are no longer available. You can use Ntdsutil to forcibly take away the roles and assign them to the node that is still running (refer to KB article 223787). Review KB article 223346 for information about placement of flexible, single master operation roles throughout the domain.

  • If a domain controller is so busy that the cluster service is unable to gain access to the quorum drive as needed, the cluster service may interpret this as a resource failure and cause the cluster group to fail over to the other node. If the quorum drive is in another group (although it should not be), and it is configured to affect the group, a failure may move all group resources to the other node, which may not be desirable. For more information regarding Quorum configurations, please refer to the KB article 280345 listed in the "Reference" section.

  • Clustering other programs such as SQL or Exchange in a scenario where the nodes are also domain controllers, may not result in optimal performance due to resource constraints. You should thoroughly test this configuration in a lab environment prior to deployment.

  • You must promote a cluster node to a domain controller by using the Dcpromo tool prior to creating a Server cluster or adding a node to the cluster.

  • You must be extremely careful when demoting a domain controller that is also a cluster node. When a node is demoted from a domain controller, the security settings and the user accounts are radically changed (user accounts are demoted to local accounts for example).

Generic resources

The generic resources types, Generic Application, Generic Service and Generic Script, allow existing non-cluster aware applications to be monitored and failed over with little or no effort. (Generic script is available on Windows Server 2003 only.) A file name path is used to identify the application or script. The cluster service runs the resource DLL under the cluster service account and therefore they are run with elevated (local administrator) privileges.

When using a generic resource you should ensure that the file containing the code/script and any registry keys used by the script, application or service are protected against attacks.

  • Generic Script

    • Make sure script files are secure using NTFS file system protection.

    • Specify the fully qualified path to the script to avoid spoofing issues associated with the PATH variable.

  • Generic Application

    • Make sure the application is trusted and that the files, registry keys that need to be checkpointed and any other resources needed for the application to run are secure.

    • Specify the fully qualified path to the application to avoid spoofing issues associated with the PATH variable.

  • Generic Service

    • Make sure the service is trusted and that any registry keys or other resources needed for the service to run are secure.

Note

Typically, parameters and configurations for Windows services are stored under HKEY_LOCAL_MACHINE in the registry. This document assumes that this registry hive is protected against attacks. The default security attributes of the HKLM hive ensure that the hive is secure.

Summary of security attributes

This section outlines the various objects and their associated security attributes needed to ensure that the cluster service runs successfully. The section lists the minimum security requirements. If the security attributes are made more restrictive, the cluster service may not run. If the security attributes are less restrictive, it may result in malicious attacks being able to compromise the cluster or corrupt end-user data. We recommend that the security attributes on the objects not be changed from the default settings.

Directory and file protection

Object Description Minimum recommended security attributes

%windir%\help

Cluster Help files

BUILTIN\Users:R
BUILTIN\Administrators:F

%windir%\cluster

Cluster directory (and subdirectories)

BUILTIN\Administrators:F

BUILTIN\Administrators:(OI)(CI)(IO)F

NT AUTHORITY\SYSTEM:F

NT AUTHORITY\SYSTEM:(OI)(CI)(IO)F

CREATOR OWNER:(OI)(CI)(IO)F

%windir%\cluster\*

Files in cluster directory

BUILTIN\Administrators:F

NT AUTHORITY\SYSTEM:F

<quorum_drive>

Volume holding quorum data

BUILTIN\Administrators:F

BUILTIN\Administrators:(OI)(CI)(IO)F

NT AUTHORITY\SYSTEM:F

NT AUTHORITY\SYSTEM:(OI)(CI)(IO)F

<quorum_drive>:\MSCS

Quorum directory (and subdirectories)

BUILTIN\Administrators:F

BUILTIN\Administrators:(OI)(CI)(IO)F

CREATOR OWNER:(OI)(CI)(IO)F

<quorum_drive>:\MSCS\*

Files in quorum directory

BUILTIN\Administrators:F

Volumes on cluster disks

BUILTIN\Administrators:F

BUILTIN\Administrators:(OI)(CI)(IO)F

NT AUTHORITY\SYSTEM:F

NT AUTHORITY\SYSTEM:(OI)(CI)(IO)F

System resources

The cluster service uses other system resources that are protected by default.

Object Description Minimum recommended security attributes

HKLM\*

HKLM registry hive

SYSTEM:Full control

BUILTIN\Administrators: Full control

Cluster Service Account Policies

  • Do not let the password expire. On Windows Server 2003 you can use the password change mechanism to ensure that the password is periodically cycled.

  • Use strong passwords for the cluster service account.

Active Directory

  • To ensure Kerberos authentication functions as expected, the cluster service account must have access to Active Directory. This is covered in the section Using Kerberos Authentication in a Server cluster.

DNS

  • The cluster service account needs to be able to publish records. In a secure, DNS backed zone, the DNS administrator can chose to restrict the access rights for users. The cluster service account must be granted permission to create records or alternatively, the records can be pre-created. If the records are pre-created, you should not set the zone to dynamic updated.

Majority Node Set considerations

Windows Server 2003 provides a new quorum-capable resource known as Majority Node Set. The primary goal of this resource is to keep multiple copies of data spread across a cluster in sync at all times. It can be used to provide a quorum resource in a cluster without using a shared disk for the quorum data and is primarily targeted at the following scenarios:

  • Geographically dispersed clusters to provide a single, common quorum mechanism for multi-site cluster configurations to allow vendors to build solutions without having to be concerned about quorum requirements.

  • Clusters of appliances set of off-the-shelf machines that have no shared disks being bound into a single cluster using some other data replication technique for application and/or user data.

In a majority node set cluster there is a single majority node set resource. It is responsible for committing changes to the quorum data stored on each cluster node. Each cluster node has its own copy of the data. To access the data on the other nodes in a cluster (i.e. the ones other than the node that the resource is hosted on), the Majority Node Set resource uses the file share infrastructure. When a Majority Node Set resource is created, a file share is created on each cluster node (and on nodes that are added to the cluster).

This file share is a hidden file share that has a name constructed as follows:

\\<node_name>\<resource_GUID>$

The default security setting for this file share is:

BUILTIN\Administrators:F (this folder, subfolders and files)
CREATOR OWNER:F (Subfolders and files only)

To ensure the correct operation of a majority node set cluster:

  • Do NOT delete the file share.

  • Do NOT remove the cluster service account from the set of accounts that have access to the share.

  • Do NOT change the default access controls for the file share

  • The file share target is a directory in the %windir%\Cluster\MSCS directory. You should not change the default access permissions on the files or the directories.

  • You should never put other files into the MNS target directory %windir%\Cluster\MSCS\MNS.<guid-representation>. If you put other files in this directory, they will be deleted on cleanup if the MNS resource is deleted.

Upgrading to Windows Server 2003 considerations

Windows 2000 and Windows NT4 have default security attributes on the %windir%\Cluster directory and the quorum directory that allow any authenticated user to read the contents. The security of the directories has been tightened in Windows Server 2003 to stop non-administrator access altogether to ensure that unauthorized users cannot gain information about the cluster configuration. On an upgrade to Windows Server 2003, however, these security attributes are not modified and therefore on an upgraded Windows Server 2003 machine, all authenticated users may have read access to these directories. You may want to consider manually setting the permissions to conform to the minimum access requirements.

Developing Cluster-aware Applications

Resource DLLs run within the context of the cluster service account. This account has a number of elevated privileges as well as being a member of the local Administrators group on every node in the cluster. When developing a cluster-aware application, you should consider how best to split the application between the resource DLL and an independent service or executable to ensure that the application runs with the minimal privileges and rights necessary.

Running applications and services at unnecessary privilege levels introduces a potential security risk if that application is compromised.

Calling the Server cluster APIs

The Server cluster APIs are protected so that arbitrary, untrusted users cannot affect the state of the cluster or the availability of applications. The cluster service maintains a security descriptor in order to control access. Only accounts that are part of the security descriptor are able to call cluster APIs (actually, the security descriptor controls which accounts can open a handle to the cluster, since other cluster server APIs require a handle to the cluster, this effectively limits access).

There is one caveat with this mechanism. The cluster configuration mechanism provided in Windows Server 2003 requires that a user who adds a node to a cluster has local administrator rights on every node in the cluster. If the user does not have local administrator rights on the existing cluster nodes, that user will not be able to successfully add a new node to the cluster. Thus, to give an account administration rights to a cluster you should:

  1. Add the account to the cluster service security descriptor.

  2. Make sure that the account is a member of the local Administrators group on ALL machines in the cluster.

These operations can be done programmatically using the cluster APIs and the security APIs. See the platform SDK for more details.

Security checks are done by the OpenCluster API. If the call returns successfully, the caller has a valid handle and can make any changes to the cluster configuration or enumerate the cluster configuration. There is no notion of open for read or open for write.

Backup/Restore APIs

The backup path specified to the backup API should be protected. The data returned by the API contains cluster configuration information that can be used to extend the attack surface (e.g. lists all cluster IP addresses).

During a restore operation, the backup API takes the cluster configuration to inject. This data must be protected as it will be used to restore the cluster by overwriting any configurations that currently exists.

Installing resources

Cluster aware applications may install cluster resources. Care should be taken when installing resources since they will be executed at elevated privilege and they must, therefore be protected against malicious attack by default when installed:

  • Ensure that resource DLLs are protected at the file system level to allow local Administrator and the cluster service account access only.

  • Be extremely careful about any paths that are specified. There are several well-known issues with paths; especially paths that contain spaces (follow recommendations on page 419 of Writing Secure Code). When in doubt, specify a fully qualified path to the resource dll when creating the resource type.