Troubleshooting Cluster Communications
Updated: December 5, 2005
Applies To: Windows Compute Cluster Server 2003
The following topics discuss loss of cluster communication due to incorrect setup or administrative actions.
Remote Desktop issues
When working with clusters that have both public and private network interfaces (topology scenarios 2, 4, and 5), if connectivity is lost on the public interface of a compute node, communication problems between the head node and that compute node might result. For example, you might not be able to create a Remote Desktop Connection from the head node to the compute node. This can occur when the head node attempts to resolve the DNS name of the compute node using a DNS server on the public network. The DNS host address record ("A" record) that is being used to resolve the node name has an address that is now inoperative. To investigate whether this is occurring, attempt a remote connection from the head node to the target compute node using the IP address of the compute node's private network interface. If this attempt succeeds, you know that the head node is resolving the DNS name using an incorrect DNS account record on the public network.
Firewalls and Windows Compute Cluster Server 2003
When creating a head node, the default behavior of the Beta 2 version of Windows Compute Cluster Server 2003 is to not activate firewalls on either public or private interfaces. However, if an administrator activates a firewall on either the public or private network interfaces of the cluster nodes, all ports are monitored. This will disrupt communication between head node and compute nodes as well as other cluster communications.
If you have enabled firewall services for the private network interface of the head node, PXE boot will fail. This means that adding compute nodes using the Automated Addition method will fail because the firewall is turned on. The Compute Cluster Administrator does not warn the user that PXE boot is failing. Therefore, you should not enable firewall services on the private cluster network. Alternatively, turn off the firewall while adding compute nodes using the Automated Addition method.
Duplicate GUIDs cause problems with automated addition method
Each computer is usually assigned a unique GUID by the original equipment manufacturer (OEM). However, in rare cases, OEMs assign the same GUID to multiple computers.
When compute nodes have duplicate GUIDs, the Automated Addition method of adding nodes will fail. This failure occurs because Remote Installation Services (RIS) uses the node's GUID when creating a new computer account in Active Directory. If any compute nodes have duplicate GUIDs, RIS will not be able to create unique computer accounts in Active Directory for each compute node. As a result, automated installation will fail.
The computer GUID can be seen in the PXE boot phase of computer startup. If you find duplicate GUIDs among the computers that you intend to use as nodes, access the head node and edit the registry. Add the duplicated GUID to a registry key named BannedGuids located under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\BINLSVC\Parameters.
|Modification of this registry setting (i.e. the addition of a BannedGUID) must be accomplished while the Compute Cluster Management snap-in is closed. Once the modification is made, the management snap-in can be opened again and automated deployments of compute nodes initiated.|
If any of the GUIDs in this list are detected during PXE boot, RIS automatically uses the MAC address of the private network adaptor with the last 12 digits of the GUID. This creates a unique identifier for each computer. RIS then creates the computer account in Active Directory using this identifier.