Troubleshooting Network Load Balancing Clusters

Applies To: Windows Server 2008, Windows Server 2008 R2, Windows Server 2012

This section lists some common issues that you might encounter when using Network Load Balancing (NLB) clusters.

Note

The NLB functionality in Windows Server 2012 is generally the same as in Windows Server 2008 R2. However, some task details are changed in Windows Server 2012. For information on new ways to do tasks in Windows Server 2012, see Common Management Tasks and Navigation.

What problem are you having?

  • After installing Network Load Balancing and restarting a cluster host, a message appears: "The system has detected an IP address conflict with another system on the network..."

  • There is no response when you use ping to access the cluster's IP address from an outside network.

  • There is no response when using ping to access a host's dedicated IP addresses from another cluster host.

  • When attempting to use Network Load Balancing Manager to connect to a host in your cluster, you receive the error "Host unreachable."

  • When using Telnet or attempting to browse a computer outside the cluster from a cluster host, there is no response.

  • When invoking the Network Load Balancing remote control commands from a computer outside the cluster, there is no response from one or more cluster hosts.

  • There is no reply when you use the dedicated IP address of a host to specify it as a target for a remote control command. However, specifying the host by its priority (ID) works.

  • Connectivity to the cluster is denied to some users, but not all.

  • You cannot view or change the Network Load Balancing properties by using net config and Windows Management Instrumentation (WMI).

  • An unusual number of TCP connections to the cluster's IP address are being reset by the server or the client.

  • Virtual Private Network (VPN) calls fail when you make a change that causes convergence (such as adding a host, removing a host, or draining a host).

  • After the cluster hosts start, they begin converging, but they never complete convergence.

  • The cluster moves in and out of a converged state.

  • After the cluster hosts start, Network Load Balancing reports that convergence has finished, but more than one host is a default host.

  • Network Load Balancing is not load balancing applications, and the default host handles all the network traffic.

  • Traffic alternates unexpectedly between the cluster hosts, and it breaks TCP connections.

  • Network traffic does not appear to load balance evenly among the cluster hosts.

  • When you are using Network Load Balancing with Microsoft Internet Security and Acceleration (ISA) Server, one cluster host logs blocked packets that are directed to the dedicated Internet Protocol (IP) address of another host.

  • You are unable to create a Network Load Balancing cluster in a 64-bit version environment.

After installing Network Load Balancing and restarting a cluster host, a message appears: "The system has detected an IP address conflict with another system on the network..."
  • Cause: The same IP address already exists on the network.

  • Solution: Choose a new IP address, or remove the duplicate address.

  • Cause: You have configured different cluster operation modes (Unicast or Multicast) on the hosts, which causes two different MAC addresses to map to the same IP address.

  • Solution: Ensure that all hosts are configured with the same cluster operation mode.

  • Cause: You configured the cluster's IP address before NLB was bound to the network adapter.

  • Solution: Remove the cluster's IP address from TCP/IP properties, enable NLB on the proper adapter, and then configure the cluster's IP address.

  • Cause: You added the cluster's IP address to a network adapter that has not been enabled for NLB.

  • Solution: Remove the cluster's IP address from the incorrect adapter's TCP/IP properties, enable NLB on the proper adapter, and then configure the cluster's IP address.

For more information about enabling NLB, see Installing Network Load Balancing

There is no response when you use ping to access the cluster's IP address from an outside network.

Verify that you can use ping to access the dedicated IP addresses for the cluster hosts from a computer outside the router. If this test fails, and you are using multiple network adapters, the issue is not related to NLB. If you are using a single network adapter for the dedicated and cluster IP addresses, consider the following causes:

  • Cause: If you are using multicast support, you might find that your router has difficulty resolving the primary IP address into a multicast media access control (MAC) address by using the Address Resolution Protocol (ARP).

  • Solution: Verify that you can use ping to access the cluster from a client on the cluster's subnet and to access the cluster hosts' dedicated IP addresses from a computer outside the router. If these tests work properly, the router is probably at fault. You should be able to add a static ARP entry to the router to circumvent the issue. You can also turn off NLB multicast support and use a unicast network address without a hub.

  • Cause: When using NLB in multicast or unicast mode, routers need to accept proxy ARP responses (IP-to-network address mappings that are received with a different network source address in the Ethernet frame).

  • Solution: Make sure that your router has proxy ARP support turned on. You can also set a static ARP entry to keep proxy ARP support disabled in the router.

  • Cause: Internet control message protocol (ICMP) to the cluster is blocked by a router or firewall.

  • Solution: Allow ICMP traffic through the router or firewall. Be aware that this may expose your system to additional security risk.

There is no response when using ping to access a host's dedicated IP addresses from another cluster host.
  • Cause: When using NLB in multicast or unicast mode, routers need to accept proxy ARP responses (IP-to-network address mappings that are received with a different network source address in the Ethernet frame).

  • Solution: Make sure that your router has proxy ARP support turned on. You can also set a static ARP entry to keep proxy ARP support disabled in the router.

  • Cause: Internet control message protocol (ICMP) to the cluster is blocked by a router or firewall.

  • Solution: Allow ICMP traffic through the firewall or router. Be aware that this may expose your system to additional security risk.

When attempting to use Network Load Balancing Manager to connect to a host in your cluster, you receive the error "Host unreachable."
  • Cause: Internet control message protocol (ICMP) to the host is either blocked by a router or firewall or disabled on the host's network adapter.

  • Solution: Enable ICMP on the host's network adapter or allow ICMP traffic through the firewall or router. Be aware that this may expose your system to additional security risk. You can also use NLB Manager's /noping option.

When using Telnet or attempting to browse a computer outside the cluster from a cluster host, there is no response.
  • Cause: Verify that you can use ping to access the computer outside the cluster. If this test is successful, you might not have listed the host's dedicated IP address first in the TCP/IP properties.

  • Solution: If ping fails to access the computer outside of the cluster, refer to the following issues (described earlier in this Troubleshooting topic):

    • There is no response when you use ping to access the cluster's IP address from an outside network.

    • There is no response when using ping to access a host's dedicated IP addresses from another cluster host.

When invoking the Network Load Balancing remote control commands from a computer outside the cluster, there is no response from one or more cluster hosts.
  • Cause: Remote control commands are not being sent to the cluster's IP address.

  • Solution: Commands must be sent to the cluster's primary IP address, which was assigned in the Network Load Balancing Properties dialog box. Be sure that you send remote commands to the correct IP address.

  • Cause: The remote control traffic is being encrypted by Internet Protocol security (IPSec). NLB remote control commands will not work correctly if they are sent from a computer that has IPSec configured so that the remote control traffic is encrypted by IPSec.

  • Solution: Disable IPSec.

    For more information, see the Internet Protocol Security (IPSec) Help content.

  • Cause: NLB UDP control ports are protected incorrectly by a firewall. By default, remote control commands are sent to UDP ports 1717 and 2504 at the cluster IP address.

  • Solution: Be sure that these ports have not been blocked incorrectly by a router or firewall. You can also change the port number by modifying the corresponding NLB parameter.

There is no reply when you use the dedicated IP address of a host to specify it as a target for a remote control command. However, specifying the host by its priority (ID) works.
Connectivity to the cluster is denied to some users, but not all.
  • Cause: An application that is being load balanced is not responding.

  • Solution: This is an application-specific issue that is not related to NLB. Refer to your application's documentation to correct this issue. You may need to stop and restart the application.

  • Cause: If your cluster is configured for unicast mode, a switch might have learned the NLB network adapter's MAC address.

  • Solution: Clear the switch's port to MAC address mapping.

  • Cause: The cluster's IP address was not added to TCP/IP on one or more of the hosts.

  • Solution: If you do not use NLB Manager to configure your cluster, you must manually configure TCP/IP with the cluster's IP address.

  • Cause: A host is leaving the cluster because of a drainstop or stop command, but convergence did not complete correctly.

  • Solution: Wait for the convergence to complete. If the convergence does not complete, see the following issue later in this Troubleshooting topic:

    After the cluster hosts start, they begin converging, but they never complete convergence.

You cannot view or change the Network Load Balancing properties by using net config and Windows Management Instrumentation (WMI).
  • Cause: To view or change Network Load Balancing properties, you must be a member of the Administrators group.

  • Solution: Log on as a user who is in the local Administrators group of the computer that is running NLB.

An unusual number of TCP connections to the cluster's IP address are being reset by the server or the client.
  • Cause: The HTTP keep-alive values are enabled on the NLB hosts and keep-alive value-enabled clients are connecting to the cluster.

  • Solution: Disable HTTP keep-alive values. For more information about HTTP keep-alive values and Internet Information Services (IIS), refer to the IIS documentation set.

    To view the IIS documentation set from your desktop, install IIS, then click Start, click Run, and type the following command in the Open text box:

    %windir%\help\iisrv.chm

  • Cause: Low system resources on the server are causing TCP to reject the connections.

  • Solution: Free system resources by, for example, adding additional system memory or closing unnecessary applications.

  • Cause: The cluster has diverged into two separately converged clusters, which causes more than one node to claim ownership of every connection.

  • Solution: Remove the two clusters, then recreate a single cluster.

Virtual Private Network (VPN) calls fail when you make a change that causes convergence (such as adding a host, removing a host, or draining a host).
  • Cause: When using NLB to load balance VPN traffic, you must configure the port rules that govern the ports handling the VPN traffic (TCP port 1723 for PPTP/GRE and UDP port 500 for IPSEC/L2TP) to use either Single or Network affinity.

  • Solution: Configure the port rules that govern ports 500 and 1723 to use Single or Network affinity. For more information, see Network Load Balancing Manager Properties.

After the cluster hosts start, they begin converging, but they never complete convergence.
  • Cause: A different number of port rules or incompatible port rules on different cluster hosts were entered. This will inhibit convergence.

  • Solution: Open the Network Load Balancing Properties dialog box on each cluster host and verify that all hosts have identical port rules.

  • Cause: You have a bad network adapter or cable.

  • Solution: Use the ping command to test connectivity. Enter the host's fully qualified domain name. You can also learn more about the issue by using the ping command to search your domain controller by IP address and other network servers by name and IP address.

  • Cause: Duplex settings on a switch or hub are mismatched.

  • Solution: Confirm that the duplex settings in each of your switches and hubs are configured appropriately.

  • Cause: The dedicated IP address that you used for one of the hosts already exists on the network.

  • Solution: Choose a new IP address, or remove the duplicate address.

  • Cause: Your cluster contains hosts that are running Windows 2000.

  • Solution: Your cluster must be running Windows Server 2008 on all hosts. An NLB cluster environment that contains hosts with Windows Server 2003 and Windows Server 2008 is supported only when performing a rolling upgrade to Windows Server 2008. Mixing Windows Server 2003 and Windows Server 2008 in the same cluster is not supported for long periods of time.

  • Cause: You have configured different cluster operation modes (unicast and multicast) on the hosts.

  • Solution: Use NLB Manager to ensure that all hosts are configured with the same cluster operation mode.

Note

You can also view the Windows event logs to check for errors and warnings. For more information see Installing Network Load Balancing.

The cluster moves in and out of a converged state.
  • Cause: Heartbeats are being missed due to intermittent network connectivity caused by a bad network adapter or cable or other network problems.

  • Solution: Use the ping command to test connectivity. Enter the host's fully qualified domain name. You can also learn more about the issue by using the ping command to search your domain controller by IP address and other network servers by name and IP address.

After the cluster hosts start, Network Load Balancing reports that convergence has finished, but more than one host is a default host.
  • Cause: The cluster hosts have become members of different subnets, so all the hosts are not accessible on the same network.

  • Solution: Be sure that all cluster hosts can communicate with each other.

  • Cause: A layer-three switch is being used.

  • Solution: Put a layer-two switch between the hosts and the layer-three switch.

  • Cause: A break in a redundant switch caused the cluster to separate into two clusters, creating two default hosts.

  • Solution: Remove the two clusters, then create a single cluster.

  • Cause: Your switch is configured to reject broadcast packets.

  • Solution: Configure your switch to accept broadcast packets (be aware that this might introduce certain security risks), or configure your NLB cluster to use multicast mode.

  • Cause: One host is unable to send or receive heartbeats.

  • Solution: Use the ping command to test connectivity to each of the hosts. Enter the hosts' fully-qualified domain name.

  • Cause: A host is plugged into the wrong port on the switch.

  • Solution: Use the correct port on the switch.

Network Load Balancing is not load balancing applications, and the default host handles all the network traffic.
  • Cause: A port rule is missing. By default, NLB directs all incoming network traffic that is not governed by port rules to the default host—this ensures that applications that you do not want load balanced behave properly.

  • Solution: To load balance an application across the cluster, create a port rule on every cluster host for the TCP/IP ports that are serviced by the application.

  • Cause: You added a second host to a single host cluster, but the second host is not configured correctly. The cluster never converges and the original host continues to handle all of the traffic.

  • Solution: Carefully review (and if necessary, correct) each of the settings on the second host—for example, the cluster IP address, dedicated IP address, and port rules.

  • Cause: If your cluster is configured for unicast mode, a switch might have learned the NLB network adapter's MAC address.

  • Solution: Clear the switch's port to MAC address mapping.

  • Cause: A proxy server is sending all connections that are using a single IP address to your cluster in single affinity mode.

  • Solution: Configure your proxy server to use multiple IP addresses.

Traffic alternates unexpectedly between the cluster hosts, and it breaks TCP connections.
  • Cause: Unicast network addresses are causing issues with the switching hub. If you are using a switching hub to interconnect the cluster hosts, you must use NLB multicast support. Otherwise, the switch can behave erratically when the same unicast network is used on multiple switch ports.

  • Solution: Check that you have selected multicast support in the Network Load Balancing Properties dialog box. If you do not want to use multicast support, you can interconnect the cluster hosts with a hub or coaxial cable instead of with a switch.

Network traffic does not appear to load balance evenly among the cluster hosts.
  • Cause: The network traffic is coming from a limited number of IP addresses, possibly due to the setting on a proxy server.

  • Solution: Configure your proxy server to use multiple IP addresses.

When you are using Network Load Balancing with Microsoft Internet Security and Acceleration (ISA) Server, one cluster host logs blocked packets that are directed to the dedicated Internet Protocol (IP) address of another host.
  • Cause: One of the cluster hosts is configured with a host priority identifier equal to 1.

  • Solution: Do not configure any cluster host with a host priority identifier of 1. Use numbers that are greater than 1. For more information, see Configure Network Load Balancing Host Parameters.

You are unable to create a Network Load Balancing cluster in a 64-bit version environment.
  • Cause: You might not be running the appropriate NLB version for your environment. NLB cannot form a cluster when the 32-bit version of NLB is used on a 64-bit version computer. This issue might have gone undetected because 32-bit NLB components (nlb.exe, wlbs.exe, and nlbmgr.exe) appear to run correctly in the 64-bit version environment.

  • Solution: If you plan to use a 64-bit version computer environment, you must use the 64-bit NLB version.

Note

The following topics describe several common issues that you might encounter when installing and initially using NLB. The topics describe the likely reasons for each issue and one or more suggested remedies. These topics assume that your system and applications meet the minimum requirements for NLB. For more information, see: Overview of Network Load Balancing and Installing Network Load Balancing.
You should test your network and all network adapters for proper operation before installing NLB. Be sure to follow all installation steps, and check that the cluster parameters and port rules are identically set for all cluster hosts. If an issue occurs, always check the Windows event log for a message from the NLB driver. For more information, see the sections titled Cluster parameters, Host parameters, and Port rules in Network Load Balancing Manager Properties.