How Network Load Balancing works

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

How Network Load Balancing works

Network Load Balancing provides high availability and scalability of servers using a cluster of two or more host computers working together. Internet clients access the cluster using a either a IP address or a set of addresses. The clients are unable to distinguish the cluster from a single server. Server applications do not identify that they are running in a cluster. However, a Network Load Balancing cluster differs significantly from a single host running a single server application because it can provide uninterrupted service even if a cluster host fails. The cluster can also respond more quickly to client requests than a single host.

Network Load Balancing delivers high availability by redirecting incoming network traffic to working cluster hosts if a host fails or is offline. Existing connections to an offline host are lost, but the Internet services remain available. In most cases (for example, with Web servers), client software automatically retries the failed connections, and the clients experience a delay of only a few seconds in receiving a response.

Network Load Balancing delivers scaled performance by distributing the incoming network traffic among one or more virtual IP addresses (the cluster IP addresses) assigned to the Network Load Balancing cluster. The hosts in the cluster then concurrently respond to different client requests, even multiple requests from the same client. For example, a Web browser might obtain each of the multiple images in a single Web page from different hosts within a Network Load Balancing cluster. This speeds up processing and shortens the response time to clients.

Network Load Balancing enables all cluster hosts on a single subnet to concurrently detect incoming network traffic for the cluster IP addresses. On each cluster host, the Network Load Balancing driver acts as a filter between the cluster adapter driver and the TCP/IP stack in order to distribute the traffic across the hosts.

Network Load Balancing employs a fully distributed algorithm to statistically map incoming clients to the cluster hosts based on their IP address and port. No communication between the hosts is necessary for this process to occur. When inspecting an arriving packet, all hosts simultaneously perform this mapping to quickly determine which host should handle the packet. The mapping remains invariant unless the number of cluster hosts changes. The Network Load Balancing filtering algorithm is much more efficient in its packet handling than centralized load-balancing applications, which must modify and retransmit packets.

Distribution of cluster traffic

Network Load Balancing controls the distribution of TCP and UDP traffic from the Internet clients to selected hosts within a cluster as follows: After Network Load Balancing has been configured, incoming client requests to the cluster IP addresses are received by all hosts within the cluster. Network Load Balancing filters incoming datagrams to specified TCP and UDP ports before these datagrams reach the TCP/IP protocol software. Network Load Balancing manages the TCP and UDP protocols within TCP/IP, controlling their actions on a per-port basis.

In multicast mode, Network Load Balancing can limit switch flooding by providing Internet Group Management Protocol (IGMP) support. Network Load Balancing does not control any incoming IP traffic other than TCP and UDP traffic for specified ports and IGMP traffic in multicast mode. It does not filter other IP protocols (for example, ICMP or ARP), except as described below. Be aware that you should expect to see duplicate responses from certain point-to-point TCP/IP applications (such as ping) when the cluster IP address is used. If required, these applications can use the dedicated IP address for each host to avoid this behavior.

Convergence

To coordinate their actions, Network Load Balancing hosts periodically exchange heartbeats within the cluster (for more information, see Internet Group Management Protocol (IGMP)). IP multicasting allows the hosts to monitor the status of the cluster. When the state of the cluster changes (such as when hosts fail, leave, or join the cluster), Network Load Balancing invokes a process known as convergence, in which the hosts exchange a limited number of messages to determine a new, consistent state of the cluster and to designate the host with the highest host priority as the new default host. When all cluster hosts have reached consensus on the correct new state of the cluster, they record the completion of convergence in the Windows event log. This process typically takes less than 10 seconds to complete.

During convergence, the remaining hosts continue to handle incoming network traffic. Client requests to working hosts are unaffected. At the completion of convergence, the traffic destined for a failed host is redistributed to the remaining hosts. Load-balanced traffic is repartitioned among the remaining hosts to achieve the best possible new load balance for specific TCP or UDP ports.

If a host is added to the cluster, convergence allows this host to receive its share of the load-balanced traffic. Expansion of the cluster does not affect ongoing cluster operations and is achieved transparently to both Internet clients and to server applications. However, it might affect client sessions that span multiple TCP connections when client affinity is selected, because clients might be remapped to different cluster hosts between connections. For more information on affinity, see Network Load Balancing and stateful connections.

Network Load Balancing assumes that a host is functioning properly within the cluster as long as it exchanges heartbeats with other cluster hosts. If other hosts do not receive a response from any member for several heartbeat exchanges, they initiate convergence to redistribute the load that would have been handled by the failed host.

You can control both the message exchange period and the number of missed messages required to initiate convergence. The default values are respectively set to 1,000 milliseconds (1 second) and 5 missed message exchange periods. Because these parameters are not usually modified, they are not configurable through the Network Load Balancing Properties dialog box. They can be adjusted manually in the registry as necessary. The procedures for this are described in Adjust convergence parameters.