The Cable Guy
TCP Receive Window Auto-Tuning
Welcome to the first installment of The Cable Guy in TechNet Magazine
. Fans of the column on the TechNet Web site already know we cover all manner of networking issues, and we'll continue that tradition here each month. If you're new and looking for an archive of previous columns, head over to the Cable Guy site
Now let's get started with our first topic here-the TCP Receive Window.
Throughput over TCP connections can be limited by sending and receiving applications, sending and receiving implementations of TCP, and the transmission path between the TCP peers. In this column I'll describe the TCP receive window and its impact on TCP throughput, the use of TCP window scaling, and the new Receive Window Auto-Tuning feature in Windows Vista™ and Windows Server® 2008 that optimizes TCP throughput for received data.
The TCP Receive Window
TCP connections have a number of important characteristics. First, they are a logical point-to-point circuit between two Application Layer protocols. TCP does not supply a one-to-many delivery service, it provides only one-to-one delivery.
Second, TCP connections are connection-oriented. Before data can be transferred, two Application Layer processes must formally negotiate a TCP connection using the TCP connection establishment process. Similarly, TCP connections are formally closed after negotiation using the TCP connection termination process.
Third, reliable data sent on a TCP connection is sequenced and a positive acknowledgment is expected from the receiver. If a positive acknowledgment is not received, the segment is retransmitted. At the receiver, duplicate segments are discarded and segments arriving out of sequence are placed back in the proper order.
Fourth, TCP connections are full-duplex. For each TCP peer, the TCP connection consists of two logical pipes: an outgoing pipe and an incoming pipe. The TCP header contains both the sequence number of the outgoing data and an acknowledgment (ACK) of the incoming data.
In addition, TCP views the data sent over the incoming and outgoing logical pipes as a continuous stream of bytes. The sequence number and acknowledgment number in each TCP header are defined along byte boundaries. TCP is not aware of record or message boundaries within the byte stream. The Application Layer protocol must provide the proper parsing of the incoming byte stream.
To limit the amount of data that can be sent at any one time and to provide receiver-side flow control, TCP peers use a window. The window is the span of data on the byte stream that the receiver permits the sender to send. The sender can send only the bytes of the byte stream that lie within the window. The window slides along the sender's outbound byte stream and the receiver's inbound byte stream.
For a given logical pipe (one direction of the full-duplex TCP connection) the sender maintains a send window and the receiver maintains a receive window. When there are no data or ACK segments in transit, a logical pipe's send and receive windows are matched. In other words, the span of data in the outbound byte stream that the sender is allowed to send is matched to the span of data in the inbound byte stream that the receiver is able to receive. Figure 1 illustrates this send and receive relationship.
Figure 1 Matching Send and Receive Windows (Click the image for a larger view)
To indicate the size of the receive window, the TCP header contains a 16-bit Window field. When the receiver gets data, it sends ACKs back to the sender indicating the successfully received bytes. In each ACK, the Window field notes the number of bytes remaining in the receive window. When data is sent, acknowledged, and retrieved by the application, both the send and receive windows slide to the right. The receive window is the window that controls how much unacknowledged data can be in flight from the sender to the receiver.
Because there can be data in the receive window that has not been retrieved by the app and data that has been received but not acknowledged, the TCP receive window has additional structure, as Figure 2 shows.
Figure 2 Types of Data in the TCP Receive Window (Click the image for a larger view)
Notice the difference between the maximum and current receive windows. The maximum receive window is a fixed size. The current receive window is of variable size and corresponds to the remaining amount of data that the receiver is allowing the sender to send. The current receive window's size is the value of the Window field advertised in ACKs sent back to the sender, and is the difference between the maximum receive window size and the amount of data that has been received and acknowledged but not retrieved by the application.
The TCP Receive Window and TCP Throughput
To optimize TCP throughput (assuming a reasonably error-free transmission path), the sender should send enough packets to fill the logical pipe between the sender and receiver. The capacity of the logical pipe can be calculated by the following formula:
Capacity in bits = path bandwidth in bits per second * round-trip time (RTT) in seconds
The capacity is known as the bandwidth-delay product (BDP). The pipe can be fat (high bandwidth) or thin (low bandwidth) or short (low RTT) or long (high RTT). Pipes that are fat and long have the highest BDP. Examples of high BDP transmission paths are those across satellites or enterprise wide area networks (WANs) that include intercontinental optical fiber links.
The size of the Window field in the TCP header is 16 bits, allowing a TCP peer to advertise a maximum receive window size of 65,535 bytes. You can calculate the approximate throughput for a given TCP window size from the following formula:
Throughput = TCP maximum receive windowsize / RTT
For example, with a 65,535 byte receive window you can only achieve an approximate throughput of 5.24 megabits per second (Mbps) on a path with a 100ms RTT, regardless of the transmission path's actual bandwidth. With today's high-BDP transmission paths, the originally designed TCP window size, even at its maximum value, becomes a throughput bottleneck.
TCP Window Scaling
For larger window sizes to accommodate high-speed transmission paths, RFC 1323 (ietf.org/rfc/rfc1323.txt
) defines window scaling that allows a receiver to advertise a window size larger than 65,535 bytes. A TCP Window Scale option includes a window scaling factor that, when combined with the 16-bit Window field in the TCP header, can increase the receive window size to a maximum of approximately 1GB. The Window Scale option is sent only in synchronize (SYN) segments during the connection establishment process. Both TCP peers can indicate different window scaling factors to use for their receive window sizes. By allowing a sender to send more data on a connection, TCP window scaling allows TCP nodes to better utilize some types of transmission paths with high BDPs.
Although the receive window size is important for TCP throughput, another important factor for determining the optimal TCP throughput is how fast the application retrieves the accumulated data in the receive window (the application retrieve rate). If the application does not retrieve the data, the receive window can begin to fill, causing the receiver to advertise a smaller current window size. In the extreme case, the entire maximum receive window is filled, causing the receiver to advertise a window size of 0 bytes. In this case, the sender must stop sending data until the receive window has been cleared. Therefore, to optimize TCP throughput, the TCP receive window for a connection should be set to a value that reflects both the BDP of the connection's transmission path and the application retrieve rate.
Even if you could correctly determine both the BDP and the application retrieve rate, they can change over time. The BDP rate can vary based on the congestion in the transmission path and the app retrieve rate can vary based on the number of connections on which the app is receiving data.
The Receive Window in Windows XP
For the TCP/IP stack in Windows XP (and Windows Server® 2003), the maximum receive window size has a number of significant attributes. First, the default value is based on the link speed of the sending interface. The actual value automatically adjusts to even increments of the maximum segment size (MSS) negotiated during TCP connection establishment.
Second, the maximum receive window size can be manually configured. The HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\TCPWindowSize and HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Interface\InterfaceGUID\TCPWindowSize registry values can be set to a maximum of 65,535 bytes (without window scaling) or 1,073,741,823 (with window scaling).
Third, the maximum receive window size can use window scaling. You can enable window scaling by setting the HKLM\System\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts registry value to 1 or 3. By default, window scaling is only used on a connection if the received SYN segment happens to contain the Window Scale option.
Finally, the maximum receive window size can be specified by an application by using the SO_RCVBUF Windows Sockets option when a connection is initiated. For window scaling, the application must specify a window size larger than 65,535 bytes.
Despite the support for scalable windows, the maximum receive window size in Windows XP can still limit throughput because it is a fixed maximum size for all TCP connections (unless specified by the application), which can increase throughput for some connections and decrease throughput for others. Additionally, the fixed maximum receive window size for a TCP connection does not vary with changes in the application retrieve rate or congestion in the transmission path.
Receive Window Auto-Tuning in Windows Vista
To optimize TCP throughput, especially for transmission paths with a high BDP, the Next Generation TCP/IP stack in Windows Vista and Windows Server 2008) supports Receive Window Auto-Tuning. This feature determines the optimal receive window size by measuring the BDP and the application retrieve rate and adapting the window size for ongoing transmission path and application conditions.
Receive Window Auto-Tuning enables TCP window scaling by default, allowing up to a 16MB maximum receive window size. As the data flows over the connection, the Next Generation TCP/IP stack monitors the connection, measures its current BDP and application retrieve rate, and adjusts the receive window size to optimize throughput. The Next Generation TCP/IP stack no longer uses the TCPWindowSize registry value.
Receive Window Auto-Tuning has a number of benefits. It automatically determines the optimal receive window size on a per-connection basis. In Windows XP, the TCPWindowSize registry value applies to all connections. Applications no longer need to specify TCP window sizes through Windows Sockets options. And IT administrators no longer need to manually configure a TCP receive window size for specific computers.
With Receive Window Auto-Tuning, a Windows Vista-based TCP peer will typically advertise much larger receive window sizes than a Windows XP-based TCP peer. This allows the other TCP peer to fill the pipe to the Windows Vista-based TCP peer by sending more TCP data segments without having to wait for an ACK (subject to TCP congestion control). For typical client-based networking traffic such as Web pages or e-mail, the Web server or e-mail server will be able to send more TCP data more quickly to the client computer, resulting in an overall increase in network performance. The higher the BDP and application retrieve rate for the connection, the better the performance increase.
The impact on the network is that a stream of TCP data packets that would normally be sent out at a lower, measured pace, are sent much faster resulting in a larger spike of network utilization during the data transfer. For Windows XP and Windows Vista-based computers performing the same data transfer over a long, fat pipe, the same amount of data is transferred. However, the data transfer for the Windows Vista-based client computer is faster due to the larger receive window size and the server's ability to fill the pipe from the server to the client.
Joseph Davies is a technical writer with Microsoft and has been teaching and writing about Windows networking topics since 1992. He has written eight books for Microsoft Press and is the author of the monthly TechNet Cable Guy column.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited
Because Receive Window Auto-Tuning will increase network utilization of high-BDP transmission paths, the use of Quality of Service (QoS) or application send rate throttling might become important for transmission paths that are operating at or near capacity. To address this possible need, Windows Vista supports Group Policy-based QoS settings that allow you to define throttling rates for sent traffic on an IP address or TCP port basis. For more information, see the resources on policy-based QoS