IPSec Implementation

Article
10/27/2009

By Naganand Doraswamy and Dan Harkins

Chapter 9 of IPSec – The New Security Standard for the Internet, Intranets and Virtual Private Networks (Prentice Hall)

Introduction

This chapter discusses the implementation issues of IPSec. These include interaction of the various components of IPSec, interfaces that each of these components provide, and a walk through the packet processing for both inbound and outbound packets.

As implementations are specific to a particular platform, the discussions in this chapter are mostly platform-independent so that they can be used as guidelines in implementing IPSec on a specific platform. In places where discussing implementation on a specific OS helps in explanations, the choice is a BSD (Berkeley Software Distribution) variant OS.

We discuss the following components: IPSec base protocols, SADB, SPD, manual keying, ISAKMP/IKE, SA management, and policy management. The implementation and optimization issues that you as an implementor of IPSec should be aware of are highlighted in this chapter.

Implementation Architecture

Most IPSec implementations define the following components:

IPSec base protocols: This component implements ESP and AH. It processes the headers, interacts with the SPD and SADB to determine the security that is afforded to the packet, and handles network layers issues such as fragmentation and PMTU.
SPD: The SPD is an important component because it determines the security afforded to a packet. The SPD is consulted for both outbound and inbound processing of the packet. For outbound processing, the SPD is consulted by the IPSec base protocol to decide if the packet needs any security. For inbound packets, the SPD is consulted by the IPSec base protocol component to decide if the security afforded to the packet concurs with security configured in the policy.
SADB: The SADB maintains the list of active SAs for outbound and inbound processing. Outbound SAs are used to secure outgoing packets and inbound SAs are used to process inbound packets with IPSec headers. The SADB is populated with SAs either manually or via an automatic key management system such as IKE.
IKE: The Internet Key Exchange is normally a user-level process, except in embedded operating systems. Typically, in nodes such as routers that are running embedded operating systems, there is no distinction between a user space and kernel space. IKE is invoked by the policy engine when the policy mandates an SA or SA bundle exist for two nodes to communicate securely but the SA(s) is yet to be established. IKE is also invoked by its peer when the node needs to communicate securely.
Policy and SA management: These are applications that manage the policy and SA.

Figure 9.1 indicates the various components of IPSec and the interactions among them.

Figure 9.1: IPSec implementation architecture

In the rest of this section, the interface and design requirements for each of the components are discussed.

IPSec Base Protocols

The IPSec base protocols interact closely with the transport and the network layer. In fact, IPSec base protocol is part of the network layer. The IPSec base protocol module should be designed to efficiently implement the following capabilities:

Ability to add, possibly multiple, IPSec headers to an outbound packet in both tunnel mode and transport mode.
Ability to process tunnel mode IPSec packets and pass the decapsulated packet onto IP forwarding.
Ability to process transport mode IPSec packets and pass them into the appropriate transport layer such as TCP, UDP, or ICMP, depending on the transport payload carried by the IPSec packet.

The IPSec protocol module provides two interface functions‹input and output. The input interface function is called for inbound packets and the output interface function is called for outbound packets.

In almost all implementations of IPSec, IPSec is integrated with an existing TCP/IP implementation. The interface between the transport layer and the network layer for both incoming and outgoing traffic is dependent on a particular implementation. For example, in the popular BSD UNIX implementation, the transport layer protocols such as UDP and TCP register with the network layer indicating the input function the IP layer should invoke. TCP registers a function that IP should invoke when the IP layer encounters a TCP payload. The IPSec implementations should not violate the interface between the transport and the network layer. The interface and integration should be seamless. For example, the TCP layer should not be required to know if it was invoked from the IPSec component or the IP component.

SPD and SADB

The choice of the data structure to store the SPD and the SADB is fairly crucial to the performance of IPSec processing. The SPD and SADB implementation depend on the performance requirements of the system and the capabilities of the system. Some of the factors that determine the design of the SPD and SADB are:

Number of expected entries in the SPD and the SADB
System throughput both for outbound and inbound packets.
Cost of allocating memory as needed versus the cost of maintaining large tables, portions of which may be unused.
Any optimizations the system provides to cache pointers to SAs or SPD entries.

The design of SPD and SADB should satisfy the following requirements:

Ability to efficiently look up the structure for exact or most specific match based on the selectors‹source address, destination address, protocol, and SPI.
Ability to store wildcards, range, or exact values for the selectors.
Synchronization mechanisms for the structure when the implementation is optimized by caching pointers to SADB, SPD.
Ordering of entries so that the match is always deterministic.

As mentioned earlier, the choice of the data structure for SADB or SPD depends, among other things, on the performance requirements and the number of entries in the SADB or the SPD. The lookup for the outbound processing is more complicated than the inbound processing. For outbound processing, the lookups are based on selectors in the SPD. There may not be an exact match, as the selectors for source and destination address can be network prefixes. However, this problem is already solved in routing. In order to find the route to a destination, one has to match the closest network prefix for cases where there are no host routes (host routes have the destination address in the routing table).

The processing can be optimized for SA lookup by caching SA pointers in the SPD entries. If the SA pointers are not cached in the SPD entry, then the lookup for outbound SA is as complicated as the SPD lookup.

The inbound processing of IPSec packets is simpler because the lookups are based on information in the received packet for which the matches are exact. The SPD entry that corresponds to the SA can be cached in the SADB entry. This optimizes the policy checking upon the conclusion of IPSec processing and avoids another lookup.

For inbound IPSec packets, after the IPSec headers are processed, the SPD is consulted to determine if the packet can be admitted based on the security services that were offered. If the policy is not checked, security ofthe system is compromised. Let us consider the example shown in Figure 9.2.

Figure 9.2: Flow-specific IPSec

Two hosts, A and B, with IP address 1.1.1.1 and 2.2.2.2 have established two SAs, SA1 and SA2.(The SADB shown in the Figure 9-2 does not have all the fields. The sequence number, lifetime, and other fields have been left out.) SA1 is configured for ESP with 3DES and SA2 ESP with DES. This information is stored in the SPD. SA1 is used for very secure applications, say banking, which runs on a TCP using port 1000. After IPSec processing, if the policy is not checked to determine if the banking application did in fact use 3DES, then security is compromised. The policy can be checked only after the IPSec processing as the selectors are not available before because they may be encrypted.

For inbound packets that are not IPSec packets, the SPD has to be consulted to determine the type of processing afforded to the packet. The reason is because, if certain traffic is required to be IPSec-protected, one needs to drop any inbound packets that match the SPD definition of that traffic but that do not have the proper IPSec protection.

The inbound and the outbound processing are completely independent. However, it easier to use the same data structures for the SPD and the SADB for both inbound and outbound processing. There are various choices for the data structures‹sorted hash lists, PATRTICIA trees, radix-4 trees, to name a few.

For connection-oriented connections, in the end hosts the lookups can be optimized further by caching the pointer to the SPD entry in the socket data structure, if such a structure exists. This obviates the need to look up the SPD for each outbound packet generated by the application using the socket. Multiple sockets can cache the same SPD. However, caching pointers in end host or router implementations increases the implementation complexity. In multithreaded operating systems, this implies providing locks to synchronize access to these shared data structures. There is also the additional complexity of removing pointers to SAs and SPD entries when the entries themselves are removed from either SADB or SPD.

The SADB and SPD may be implemented in the same module. In fact, it helps to implement them in the same module because of pointer references between the two entities. The module provides the following interfaces:

Add one or more SA or SPD entries
Delete one or more SA or SPD entries
Look up one or more SA or SPD entries

IKE

The Internet Key Exchange is responsible for dynamically populating the SADB. IKE is in a quiescent state until its services are needed. IKE¹s services can be requested in one of two ways‹the SPD requests it to establish an SA or a peer requests it to establish an SA.

IKE provides an interface to the SPD to inform it about establishing new SAs. The physical interface between IKE and the SPD and SADB is completely dependent on the Inter-Process Communications (IPC) capabilities of the platform. For example, it could be a socket, or a message queue. More important is the information passed via the IPC mechanism. Through this interface must pass all the information necessary to begin negotiation: the peer and all transforms with all necessary attributes. Since IPSec policy can be complex, and IKE is capable of expressing complex policy, it is imperative that the information exchanged between IKE and the SPD contain all the required conjunctions to maintain the policy complexity. For example, if the policy states that the SA bundle of AH-HMAC-MD5 and ESP-CAST-CBC (no authenticator), and IPPCP-LZS is needed, then the conjunctive nature of the request ("...and...and...") must be retained. This can be achieved by sending each protocol as an individual IPC message but including a logical operator field in the message, which indicates the relationship this message has to the next, or by passing the entire bundle of information in a single message. Either way, the interface between the SPD and IKE must not hamstring policy declaration and negotiation.

IKE also provides an interface for the remote IKE peer to request SA establishment. The remote peer requests the establishment of a new SA through Phase II negotiation if ISAKMP SA is already established or through Phase I if ISAKMP SA is not established.

The interface between IKE and SPD is bi-directional. This is necessary for IKE to support both ways of the SA establishment. IKE receives policy information from the SPD to present to its peer (in the case of IKE as the initiator) and receives offers from its peer (in the case of IKE as the responder), which it must present to the SPD for a local policy check.

IKE communicates with the SADB either after it has received a message from the SPD (if it is the initiator of the protocol), or after it has received a message from a remote IKE peer (if it is the responder of the protocol). The SADB manages itself and therefore determines which SPIs are used, unused, or in a larval state. When IKE takes part in a quick mode exchange, it must request SPIs from the SADB to be reserved for its negotiation. These SPIs are for SAs that have not yet been fully created and are therefore larval. Upon completion of the quick mode, IKE has all the information necessary to instantiate the SA. This interface must be bi- directional since IKE sends messages to the SADB (SPI request and SA instantiation) and receives messages back from the SADB (SPI response).

IKE provides the following interfaces:

A bi-directional interface for communicating with the SPD.
An interface for the SADB to communicate the SPIs IKE can use.

Policy Management System

In the previous chapter, policy requirements, capabilities, and design implications were extensively discussed. The policy management (PM) system module is implemented at the user level. The user interacts with this module for all policy-related processing. This module interacts with the kernel to update the kernel¹s SPD. This module is also responsible for handling manual keying. This PM module provides the following capabilities and interfaces:

Add, lookup, and delete policies. This may involve interfacing with a directory protocol to fetch policy from a central repository or provide a UI for the user to manage the policy.
Add, create, and delete SAs manually.

IPSec Protocol Processing

The IPSec processing is broadly classified into outbound versus inbound processing and AH versus ESP and their variants thereof. Although the interface to the various components of IPSec remains the same, the packet processing is different between input and output processing. The protocol processing can be classified into SPD processing, SA processing, header, and transform processing. The SPD and SA processing are the same for both AH and ESP. The transform and header processing is different between AH and ESP.

Outbound Processing

In most TCP/IP implementations, the transport layer invokes the IP layer to send a packet by calling a function ip_output. One of the parameters to the function will be a pointer to an IP structure that has the elements such as source address and destination address that enables the IP layer to construct the IP header. The transport layer also passes the value of the protocol that IP layer puts in its next header field in its header and a flags field that gives IP layer to support various socket options. It makes most sense to perform SPD processing (i.e., decide if the packet needs any security at this entry to ip_output).

Let us consider the example network diagram shown in Figure 9.3. For outbound processing, we will consider an HTTP packet (TCP, port 80) generated by host A destined to host B (a Web server) traversing routers RA and RB. The policy on A for packets destined to host B mandates using AH in transport mode using HMAC-MD5. The policy on router RA mandates that all packets destined to the network 2.2.2/24 be encrypted with ESP using 3DES and tunneled to RB.

Figure 9.3: Example of IPSec processing

SPD Processing

At the entry point into the ip_output, the policy engine is consulted to determine the security services afforded to a packet. The input to the policy engine will be the selectors from the transport header and the source and the destination address. In our example, the input to the policy engine will be the tuple <1.1.1.1, 2.2.2.2, TCP, 80> (the source port is a wild card). The policy engine determines the security services and performs the following functions:

If the policy indicates the packet needs to be dropped, it returns to the ip_output function indicating that the packet has to be dropped. The ip_output function is expected to discard the packet.
If the policy indicates that this packet can be transmitted without any extra security processing, the policy engine returns with an indication that the packet can be transmitted in clear. The ip_output function transmits the packet in clear at this point.
If the policy indicates that the packet needs security, the policy engine checks if the SAs are already established. If the SAs are already established, the SA or SA bundle is passed to the IP output function. If the SAs are not yet established, the policy engine indicates security processing is required but the SAs are not established yet and notifies IKE to establish the required SA(s). At this point, it is up to the ip_output function to decide whether to queue the packet until the SAs are established or to drop the packet.

In our example, the policy engine determines that the packet needs to be secured using transport mode AH HMAC-MD5. The policy engine also determines that the SA is already established and hence does not invoke IKE to establish a new SA.

IKE Processing

IKE requires special processing. It is a requirement that all implementations recognize IKE packets that are locally generated and locally destined, and process them without securing them. Otherwise, we will end up having a chicken and egg problem. As IKE is used to establish the SAs, if the IP layer does not recognize IKE packets and instead "retriggers" IKE to establish SAs for the destination, no packets that require security will ever leave the node!

This is true even after IPSec SAs are established. This is possible as IKE always uses a well-known port (500) and a well-known protocol (UDP).

SA Processing

The next step in the outbound packet processing is the SA processing. This particular SA is fetched from the SADB. In our example, the SAA->B is fetched from the SADB on host. The SA is processed as follows:

If the SA is using the number of bytes as its lifetime, then the number of bytes fields is increased depending on the size of the payload (not including the padding). For ESP, it will be the number of bytes that are encrypted and not the number of bytes on which we have calculated the hash.
If the SAs soft lifetime has expired, then invoke IKE to establish a new SA.
If the SAs hard lifetime has expired, then delete the SA.
Increment the sequence number field. The sequence number is always incremented before it is copied into the IPSec header. It is initialized to 0 during the SA setup and the first IPSec packet should have a sequence number of value 1. If the sequence number field overflows, it is audited. If the SA was not established using manual keying, IKE is invoked to negotiate a new SA.

After SA processing, the protocol specific component is invoked. Let us consider both cases‹transport mode and tunnel mode for both AH and ESP.

Transport Mode Header Processing

So far, we have not distinguished between IPv4 and IPv6 as the functions performed for both versions of IP if they are the same. However, the header construction differs between IPv4 and IPv6. This is inevitable because of the format of the headers. There are also some differences between AH and ESP header construction.

In earlier chapters it was mentioned that AH in transport mode protects some parts of IP headers whereas ESP protects only the transport payload. This requires the construction of the AH header after the construction of the partial IP header. We say partial because the length field in the IP header will change once the AH header is added to the payload and this implies that the checksum has to be recalculated. In IPv4, some fields such the TTL and checksum are modified during the flow. In IPv6, hop-by-hop options change. If the hash is calculated over these fields, the authentication will fail at the destination. In order to avoid this, AH zeros out all these fields when it calculates the hash. The rules for calculating the AH were described in the AH chapter.

ESP Processing

The transport mode ESP involves encrypting the transport payload and calculating the hash. As was mentioned while describing ESP, either the encryption or the authentication can be NULL. This is specified in the SA. The transport mode ESP header is constructed before the IP header is added to the payload. This simplifies the implementation because it is not necessary to insert any headers in the middle of a buffer. After the ESP transform is applied to the transport payload and the ESP header is added, the ESP packet is forwarded to the IP layer for IP processing.

AH Processing

AH processing is not as clean as ESP because it expects part of the IP header to be constructed before it calculates the hash. The AH header has to be inserted in the middle of a packet and this leads to inefficient implementations because this involves buffer traversals and potential memory copies that can be very expensive.

The rules for calculating the AH header were described in Chapter 6. Unlike ESP, AH processing cannot be performed at the entry of the ip_output function because it requires part of the IP header be constructed before it can calculate the hash. The ideal place to perform AH processing is just before the fragmentation check is performed. In most IP implementations, the IP header with the mandatory fields for AH calculation is constructed by this point.

Tunnel Mode Processing

Tunnel mode processing involves construction of an extra IP header. Also, tunnels can be nested and can be of any depth as long as they are nested properly. In this section, tunnel implementation when the nesting is only one level deep is discussed.

In our example, when router RA receives the packet, the output of the policy lookup is a pointer to the SARA->RB. The SA indicates that the packet should be protected by a tunnel mode ESP with 3DES. The source and destination fields in the SA record are the source and the destination values in the outer IP header. The router RA constructs a tunnel mode ESP header and forwards the packet to the IP layer for IP processing. The rules for building the IP header follow.

Tunnel mode processing on the hosts is different from that of the router because the host needs to add two IP headers and not one. One possible solution is to perform IPSec processing just before IP fragmentation. It is important to perform IPSec tunnel mode processing before fragmentation. After IPSec processing, the packet is passed back to the IP layer (i.e., call ip_output()). However, this time this function is called with a different destination address. The policy lookup for the tunneled destination indicates that the packets be forwarded without any additional processing. The packet is forwarded to the data link layer after the IP layer adds the IP header.

IPv4 Tunnel Header

The IPv4 tunnel header can carry either an IPv4 or an IPv6 packet. The outer header fields are constructed as follows:

Version: The value of this field is 4.
Header length: This value depends on what options are configured on the route. The options are never copied from the inner IP header to the outer IP header.
TOS: This value is always copied from the inner IP header. If the inner IP header is the IPv6 header, then the class value is mapped to the IPv4 TOS value. The class values for the IPv6 are not standardized. The IETF is in the process of redefining the usage of the TOS byte. Depending on what is standardized at the IETF, in future, the inner TOS byte may not be copied in its entirety.
Length: This value is constructed for the header after the entire datagram is constructed.
Identification: This field is constructed.
Flags: The value for the DF bit is determined based on the configuration. This is dependent on whether the PMTU is turned on or not. The value of MF depends on whether this datagram needs to be fragmented.
Fragme
TTL: If a node nt offset: This field is constructed as is normally done to construct an IP datagram. is forwarding a packet, then the TTL value of the inner header is decremented before the outer header is added. The TTL value of the outer header depends on the configuration for packets going over a particular interface.
Protocol: This value is the next immediate protocol IP is carrying. If the tunneled header is constructed to secure a packet, the probable values are 51 for AH or 50 for ESP.
Checksum: This field is constructed.
Source address: This is specified in the SA.
Destination address: This is specified in the SA.
Options: The options are never copied from the inner header to the outer header.

IPv6 Tunnel Header

The IPv6 tunnel header can carry either an IPv4 or IPv6 packet. The IPv6 tunnel header fields are derived as follows:

Version: This value is 6 (this represents IP version 6).
Class: This value is copied from the inner header if the inner header is IPv6, or the TOS byte is mapped to the class value if the inner header is IPv4.
Flow id: The flow id is copied from the inner header if the inner header is IPv6. If not, a configured value is copied into this field.
Length: This field is constructed after the packet has been constructed.
Next header: This value is the next immediate protocol IP is carrying. If the tunneled header is constructed to secure a packet, the probable values are AH or ESP. However, the tunneled header can be constructed to carry non-IPSec traffic as well. In this case the value will be a network protocol, such as routing header or IP header.
Hop limit: If a node is forwarding a packet, then the TTL value of the inner header is decremented before the outer header is added. The TTL value of the outer header depends on the configuration for packets going over a particular interface.
Source address: This is either specified in the SA or it is the interface over which the tunneled packet is sent out.
Destination address: This is specified in the SA.
Extension headers: The extension headers are never copied.

Figure 9.4: Nested IPSec tunnel headers

Multiple Header Processing

Multiple header processing is fairly complicated if multiple tunnels have to be constructed from the same node. Consider the example shown in Figure 9.4. Let us say that in order for host A to send a packet to the destination D, it has to authenticate it to firewall RC and send it encrypted to firewall RB.

If all these rules are to be represented in a single policy, the construction of the headers gets fairly complicated. One solution to this problem is to limit the number of tunnels to just one in each policy. In this example, the policy indicates that for the host A to send a packet to the host D, it has to first authenticate to the router RC. The IPSec layer builds a tunneled AH packet to the router RC. After constructing the AH header, the IP layer is invoked to add the tunnel header. It uses the router RC address (2.2.3.3.) as the destination address in the selector field. The policy indicates that the packet has to be encrypted and tunneled to RB (6.6.6.6). IPSec processes the IP packet destined to 2.2.3.3, adding the ESP header and encrypting the payload. It then invokes the IP layer to add an additional header. It uses the router RB address (6.6.6.6) as the destination. At the entry, the IP layer checks the policy for the destination 6.6.6.6. The policy engine indicates that the packet can be sent in clear and this packet with multiple tunnels is dispatched.

The disadvantage of this approach is that the ability to have a policy on a per destination for nested tunnels is lost. For example, if there are two hosts behind router RC, it is not possible to have per host encryption policy from A to D. This is because the policy lookup for encryption is based on the destination address that is RC and not the true destination. Since this information is lost, it will not be possible to define per host encryption from A to D in our example. However, adding support for this increases the complexity of the implementation without providing too much capability.

Inbound Processing

Inbound processing is simpler than outbound processing mainly because header construction is more complicated than header checking. There is also no interaction with the key management system during inbound processing. The focus is more on the generic processing‹things that are common to both AH and ESP. In terms of AH and ESP processing, there is not much difference except for the transforms and header processing.

The IP layer is invoked by the layer 2 to process the packet that was received over an interface. This packet is the start of the IP payload and has the IP header at the start. The IP layer processes the packet (reassembly) and invokes the function that handles the input for a particular protocol. Most implementations have a function for each protocol that the IP layer can call to process the packet.

The IPSec layer registers with the IP layer indicating the function that the IP layer has to invoke for both AH and ESP. An implementation may register different functions for AH and ESP or register the same function and perform AH or ESP specific processing in the same function. The IP layer strips the IP header and invokes the IPSec layer with either an AH or ESP header at the beginning of the packet.

The following is the sequence of steps performed by the IPSec layer. Let us continue with the same example we considered for outbound processing. Let us consider the network shown in Figure 9.3. However, this time we will discuss from the perspective of inbound processing. In Figure 9.5, the SADB and the SPD for the receivers are also shown (note that the SPD entries are symmetric). We will first walk through the processing at the host B, a non-tunneled case.

Figure 9.5: Inbound IPSec processing

The IPSec layer extracts the SPI from the AH or the ESP header, and the source and the destination IP addresses and protocol from the IP header. In our example, the AH header has an SPI value of 10 with source and destination being 1.1.1.1 and 2.2.2.2 respectively.
The IPSec component then fetches the SA from the SADB using the destination (2.2.2.2), protocol (AH), and SPI (10).
If the SADB does not find the SA, an error is logged and the packet is dropped.
If the SADB returns the SA, which is true in our example, the IPSec layer processes the packet according to the rules defined in AH and ESP chapters.
The policy corresponding to the packet is checked to determine if the IPSec processing is applied appropriately. The policy is obtained either by a pointer in the SA or by querying the SPD using the selectors. In the example, the SPD has an entry that specifies that any packets from 1.1.1.1 should have AH in transport mode using HMAC-MD5. The policy engine checks if this is true. In this case, the security afforded to the packet is what was specified in the policy and hence the packet is accepted.

The following failures are possible. In all these cases, the packet is dropped.

The antireplay option is turned on and the packet fails the replay check.
The authentication fails.
The length is incorrect.
The lifetime for the SA has expired.
The packet is decrypted but the protocol field contains an invalid value, or if there is a pod, the podding is incorrect.
If the packet is authenticated and/or decrypted correctly and anti-replay is enabled, the replay window is updated.

In case of tunneling, the IPSec layer has to perform an extra check to determine if the SA that was used to process the packet was in fact established to process the packet from the actual source. This is achieved by using the inner-header destination address for SPD lookup. This check is extremely important. If this check is not performed it is possible to induce a recipient to process and/or forward spoofed packets that may be in violation of its local security policy. IPSec invokes the upper layer to process the packet. In case of tunneled packets, the upper layer is the IP layer itself.

Let us consider the tunneled processing at the router RB. RB receives a tunneled packet from RA. The processing of the packet is the same as the nontunneled case until the policy is invoked. In our example, RB receives a packet from source 5.5.5.5 with tunneled ESP using 3DES using an SPI value of 11. The lookup in the SADB yields an SA pointer. However, when the policy engine is invoked, the source and the destination address will be that of the inner IP header. The values in this case are 1.1.1.1 and 2.2.2.2. The lookup in the SPD matches the entry whose from and to fields are network prefixes 2.2.2/24 and 1.1.1/24. They also indicate that the packet was tunneled by 5.5.5.5 which is also live in this case. As the security services afforded to the packet match to that in the SPD, the packet is forwarded to the actual destination.

For non-IPSec packets, the processing is limited to confirming that the packet without any security can in fact be admitted. This requires a lookup into the SPD to determine if the policy requires the packet to be secure.

Fragmentation and PMTU

Generally, IPSec is not affected by fragmentation because the IP packets are fragmented after the IPSec processing and the fragments are reassembled before the IP layer invokes the IPSec layer for further processing. This is true for all implementations of IPSec‹host, routers, bump in the stack, or bump in the wire. The general exception to this rule is that IPSec implementations on gateways which have selectors on the granularity of port and protocol may have to assemble enough of the fragments to determine whether the reconstructed packet is permitted.

However, IPSec does affect PMTU. The hosts that generate a packet avoid fragmentation by setting the DF bit in the IP header. This indicates to a router to inform the host that originated the packet about its MTU. If a router gets a packet that is too big for its MTU, it sends an ICMP message to the host that originated the packet, indicating the MTU in the ICMP message. It is up to the host to interpret these ICMP messages and store them appropriately so that future packets generated on this connection should not need fragmentation. The host maintains this information either in the socket, or on a per route basis. It is preferable that the host maintain PMTU information where possible because it leads to optimal use of bandwidth.

As the IPSec layer introduces extra headers, it should be involved in the PMTU processing. The involvement of IPSec varies, depending on host or router implementation.

Host Implementation

PMTU discovery can be initiated either by the transport layer or by the network layer in host implementations. As hosts do not maintain source routes, it is preferable to maintain the PMTU information at the transport layer. This enables PMTU to be maintained on an end-to-end basis.

When IPSec is enabled end to end, the IPSec implementation on the host should decrease the MTU that the network layer advertises to the transport layer for a particular connection. This value depends on the length and on the kind of IPSec processing afforded to the packet. The various parameters that affect the length are the IPSec protocols afforded to the connection, the transforms (different algorithms produce headers of different length), and the modes (tunnel mode adds an extra IP header).

Before the network layer passes up the MTU to the transport layer, the policy engine has to be consulted to determine the implication on the MTU because of IPSec. The MTU computation should also handle multiple tunnel cases. This computation involves invoking the policy engine multiple times to find the implication on the MTU for each tunnel header that is added. This computation is very expensive and complicated. Host implementations can optimize this process by precomputing the length of the headers that IPSec protocols add to a packet. Consider the following example where packets from host A to host B are authenticated in transport mode and then tunneled (encrypted) from host A to router R. The host can precompute the length of the header from A and B and the length of the header from A to R and store this value packet destined from A to B. However, the implication of this is if the policy changes, the policy engine has to recompute all the affected header lengths.

Router Implementation

The routers send an ICMP unreachable message to the originating host (this is the source address in the IP packet) with at least the first 64 bits of the packet data, if the DF bit is set and the packet length exceeds the MTU of the interface over which the packet is forwarded. This procedure is essential for PMTU discovery. However, tunneling poses a challenge to the PMTU discovery process.

Figure 9.6: PMTU discovery

Let us consider the following scenario shown in Figure 9.6.

There two hosts, A1 and A2, attached to the router RA. A1, A2, and RA are on a network whose MTU is 1518 (Ethernet). RA is connected to RB over Ethernet as well, and hence the MTU is 1518. RB is connected to RC over some other link layer whose MTU is, say, 700. RA has a policy that requires it to tunnel all packets destined to the network 4.4.4/24 to RC (3.3.3.2). Host A generates a packet destined to host B and sets the DF bit to prevent fragmentation of the packet. Let us say that the size of this packet is 680 bytes. When the packet reaches RA, RA checks its policy and determines that the packet should be tunneled to RC. Let us say that after adding the tunnel and IPSec header, the packet size increases to 720. RB receives the packet over its interface 2.2.2.2 and determines it has to forward the packet over the interface 3.3.3.1. RB cannot forward the packet because the interface over which the packet should be forwarded has an MTU of 700 and the current packet size exceeds that. It cannot fragment the packet as the DF bit is set. It sends an ICMP message with, say, 64 bits of data. value. The packet that caused the ICMP message and the ICMP error message packet are shown in Figure 9.7.

Figure 9.7: PMTU discovery packet format

In the ICMP error packet, the first 64 bits of the data are the SPI and sequence number values of the ESP header. When RA receives the ICMP packet, it cannot determine the actual source because either A or A1 could have generated the packet. This is because the same SPI is used to secure all packets originating from network 1.1.1/24 and destined to 4.4.4/24.

The following guidelines can be used to handle PMTU messages.

If the routers cannot determine the source, they can use the SPI information to determine the source of the packet and forward the ICMP message to the host. The routers can use the SPI if they are using a different SPI for each host. In our example, if RA was using different SPIs for A and A1, it knows by looking at the SPI to whom to direct the PMTU message it receives.
In cases where the router cannot determine the source, it should remember the SPI and when it sees a packet matching the SPI, it should send the ICMP message indicating the MTU to the host.

The routers should provide the ability to set the DF bit if the DF bit of the inner header is set. However, in some cases, if the data length becomes too small, it may be more efficient to fragment the packet to make better use of the bandwidth. Hence, it may be important for this bit to be configurable in router implementations.

The BITS and the BITW implementation PMTU behavior are similar to the router implementation. As BITS is implemented as a shim, it cannot interact with the stack directly and has to process ICMP as a router. BITW implementations are specialized stand-alone security devices and hence their PMTU behavior is no different than that of a router.

ICMP Processing

ICMP messages are critical to operation of the network. There are different types of ICMP messages defined. Some of the messages are errors and some are queries used to check the status of the network. The ICMP query messages are end to end and the IPSec processing afforded to these packets is the same as any other IP packets using normal SA processing. The ICMP error messages generated by the end hosts can also be treated as normal IP packets with respect to IPSec processing by performing the selector checks.

However, ICMP error messages generated by the routers need to be handled differently, particularly if a router is tunneling the ICMP packet generated by other routers. The routers at the tunnel origination and destination would have established a tunnel mode SA for their communication. This SA is used to forward ICMP error messages. However, the source address of the inner header will not be the tunnel origination address and the source address check at the destination will fail. The tunnel destination node has to be configured to ignore source address checks for ICMP error packets.

A deployment issue with IPSec is the decision whether to accept non-IPSec ICMP error packets or not. Many routers will not be IPSec-capable for some time to come and if the end host or routers were to drop these packets, it is detrimental to the operation of the network. However, not performing IPSec exposes the nodes to denial of service attacks. Unfortunately, there is no easy answer to this question. It is a matter of policy configuration and one could turn on IPSec processing even for ICMP error packets for offending nodes.

About the Authors:

Naganand Doraswamy is a senior principal engineer at Nortel Networks in Billerica, MA., and an active participant in the IETF and key industry panels on VPNs and IP security. He was a network security architect at Bay Networks (currently Nortel Networks) and is currently working on next-generation router architectures and protocols. He was the technical lead for IP Security at FTP Software.

Dan Harkins, formerly a senior software engineer in the Network Protocol Security Group at Cisco Systems, is currently a Senior Scientist at Network-Alchemy in Santa Cruz, CA, and is active in several IETF working groups. He wrote IPSec's standard Internet Key Exchange (IKE) key management protocol.

We at Microsoft Corporation hope that the information in this work is valuable to you. Your use of the information contained in this work, however, is at your sole risk. All information in this work is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Microsoft Corporation. Microsoft Corporation shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.

Click to Order