Performance Considerations

Article
10/08/2009

Applies To: Windows Server 2003 with SP1

Using the SSL protocol for encryption protects data traveling over the Internet, but it also imposes a performance penalty. SSL can slow down an application or Web site considerably. The most obvious performance degradation when using SSL/TLS results from the time required to establish a session and then encrypt and decrypt the data, all of which heavily use processor cycles.

There are ways to minimize this impact. This section discusses the performance issues related to the cryptography involved and ways to accelerate the performance of the public key operations in SSL/TLS by using hardware solutions. It also covers considerations for scaling Web sites that use SSL/TLS, and performance tuning options for Schannel in Windows Server 2003.

Asymmetric Encryption

The process of SSL/TLS key exchange can exact a performance penalty. Nearly all commercially available Web servers and clients implement and support RSA as the asymmetric key exchange algorithm. Although there are other key exchange algorithms supported by SSL/TLS, RSA is the most widely used algorithm.

Offload Hardware Advantages

The high computational cost of the RSA public-key encryption algorithm makes it attractive to use encryption offload hardware to accelerate processing. Using offload hardware offers two advantages over performing the RSA encryption in software:

Optimizes processing. The general-purpose processors normally used in computers do most things very well. However, special processors optimized for RSA are able to perform specific mathematical operations much more efficiently.
Decreases system-wide load. Offloading the large number of clock cycles required for RSA decryption to a specialized processor frees the computer’s general purpose processor to do other things, such as serving Web pages to other clients, while the decryption completes.

Offload Devices

There are three types of offload devices: RSA offload; Hardware Security Modules (HSMs), and Internet Hardware Devices. The first two, RSA offload and HSM devices, are typically cards (such as PCI cards) that plug into the bus on the computer. They interact with the Windows Server 2003 operating system through the Crypto API (CAPI). Internet Hardware Devices are network devices.

RSA-offload only devices. This type of device is designed to provide offloading of RSA operations from the host CPU. These devices are typically cards that are installed with a minimum of effort and little to no configuration. They perform the RSA operation only, and offer no additional features, such as improved key security.
Hardware Security Modules (HSM). These devices are primarily targeted to deployments where security of the private key is a priority. By default, private keys are stored in software. With HSMs, not only are the RSA operations offloaded, but private keys never leave the HSM. These devices are normally tamper-proof and meet varying levels of security certification.
Internet Hardware Devices. This loosely-defined class of devices is generally targeted at solving various problems in deploying e-commerce and internet solutions. These devices all offload SSL operations from the Web Servers by terminating the SSL connections at the device and then proxy the request as standard HTTP. Many of these devices have additional features such as content caching and load-balancing.

Symmetric Encryption

Symmetric encryption is so efficient that offloading it to specialized hardware is unnecessary, and might even be counterproductive. For most symmetric algorithms, fewer clock cycles are required to process the data to be encrypted or decrypted on the main processor than to move it from main memory, through the I/O bus, onto the offload hardware, back through the bus and back into memory. Add in the necessary trip through the bus to the network adapter, and it becomes clear why this is commonly referred to as the triple-trip problem. Because all these trips would themselves require resources, and because most symmetric algorithms are optimized for computer processing, it is far more efficient to simply perform the operation on the processor.

Considerations for Scaling When Using SSL

Scaling out a Web site means adding servers and then sending client requests to these additional computers, thereby increasing the performance of the site. This is often done by using technologies such as software load balancing, hardware load balancing, and DNS round robin. The traditional methods for scaling out a Web site can still be utilized with SSL/TLS, but you must also deal with a specific requirement imposed by SSL/TLS.

Once an SSL/TLS session is established between a specific client and a specific server, only that server can encrypt and decrypt the requests and responses for that client. Consequently, during an SSL/TLS session a client cannot send an SSL response to a server different from the one with which the client has already negotiated. This would result in an error, and a new SSL/TLS session would have to be negotiated between the client and the new server.

Therefore, when scaling out an SSL/TLS site you must use a method that allows a client to maintain its session with the same SSL/TLS server. This requirement can be met when using Network Load Balancing in Windows Server 2003 by configuring affinity as Single or Class C. The Single option specifies that Network Load Balancing direct multiple requests from the same client IP address to the same cluster host. Class C affinity specifies that Network Load Balancing direct multiple requests from the same TCP/IP Class C address range to the same cluster host.

Most load balancing solutions provide similar functionality. For more information about Network Load Balancing, see The Network Load Balancing Technical Overview link on the Web Resources page at https://go.microsoft.com/fwlink/?LinkID=291.

For example, you might have an SSL/TLS site that runs on a multiprocessor server and can process only 100 transactions per second (Tx/Sec), but you require a processing rate of 300 Tx/sec. To meet this requirement, you could create a Network Load Balancing cluster using two additional servers of the same configuration, and configure the cluster for Single affinity or Class C affinity. In theory you would have increased the performance of your site to 300 Tx/sec. In fact, you would also need to take into account some slight differences in the configuration and possible overhead from using Network Load Balancing. However, it is clearly possible to scale out an SSL/TLS site so long as you observe the requirements imposed by SSL/TLS.

Another technique for increasing performance of an SSL/TLS site is scaling up. Scaling up refers to adding larger servers, adding additional processors, or adding special SSL/TLS accelerators to your environment. Scaling up will allow more transactions to be performed by the increase hardware capacity. There are no special considerations for SSL/TLS when scaling up a server.

Performance Tuning Parameters for SSL/TLS

The administrator of a Windows Server 2003–based system can tune several Schannel parameters to control session reconnects. These parameters can be used to increase performance or save memory in some application scenarios.

The Schannel Cache

SSL/TLS supports a reconnect operation that can be used to enable a client and server to resume a previously negotiated session. This is desirable in many cases because the SSL/TLS reconnect does not require the time needed for a full RSA handshake. But there are cases in which reconnects can be troublesome, such as during performance testing or when the connection pattern of the site is that the same client never reconnects to the same Web server.

Reconnects are enabled by the Schannel cache, which keeps a list of previous SSL/TLS sessions that are established with the current credential handle. The sessions are referenced by the SSL/TLS session ID. Schannel currently maintains up to ten thousand cached sessions for a maximum of ten hours.

Schannel cache values are stored in the following registry subkey:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL

You can add or modify these values by using the registry editor Regedit.exe. Table 2 describes the entries and their values.

Warning

Do not edit the registry unless you have no alternative. The registry editor bypasses standard safeguards, allowing settings that can damage your system, or even require you to reinstall Windows. If you must edit the registry, back it up first and see the Resource Kit Registry Reference for Windows Server 2003 at https://go.microsoft.com/fwlink/?LinkID=4543.

Table 2. Schannel Registry Settings

Entry	Datatype	Description
MaximumCacheSize	REG_DWORD	The maximum number of SSL/TLS sessions to maintain in the cache. The default value is 10,000
ClientCacheTime	REG_DWORD	The time, in milliseconds, to expire each client side cache element. The default is 10 hours.
ServerCacheTime	REG_DWORD	The time, in milliseconds, to expire each server side cache element. The default is 10 hours.

Setting either MaximumCacheSize or ServerCacheTime to zero disables the server-side session cache and prevents reconnects. Increasing MaximumCacheSize or ServerCacheTime above the default values causes LSASS.EXE to consume additional memory. Each session cache element typically requires 2-4k bytes of memory.

Additional Reading

See the following resources for further information:

RFC 2246: The TLS Protocol Version 1.0.

ITU-T Recommendation X.509, available on the International Telecommunications Union Web site at https://go.microsoft.com/fwlink/?LinkId=3799.