ISA Server 2000 Performance Best Practices

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Internet Security and Acceleration Server 2000 Performance Best Practices White Paper

Last Updated: November 2002

Introduction

The goal of ISA Server capacity planning is to enable planning the hardware and software configuration of an ISA Server deployment for customer-specific performance and capacity requirements.

A typical question about ISA Server capacity could be: "What hardware do I need to support ISA Server in my organization with n users?" The following is a closer look at this question, with an explanation of each of its parts:

hardware – or resources refer to the actual hardware components on the ISA Server computer that are utilized by ISA Server, for example, CPU, RAM, network, disks, and so on. There are several possible connotations:

Single/Multiple CPUs in a computer (scale-up).
Single/Multiple computers (scale-out).
An organization's actual or required Internet connection bandwidth.
Special purpose hardware, such as SSL accelerators, RAID disks, and so on.

do I need – refers to customer requirements that set the ISA Server configuration to be used. Here are some examples of customer requirements that may be important for figuring the correct capacity:

What kind of Internet access policy is recommended?
Must a user be authenticated by ISA Server?
Must ISA Server activity be logged?
Are there any third party Web/application filters that are recommended? (For example, antivirus, content filtering, and so on.)

support ISA Server - ISA Server has several scenarios, each with its own performance/capacity characteristics: firewall, Web caching, Web publishing, server publishing, perimeter network. Which of these functions does the customer need?

organization – Different organizations have different needs. The Internet access patterns of a bank differ from those of an Internet service provider (ISP). For example, a group of homogenous users tend to consume less ISA Server resources than a non-homogenous group, because their requests can be served from cache more often.

n users – refers to the expected workload. The number of users may be too vague for accurate capacity planning, because the average Internet usage pattern of these users is rarely known. A more informative metric is the bandwidth of the current Internet connection that the organization uses. ISA Server is more commonly deployed in situations where some firewall or cache is already used by an organization, and an Internet link with sufficient bandwidth is available.

Re-phrasing the question to more generalized terms: "What Resources are necessary to support a given ISA Server Scenario and Configuration loaded by some given Workload?" This document provides the necessary guidelines for answering this question.

It starts by describing three simple Capacity Strategies each fit for a broad range of customer's. A capacity strategy identifies the customer capacity requirements, and provides tailored guidelines for capacity-sensitive deployment. This section is recommended reading for all ISA Server users.

Next come Scaling ISA Server describing ways to grow from current requirements to future capacity needs. This section is recommended for the Large Enterprise class of customers.

Next come several sections on ISA Server performance tuning: Tuning ISA Server Resources, Tuning ISA Server Scenarios, and Tuning ISA Server Features. These sections are recommended for more experienced technical personnel that want to increase the capacity of an ISA Server deployment. Learning the Workload is another section aiming at increasing capacity by addressing specific usage patterns.

Capacity Strategies

Learning a customers capacity requirements is the first step in determining the necessary resources for an ISA Server deployment. To do this, there are several strategies, each fit for a broad range of customers.

In general, a customer is likely to have the following metrics:

The network bandwidth of the Internet links.
The number of Internet users within an organization.
The number of access hits (a day) on a Web service (common in publishing scenarios).

From these, the most valuable metric for ISA Server capacity planning is the bandwidth of the Internet connection, because in the majority of cases, it represents the true Internet capacity needs of the customer. This is because it is an expensive resource that is used efficiently.

The number of Internet users in an organization is a poor metric for capacity planning, because it varies widely between organizations according to usage patterns, Internet usage policy, and so on.

On its own, the number of access hits is also useless, but in many cases it indicates that other more valuable metrics are available from the customer, such as peak request rate, request type distribution, and so on.

All ISA Server capacity planning cases fall into one of the following categories:

The Internet connection bandwidth is low enough to become the bottleneck of the ISA Server system. This leads to the Single Entry-Level ISA Server Computer strategy.
The Internet connection bandwidth is larger than what a single machine can fill, and ISA Server is used for forward/outbound scenarios. This leads to the Large Enterprise strategy.
The Internet connection bandwidth is larger than what a single machine can fill, and ISA Server is used for publishing scenarios. This leads to the Publishing strategy.

The following sections describe these strategies in more detail.

Single Entry-Level ISA Server Computer

The single ISA Server computer capacity strategy is fit for outbound firewall and forward caching in organizations that have low Internet connection bandwidth. According to recent market research reports on Internet usage, most U.S. corporate Internet links are in the 1 to 10 Mbps bandwidth range. This fact alone is enough to indicate that an entry-level computer with a single or dual processor will suffice for most ISA Server deployments.

According to outbound firewall test results, ISA Server firewall service running on a single Pentium III 733-MHz processor can provide a throughput of ~25 Mbps (mix of HTTP and streaming media) at 70 percent CPU utilization (This is the throughput measured for worst-case SecureNAT clients). This means that for a 1.5 Mbps T1 Internet link, the firewall service will utilize only ~5 percent CPU.

According to Forward Cache test results, an entry-level ISA Server computer with a single Pentium III 733-MHz processor can sustain ~660 HTTP requests per second, with 35 percent hit ratio, 50 percent non-cacheable responses, and 7.6 KB average response size, at 75 percent CPU. This results in a throughput of about 39 Mbps through the server but only 25 Mbps on the Internet link, assuming that the system hardware resources are designed for maximizing CPU utilization (see more details in System Design for Maximizing CPU utilization):

660 – requests per second

65% - percent of requests that are not cache hits (cache misses and non-cacheable)

7.6x1, 024 – 7.6 KB average response size

8/10242 – conversion from Bytes per second, to Mbits per second (Mbps).

If the Internet link is a private dedicated T1 connection with 1.5 Mbps bandwidth, then the system will be able to sustain at most 39 requests per second because:

At this request rate ISA Server will utilize only 5 to10 percent CPU, and the rest can be used for other processes running on the same machine. At this low level of CPU utilization, it is even possible to enable transparent caching in ISA Server integrated mode with a reasonable CPU utilization level of 50 percent to 70 percent when feeding a T1 link with a 733-MHz Pentium III processor (see Tuning Clients for more details on transparent caching performance).

The single entry-level computer strategy is fit for all situations where the Internet connection bandwidth is small enough to be filled by the power of a single entry-level computer. This strategy states a simple course of action: "install and configure ISA Server on an entry-level computer of your choice, and you are ready to go!"

This strategy is also fit for large enterprises spread over several branches, where each has its independent low bandwidth Internet link. In this case, each branch would be considered a small organization, where a single entry-level ISA Server computer is sufficient to fully utilize its own separate Internet connection.

What remains to be determined is the relationship between a computer's performance capabilities and the Internet connection bandwidth it supports. The following table shows the CPU utilization of a Pentium III 733-MHz processor when the Internet connection is 100 percent utilized in each scenario (note that the throughput is on the Internet link).

Table 1 CPU Utilization on Network Bound Systems

	Internet Mbps at 75% CPU	%CPU at 1.5 Mbps (T1)	%CPU at 4.5 Mbps	%CPU at 15 Mbps	%CPU at 45 Mbps (T3)
Outbound/Inbound Firewall (SecureNAT clients)	29	4	12	38	115
Outbound Firewall (Firewall Clients)	61	2	6	19	56
Forward Caching	25	4	13	45	134
Reverse Caching	71	2	5	16	48
Integrated Firewall/Cache	32	4	11	36	107

The Mbps at 75% CPU column states the Internet bandwidth that a Pentium III 733-MHz processor can support. The next four columns indicate the estimated CPU utilization levels when fully utilizing an Internet connection with the specified bandwidth. According to this table, all scenarios can run on such hardware for most Internet links. In the high end (T3 link), any Pentium 4 machine will be sufficient.

Large Enterprise

For large enterprise-scale sites, like Microsoft's Redmond site, the situation is more complex. The ISA Server capacity strategy for this case requires more elaborate planning, because Internet bandwidth is large enough to shift the performance bottleneck to the system's CPU resource. This strategy is fit for caching, both forward and reverse, as well as for firewall scenarios.

Still, Internet connection bandwidth imposes a maximal limit on the number of machines that are needed to fully utilize the connection, and this maximum may be sufficient for most capacity estimations. Moreover, capacity requirements tend to grow over time, so planning for maximum network capacity is a good conservative estimate. This also calls for planning for future growth, by enabling easy processing power upgrades. Scaling ISA Server describes hardware scaling techniques, their performance characteristics, and other scaling benefits.

For planning ISA Server capacity more accurately, learning the workload is key. The section Learning the Workload describes several ways to do that. Still, in many cases, learning the workload may be unfeasible due to limited resources or unavailable information (for example when building a new large-scale ISA Server system). For forward scenarios (outbound firewall and forward caching) where workloads do not vary too much between organizations, learning the workload may not prove to be beneficial.

Publishing

Publishing scenarios are different from forward scenarios in that the Internet link serves as the connection between clients and the server. A large Web site serving millions of hits per day uses an Internet link that can serve several thousands of requests per second. ISA Server is not used here for acceleration, but rather for offloading cacheable responses from the hosted Web site. Still, if the Internet link is less than 70 Mpbs, a publishing scenario can be treated using the single computer strategy. On the other hand, a large web site (even though serving a link of up to 70 Mbps) can benefit from improved fault-tolerance enabled when ISA Server is deployed on a 2-machine array. For the firewall server publishing scenario, the maximal throughput for a single Pentium III 733-MHz computer is 29 Mbps.

For large publishing sites, however, the capacity strategy is similar to that of the large enterprise. Again, starting with a conservative estimate taking into account the maximal Internet bandwidth, along with a hardware scaling strategy, would probably be sufficient for most large-scale customers.

But in publishing scenarios, different from forward scenarios, the variability of workloads is much larger due to differences in content characteristics. This leads to large differences in capacity between Web sites serving about the same request rate. If the hardware price for a conservative estimate is too high, having a more accurate estimate, based on learning the workload, can probably reduce the hardware price considerably.

Scaling ISA Server

There are two ways to increase the CPU power of any system. One way is to add more processors to a computer (provided that the computer is already a multiprocessor box), which is called scaling-up the processing power. Another way is to add another computer (provided that computers are already arranged in a cluster), which is called scaling-out.

ISA Server supports both scaling-up and scaling-out, and the following sections describe each with more detail.

Scaling-up

The Windows Server System is designed to implement symmetric multiprocessing (SMP). With symmetric multiprocessing, the operating system can run threads on any available processor. As a result, SMP makes it possible for applications to use multiple processors when additional processing power is required to increase the throughput capability of a system.

However, due to other hardware resources, such as memory and I/O devices that are shared by all the processors, doubling the number of processors in a multiprocessor computer system does not necessarily double the number of transactions that an application can process1. On the contrary – there are rarely any applications that can reach linear scaling. ISA Server has a scale factor of 1.6 to 1.7 depending on the scenario.

Scaling-out

There are several ways to scale-out an ISA Server system:

Using high-level network switching hardware gear. These switches are often called L4/7 switches (layer-4 to layer-7) because they can provide packet switching based on the network layer information (TCP), or even the application layer information (HTTP). The information available at these levels can provide sophisticated load-balancing, according to IP source/destination, URL, content type, and so on. Because they are implemented as hardware appliances, they have a relatively high throughput, and are highly available and reliable, but also expensive. Most switches can detect server-down conditions, enabling fault tolerance.
Using DNS round robin name resolution, a cluster of servers can be assigned the same name in the DNS. The DNS responds to queries for that name by cycling through the list. This is an inexpensive solution (actually it costs nothing), but has many drawbacks. One major problem is that the load is not necessarily distributed evenly between servers in the cluster. Another problem is that it provides no fault tolerance.
Using Windows Network Load Balancing. Network Load Balancing works by sharing an IP address with all the servers in a cluster, and every packet sent to this IP is viewed by all servers. However, each packet is served by only one of the servers, according to some shared hash function. Network Load Balancing is implemented at the operating system level. It provides evenly distributed load balancing and supports fault tolerance. (Other servers in the cluster can detect a failing server and distribute its load between them.) However, it requires some CPU processing overhead (about 10 to 15 percent for common ISA Server scenarios), and has a limit to the number of members in the cluster. (About 12 machines is the recommended maximum.) Network Load Balancing is available only on Windows Advanced Server.
For the caching scenarios, ISA Server supports the cache array protocol (CARP), which is a cache load balancing protocol. It not only distributes the load between the servers, it also distributes the cached content. Each request is sent to a specific machine in the cluster, so that subsequent hits will be served from that machine. CARP is available only in ISA Server Enterprise Edition.

Scale-up vs. Scale-out

Scaling is used for increasing the capacity of a system. Each scaling method has its benefits and drawbacks, and for ISA Server it also depends on the scenario. When deciding on what scale method to use, the following considerations should be taken into account:

Performance Factor – how much more throughput can be gained by adding another processor/machine?
System Cost - refers to the initial cost of buying the system, and not to be confused with "cost-of-ownership."
System Administration – refers to the level of complication in administering the system. This has a direct implication on the system's cost-of-ownership.
Fault-Tolerance – is the method used by the system to enable high availability and reliability.
System Upgrade – is the method used to increase the processing power of the system. The cost of upgrade is also an important consideration.

The following lists several tradeoffs to take into account when deciding between scale-up and scale-out.

Performance

The scale factor of a single multi-processor machine is in general less than the scale factor of a cluster of machines. This means that a single machine may be preferable up to several processing units, but above a certain scale they do not provide the performance boost that a high scale-factor provides by most clusters.
System Cost vs. Cost-of-Ownership

In general, a single box, even with many processors, is easier to configure and maintain than a cluster of several computers. On the other hand, a computer with multiple processors is much more expensive than a single processor computer (in general, it is more expensive even on a per processor basis).
Single-Point-of-Failure vs. Fault-Tolerance

The availability of a single machine deployment is much more susceptible to hardware failures than a multi-machine cluster. A failure in the system board, or disk controller will cause the whole system to go down for repair. This is also true for a hardware load-balancing solution.
Upgrade

Upgrading a single machine solution is very easy and simple provided that there are empty processor slots in the machine (or available ports in the hardware load-balancing switch). Once they are full, the box must be replaced making the upgrade very expensive. In multiple-machine clusters, adding another machine is more complicated, but a lot cheaper to buy on the average.

The following table summarizes the above.

Table 2 ISA Server Scaling Options

	Scale-up	Scale-out
		L4/7 Switch	DNS	Network Load Balancing	CARP
Performance (scale-factor)	1.6 – 1.7, depending on the machine architecture (bigger L2 cache increases scale factor)	2	2	1.8 (for 1-8 machines in a cluster)	Starting from 1.5, and asymptotically approaching 2
System Cost	MP machines cost much more, especially 4P and up2.	Expensive	None	Need Windows 2000 Advanced Server or Windows 2003 Server for each machine	Requires ISA Server Enterprise Edition (EE)
System Administration	Single machine admin.	Need admin. of several machines plus extra for switch	Admin. of several machines with ISA Server centralized array admin (EE)	Admin. of several machines with ISA Server centralized array admin (EE)	Admin. of several machines with ISA Server centralized array admin (EE)
Fault Tolerance	Native hardware	Switch detects failing machine and loads the others	None	By mutual detection of failing machine	By mutual detection of failing machine
System Upgrade	Add another processor – must have an empty slot available	Add another machine – need an empty port and extra bandwidth on switch	Add another machine and add an entry to the DNS	Add another machine – note that too many may degrade scale-factor	Add another machine – practically unlimited scale.
Scenario	All	All	All	All	Forward caching only

Tuning ISA Server Resources

The first step in capacity planning is to determine the type of computer system that is needed to fulfill the performance requirements. The next step, once a system is configured and running, is to get the most performance out of the system. For ISA Server, this means designing adequate hardware resources to make the system bound on its CPU power. (The single entry-level ISA Server computer strategy is an exception to this concept, because the Internet bandwidth is much lower than the slowest CPU can fill.) The next sections describe this concept in more detail.

Tuning resources, as well as Tuning Scenarios, and Tuning Features are advanced topics necessary for Large Enterprise scale deployments. In the vast majority of single entry-level computer deployments, tuning ISA Server for performance is unnecessary.

System Design for Maximizing CPU Utilization

ISA Server capacity depends on CPU, network, disk, and memory hardware resources. Each resource has a capacity limit, and although all resources are consumed below their limit, the system as a whole functions properly, fulfilling its performance objectives. Performance drops considerably when one of these limits is reached, causing a bottleneck. In this case, the system is said to be bound on that resource. Each bottleneck has its symptoms in overall system performance that can help detect the resource that has inadequate capacity. This shortage calls for increasing the failing resource's capacity.

Each resource has its price, and in today's computer market, the CPU is the most expensive resource of a computer. Therefore, it is the last resource to be increased. This logic explains the need to design a system for maximizing CPU utilization: making sure that a system will have no performance bottlenecks before reaching full CPU utilization. In other words, if the CPU is powerful enough to sustain the expected load, the bottleneck will never surface. To do this, all other resources must have adequate capacity. The following sections describe how to design a CPU-maximized system with adequate capacity in each resource, how to monitor each resource, and how to troubleshoot a bottleneck in each.

Determining CPU Capacity

Like most server applications serving numerous client requests, ISA Server performance also benefits from bigger processor cache, increased CPU speed, and improved processor architecture. Here are some guidelines for deciding on the right CPU for an ISA Server system (ordered by their effect on performance):

L2/L3 Cache Size. Dealing with huge amounts of data requires frequent memory access. An L2/L3 cache improves the performance on large memory fetches.
CPU Speed. As in most applications, ISA Server benefits from faster CPUs. Still, increasing CPU speed does not ensure linear increase in performance. Due to the large and frequent memory access effect, increasing CPU speed may cause more wasted idle memory when waiting for CPU cycles. This phenomenon was actually observed when speeding up from an Intel 733-MHz Pentium III to a 933-MHz processor.
Improved Architecture. As in most other applications, ISA Server benefits from CPUs with improved architecture. Intel Pentium 4 systems provide better performance than Pentium III. ISA Server capacity tests show an improvement of 20 percent in firewall tests, and 45 percent in cache tests, comparing a 1.7-GHz Pentium 4 processor to a 733-MHz Pentium III processor.

CPU bottlenecks are characterized by situations in which \Processor\% Processor Time performance counter numbers are high while the network adapter card and disk I/O remain well below capacity. In this case, (which is the ideal CPU-maximized system), reaching 100 percent means that the CPU must be upgraded. Apart from the preceding upgrade guidelines for a single CPU, it is also possible to add power using more CPUs. In the section Scale-up or Scale-out, CPU scaling options are discussed. If ISA Server seems stuck with high response times, but low CPU percentages, then the bottleneck is elsewhere.

Determining Network Capacity

Every network peripheral that exists on a connection has its capacity limit. These include the client and server NICs, and the routers, switches and hubs that interconnect them. Adequate network capacity means that none of these network devices are saturated. Monitoring network activity is essential for assuring that actual loads on all network devices are below their maximal capacity.

Adequate network capacity is a function of workload parameters. For example, consider a forward cache, where clients are connected to the ISA Server on a 100 Mbps Fast Ethernet link. When utilized at 75 percent, which is a recommendation for peak network utilization, this link enables peak request rate of 1,200 HTTP responses per second, with average size of 8 KB, according to:

If the average request rate at peak load is six requests per minute per user, then a total of 1,200 HTTP requests per second serve requests from 12,000 users:

isprfp04

The preceding results are a function of the average response size (8 KB) and the average user request rate (6 requests per minute), which can vary with different workloads.

The network link to the Internet in forward cache scenarios is typically much lower, but fortunately only a portion of the response traffic is transferred through this line. For example, if the hit ratio is 40 percent, then the 12,000 simultaneous users in the preceding example create throughput of:

For transferring 45 Mbps at 75 percent network capacity, the actual bandwidth of the Internet link is recommended to be 60 Mbps. In most cases, the Internet link bandwidth is a fact, because it is an expensive resource, which an organization would utilize efficiently. In the previous example, a T3 link (45 Mbps), with 75 percent utilization at peak throughput, would enable a peak request rate of 900 requests per second:

Monitoring network activity is achieved with performance counters \Network Interface\Bytes Total/sec. If its value is more than 75 percent of the maximal bandwidth of the card, then you will have to consider increasing the Internet link bandwidth.

Determining Disk Capacity

Disk performance is determined by several factors: the scenario, hit ratio, peak request rate, and working set size.

Disk capacity is irrelevant for firewall scenarios. (They use disk only for small logging activity.) For caching, the workload factors for forward and reverse are different, creating totally different cache tuning principals.

Tuning Disk for Forward Cache

In forward caching, hit ratio and peak request rate are used to determine the number of necessary disks. For example, if the ISA Server computer is using 15,000 RPM disk drives (150 I/O per second), then 3 disks are needed to support 900 requests per second at peak load with hit ratio of 40 percent, according to:

It is recommended to use disks of the same type and of equal capacity, and not use the disks for anything else. If a RAID storage subsystem is used, then it would be configured as RAID-0 (no fault tolerance). If you want to save a disk, and use one disk for both system files and ISA Server cache, you can do so, provided that no other application is running on your system that heavily utilizes disk resources.

Once the number of disks has been determined, the total size of the disk cache can be tuned to get the highest possible hit ratio. This can be done according to the following procedure:

Start with a cache size that is about 50 percent of the total (formatted) disk space. Set every disk an equal proportion of the total cache size.
Monitor hit ratio at peak load hours.
Increase the disk capacity by some factor. If hit ratio is very low, choose a large factor (50 percent), and if it is low, choose 20 percent.
Monitor hit ratio again. (Be sure to do it on the next day.) If it increased considerably (not just by a few percents), then return to step 3. Otherwise, you are done, and the hit ratio you have is asymptotically close to the true hit ratio.

Tuning Disk for Reverse Cache

In reverse caching, working set size is so small compared to the working set size of the forward cache, that it becomes relevant to try to put it all in memory. Still, ISA Server uses the disk cache in the same manner that it is used in the forward cache scenario, but hopefully, if the memory cache is big enough to hold the whole working set, then a disk cache object is read into memory only once.

The size of the working set is the total amount of cacheable objects in the Web site that the cache hosts. The size of the disk cache is recommended to be about twice the size of the working set to hold all cacheable objects, and account for fragmentation in disk allocation and cache refresh policy3. For a working set of 1 GB, 2 GB disk cache would be enough.

Because most cache fetches are served from the memory cache (if it is big enough – see the next section, Determining Memory Capacity describing how to tune memory cache capacity), the disk read rate is small. In most cases, a single disk is sufficient to serve these fetches without becoming a bottleneck.

Determining Memory Capacity

ISA Server memory is used for:

Storing network sockets (mostly from nonpaged pool)
Internal data structures
Pending request objects
The disk cache directory structure
Memory caching

(The last two are relevant only in cache scenarios.)

Tuning Firewall Memory

In firewall scenarios, the limiting memory factor is the size of the nonpaged pool, which is a function of total memory size. Typical minimal and maximal values of nonpaged pool size are:

Table 3 Minimum and Maximum Nonpaged Pool Size According to Physical Memory Size

Physical Memory (MB)	128	256	512	1024	2048	4096
Min Nonpaged Pool Size	4	8	16	32	64	128
Max Nonpaged Pool Size	50	100	200	256	256	256

Testing shows that 256 MB is enough for entry-level machines, and 1024 MB proves sufficient for high-end machines. These amounts will also be able to store the full memory working set capacity.

Tuning Cache Memory

In cache scenarios, memory is used for:

Pending request objects. The number of pending request objects is actually the number of connections to the ISA Server, and each connection requires about 15 KB. So for 10,240 connections, the Web proxy memory working set has 10,000 x 15 KB = ~150 MB allocated for pending request objects.
Cache Directoy containing a 32 bytes entry for each cached object. The size of the cache directory is directly determined by the size of the cache and the average response size. For example, a 50 GB cache holding 5,000,000 objects (10 KB each) requires 32 x 5,000,000 = 153 MB.
Memory Caching. The purpose of memory caching is to serve requests for popular cached objects directly from memory lowering disk cache fetches. The effect of memory caching is very different between forward and reverse caching. In reverse caching, ISA Server hosts a web site, with a limited amount of data. In this case it is feasible to have all the cacheable content loaded in the memory cache, totally eliminating the need to access the disk, which in turn has a dramatic effect on performance. In forward caching, the cacheable content is practically unlimited, so however big the memory cache is, there will always be a considerable portion of hits that will be served from the disk cache. Therefore, in forward caching the memory cache size has a limited effect on performance.

By default, the memory cache is 50 percent of total physical memory, and is configurable. Tuning the size of the memory cache is a process of trial-and-error. First, it must be large enough to maximize hits from memory. (Performance counter \ISA Server Cache\Memory Usage Ratio Percent (%) measures the ratio between memory fetches and total cache fetches.) However, allocating too much memory for the ISA Server memory cache may cause unwanted hard page faults, because there is not enough physical memory left to hold all the rest.

Hard page faults cause severe performance degradation. Fixing this situation can be achieved by adding more memory, or decreasing the size of the memory cache.

Considering the preceding information, the following process for tuning cache memory size is recommended:

Tune disk cache size (Determining Disk Capacity).
Estimate required memory as the total of:
1. pending request objects (15 KB * peak-established-connections).
2. cache directory size (32 * URLs-in-cache).
3. memory cache size, which should be set to one to two times the working set size in case of reverse caching. In forward caching, set memory cache to be as big as possible, without causing hard page faults. When determining required memory for a new ISA Server system, reserve between 10 to 50 percent of physical memory for the memory cache.
4. system memory will require about 2 KB per connection, with an extra 50 MB to start with (50 MB + 2 KB * peak-established-connections).
5. At least 100 MB for other processes running in the system.
Set memory cache to its estimated percentage from total memory computed in step 2. If the hardware system has more memory than required by step 2, it is possible to increase memory cache percentage even more. The goal is to have all physical memory used at peak load without performing hard page faults.
Monitor memory usage and change memory cache size accordingly. The informative performance counters are:

\ISA Server Cache\Memory Cache Allocated Space (KB) \ISA Server Cache\Memory URL Retrieve Rate (URL/sec) \ISA Server Cache\Memory Usage Ratio Percent (%) \ISA Server Cache\URLs in Cache \Memory\Pages/sec \Memory\Pool Nonpaged Bytes \Memory\Pool Paged Bytes \Process(W3PROXY)\Working Set \TCP\Established Connections

As explained before, the most critical evidence that memory is not tuned correctly, is when

\Memory\Pages/sec

is large (above 10) during peak loads. If this happens, the first thing to do is lower memory cache size, and then determine if more memory is needed.

Using the /3 GB boot.ini Switch

For large systems with over 2 GB of memory, Windows 2000 Advanced Server or Windows Server 2003 offers the 4 GT RAM tuning feature, dividing a process memory space to 3 GB for application memory and 1 GB for system memory. This feature enables processes to benefit from more than 2 GB RAM in user space, and is enabled by adding the switch /3GB to the boot.ini file. (For details, see KB article 291988.)

This feature may be beneficial for ISA Server, especially for reverse caching hosting a big Web site (for example, with a large working set). However, using this feature reduces the maximal size of the nonpaged pool (128 MB instead of 256 MB), hence the maximal number of concurrent TCP connections.

Tuning ISA Server Scenarios

This section describes performance tuning aspects that are specific to a scenario.

Tuning Firewall Scenarios

The primary performance booster in all firewall scenarios is kernel-mode data pumping (KMDP). KMDP is enabled by selecting IP routing (which is disabled by default). Still, enabling KMDP does not guarantee that it will be used for every protocol. Moreover, activating KMDP for some protocols does not guarantee the full expected performance boost over user-mode data pumping. To get the most performance out of KMDP, it is recommended to activate it for the most capacity-intensive protocols in the workload.

Here are examples for some capacity intensive protocols that are used today:

HTTP – HTTP is the most widely used protocol, and has 70 percent to 80 percent of the Internet traffic. Since HTTP is a single channel protocol where data and control headers pass through the same TCP connection, KMDP can be used for HTTP only:
1. for outbound Firewall Client (which are also called Remote WinSock or RWS for short), and
2. if no application filter is attached to the HTTP port. (This requires disabling the HTTP redirector filter, which is enabled by default.) Disabling the HTTP redirector filter is recommended also for improving cache performance because it disables transparent caching. (See the next section on tuning forward cache performance.)
FTP – FTP is common for large file transfers. In ISA Server, the FTP application filter (enabled by default) data channel always uses KMDP if IP routing is enabled.
Streaming media – a collection of protocols for transferring audio and video. The ISA Server built-in streaming media application filter supports MMS (Microsoft Media Streaming), PNM (Progressive Network Media by Real Media) and RTSP (Real Time Streaming Protocol – the Internet standard). All of these protocols use secondary data channels over UDP (or TCP for fail over in case of poor transmission lines), and therefore can utilize KMDP.

The following table summarizes the performance test results of ISA Server in all transport/client/filter configurations.

Table 4 ISA Server Throughput for Streaming Media

According to this table, KMDP provides superior performance over UMDP, but the configurations exploiting KMDP are used less frequently: UDP is the preferred transport for servers and clients, but it requires the streaming media application filter to be active, which in turn causes most other cases (TCP, RWS) to run with UMDP. The fortunate exception is streaming media data over HTTP, in which case it can run in KMDP for RWS clients (See the preceding explanation on HTTP).

All other protocols, even though some are popular (such as mail protocols, DNS, ICMP, and so on), have low capacity compared to the protocols in the table, so improving their performance does not have a distinguishable effect.

Tuning Cache Scenarios

Forward and reverse caching have different workloads, and are used for different reasons. Therefore, the performance tuning considerations are different, as explained in the following sections.

Tuning Forward Cache

Forward caching is used for decreasing response times and lowering bandwidth on connection to the Internet, (which is actually the goal of improving quality of service while lowering network access cost). Therefore, the ultimate performance goal of a forward cache is to maximize hit ratio. Fortunately, this goal is inline with ISA Server design where hits (served from cache) cost less (in terms of overall resource utilization) than misses (served from network).

According to this rationale, the best thing to do for getting the most performance benefit, is to put enough disk storage to maximize hit ratio.4 (For details, see the section Determining Disk Capacity.) Determining the disk cache configuration includes both the total size (for maximizing hit ratio) and number of disks (to make sure that the number of hits per disk does not pass the maximum I/O rate that the disk can serve).

Once the disk configuration is determined, the next step is to define the size of the memory cache. By default, ISA Server sets a memory cache to 50 percent of the total physical memory. This value is sufficient for most cases, dangerous in some cases, and may prove to be useless in others. For details, see Determining Memory Capacity.

One important tip for cache performance – if your system is not bound on Internet network bandwidth (see Single Entry-Level ISA Server Computer), and ISA Server system is designed according to System Design for Maximizing CPU utilization, then it is highly recommended to use explicit proxy caching, by configuring user browsers to use the corporate Web proxy (the ISA Server). In Microsoft Internet Explorer 5 and later, the Web proxy can be set manually or automatically by corporate policy. For details on how this can be achieved, see Advantages of Using MS Internet Explorer 5 in Your Business.

Tuning Web Publishing

Web publishing (reverse caching) is used for decreasing Web access response times, and lowering the load off of backend Web servers. (Again, improving quality of service while lowering CPU cost.) Like in forward cache, this means to maximize hit ratio. But the fundamental difference here is that the working set is much smaller, and can fit into physical memory in most cases.

So, in this case the first focus is on Determining Memory Capacity. Disk capacity is not critical, because even large Web sites can hold all their static data on a single disk with several gigabytes.

Another thing to consider is the bandwidth of the network card that connects to the internal network. Although the incoming connection bandwidth may be large (because it is serving the Internet), the internal bandwidth may be small. This is because common hit ratios are in the 80 to 90 percent range, meaning that only 10 to 20 percent of response traffic is served from the backend Web servers5. All this means that the internal network card can have smaller maximal bandwidth, which could make the network infrastructure less expensive.

Tuning ISA Server Features

Tuning Clients

Tuning the clients has the most dramatic effect on ISA Server Performance.

Follow these guidelines for improving performance:

Outbound firewall performance may benefit from RWS firewall clients (as compared to SecureNAT clients), especially on high traffic protocols like HTTP, FTP and streaming media. (A two to three fold performance increase can be expected with KMDP)
Forward caching is much faster than transparent caching. The reason for this is that in the former case clients explicitly connect to the ISA Server providing a much higher connection keep-alive rate while lowering the total number of simultaneous connections. Therefore, it is recommended to make all internal Web browsers interact directly with the ISA Server (either by configuring a central browser installation according to enterprise-wide policy, or by using the Internet automatic settings detection in the Internet browser).
The load of streaming media clients on an ISA Server will be lower in these cases:
1. If the streaming media filter is enabled, then configure the player to use media over HTTP for clients with RWS, or use UDP for SecureNAT clients.
2. If the streaming media filter is disabled, then only RWS clients will be able to play media, in which case it is advisable to configure media over TCP transport.

Tuning Policy

The performance guidelines for tuning ISA Server policy are:

Minimize the number of rules (because rules processing grows linearly with the number of rules). For example, it is more efficient to use a single rule specifying a large destination set than to use a rule for each destination.
Favor allow rules over deny rules (because all deny rules are checked before any allow rule, and rules checking terminates when the first applicable allow rule is found).
Firewall policy is evaluated when a connection is established, and therefore has a negligible overhead relative to the data transferred on the connection. Cache policy is evaluated on every request, so its overhead is much larger compared to firewall policy evaluation.
Avoid possible DNS lookup problems. Make sure all destination names in destination sets are resolvable, and set up a nearby DNS if possible.

Tuning Authentication

For tuning authentication performance, take the following issues into account:

Firewall authentication is relevant for RWS only, in which case it has a negligible overhead, because it is checked once per session, (in which many connections are opened and many more packets are transferred).
Cache authentication is checked more often, and re-authentication rate depends on the authentication scheme used. The authentication scheme also determines the amount of actual processing performed when applying authentication. The following table indicates the possibilities.

Table 5 ISA Server Authentication Performance

Authentication Scheme	Strength	When is authentication performed ?	Overhead per Request	Overhead per Batch
Basic	low	per request	low	none
Digest	medium	per time/count	none	high
NTLM	high	per connection	none	high
Kerberos	high	per connection	none	medium

Another thing to remember about the performance of cache authentication is that it can be configured on the Web proxy listeners level (by selecting Ask unauthenticated users for identification in the Outgoing/Incoming Web Requests tabs of the array properties), or on a rule level. Choose the former only if authentication is required for all Web access, otherwise choose the second, (which actually means that authentication will be performed only when necessary according to rules).

Tuning Filters

Filters use extra CPU cycles so they have a direct effect on performance. In extreme cases, a filter may require more resources, making the performance of the ISA Server totally different from its performance without the filter. Therefore, it is recommended:

Measure the performance of the filters you use, and tune them to be as efficient as possible. In cases where applicable, consider using ISA Server rules instead of a filter. (For example, site blocking using access rule destination sets may prove to be much more efficient than an ISAPI filter doing the same thing).
If you develop a filter, optimize it for best performance. This is true for any piece of software, especially for a mission critical firewall/proxy server.
Try to avoid downgrading to UMDP by adding an application filter to a protocol that uses KMDP without an application filter. (For example, try to keep the secondary data channel in KMDP.)

Tuning Encryption

For firewall, tuning encryption options are actually tuning Windows VPN. For information regarding VPN performance, see the Windows 2000 VPN: Enterprise Performance with Server-Based Flexibility report.

For cache scenarios, SSL (also called HTTPS) is utilized differently in forward caching and in Web Publishing. In Web publishing, ISA Server serves as an SSL terminator, meaning that it performs SSL handshakes and encryption with clients, and can do so with back-end servers as well.

In forward caching, ISA Server serves only as a tunnel, moving packets from one side to the other, where no encryption/handshake is involved. Therefore, the performance of tunneling is comparable to dealing with unencrypted data that is fetched from the Internet.

To tune SSL in Web publishing on ISA Server, follow these guidelines:

The encryption of an HTTPS connection requires about twice the CPU power compared to an HTTP connection transmitting the same data. Creating an HTTPS connection requires an order of magnitude more CPU power compared to the creation of an HTTP connection. Literature and advertisements on SSL acceleration hardware report 2 to 10 performance improvement, compared to software PKI implementations. How does this affect your workload? It highly depends on how many requests are transmitted per connection (this tip refers only to Web publishing).
SSL in reverse caching encrypts and decrypts data just like SSL in a real Web site. In ISA Server, there are two possibilities for SSL connections:
1. 1-way SSL – Clients and ISA Server communicate using SSL, but ISA Server and backend Web servers communicate using ordinary HTTP.
2. 2-way SSL – Both communication channels use SSL, meaning that the ISA Server will decrypt responses coming from the backend Web site and encrypt it again for sending the response to the client (using a different encryption key).
The 2-way SSL performs twice the amount of processing than the 1-way SSL performs for every request, making 2-way SSL twice as expensive in CPU utilization than 1-way SSL.

Therefore:
- Use 2-way SSL only if necessary. Remember that the second SSL channel between an ISA Server and the backend Web server secures communications on an internal protected network.
- Do not restrict all SSL content to be uncacheable. The HTTPS stream, like HTTP, contains many objects that can be shared as hits in private Web pages (images, icons, and so on). If public data passing through HTTPS connections is made cacheable, then the overhead of 2-way SSL as compared to 1-way SSL will have only a small performance impact, because it affects only cache misses, which will be a small percentage of total traffic.
- SSL in forward caching scenarios is a tunneling protocol. Therefore, content is not cached. This increases Internet throughput (as compared to non-encrypted transmission), but the processing overhead for tunneling the SSL protocol is much smaller than decrypting the data.

Learning the Workload

This section describes ways to learn the workload when it is required (large enterprise and publishing strategies). In general, there are two ways to do this, depending on whether ISA Server is deployed as a replacement or add-on of an existing system, or whether it is deployed for a new system. In the former case, an existing system can provide valuable information sources, such as Web and firewall logs, and performance counters for estimating the workload. In the latter case, learning the workload may be a difficult task, perhaps requiring building and running a scaled-down prototype.

Learning from Existing Logs and Performance Counters

Logs provide the best possible source for computing workload parameters. Web cache servers often record an entry for each Web transaction, detailing the client and server name, the IP, requested URL, response size, time and status, whether the response was served from cache or from Internet, and so on. Firewalls do the same for connections, and record each connection's destination port, which is actually the protocol that was used.

Some parameters are easy to compute: For example, average response size, average response time, hit ratio, cacheable ratio, and protocol distribution. Others are more complicated to compute, and here are a few examples:

Working set size (forward cache): Look at all requests that were served from the cache, and count how many times each URL was hit. Ordering URLs by decreasing popularity will create a Zipf-like distribution with a noticeable cut-off point where a few URLs to its left have many hits, and many URLs to its right have a few hits. (Plotting the distribution provides a good view for locating the cut-off point.) The working set size is the accumulated size of all the popular objects to the left of the cut-off point.

The following graph is an example (taken from one of the Microsoft Corporate proxy logs in July 2001):

This example shows how the working set behaves in 2 consequent days. On the first day the working set is about 4GB, while on the second day it is only 3.3 GB (the cut-off point is at 150,000 URLs).
Requests per connection (outbound HTTP in firewall scenario): Look at the Web log, and count the number of requests between each client and server pair. Using the time of each request, it is possible to determine the time that passed between two consecutive requests. Because IE closes a connection after it is idle for a minute, time differences that are larger than one minute indicate that, in between, a connection was reestablished. Consecutive requests sent with time differences lower than a minute were probably sent on the same connection. Also, IE opens two connections per target, Hence, the average number of requests per connection is the average number of connections within one minute idle time divided by two.

Because Internet usage varies during a day, and during a week, it is important to look at several logs, preferably within normal work days.

Another good source of information is real-time monitoring counters (also known as performance counters on Windows systems). Existing firewall and cache servers often provide useful counters that directly measure relevant events.

Learning from Prototype

There are also cases where ISA Server is to be deployed for systems that do not exist yet. Most of these cases fall into the publishing strategy category where a new Web site or a new Web service is planned to be offered in the future. The scope of this document is brief, to describe the possibilities for simulation and modeling performance of a Web site. For more information on Web capacity planning, please refer to https://www.microsoft.com/technet/archive/itsolutions/ecommerce/default.mspx.

References and More Information

The Market research report used for this paper is: Internet Connectivity Services: A Demand-Side View, 2001, report 26023 by Steven T. Harris, Dec. 2001, IDC.

Microsoft will continue to share information with customers in hopes that it will help you deploy and manage ISA Server more successfully.

For the latest information on ISA Server, go to https://www.microsoft.com/isaserver/.

For the latest information on Microsoft .NET, go to https://www.microsoft.com/net/.

For the latest information on Windows, go to https://www.microsoft.com/windows/.

For support information and self-help tools for Microsoft products on the Microsoft Knowledge Base, go to https://support.microsoft.com/default.aspx?scid=fh;EN-US;KBHOWTO.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

Microsoft, Active Directory, Outlook, and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

1 Doubling the number of transactions an application can process by doubling the number of processors is linear scaling. The true ratio between the number of transactions an application can process when doubling the number of processors is the scaling factor. The scaling factor of a linear-scale system therefore is 2.

2 In the dual processor arena there are two classes of machines – workstations based on desktop class processors (e.g. Pentium), and servers based on server class processors (e.g. Xeon). The former are much cheaper and considered high-end entry-level, while the latter are tuned to provide more performance/capacity for server applications.

3 When a cached object is refreshed, then for some time both new and old copies of the same object consume disk space. The space consumed by the old version of the cached object is de-allocated but not returned immediately for some time (depending on the average refresh rate) until reused by another object.

4 Actually, the first thing to do is to make sure that there is enough CPU power and internal network bandwidth to hold the peak request rate, otherwise response times will increase to unacceptable levels due to CPU or Network starvation.

5 This is under the assumption that the hit, miss, and uncacheable response size distribution is about the same.