Programming Considerations (Web and Application Server Infrastructure - Performance and Scalability)

Article
10/08/2009

Applies To: Windows Server 2003 with SP1

This section is not designed to be the comprehensive performance programming section for Microsoft Windows Server programming environments. Rather, it simply identifies changes and highlights important considerations for building Web applications or Web services on Windows Server 2003.

General Application Design Points

This section contains some general points that are useful to be aware of when designing and developing and application.

Design for Multiple Instances of an Application

This is an important attribute for scaling applications. If an application can have multiple instances of itself running on a single server or in a farm scenario, it stands a better chance of being scaled up should the need arise, for example, to get additional throughput out of the application due to changing business needs.

Windows Server 2003 also offers the capability to easily run multiple instances of Web applications and Web services through a basic configuration parameter. The considerations when building and designing for multiple instances are:

State management
Centralized content or a strategy for replicating and synchronizing content

Understand the Paths a Request Will Take Throughout the System

It is important to understand, from a physical perspective, which processes and subsystems a specific request or message will propagate through on Windows. If you notice that a request or message is bouncing from process to process (or across a server) while its being processed, the application will need further scrutiny from a performance and scalability perspective. The best case scenario is to keep a unit of work within the same process that it started. This is often an impractical goal, but any reconciliation of process hops and minimization of network hops is good for overall performance and response time latency.

With the IIS 6.0 re-architecture, some of the IIS programming interfaces behave a little differently, and the core (base) native code API for IIS is augmented with new performance APIs and features.

ExecuteURL

Normally, an HTTP redirect is accomplished when a server responds to a clients request for a resource, with an HTTP 302 response containing the new location of the resource. The client application then does a new HTTP request to access the content (sometimes on the same server).

The ExecuteURL (HSE_REQ_EXEC_URL) ServerSupportFunction now allows an ISAPI extension to easily redirect a request to another URL. It answers growing demand by ISAPI extension developers to chain together requests of different types. From a performance perspective, it is very useful, because an application can achieve redirection of a request to a new piece of content without the additional network hop, and associated latency, of delegating a redirection to the client browser or rich client application. ExecuteURL also provides functionality to replace almost all ISAPI read raw data filters.

Important

If your applications require read raw data filters, your server must be running in IIS 5.0 isolation mode.

Note that the most common scenario for developing read raw data filters is the need to examine or modify the request entity body before the target URL processes it. Currently, the only way to see the entity body of a request (if you are not the target URL) is through read raw data notifications. Unfortunately, writing an ISAPI filter to accomplish this goal can be exceedingly difficult, or even impossible in some configurations. ISAPI extensions, on the other hand, provide functionality for easy retrieval and manipulation of the entity body. ExecuteURL allows an ISAPI extension to process the request entity body and pass it to a child request, meeting the needs of nearly all read raw data filter developers.

Generally speaking, a great deal of care needs to taken when writing ISAPI filters, because the filters are totally synchronous and the code in an ISAPI filter executes on a core IIS thread. For operations that are going to involve blocking or calling off into other computers (network latency), ISAPI filters can cause problems. Therefore, when ExecuteURL is used in combination with the wildcard ScriptMap API/feature, a developer has enough programming environment infrastructure to replace the majority of ISAPI filters with ISAPI extensions.

Wildcard Application Map

What is an application map?

When IIS receives a request from a client, the Web server looks at the extension of the file that is named in the request to determine which ISAPI or Common Gateway Interface (CGI) application handles that file. By using wildcard application maps, you can intercept every request before the requested page is sent to its mapped application. The effect is like having an application mapping that handles every file name extension. This is why the term wildcard is used to name this feature. Applications using wildcard application mappping can only be ISAPI applications, which is an advantage because they can have their own thread pool, work asynchronously, and have full access to entity bodies (important when processing HTTP POST as part of a Web Service).

With IIS 6.0, it is possible to specify one (or more) wildcard (*) application maps. As implied, these wildcard application maps get executed for every extension or request coming in to a site or virtual directory. Therefore, these wildcard application maps start to take on the behavior of an ISAPI filter; they get to see every request. Combined with the ExecuteURL function, they can perform some logic to determine what should be done with the request, and then optionally call in to other types of content.

The IIS 6.0 configuration file (%SYSTEMROOT%\system32\inetsrv\metabase.xml) contains a section entitled ScriptMaps. This section explains the application mapping of extensions (.aspx, .asp, .shtm, etc.) to physical components for request types. The ScriptMaps section under the /LM/W3SVC node of the metabase is the parent configuration, that is, all sites virtual directories inherit from it if they do not override a value at their level.

To configure a wildcard application map, please see Installing Wildcard Application Mappings in the IIS 6.0 Help.

VectorSend

ISAPI developers have only two options if a response is contained in multiple buffers. They can either call the ISAPI WriteClient() API multiple times, or they can assemble the response in one big buffer and send that response completely. The first approach is a performance bottleneck, because there is one kernel-mode transition per buffer and potential network latency. The second approach affects scalability, in that it puts more contention on the process heap and is wasteful in terms of memory and CPU cycles. For example, if you had to assemble a 64K response from multiple response parts, to allocate the target memory block, the heap would have to search for a 64K contiguous block for your program (possibly involving an extension of the program heap in the process) and you would have to expend the CPU cycles to copy 64K worth of data into the newly-allocated memory chunk. The VectorSend (HSE_REQ_VECTOR_SEND) ServerSupportFunction in IIS 6.0 provides a solution to this problem.

VectorSend allows developers to put together a list of buffers and file handles to send, in order, and then hand off to IIS 6.0 to compile the final response. HTTP.sys sends the buffers and/or file handles as one response by assembling and sending the response on the fly from the original data buffers. This frees the ISAPI from having to do any of this buffer construction or multiple WriteClient calls. This gives the additional benefit of higher performance and better scalability. If possible, existing applications should change from using WriteClient to using VectorSend.

FinalSend

The VectorSend ServerSupportFunction also includes an optional flag that aids the performance of a high throughput site, FinalSend (HSE_IO_FINAL_SEND). The FinalSend flag is one of the flags that can be passed into VectorSend to alter the behavior.

Specifying the FinalSend flag, tells IIS that it can clean up its request context, create the log file entry, etc. when it is completing the send in kernel mode. Without an ISAPI application specifying FinalSend, IIS deduces that the application is finished sending for the current request by retuning control to the HttpExtensionProc and waiting for it to exit. The downside here is that if IIS waits for the HttpExtensionProc procedure to return, it has to do an extra system call (transition from user mode to kernel mode and back) to facilitate the request cleanup and logging for the request. The use of FinalSend does all of this at the same time the actual response is sent to the client, saving one kernel transition per request. Although not a massive amount of CPU resources, this can be a big waste if you’re trying to sustain tens of thousands of requests per second.

Caching Responses from Dynamic Content in the Kernel

One of the key performance features of IIS 6.0 is that it will allow an ISAPI component to cache the dynamically-generated response to a request in the kernel, called the kernel-mode response cache. This feature has the potential to dramatically improve the performance of a server if the content being served lends itself to this form of caching.

Why would I cache the output of a dynamic request?

There are many scenarios where it is useful to dynamically generate a response, and allow the response to be a little stale. An online store scenario is a classic example. The product catalog is a relatively expensive page to create, because it needs to call in to a database, perform a query that is likely to involve some form of database join, and return and render the results for the specific caller. In terms of CPU cycles, creating that page would have cost many millions of CPU cycles across multiple servers. Now, imagine you’re a large e-commerce application on the Internet, getting hundreds or thousands of requests per second; what is the value in regenerating this same page a few hundred times per second, when the actual content is only going to change maybe once every ten minutes? The answer is there is little value in doing this for every request. A better strategy is to generate the page once, have the server efficiently cache it, and then give the response a staleness limit (how long the page can be served from the cache without being regenerated). Once its staleness limit expires, have the server call your code again, in the standard way, to regenerate the page and store the new response for the staleness limit.

What performance gain do I get when putting responses in the kernel?

To provide an idea of the difference, a simple ISAPI was written that looped 10,000 times to simulate some processing, it then responded to the client with approximately 1K of data. The ISAPIs performance was baselined on an 8P server5. Then, the code to cache the response in the kernel was added to the VectorSend() call. The version of the ISAPI that used the kernel cache returned a throughput of over eight times the baseline846 percent of the throughput in the non-cached case. Additionally, the latency of responses was dramatically better. When the ISAPI was executed each time, the average TTFB/TTLB6 was 35.57 / 35.62 milliseconds; when the responses were served from the kernel, the TTFB/TTLB was 2.01 / 2.12 milliseconds.

Of course, results will vary according to the amount of processing done and the size of the response. The point is that leveraging the kernel cache can make a dramatic difference to the performance of an application.

What types of content can be cached in the kernel?

Static file requests, ISAPI components, and Microsoft ASP.NET pages can all cache responses in the kernel. A caveat to this is that the only responses that will be cached in the kernel are responses to HTTP GET requests.

How to Cache ISAPI Responses in the Kernel

To cache ISAPI responses in the kernel, you must use the VectorSend ServerSupportFunction to send your response. In preparation for sending your response, you must add the following response headers to go out with it:

Last-Modified7 a necessary field, it must be provided in GMT time format.
Expires controls the amount of time an item is held in the cache. It is recommended that this value be calculated using an algorithm of current time on the server, plus the amount of time you want the cached copy to be served from the cache. This header should also be provided to IIS in GMT time format.

To cache the response, you call the VectorSend ServerSupportFunction, specifying at least the following flags:

HSE_IO_FINAL_SEND | HSE_IO_CACHE_RESPONSE

By implication, the send that is cached has to be the one and only send to a response.

The ASP.NET v1.1 release uses this facility in IIS 6.0 to host its output cache in the kernel. So, when using the output cache directive in ASP.NET, your responses are also being served from the kernel cache.

Synchronous or Asynchronous ISAPI Sends

This is a key area of architectural change for Windows Server 2003 compared to previous releases of Microsoft Server operating systems. In Windows Server 2003, the IIS 6.0 HTTP.sys interfaces with TCPIP.sys in the Windows kernel. Previous versions of IIS used the Winsock user mode library (a more general purpose TCP/IP sockets library).

Before Windows Server 2003, when an ISAPI application called a synchronous API to send a response to an end browser or application, the Winsock library would buffer the complete response in the kernel and free the ISAPI application to do more work. Meanwhile, if the caller getting the response back was on the end of a very slow link, the kernel memory allocated would be held for a relatively long time and the Windows kernel would grow its memory usage.

This functionality no longer exists in Windows Server 2003. If an ISAPI calls the WriteClient or VectorSend API synchronously (blocking call), IIS and HTTP.sys will only buffer the last 2K of a response on the callers behalf, the thread will wait until all but the last 2K bytes have been sent to that Web client (possibly halfway around the world, running over a 28.8 modem link). What eventually occurs is that the IIS thread pool reaches its maximum limit and stops growing, and the performance of the particular ISAPI is likely to be affected. To rectify this situation, it is recommended that existing ISAPI applications be changed to use the asynchronous versions of the WriteClient API. VectorSend should always be called asynchronously.

COM+ Services in ASP

In IIS 4.0 and 5.0, Active Server Page (ASP) applications are able to use COM+ services by way of configuring the applications Web Application Manager (WAM) object in the COM+ configuration store to use transactions and other COM+ features. In IIS 6.0, the IIS and COM+ teams have separated the COM+ services from physical components. This allows ASP applications to use COM+ services directly inline. For example, a component can enter a transaction context inline in its code, and then exit the transaction context while still processing a method. In addition to those services available in COM+ on Windows 2000, a few new services have been added and are supported in ASP.

Apartment Model Selection

ASP, through COM+, allows developers to determine which threading model to use when executing the pages in an application. By default, ASP uses the single-threaded apartment (STA) model. However, if the application uses poolable objects, it can be run in the multi-threaded apartment (MTA) model.

This is an important scalability gain for many existing ASP applications, because running the ASP objects in the multi-threaded apartment means fewer threads in the system, fewer context switches, and better resource usage. If your ASP application is only using COM+ objects marked as Both or Free threaded, you can safely turn on the multi-threaded apartment in IIS 6.0 for these applications.

To turn on the multi-threaded apartment for a particular site or application in IIS 6.0, navigate to the applications node in the MetaBase.xml file, and add the value: AspExecuteInMTA=1.

High Latency ASP Pages

The ASP programming model is fundamentally synchronous; a request comes in to an ASP script and the ASP script keeps processing until that request is completed and the response is sent. The simple nature of the programming model is such that a developer cannot start an operation that is going to take a long time, and expect the ASP page to exit, and when the operation completes, to restart from a given point.

Consequently, if an ASP page includes a COM+ object that has to perform an operation that is going to take a long time, what will likely occur is that the thread executing the page will block and idle until the I/O operation is complete. If you have a set of ASP scripts that all have high latency operations, you will likely need a large number of threads in the system to ensure good response times.

With any large multiprocessor system, it is better to minimize any surplus threads you have; more threads = more context switches = less scalability. In Windows Server 2003, COM+ has changed to take a gradual approach to starting up new threads. It looks at system parameters before making the decision to start more concurrent work. If there is an ASP-based site or application (running in the COM+ STA) that just needs to have a high number of threads started when it initializes, it is possible to make COM+ revert to the previous behavior with the following registry setting:

HKEY_LOCAL_MACHINE\Software\Microsoft\COM3\STAThreadPool\CPUMetricEnabled = 0 (REG_DWORD)

As per the COM+ parameter above, you may want to also increase the number of ASP thread pool threads when you have an ASP page that experiences high latency, but is running in the COM+ MTA, or uses ASP intrinsic objects that will take a long time to execute. The IIS metabase parameter is AspProcessorThreadMax and is set to 25 by default. This 25 number can be a little misleading; it means a maximum of 25 ASP threads per processor, so, if you're running on a 2P system, you have a maximum of 50 threads by default. AspProcessorThreadMax can be changed by running the following command:

for a specific site: cscript %SystemDrive%\Inetpub\AdminScripts\adsutil.vbs set W3SVC/NumericSiteID/AspProcessorThreadMax 5

or, for all sites: cscript %SystemDrive%\Inetpub\AdminScripts\adsutil.vbs set W3SVC/AspProcessorThreadMax 5

COM+ General

With the new IIS 6.0 architecture, it is important to question some of the existing guidelines where COM+ is concerned. A major consideration is that, before Windows Server 2003, COM+ application components were configured (by default) to run out-of-processes from the caller. The default for COM+ applications is for them to run as Server Applications, executing in a DLLHost.exe process called into from the object instantiate, or over DCOM.

The performance downside of doing this for every method call, is that there are extra threads running on the system, and every call to a method must be marshaled across process boundaries. This is not noticeable on a small implementation with low request/transaction rates, but on a high volume, large multiprocessor, this kind of overhead can greatly decrease the overall scalability of the system.

Therefore, on Windows Server 2003, it is best to change the default configuration for a COM+ Server Application to Library Application8 to aid scalability of the calling per use of that application.

ASP.NET Environment

There are already many existing documents on https://msdn.microsoft.com and https://www.asp.net/Default.aspx?tabindex=0&tabid=1 which discuss coding tips and techniques for building incredibly fast applications using ASP.NET. Most of this information applies to running the ASP.NET v1.1 release on Windows Server 2003 as well. This document will emphasize some of the points that have particular relevance to Windows Server 2003.

Kernel-Mode Response Cache

When the Windows .NET Framework v1.1 is running on Windows Server 2003, ASP.NET will detect the enhanced capabilities of IIS 6.0 and will leverage some of the performance features available in the Windows Server 2003 platform. A key feature being leveraged is the kernel-mode response cache. ASP.NET uses the kernel-mode response cache to store its output cache responses. The table below shows an example of an ASP.NET page that generates a 4K response. One set of figures is when the page is being retrieved from the user-mode output cache in ASP.NET, and the other is when the page is being retrieved from the kernel-mode cache.

**Table 1 The difference between user-mode and kernel-mode output caches for ASP.NET pages.**9

	ASP.NET User-Mode Output Cache	Kernel-Mode Output Cache
Requests / Sec	1,394	15,416
TTFB / TTLB (msec)	70.82 / 70.97	3.39 / 4.02
User Mode CPU %	76.57%	0.78%
Kernel Mode CPU %	22.69%	99.22%
System Calls / Sec	20,110	2,101
Network Util (KB / Sec)	6,153	68,326
Context Switches / Sec	2,621	6,261

As you can see, the difference is substantial when the content is being served directly from the kernel-mode response cache. When the requests are going to the user-mode static-file cache, every request transitions the processor from kernel to user mode and back, and changes process page pointers and other operating system context data. If the workload stays in the kernel, a lot of system overhead is avoided.

Additionally, with the ASP.NET kernel-mode response cache, there are no changes required from a developer's perspective. If the existing .aspx page already has the following directive:

<@ OutputCache Duration="time in seconds VaryByParam=none %>10

the page output is automatically served from the kernel-mode response cache.

On a high volume Web site, setting the output cache duration to as low as one second can also make an important difference to the overall throughput of a server.

Partial Page Caching

Another major consideration for a Web application is which sections of dynamically generated pages can be statically cached and which parts need to be regenerated. It can be a big performance boost if expensive-to-create portions of pages can be cached, and only the parts of a page that really need to be regenerated are dynamically generated per page access.

Exception Handling

Exception handling should be just that, code that is executed in the exceptional case, not code that is executed all of the time. Many programmers have developed a style where they use an exception as a convenient way to execute cleanup logic on the basis of a standard condition.

A few statistics might help to drive the point of how expensive exceptions can be: on a processor level, the cost of throwing an exception in the CLR11 can be up to 1.8 million CPU cycles12. Again, 1.8 million CPU cycles gone, when one comparison operation would have sufficed. Worse, throwing an exception invariably acquires a system lock for page frame manipulation. The section entitled Software Locks (Resource Contention) explains why this is a bad thing to do. The result is that an application that throws and catches a lot of exceptions will not scale on large multiprocessor servers, in addition to wasting CPU cycles.

To summarize, throwing exceptions should be done very sparingly and certainly not on your common code paths, if you want to have high performing, scalable code.

COM+ Interop

Many companies have an existing inventory of applications or components that have been built using COM+ over the years. When writing new ASP.NET applications, chances are that many of the existing COM+ components will be integrated into these ASP.NET applications (why fix something that isn’t broken?). Therefore, this section contains a set of guidelines regarding the use of COM+ interop in ASP.NET.

Infrequent, Functionally-Rich Calls

The key to efficiency when using COM+ interop with ASP.NET applications is to minimize the number of times you transition from one environment to the other. An application needs to minimize the number of housekeeping CPU cycles expended on transitioning between the world of managed code and unmanaged code. Therefore, it is better to try and have an interface that has substantial, but less frequent calls, rather than frequent, chatty calls.

Transactional Attribute on COM+

Two-phase commit is one of the key features of COM+. Many projects mark all of the objects involved in an application with the Required or Requires New transaction attribute. This can introduce a substantial processing and network latency cost to the application.

The reason for this is that when a new transaction is started, or when a component enlists itself in a transaction, there is a communication with the Microsoft Distributed Transaction Coordinator Service (MSDTC). If the MSDTC is running on the local server, this will involve extra context switches; if MSDTC is running on a remote server, it will also include additional network latency. Use the transaction attributes only when necessary. If performance is a critical concern, consider re-factoring your components appropriately.

Single-Threaded Apartment

If your ASP.NET solution uses COM+ components that were built using Microsoft Visual Basic 6.0 or COM+ components marked as apartment threaded, it is recommended that you mark your ASP.NET page with the AspCompat directive. At the top of the .aspx page, add the following to the directive line:

<%@ Page AspCompat="true" Language="vb" %>

The reason for this is that if your .aspx page detects a COM+ object, in addition to interop, it also has to run your component in its own apartment, which means additional threads and data structure marshalling,essentially more operating system housekeeping. You can avoid some of this overhead with the AspCompat directive.