Application Pool Health

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1

Important

This feature of IIS 6.0 is available only when running in worker process isolation mode.

In worker process isolation mode, application pools can be configured to monitor the health of their worker processes as well as the health of the entire pool. Monitoring the health of a worker process includes detecting that the worker process is not able to serve requests and taking appropriate action. For example, if a worker process fails to respond to a ping request by the World Wide Web Publishing Service (WWW service,) the worker process probably does not have threads available for processing incoming requests. When this happens, the WWW service will either terminate the worker process, or release the worker process and leave it running, and start a new worker process to replace it. An administrator can pre-configure an action to take when an unhealthy worker process is released, such as attaching the worker process to a debugger.

In addition to monitoring the health of worker processes, the WWW service can also detect ongoing problems with the entire application pool. For example, if worker processes are terminating abnormally every few seconds, the WWW service can determine that the worker process is unhealthy and stop it, preventing the unhealthy applications from affecting applications in other pools.

The following conditions cause an application pool to start or stop:

  • Initiation of rapid-fail protection causes an application pool to stop.

  • A job object hitting it's time limit causes an application pool to stop, followed by a start when the time window expires.

  • A configuration error caused by trying to use a nonexistent identity causes an application pool to stop.

  • A Windows administrator performs a demand stop or start on an application pool.

Monitoring Application Pool Health

Worker process pinging enables the WWW service to detect that a worker process is unable to respond to requests (In other words, the worker process is unhealthy). The ping is a message sent between the WWW service and the worker process. If the ping succeeds, then the WWW service assumes the application pool is healthy. If the ping fails (no response from the worker is received), then the WWW service assumes that a problem exists with the worker process. If a problem exists, the WWW service will either terminate the worker process or release it, and then will start a new worker process when it is needed to serve new requests. If the worker process is released but still running, then the WWW service will run the action configured by the administrator.

The setting of the WWW service pinging feature and the frequency of the ping are controlled by the PingingEnabled Metabase Property and the PingInterval Metabase Property.

ISAPI Extensions Can Declare Themselves Unhealthy

An ISAPI extension application can be built to programmatically signal IIS that it needs to be recycled. This can be accomplished through the new Server Support function: HSE_REQ_REPORT_UNHEALTHY. See ServerSupportFunction Extension Function at MSDN Online.

To use this function effectively, the IIS server that is running your application must have worker process pinging enabled (see Configuring Worker Process Pinging), because it is during the ping operation that the WWW service detects that the ISAPI is unhealthy and the worker process should be recycled. The ISAPI will need an internal mechanism for determining its unhealthy state, such as monitoring the status of its internal thread pool. You should consider that this type of programming shuts down the worker process the ISAPI extension is running in. Therefore, all applications running in that worker process will be restarted.

The ASP ISAPI extension implements logic that takes advantage of this feature as it monitors the status of its internal thread pool. If too many of its threads enter a blocking state, it will signal a recycle.

Limitations of Health Detection

Health detection cannot be used to determine application failures that do not cause the worker process to crash nor block the available threads in the worker process. As an example, an application that is returning invalid response codes such as HTTP 500 errors, but otherwise functioning normally, will still respond to a ping from the WWW service, unless the application was a custom ISAPI extension that implemented specific code to indicate its unhealthy state.

Enabling Debugging Action

When a worker process is determined to be unhealthy, instead of terminating it, you might want the WWW service to keep it running for the purpose of debugging it, while bringing up a new worker process to serve requests. By enabling debugging on an application pool, you signal the WWW service not to terminate the worker process, but to release it from serving an application pool, and leave it running.

In addition to running an unhealthy worker process in a released state, you can configure the WWW service to launch an executable application or script (for example, an application that sends e-mail to administrators notifying them that the failure could be configured as an enable-debugging feature). The process id of the process deemed unhealthy is the first argument sent to the executable application or script.

Note

A worker process that has been released and left running may still terminate. The WWW service will have left it running. If the worker process recovers from its unhealthy state, it will detect that it has no relationship to the WWW service, and self-terminate. It would then be possible to find a log entry stating that a worker process was released, but find no evidence of the worker process running.

If you enable running unhealthy worker processes to be released, consider that you will need to deal with blocked worker processes, as they will not be removed from memory by IIS. There could be large numbers of failed worker processes running on your computer if administrators are not properly handling worker processes that are kept alive for debugging purposes. Also, consider that these worker processes may be tying up resources needed by other processes. You may need to terminate them quickly in order to free up those resources.

Rapid-Fail Protection

Rapid-fail protection stops application pools when too many worker processes assigned to it are found to be unhealthy in a specified period of time.

When an application pool is stopped, HTTP.sys will either return an out-of-service message (503: Service Unavailable) or connection resets based on the configuration of the LoadBalancerCapabilities property of the application pool. Also, when an application pool is stopped automatically, you can configure an action (a debugging action, for example) to notify the administrator that the application pool has stopped.

Note

If you host multiple applications on a single computer, you should be careful to configure load balancers or switching hardware to reroute only the traffic intended for a failed application pool. Do not route requests away from healthy application pools; they are still able to receive and process requests.

Rapid-fail protection reduces processing overhead for problematic applications as the requests do not enter user-mode processing. Thus, other application pools are protected from the unhealthy application pool.

You can set rapid-fail protection two ways:

  1. Configure IIS to place the application in a rapid-fail state based on the number of the worker process failures in a given time period, measured in minutes.

  2. Place an application in a rapid-fail state manually.

Scenario for Monitoring Application Pool Health

Monitoring application pool health and taking corrective actions requires, at a minimum, that you take the following steps:

  1. Allow the WWW service to detect unhealthy applications by enabling the WWW service to ping worker processes. See Configuring Worker Process Pinging.

  2. Configure rapid-fail protection to allow the disabling of application pools when the worker processes assigned to them crash a set number of times in a specified period. See Configuring Rapid-Fail Protection in IIS 6.0.