Performance and Reliability Monitoring

Article
12/09/2009

Monitoring your Hardware and Applications

An important part of operations is monitoring the performance and reliability of your site. Through monitoring you gain insight into potential performance bottlenecks and establish baseline performance values. These baseline values can be used to assess the effectiveness of performance tuning and hardware upgrades.

Monitoring reliability helps you find problems before they cause loss of service. IIS can be set to restart automatically if an application causes the service to crash. By monitoring these restarts you can fix problems with errant applications in the early stage.

Tools to Monitor and Test Server Performance

To support your performance tuning and testing needs, Microsoft offers a number of tools: some included with Windows 2000 and IIS 5.0, others offered on the Windows 2000 Resource Kit CD, and still others downloadable from the Microsoft Web site.

The System Monitor is built in to Windows 2000 and is essential to monitoring nearly every aspect of server performance.
Process and Thread Status (pstat.exe) shows the status of all running processes and threads.
Process Tree (ptree.exe) allows you to query the process inheritance tree and kill processes on local or remote computers.
The HTTP Monitoring Tool monitors HTTP activity on your servers and can notify you if there are changes in the amount of activity.
Network Monitor is a Windows 2000 administrative tool you can use to keep tabs on network traffic.
NetStat is a command line tool that detects information about your server's current network connections.
Windows Management Instrumentation exposes hardware and software diagnostics in a common API.
Microsoft Management Console accepts Snap-ins that display network wide diagnostics.

At the center of these tools are the Performance Counters that are built into IIS 5.0 and the Windows 2000 operating system. Developers can also include custom Performance Counters in the ISAPI DLLS or COM components that they write. These counters can be read directly by a number of the tools mentioned above, including System Monitor, the Web Application Stress Tool and WCAT.

System Monitor is the single most important tool to establish a baseline of performance on your Web server and monitor the effects on performance of any changes you make to software or hardware. System Monitor provides a UI that allows you to see performance counter readings whether you are monitoring or logging them. It also allows you to graphically log counter activity and set alerts that will appear in Event Viewer. System Monitor provides documentation for each counter in your system.

For more information on any of these tools, see the online IIS 5.0 documentation included in the Windows 2000 Resource Kit.

Monitoring Your Hardware

Memory

Problems caused by memory shortages can often appear to be problems in other parts of the system. You should monitor memory first to verify that your server has enough, and then move on to other components. To run Windows 2000 and IIS 5.0, the minimum amount of RAM a dedicated Web server needs is 128MB, but 256 MB to 1GB is often better. Additional memory is particularly beneficial to e-commerce sites, sites with a lot of content, and sites that experience a high volume of traffic. Since the IIS File Cache is set to use up to half of available memory by default, the more memory you have, the larger the IIS File Cache can be.

Note: Windows 2000 Advanced Server can support up to 8GB of RAM, but the IIS File Cache will not take advantage of more than 4GB.

To determine if the current amount of memory on your server will be sufficient for your needs, use the Performance tool that is built in to Windows 2000. The System Monitor, which is part of the Performance tool, graphically displays counter readings as they change over time.

Also, keep an eye on your cache settings—adding memory alone won't necessarily solve performance problems. You need to be aware of IIS cache settings and how they affect your server's performance. If these settings are inappropriate for the loads placed on your server, they, rather than a lack of memory, may cause performance bottlenecks. For more information about these cache settings, see the IIS Settings section and Appendix 1: Performance Settings of this document. For a discussion about caching with ASP and IIS, see Appendix 3: ASP Caching.

Note: When using Performance counters to monitor performance, you can see a description of any counter by selecting that counter in the Add Counters dialog and clicking Explain.

Log the following counters to determine if there are performance bottlenecks associated with memory:

Memory: Available Bytes. Try to reserve at least ten percent of memory available for peak use. Keep in mind that IIS 5.0 uses up to 50 percent of available memory for its file cache by default.
Memory: Page Faults/sec, Memory: Pages Input/sec, and Memory: Page Reads/sec. If a process requests a page in memory and the system cannot find it at the requested location, this constitutes a page fault. If the page is elsewhere in memory, the fault is called a soft page fault. If the page must be retrieved from disk, the fault is called a hard page fault. Most processors can handle large numbers of soft faults without consequence. However, hard faults can cause significant delays. Page Faults/sec is the overall rate at which the processor handles faulted pages, including both hard and soft page faults. Pages Input/sec is the total number of pages read from disk to resolve hard page faults. Page Reads/sec is the number of times the disk was read to resolve hard page faults. Pages Input/sec will be greater than or equal to Page Reads/sec and can give you a good idea of your hard page fault rate. If these numbers are low, your server should be responding to requests quickly. If they are high, it may be because you've dedicated too much memory to the caches, not leaving enough memory for the rest of the system. You may need to increase the amount of RAM on your server, though lowering cache sizes can also be effective.
Memory: Cache Bytes, Internet Information Services Global: File Cache Hits %, Internet Information Services Global: File Cache Flushes, and Internet Information Services Global: File Cache Hits. The first counter, Memory: Cache Bytes, reveals the size of the File System Cache, which is set to use up to 50 percent of available physical memory by default. Since IIS automatically trims the cache if it is running out of memory, keep an eye on the direction in which this counter trends. The second counter is the ratio of cache hits to total cache requests and reflects how well the settings for the IIS File Cache are working. For a site largely made up of static files, 80 percent or more cache hits is considered a good number. Compare logs for the last two counters, IIS Global: File Cache Flushes and IIS Global: File Cache Hits, to determine if you are flushing objects out of your cache at an appropriate rate. If flushes are occurring too quickly, objects may be flushed from cache more often than they need to be. If flushes are occurring too slowly, memory may be wasted. See the ObjectCacheTTL, MemCacheSize, and MaxCachedFileSize objects in Appendix 1: Performance Settings.
Page File Bytes: Total. This counter reflects the size of the paging file(s) on the system. The larger the paging file, the more memory the system commits to it. Windows 2000 itself creates a paging file on the system drive; you can create a paging file on each logical disk, and you can change the sizes of the existing files. In fact, striping a paging file across separate physical drives improves paging file performance (use drives that do not contain your site's content or log files). Remember that the paging file on the system drive should be at least twice the size of physical memory, so that the system can write the entire contents of RAM to disk if a crash occurs.
Memory: Pool Paged Bytes, Memory: Pool Nonpaged Bytes, Process: Pool Paged Bytes: Inetinfo, Process: Pool Nonpaged Bytes: Inetinfo, Process: Pool Paged Bytes: dllhost#n , and Process: Pool Nonpaged Bytes: dllhost. Memory: Pool Paged Bytes and Memory: Pool Nonpaged Bytes monitor the pool space for all processes on the server. The other counters listed here monitor the pool space used directly by IIS 5.0, either by the Inetinfo process (in which IIS runs) or by the Dllhost processes (in which isolated or pooled applications run) instantiated on your server. Be sure that you monitor counters for all instances of Dllhost on your server; otherwise, you will not get an accurate reading of pool space used by IIS. The system's memory pools hold objects created and used by applications and the operating system. The contents of the memory pools are accessible only in privileged mode. That is, only the kernel of the operating system can directly use the memory pools; user processes cannot. On servers running IIS 5.0, threads that service connections are stored in the nonpaged pool along with other objects used by the service, such as file handles and sockets.

Besides adding more RAM, try the following techniques to enhance memory performance: improve data organization, try disk mirroring or striping, replace CGI applications with ISAPI or ASP applications, enlarge paging files, retime the IIS File Cache, eliminate unnecessary features, and change the balance of the File System Cache to the IIS 5.0 Working Set. The last of these techniques is detailed later in this document.

Processor Capacity

With users demanding quick response time from Web sites and the increasing amount of dynamically generated content on these sites, a premium is placed on fast and efficient processor usage. Bottlenecks occur when one or more processes consume practically all of the processor time. This forces process threads that are ready to be executed to wait in a queue for processor time. Adding other hardware, whether memory, disks or network connections, to try to overcome a processor bottleneck will not be effective and will frequently only make matters worse.

IIS 5.0 on Windows 2000 Server scales effectively across two to four processors. Consider the business needs of your Web sites if you're thinking of adding more processors. For example, if you host primarily static content on your server, a two-processor computer is likely to be sufficient. If you host dynamically generated content, a four-processor setup may solve your problems. However, if the workload on your site is sufficiently CPU-intensive, no single computer will be able to keep up with requests. If this is the case for your site, you should scale it across multiple servers. If you already run your site on multiple servers, consider adding more.

You should be aware, however, that the biggest performance gains with Windows 2000 and IIS 5.0 result from resolving memory issues. Before you make any decisions about changing the number of processors on your Web servers, rule out memory problems and then monitor the following Performance Counters.

System: Processor Queue Length. This counter displays the number of threads waiting to be executed in the queue that is shared by all processors on the system. If this counter has a sustained value of two or more threads, you have a processor bottleneck on your hands.
Processor: %Processor Time. Processor bottlenecks are characterized by situations in which Processor: % Processor Time numbers are high while the network adapter card and disk I/O remain well below capacity. On a multi-processor computer, it's a good idea to examine the Processor: % Processor Time counter to pick up any imbalance.
Thread: Context Switch/sec:Dllhost#N=>Thread# , Thread: Context Switches/sec:Inetinfo=>Thread# , and System: Context Switches/sec. If you decide to increase the size of the thread pool, you should monitor the three counters listed here. Increasing the number of threads may increase the number of context switches to the point where performance decreases instead of increases. Ten context switches or more per request is quite high; if these numbers appear, consider reducing thread pool size. Balancing threads against overall performance as measured by connections and requests can be difficult. Any time you tune threads, follow-up with overall performance monitoring to see if performance increases or decreases. To determine if you should adjust the thread count, compare the number of threads and the processor time for each thread in the process to the total processor time. If the threads are constantly busy, but are not fully using the processor time, performance may benefit from creating more threads. However, if all the threads are busy and the processors are close to their maximum capacity, you are better off distributing the load across more servers rather than increasing the number of threads. See the AspThreadGateEnabled and AspProcessorThreadMax metabase properties in Appendix 1: Performance Settings in this document.
Processor: Interrupts/sec and Processor: % DPC Time. Use these counters to determine how much time the processor is spending on interrupts and deferred procedure calls (DPCs). These two factors can be another source of load on the processor. Client requests can be a major source of each. Some new network adapter cards include interrupt moderation, which accumulates interrupts in a buffer when the level of interrupts becomes too high.

Network Capacity, Latency, and Bandwidth

Essentially, the network is the line through which clients send requests to your server. The time it takes for those requests and responses to travel to and from your server is one of the largest limiting factors in user-perceived server performance. This request-response cycle time is called latency, and latency is almost exclusively out of your control as a Web server administrator. For example, there is little you can do about a slow router on the Internet, or the physical distance between a client and your server.

On a site consisting primarily of static content, network bandwidth is the most likely source of a performance bottleneck. Even a fairly modest server can completely saturate a T3 connection (45mbps) or a 100mbps Fast Ethernet connection. You can mitigate some of these issues by tuning the connection you have to the network and maximizing your effective bandwidth as best you can.

The simplest way to measure effective bandwidth is to determine the rate at which your server sends and receives data. There are a number of Performance counters that measure data transmission in many components of your server. These include counters on the Web, FTP, and SMTP services, the TCP object, the IP object, and the Network Interface object. Each of these reflects different Open System Interconnectivity (OSI) layers. For a detailed list of these counters and their analysis, see the Internet Information Services 5.0 Resource Guide, released with the Windows 2000 Server Resource Kit. In particular, see the Network I/O section of the Monitoring and Tuning Your Server chapter. To start, however, use the following counters:

Network Interface: Bytes Total/sec. To determine if your network connection is creating a bottleneck, compare the Network Interface: Bytes Total/sec counter to the total bandwidth of your network adapter card. To allow headroom for spikes in traffic, you should usually be using no more than 50 percent of capacity. If this number is very close to the capacity of the connection, and processor and memory use are moderate, then the connection may well be a problem.
Web Service: Maximum Connections and Web Service: Total Connection Attempts. If you are running other services on the computer that also use the network connection, you should monitor the Web Service: Maximum Connections and Web Service: Total Connection Attempts counters to see if your Web server can use as much of the connection as it needs. Remember to compare these numbers to memory and processor usage figures so that you can be sure that the connection is the problem, not one of the other components.

Disk Optimization

Since IIS 5.0 writes logs to disk, there is regular disk activity even with 100 percent client cache hits. Generally speaking, if there is high disk read activity other than logging, this means that other areas of your system need to be tuned. For example, hard page faults cause large amounts of disk activity, but they are indicative of insufficient RAM.

Accessing memory is faster than disk seeks by a factor of roughly 1 million; clearly, searching the hard disk to fill requests will degrade performance. The type of site you host can have a significant impact on the frequency of disk seeks. If your site has a very large file set that is accessed randomly, if the files on your site tend to be very large, or if you have a very small amount of RAM, then IIS is unable to maintain copies of the files in RAM for faster access.

Typically, you will use the Physical Disk counters to watch for spikes in the number of disk reads when your server is busy. If you have enough RAM, most connections will result in cache hits unless you have a database stored on the same server, and clients are making dissimilar queries. This situation precludes caching. Be aware that logging can also cause disk bottlenecks. If there are no obvious disk-intensive issues on your server, but you see lots of disk activity anyway, you should check the amount of RAM on your server immediately to make sure you have enough memory.

To determine the frequency of disk access, log the following counters:

Processor: % Processor Time, Network Interface Connection: Bytes Total/sec, and PhysicalDisk: % Disk Time. If all three of these counters have high values, then the hard disk is not causing a bottleneck for your site. However, if the % Disk Time is high and the processor and network connection are not saturated, then the hard disk may be creating a bottleneck. If the Physical Disk performance counters are not enabled on your server, open a command line and use the diskperf -yd command.

Security

Balancing performance with users' concerns about the security of your Web applications is one of the most important issues you will face, particularly if you run an e-commerce Web site. Since secure Web communication requires more resources than non-secure Web communications, it is important that you know when to use various security techniques, such as the SSL protocol or IP address checking, and when not to use them. For example, your home page or a search results page most likely doesn't need to be run through SSL. However, when a user goes to a checkout or purchase page, you will want to make sure that page is secure.

If you do use SSL, be aware that establishing the initial connection is five times as expensive as reconnecting using security information in the SSL session cache. The default timeout for the SSL session cache has been changed from two minutes in Windows NT 4.0 to five minutes in Windows 2000. Once this data is flushed, the client and server must establish a completely new connection. If you plan on supporting long SSL sessions you could consider lengthening this timeout with the ServerCacheTime registry setting. If you expect thousands of users to connect to your site using SSL, a safer approach is to estimate how long you expect SSL sessions to last, then set the ServerCacheTime parameter slightly longer than your estimate. Do not set the timeout much longer than this or else your server may leave stale data in the cache. Also, make sure that HTTP Keep-Alives are enabled. SSL sessions do not expire when used in conjunction with HTTP Keep-Alives unless the browser explicitly closes the connection.

In addition to all security techniques having performance costs, Windows 2000 and IIS 5.0 security services are integrated into a number of operating system services. This means that you can't monitor security features separately from other aspects of those services. Instead, the most common way to measure security overhead is to run tests comparing server performance with and without a security feature. The tests should be run with fixed workloads and a fixed server configuration, so that the security feature is the only variable. During the tests, you probably want to measure the following:

Processor Activity and the Processor Queue: Authentication, IP address checking, SSL protocol, and encryption schemes are security features that require significant processing. You are likely to see increased processor activity, both in privileged and user mode, and an increase in the rate of context switches and interrupts. If the processors in the server are not sufficient to handle the increased load, queues are likely to develop. Custom hardware, such as cryptographic accelerators, may help here.
If the SSL protocol is being used, lsass.exe may consume a surprising amount of CPU. This is because SSL processing occurs here. This means that administrators used to monitoring CPU usage in Windows NT may see less processor consumed by Inetinfo.exe and more consumed by Isass.exe.
Physical Memory Used: Security requires that the system store and retrieve more user information. Also, the SSL protocol uses long keys—40 bits to 1,024 bits long—for encrypting and decrypting the messages.
Network Traffic: You are also likely to see an increase in traffic between the IIS 5.0-based server and the domain controller used for authenticating logon passwords and verifying IP addresses.
Latency and Delays: The most obvious performance degradation resulting from complex security features like SSL is the time and effort involved in encryption and decryption, both of which use lots of processor cycles. Downloading files from servers using the SSL protocol can be 10 to 100 times slower than from servers that are not using SSL.

If a server is used both for running IIS 5.0 and as a domain controller, the proportion of processor use, memory, and network and disk activity consumed by domain services is likely to increase the load on these resources significantly. The increased activity can be enough to prevent IIS 5.0 services from running efficiently. It is highly recommended that you refrain from running IIS 5.0 on a domain controller.

Monitoring Your Web Applications

Upgrading a poorly written application with one that is well designed and has been thoroughly tested can improve performance dramatically (sometimes as much as thirty fold). Keep in mind, however, that your Web applications may be affected by back-end latencies (for example, legacy systems such as AS/400). Remote data sources may cause performance problems for any number of reasons. If developers design applications to get data from another Web site, and that Web site crashes, it can cause a bottleneck on your server. If applications are accessing a remote SQL Server database, the database may have problems keeping up with requests sent to it. While you may be the administrator of your site's SQL database, it can be difficult to monitor these servers if they are remotely located. Worse, you may have no control over the database servers, or other back end servers. If you can, monitor the back end servers that work with your applications and keep them as well tuned as you do your Web server.

To determine if your Web applications are creating a bottleneck on your server, monitor the following performance counters:

Active Server Pages: Requests/Sec, Active Server Pages: Requests Executing, Active Server Pages: Request Wait Time, Active Server Pages: Request Execution Time, and Active Server Pages: Requests Queued. If you are running ASP applications on your server, these counters can provide you with a picture of how well the applications are performing. Active Server Pages: Requests/Sec does not include requests for static files or other dynamic content and will fluctuate considerably based on the complexity of the ASP pages and the capacity of your Web server. If this counter is low during spikes in traffic on your server, your applications may be causing a bottleneck. Requests Executing indicates the number of requests currently executing; Request Wait Time indicates the number of milliseconds the most recent request was waiting in the queue, and Request Execution Time indicates how many milliseconds the most recent request took to execute. Ideally, Requests Queued and Request Wait time should remain close to 0, but they will go up and down under varying loads. The maximum number for Requests Queued is determined by the metabase setting for AspRequestQueueMax. If the limit is reached, client browsers will display "HTTP 500/ ServerToo Busy." If these numbers deviate a great deal from their expected range, your ASP applications will likely need to be rewritten to improve performance. Request Execution Time can be somewhat misleading because it is not an average. For example, if you regularly receive 30 requests for a page that executes in 10 milliseconds (ms) to every one request for a 500ms page, the counter is likely to indicate 10ms, although the average execution time is over 25ms. It's hard to say what is a good value for Requests Executing. If pages execute quickly and don't wait for I/O (loading a file or making a database query), this number is likely to be low (little more than the number of processors when the machine is busy). If pages must wait for I/O, the number of pages executing is likely to be higher (close to AspProcessorThreadMax multiplied by the number of processors). If Requests Executing is high, Requests Queued is large and the CPU utilization is low, you may need to increase AspProcessorThreadMax. When enabled, thread gating seeks to optimize Requests Executing. The user's response time is proportional to Request Wait Time plus Request Execution Time plus network latency.
Web Service: CGI Requests/sec and Web Service: ISAPI Extension Requests/Sec report the rates at which your server is processing CGI and ISAPI application requests. If these values drop while under increasing loads, you may need to have the application developers revisit their code.

Note: ASP is an ISAPI Extension and is included by the second counter.
Web Service: Get Requests/sec and Web Service: Post Requests/Sec reflect the rate at which these two common HTTP request types are being made to your server. POST requests are generally used for forms and are sent to ISAPIs (including ASP) or CGIs. GET requests account for almost all other requests from browsers and include static files, requests for ASPs and other ISAPIs, and CGI requests.

Reliability Monitoring

The Microsoft® Windows® 2000 operating system contains tools to monitor various conditions of the operating system and the computer in general. This paper describes these tools, their metrics, and some of the commonly monitored conditions. This paper is not meant to be an in-depth study of all the capabilities of the tools, but is intended to be a source of reference for setting up and managing the most common measurement conditions.

Reliability and Availability Metrics

Operating System Stop Errors

As with all operating systems, Windows 2000 occasionally encounters serious error conditions, and stops responding. Windows stop errors display text on the console video screen with a blue background, and hence are often called blue screens. These conditions are also referred to as bug checks. Fortunately, operating system stoppages are relatively rare events. However, customers should still monitor these regularly.

A complete description of procedures for handling Windows stop errors is beyond the scope of this paper. However, customers can find additional information about these conditions in the Microsoft Support Knowledge Base at: https://support.microsoft.com/directory/default.asp?&SD=GN. In particular, refer to these articles:

192463. Gathering Blue Screen Information After Memory Dump
129845. Blue Screen Preparation Before Calling Microsoft

The Windows Event Log service is a useful tool for historical monitoring of operating system crashes. Stop errors are recorded in the Event Log when the system restarts, and the crash dump is saved in a permanent file (usually called Memory.dmp). For details, see the subsection "Save Dump" in the "Using the Event Log as a Data Source" section of this document.

Operating System Reboots

Windows 2000 reboots occur for a variety of reasons, including operating system upgrades, software installation, and hardware maintenance. Reboots are recorded in the System Event Log. A system's reboot frequency tends to drop when the system is stable. Thus, historical reboot frequencies are a long-term indicator of system and data center health. For more information, see the subsection "Startup Event" in the "Using the Event Log as a Data Source" section of this document.

Application Failures

Windows 2000 uses the Dr. Watson utility to record application failures. This utility appends information to the Drwtsn32.log file in the system root for each application failure. It also creates a User.dmp file that contains a memory dump of the user mode program that failed.

As with operating system crashes, complete procedures for handling application crashes is beyond the scope of this paper. For more information, refer to the following support articles in the Microsoft Support Knowledge Base:

Q94924. Postmortem Debugging Under Windows NT

141465. How to Install Symbols for Dr Watson Error Debugging

Application failures are recorded in the Application Event Log; therefore, the historical frequencies of these events are usually available for analysis. For details, see the subsection "Dr. Watson Event" in the "Using the Event Log as a Data Source" section of this document.

Operating System Availability

Most customers are very interested in the availability of the application services provided by the Windows operating systems. Each application generally requires different instrumentation; therefore, rather than measuring the application's availability directly, some customers find it useful to measure the operating system availability. The events needed to do this are contained in the Windows System Event Log.

There are several variations of availability, including planned availability and total availability. Total availability is defined as the percentage of up time over total run time, and can be computed easily using the information in the System Event Log.

Operating System Mean Time to Repair

There is a strong correlation between availability and recoverability of systems. System recoverability is measured as the length of time a system is unavailable following a system outage. Typically, this is reported as mean time to repair. It is easy to measure mean time to repair using the Windows 2000 operating system Event Log. An outage begins when a system is shut down and ends when the system is restarted. To understand how to capture the time stamps associated with these events, see the subsections "Startup Event," "Clean Shutdown Event," and "Dirty Shutdown Event," in the "Using the Event Log as a Data Source" section of this document.

Using the Event Log as a Data Source

Event Viewer Utility

You can use the Event Log service and Event Viewer to gather information about hardware, software, and system problems, and to monitor Windows security events.

Windows 2000 records events in three types of logs:

Application Log. The Application Log contains events logged by applications or programs. For example, a database program might record a file error in the Application Log. The program developer decides which events to record.
System Log. The System Log contains events logged by the Windows system components. For example, the failure of a driver or other system component to load during startup is recorded in the System Log. The event types logged by system components are predetermined for the operating system.
Security Log. The Security Log can record security events, such as valid and invalid logon attempts as well as events related to resource use, such as creating, opening, or deleting files. An administrator can specify what events are recorded in the Security Log. For example, if you have enabled logon auditing, attempts to log on to the system are recorded in the Security Log.

Event Viewer displays these types of events:

Error. A significant problem, such as loss of data or loss of functionality. For example, if a service fails to load during startup, an error is logged.
Warning. An event that is not necessarily significant, but may indicate a possible future problem. For example, when disk space is low, a warning is logged.
Information. An event that describes the successful operation of an application, driver, or service. For example, when a network driver loads successfully, an information event is logged.
Success Audit. An audited security access attempt that succeeds. For example, a user's successful attempt to log on to the system is logged as a Success Audit event.
Failure Audit. An audited security access attempt that fails. For example, if a user tries to access a network drive and fails, the attempt is logged as a Failure Audit event.

The Event Log service starts automatically when you start Windows. All users can view the Application and System Logs, but only administrators have access to Security Logs.

By default, security logging is turned off. You can use Windows 2000 Group Policy to enable security logging. The administrator can also set auditing policies in the registry that cause the system to halt when the Security Log is full. For more information about using Group Policy, refer to the Windows 2000 documentation and to the Group Policy white papers available at https://www.microsoft.com/windows2000/library.

To display the Event Viewer

On the Start menu, click Run.
Type Eventvwr
Click OK. The Event Viewer displays as follows:

Figure 1: Event Viewer

Exporting the Event List

You may want to export the event list to Microsoft Excel so that you can save and analyze the data.

To export the list

On the Event Viewer Action menu, click Export List. A Save As window displays.
Save the file with an .xls extension.

Figure 2: Exporting the event list

The opened Excel file appears similar to the following:

Figure 3 Opened Excel file with exported event data

Sorting Events

You can sort events to more easily review and analyze the data.

To specify sort order

On the View menu, click Newest First or Oldest First. The default is from newest to oldest.
(Optional) On the Options menu, check the Save Settings On Exit box to use the current sort order the next time you start Event Viewer.

Figure 4: Specifying sort order

Note When a log is archived, the sort order affects files that you save in text format or comma-delimited text format. The sort order does not affect event records you save in log-file format.

Filtering Events

You can filter events so that you can easily see only those events that you wish to review and analyze.

To filter events

On the View menu, click Filter Events.
In the Filter dialog box, specify the characteristics for displayed events. To return to the default criteria, click Clear.

Figure 5: Filtering events

To turn off event filtering, click All Events in the View menu.

Startup Event

Windows 2000 records startup events in the System Event Log, as shown below. The Event Log service itself is the source of this event, and the Event ID is 6005. The time of this event is approximately the time the operating system becomes available to applications.

Figure 6: Event Log, Startup Event

Clean Shutdown Event

Windows 2000 records a new event whenever an operating system shutdown is initiated. A clean shutdown can be initiated through several mechanisms.

Direct user interaction using a Shutdown screen as follows:
Shutdown or Restart using Ctrl+Alt+Delete
Shutdown or Restart using the Start menu
Shutdown or Restart using the Logon screen
Programmatically as follows:
InitiateSystemShutdown WIN32 API – local
InitiateSystemShutdown WIN32 API – remote

The Event Log service itself is the source of this event, and the Event ID is 6006. The time of this event is approximately the time the operating system becomes unavailable to applications.

Figure 7: Event Log, Clean Shutdown Event

Dirty Shutdown Event

Windows 2000 records a new event whenever the operating system is shutdown using a mechanism other than a clean shutdown. The most common cause is when the system is turned off. The Event Log service itself is the source of this event, and the Event ID is 6008. The event is recorded when the system restarts and Windows 2000 discovers that the previous shutdown was not clean.

Figure 8: Event Log, Dirty Shutdown Event

While Windows 2000 server is running, the system periodically writes a time stamp to disk. This last alive time stamp is saved in the Windows 2000 registry, always overwriting the last alive time stamp from the previous interval. Whenever the last alive time stamp is written, it is also flushed to disk. In this way, if the computer crashes, you would have a boot stamp and a last alive stamp as the final two entries in the stream. If the computer shuts down normally, the normal shutdown time stamp would overwrite the last alive time stamp.

The time in the description portion of this event is the last alive time and is therefore shortly before the time the operating system became unavailable to applications.

The last alive time stamp is written only on Windows 2000 server operating systems. The Windows 2000 Professional operating system does not maintain this time stamp, nor does it record dirty shutdown events.

The last alive time stamp is written to the registry at HKLM\Software\Microsoft\Windows\CurrentVersion\Reliability\LastAliveStamp.

The last alive time stamp interval defaults to 5 minutes. You can add the registry value TimeStampInterval to can change the interval. This value is in units of minutes. Setting it to zero prevents any last alive time stamp logging; only the boot and normal shutdown stamps are written in that case.

System Version Event

Windows 2000 records a new event containing the operating system version information whenever the system is started. This makes it easier to post-process Windows 2000 Event Logs by operating system version. The Event Log service itself is the source of this event, and the Event ID is 6009.

Service Pack Installation

Windows 2000 now records service pack version details in the system Event Log. This makes it easier to post-process Windows 2000 system Event Logs by operating system version.

Figure 9: Event Log, Service Pack Installation Event

Save Dump

Save Dump events are always generated on Windows 2000 Server systems after an operating system stop error. They can still be disabled on Windows 2000 Professional systems.

Figure 10: Event Log, Save Dump Event

Dr. Watson Event

Windows 2000 records application failures in Dr. Watson log files, and the Dr. Watson utility records application failure events in the Windows 2000 Application Event Log as shown below.

Figure 11: Event Log, Dr. Watson Event

Using Performance Monitor

System Performance Monitor is a tool that allows an administrator to monitor many types of conditions occurring within a local computer or a remote computer located across the globe.

PerfMon performs real-time and short-term historical monitoring of conditions called counters that are contained within categories of objects. One such counter, System Uptime, is described below.

System Uptime Counter

The System Uptime counter measures the time, in seconds, that the system has been "alive." PerfMon graphs the results on the screen as they are gathered, and allows you to export the results to Excel for reporting purposes.

Figure 12: PerfMon System Uptime counter

Alerts

You can configure PerfMon to alert you when thresholds have been exceeded. You choose the criteria you want reported and the manner in which you want it reported to you. Figure 13 shows PerfMon set up to report when CPU performance exceeds 80 percent.

Figure 13: PerfMon Alert