New Tools for Event Management in Windows Vista
At a Glance:
- New event infrastructure
- Event types
- Structured event properties
- Using XPath expressions
Many people say eventing and tracing are boring. Others complain that, more often than not, traces and events are merely by-products of secondary activities (such as debugging and self-monitoring features), and that not enough importance is placed on eventing and tracing capabilities.
Microsoft® Windows Vista™ aims to change these perceptions, offering a giant step forward in enterprise management. Microsoft has overhauled some of the key components and their user interfaces. The Event Log service has been completely rewritten with the enterprise in mind, and tracing has been made much faster and more secure.
Imagine having a better set of tools for discovering and resolving problems with mission-critical systems. Or how about features that let a support technician easily collect tell-tale events and traces from a user's system? Or what if your developers could diagnose and fix bugs in a deployed system using traces sent by users on the fly? Imagine you have easier, more powerful ways to gather information and quickly perform troubleshooting—now that doesn't sound so boring, does it?
What Are Events?
A computer application is like a "black box" that performs a function (or functions). A lot goes on inside this box, but since you can't see into it, it is very difficult to understand its inner workings. However, applications communicate outwardly—with other programs and with users. These communication "events" give you a glimpse into the application.
When it comes to software, an event is typically defined as an occurrence within a software system that is communicated to the outside (the outside being users or other programs). Such an occurrence typically corresponds to a state or configuration change. It may communicate current state or configuration of the software system and the reasons for the change.
More loosely, the term event is also used to refer to the way these occurrences are exposed. There are many examples of such an exposed event:
- A Win32® object used for interprocess communication (IPC)
- A WMI event used for transient (non-persistent) notifications
- An Event Tracing for Windows (ETW) trace event saved into trace files
- An Event Log event saved into the Event Log live logs and possibly archived into Event Log files
- An event saved into files using custom infrastructure
The last three examples listed here are of the sort we'll be looking at in this article.
How Events Are Used
A trace or logged event is a record of an occurrence in a program or operating system. These traces and event logs aren't just for developers. They offer indispensable tools for IT and support personnel, providing a glimpse into the state and inner workings of the applications you are running.
You can use such logs to monitor overall system health. Using the Event Log, you can search for any events that indicate problems. For example, you might find error event 6 in the Application log from the CertificateServicesClient source with the following message: "Automatic certificate enrollment for local system failed (0x80070576). There is a time and/or date difference between the client and server." Likewise, you can detect the transition of components from the unhealthy state back to normal operation. For example, after the time and date difference has been corrected, the same CertificateServiceClient source would publish informational event 19 in the Application log indicating: "Certificate enrollment for <user name> successfully received an AutoenrolledWindowsSystemComponentVerification certificate from certification authority <authority name>." Such information is extremely valuable for finding and resolving conflicts and other problems with configuration.
This info is also handy for diagnosing problems. You can locate program and system actions that lead to a problem and find details that can help you determine the root cause. Similarly, you can use this information to evaluate and solve performance issues.
And the Event Logs offer valuable information that you can use to ensure a more secure environment. They can be used to detect intrusion attempts, audit system history, ensure non-repudiation, and find resources configured incorrectly.
Events in Windows Vista
Previous versions of Windows® have had many shortcomings when it comes to eventing and tracing. These include limited scalability of the Event Log (which limited the total size of all logs to the amount of available memory), event publishing performance (which, for example, limited the number of events that could be published on an active Domain Controller), and limited security of the trace events.
Windows Vista addresses many of these issues with a new infrastructure for eventing and tracing, called Windows Eventing 6.0. This extends upon ETW, which has been in use since Windows 2000, and replaces the Event Log service and the Event Viewer. The new Windows Eventing is designed to deal specifically with events that are persisted into log files for future examination. (It is not intended for transient events like IPC and notification mechanisms.)
By providing security and scalability solutions, custom eventing and tracing implementations should be less important. Note that the enhancements are provided while preserving full compatibility with the existing Event Log and ETW APIs, which means that all existing applications will continue to work without change.
New Ways to View Events
The existing Event Viewer is already one of the most popular Windows programs in the IT community. The new Event Viewer has been completely rewritten and since it lives in Microsoft Management Console (MMC) 3.0, its appearance has also changed, but it's still familiar enough that the transition should be fairly easy.
There is still a tree pane and a list of events. You can still access the familiar Application, System, and Security logs under the Windows Logs node. However, some new nodes have been added to the root, and the new ForwardedEvents log, which I will discuss shortly, has been added to the Windows Logs node.
The most obvious new feature is the preview pane, located under the event list. It contains the properties of the focused event. This means you no longer have to double-click an event to see the event properties, and you don't have to juggle windows in order to see both the list and the Event Properties dialog. It is still possible to display properties in the dialog by double-clicking on an event. But the new dialog is not modal, so you can display multiple event property dialogs at the same time.
The new views allow you to display all the events you may be interested in with just a few mouse clicks. Events can be collected for one or more log files and can focus on specific IDs, Levels (Severity), or time frames.
Notice that under the Custom Views node you'll find Administrative Events (shown in Figure 1). This provides a list of all the errors and warnings from various log files that are of interest to administrators.
Figure 1 Administrative Events (Click the image for a larger view)
So how exactly did we determine which logs and events are of interest to administrators? We identified five separate event types and the users related to each type. These are detailed in Figure 2. This is a very general though effective division of all the Event Log and trace events that may be of interest to various groups.
Figure 2 Event Types and Their Users
||The Admin type will suffice for the majority of system administrators. These events are very high level and they often provide enough information to identify a problem and determine its solution. At the very least, Admin events should identify when an issue occurs or indicate when an application, a component, or the system as a whole is in or has recovered from an unhealthy state. Most Admin events are errors or warnings, and they are usually actionable.
||Administrators, support personnel, and Monitoring and analysis programs
||Like Admin events, Operational events enable problem diagnosis. Operational events consist of more than just errors and warnings. They also inform users about normal operation of an application or OS component. The volume of these events is kept quite low so Operational events can be enabled without affecting system performance. The Operational events—along with the Admin events—are used by support personnel, monitoring utilities, and some sophisticated administrators.
||Advanced administrators, support personnel, and monitoring and analysis programs
||Audit events provide a historical record of any resource access or actions taken by the users. These events do not in themselves represent failure or success of the program, but indicate a failure or success of the action. Audit events can be completely disabled or selectively enabled with varying levels of granularity. Security auditing at the OS level is supported (the events can be found in the Security log of the Event Log).
||Advanced administrators, security auditors, and Forensics specialists
||Analytic events, which are not very different from Operational events, are logged during normal operation of applications and components. But the volume and detail of Analytic events is much greater than Operational events and therefore there is a potential of them having a negative effect on system performance. Thus, Analytic events are normally disabled. To make use of Analytic events, enable them before a diagnostic session and then disable them before examining the trace.
||Support personnel Monitoring and analysis programs
||Debug events are also high-volume events that are normally disabled. They are used mainly by developers and are seldom viewed by IT professionals.
Every log in Windows Eventing has a designated type. All events in that log share the type of that log. The view in Figure 1 was defined using this type information—it pulls all error and warning events from the logs of type Admin.
The Windows Eventing Architecture
The Windows Eventing infrastructure consists of software components that allow event objects to be published by programs and delivered to log files. ETW provides the transport used to transfer all types of events from their event publishers to their destinations. ETW has also been overhauled in Windows Vista, and now provides better performance and enhanced security. Publishing of events via ETW is asynchronous and therefore does not affect the performance of the publishing program. When a new event is received by the system, information about the current user context and the publishing process is collected and attached to the event.
After events are published, different types are handled in different ways. Since Analytic and Debug events usually have a high volume, they need to be saved into a file with minimal processing to avoid affecting system performance. Therefore, these events are immediately saved into a trace file.
The Admin and Operational events are infrequent enough to allow additional processing without affecting system performance. These events are delivered to the Event Log service, which saves them into the live event logs and can optionally deliver them to real-time subscribers. Subscribers can pick events to be delivered to them using a query language that I will discuss in a moment.
There are two subscribers, in particular, that are of special interest to the IT community. These are the much enhanced Windows Vista Task Scheduler and the event forwarder that can be used to send events to a remote event collector.
A Look at Structured Events
A common complaint about event logs is that they contain a lot of garbage. That is, they are cluttered with events that have little or no significance, and these tend to hide or obscure the important events. While some events do provide little information, what seems like garbage to one user is often a treasure to another.
Windows Vista offers a way to filter out uninteresting events, allowing you to zero in on the events that matter to you. This relies on a cross-log querying language that is supported by the Event Log service. For this to work, all events must follow a well-defined structure.
Previous versions of the Event Log provided some structure for events, but the structure was not well-defined and it was only visible to the Win32 API. In Windows Vista, events have a well-defined structure. In fact, events are represented externally using XML with a published schema. This makes it possible to create queries that collect the events interesting to you while filtering out extraneous events. Since XML is used, XPath was chosen as the base for the event query language. Of course, the use of structured events opens new doors for automation, as can been seen with the new Task Scheduler integration.
The event preview pane now includes a Details tab. The same tab is available on the Event Properties dialog. Selecting the Details tab on the Event Properties dialog reveals the XML representation of the event. The sample shown in Figure 3 is an Operational event from the Task Scheduler. It contains two parts. The System part consists of the general event information common to every event instance of this event, as well as some system parameters collected when the instance was published. The EventData section, which is extensible, contains structured information from the application.
Figure 3 XML Representation of an Event (Click the image for a larger view)
Each event log file is treated as a sequence of such structured event elements. This presents a logical, readable view of the event log and event archive files. Internally the events are saved in a binary format that is designed to provide a balance of compactness, reliability, and search performance.
The System section of the XML data provides the time at which the event occurred, the process ID, the thread ID, the computer name, and the Security Identifier (SID) of the user. In previous versions of Windows, events had only two attributes—EventID and Category. The XML provides other details, as well, including the EventID, Level, Task, Opcode, and Keywords properties. Let's take a closer look at these.
EventID and Version An event is uniquely identified by the combination of its EventID (which is a two-byte number) and its Version (which is a one-byte number). All events from the same event provider that share EventID and Version share an identical structure.
Level This value represents the severity or verbosity of an event. Predefined values of 1 (Critical), 2 (Error), 3 (Warning), 4 (Info), and 5 (Verbose) are commonly used, but a provider can define its own values up to a maximum value of 255. Higher numbers correspond to more verbose events.
Task The Task property usually identifies a general area of the functionality of the event provider (such as printing, networking, or UI). It can also refer to a sub-component of a program. Security Audit events use these extensively. Each event publisher may define its own set of values for this two-byte number. Note that the meaning of the Task attribute often matches semantically, and is a superset of, the Category attribute in previous versions of Windows. For this reason, Event Viewer displays this value in a column called Task Category.
Opcode This is a one-byte value that is typically used to represent a specific action or a part of an action performed by the software. This value is often used in tracing activity-based processes (such as Web services in which the activity is a specific request received by the Web service). There are a few predefined values, the most common of which are 1 (Start) and 2 (Stop).
Keywords This is a mask with 56 flags that can be set by the program to enable easy grouping of similar events. Each event may have more than one flag set, indicating that the event belongs to multiple groups.
Understanding Publishers and Events
It is difficult to be proactive when you don't know in advance all the types of information that can possibly be found in the event logs and trace files. Therefore, one of our major goals for the new eventing infrastructure was to provide documentation for each event publisher, including information on the set of events and their destinations.
To accomplish this, each publisher must enumerate all the events it will ever publish along with the structure for these events. This information is compiled, encoded, and saved with the publisher's binary (a program or a DLL).
It is now possible to discover all the publishers that are registered with the eventing system, as well as the configuration of every publisher. This includes viewing a complete list of all the potential events a publisher might log, the structures of these events, and their associated messages—all before these events are even thrown. Figure 4 shows how a command-line utility, wevtutil, can be used to display provider configuration. The command being shown displays all of the information known to the system about the event publisher named Microsoft-Windows-Video For Windows.
Figure 4 Using wevtutil to Display Provider Configuration (Click the image for a larger view)
How the Query Language Works
As mentioned earlier, the structured nature of the events in the new infrastructure has allowed us to include support for a query language, which is based on standard XPath expressions. Generally speaking, given a starting location (an XML element), an XPath expression can refer to any place within the element. (For a complete description of the XPath language, see the official reference at w3.org/TR/xpath). More commonly, an XPath expression refers to another element or attribute contained within the starting element. Since the event log is essentially a sequence of Event elements, you can assume each log looks something like this:
The root element does not have a name—it is simply used as a context for all XPath expressions. Only forward axes are defined, which means an XPath expression can reference Event elements, their sub-elements, and their attributes. And if an XPath expression selects an element that exists, it evaluates to true. Consider a very simple XPath expression based on the event shown in Figure 3
*/System[Provider/@Name='Microsoft-Windows-TaskScheduler' and Level <= 2]
This expression selects all Event elements ("all" is represented by the asterisk) from the Microsoft-Windows-TaskScheduler event provider with level 2 or smaller (meaning all error and critical events).
While a simple XPath expression can be used to select events from a single log, a simple but powerful XML-based query language allows selection and suppression of events from any log or external event archive. In fact, the query language is pervasive in Windows Vista eventing. Custom Views, for example, are based on queries. And you can use the query language to specify which events from across specific logs should be shown in the Event Viewer's event list pane. You can use the language to subscribe to events, archive selected events, and trigger actions in the new Task Scheduler.
Actual query XML or XPath expressions must be supplied to the Event Log command-line utility. Defining such queries is not for everyone, so the Event Viewer provides a simple UI for creating common queries. The Create Custom View dialog shown in Figure 5, for example, can be used to create a view for all Task Scheduler events. Note that each query creation dialog has a tab titled XML, which displays the text of the query and lets you edit the query directly.
Figure 5 UI for Creating Common Queries (Click the image for a larger view)
Common Uses for Queries
Queries can be used in numerous ways. But some uses are naturally more common than others. For example, queries are often used to view select events in the Event Viewer or even the Event Log command-line utility.
You can attach a task to a query using the Windows Task Scheduler. Whenever an event is published that matches the query, the Task Scheduler will start the designated task. Note, that this feature uses subscriptions and reacts to any newly arriving events. You can only subscribe to Admin and Operational events; Debug and Analytic events are written directly into the trace files and the system cannot examine them when they occur.
Queries can be used for archiving select events. You can select events from live logs of any type, down-level event archives (EVT files), external trace files, or Windows Vista event archive files. An archive file can be opened in the Event Viewer, backed up to secondary storage, or sent to support personnel for help with diagnosing a problem. All event descriptions and other strings associated with events can be attached to the archive in a language selected during the archive operation. If this is done, the events will be available with full descriptions in the language of choice on any machine.
Finally, queries can be used to forward events to a system that is dedicated to collecting events. This feature uses the new Event Collector service, which allows an administrator to create event subscriptions to remote computers. These subscriptions are persisted on the collector machine and can be retried using a configurable schedule. The event collector uses the industry standard WS-Management protocol to create subscriptions on the remote computers and WS-Eventing protocol to transfer events. (Both protocols are secure and firewall-friendly.) The events received by the Event Collector are saved in the local Event Log.
The changes to Windows Eventing are dramatic and far-reaching—this article only touches upon them. The event forwarding feature alone could fill its own entire article.
The event system boasts improvements in performance, scalability, reliability, and security. But it should not be forgotten that the main goal was to improve the manageability of Windows. The query language, along with the Event Viewer views, lets you more easily discover problems. Tight integration with the Task Scheduler opens the way for simple monitoring, automatic problem resolution, and fast notification problems develop. And event forwarding enables event archiving and agentless monitoring of servers and desktops.
All these features are implemented without sacrificing backward compatibility, meaning existing solutions will continue to work. Ultimately, the improvements to eventing should help organizations manage their systems more efficiently.
Val Menn is with the Management Infrastructure Group at Microsoft where he is Program Manager for the Event Log. He previously worked as a software engineer and system administrator for a number of start-ups.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited