Using the StreamInsight Event Flow Debugger

Article
09/26/2011

Operational systems in domains such as manufacturing and process control, utilities, financial trading, the Web, and IT monitoring generate event streams with complex inter-relationships and low latency requirements. Given the temporal element to processing such event flows, a major challenge in these systems is determining the validity of the results under diverse and dynamic stream behavior and troubleshooting the query in the event of failures.

This motivates the need for query analysis tools addressing the following requirements:

Handle large amounts of data and reduce the problem search space.
Handle rigorous consistency requirements.
Be intuitive enough for the user to quickly arrive at a diagnosis or solution.

Microsoft StreamInsight provides a stand-alone event flow debugger graphical user interface (GUI). The StreamInsight Event Flow Debugger enables you, as a developer or administrator of a complex event processing (CEP) application, to inspect, debug, and reason about the flow of events through a StreamInsight query. This topic describes the Event Flow Debugger features and provides the procedures you need to quickly start using this debugging and analysis tool.

Understanding Event Flow Debugging

Before discussing the Event Flow Debugger, it is important to note the fundamental difference between a control flow debugger (for example, a C# or C++ debugger) and the event flow debugger.

In control flow debugging, the developer "builds" a program written in a specific language in debug mode, enables breakpoints at specific statements or junctions in the control flow of the program, "runs" the program until these specific breakpoints, reasons about the code and the state of the system, steps into or over functions and procedures, watches variables and so on until the completion of execution. Temporal reasoning of "data" variables (that is, analyzing the transformation of these variables through the passage of time) is limited or non-existent.

In contrast, event flow debugging involves analyzing an event through the passage of time, as it proceeds from one stage of the CEP query to the next; and within a query stage, from one operator to the next. Here, debugging involves understanding the effects an event has on a stream as it enters from a given operator and how new events are generated as a result of computations on events input into an operator. The emphasis in event flow debugging is on how the operator’s semantics (Filter, Project, Join, Aggregate, Multicast and so on) affect the event, rather than on the (control flow) execution of the operators themselves. As a consequence, the debugger helps you understand the impact that a given event has on other events, and the impact of other events on the event being analyzed.

Debugging and Monitoring Modes

The Event Flow Debugger is a stand-alone event trace-based debugging tool. The debugger serves two purposes:

Debugging an event flow trace. The debugging session can be based on:
- A trace generated from a live recording of a specific operational query, with the debugger connected to a live server.
- A trace file generated outside the debugger using a command-line utility, and the file later loaded into the debugger.
Monitoring the server. In this mode, the debugger provides an Object Explorer that lists system and application objects. You can obtain operational diagnostics about each of these entities. The object explorer is also the interface through which you enable and disable queries for tracing, and operationally, to start and stop query execution.

In order to be able to use the Event Flow Debugger, the user must be part of the Windows performance log users group. This enables the user to collect traces outside the debugger using trace.cmd, or record events from a query while operating within the debugger. See the Windows Management and Operations section for the steps on how to do this.

If you have successfully installed StreamInsight, we recommend that you start the debugger and review the Start page to learn about the basic capabilities of the tool. To start the debugger, click the Start button, point to All Programs, click Microsoft StreamInsight 1.1, and then click StreamInsight Event Flow Debugger.

Debugging a StreamInsight Query

You can connect the debugger to the StreamInsight server as a local or remote client application and record and replay the event flow trace of one or more queries. Alternatively, you can also use this as a stand-alone client application detached from the server, and analyze queries based on event traces collected offline. This gives you the flexibility of debugging a specific query while it is operational, or back testing the query based on historical traces of its runs.

Live Query Event Recording

To record events from a running query, you must connect the Event Flow Debugger to a live StreamInsight server. The following procedure describes how to connect to a live server, open a running query, and enable tracing on the query.

As a prerequisite, you (or the server administrator) must enable the Web service for the server. For more information about enabling the Web service, see Publishing and Connecting to the StreamInsight Server.
Confirm that the client user has permissions to connect to the server by being a member in the respective StreamInsight user group. For more information, see the section "StreamInsight Users Group" in Installation (StreamInsight).
In the debugger, click File and then click Connect to Server. Enter the server endpoint address. The default endpoint for an installed StreamInsight server is https://localhost/StreamInsight/<instance_name>.

If the server has been set up correctly for connections, the debugger will display an Object Explorer in the left pane.
In the Object Explorer, click the hierarchy of objects until you see the query to debug. Double-click the query object. This opens the query graph as shown in the following illustration. The illustration shows that the sample query 'TrafficSensorQuery' is currently running.

Illustration 1 - Viewing a query in the query graph.
To enable the query for tracing, right-click the query and select Enable Tracing. Alternatively, you can enable tracing programmatically by using an API. For more information, see Monitoring the StreamInsight Server and Queries.
To record the events being processed in the query, click the Start Recording button. This will start the recording process as shown in the following illustration. When you have recorded events for a few minutes, click Stop.

Illustration 2 - Recording events in a running query.

This will bring the debugger to the state shown in Illustration 1. At this point, you can use the query analysis tool provided in the debugger. The analysis features are described in the next section.

Loading Event Flow Trace (EFT) or Event Tracing for Windows (ETL) File

StreamInsight supports both stand-alone and embedded server deployments. If the application embeds the server and does not enable the Web service, a client application such as the debugger cannot connect to the server. You need a mechanism to debug and diagnose problems in queries running under such situations.

To collect the event trace logs for a running StreamInsight server, you can use the trace.cmd utility that is included in the StreamInsight installation. The following procedure describes how to create a trace file and load the file into the Event Flow Debugger.

With the server and the relevant query or queries running, at a Windows command prompt, type the following statement. It is important to use the file extension .etl when naming the trace file.

trace.cmd start <filename>.etl

When you have allowed the query to run for a sufficient period of time, stop the trace by using:

trace.cmd stop <filename>.etl

To load the resulting file into the debugger, click File and then click Open. In the Open dialog, browse to the location of <filename>.etl, and click Open. The query graph will display as shown in the following illustration.

Illustration 3 - Viewing a query loaded from a trace file.

Compared to the objects shown in Illustration 1, this illustration shows fewer entities in the Object Explorer. This is because the debugger is not connected to the server. In particular, server-level diagnostic objects such as Schedulers cannot be displayed when the debugger is not connected to the server.

Notice that the illustration shows a progress indicator in the status bar to indicate the load progress of the ETL file. As part of the load, the debugger translates an ETL file into its proprietary and compressed EFT format.

It is important to note that trace.cmd utility is a script based on the Windows Logman command. Logman in turn uses the Event Trace for Windows (ETW) infrastructure for event collection. During the trace log load process, the debugger may warn you that a few events have been lost. This is due to inadequate ETW buffer and session settings. To resolve the problem, edit the Logman command in the trace.cmd file and increase the buffer size specified in the -bs option (for example, -bs 3000) or increase the number of buffers specified in the -nb option. For more information and examples, see the Logman documentation.

Query Analysis Using the Event Flow Debugger Tool

The Event Flow Debugger provides the following key functionalities for query analysis.

Ability to view the query plan for a given query; the query operators and the event streams. This helps you understand your Language Integrated Query (LINQ) query in terms of its underlying event flow and the processing nodes in the event flow.
Ability to inspect all events on the input and outputs of complex event flow along with intermediate results at each stage of the computation. This includes the event metadata in terms of the start and end timestamps, and the payload fields.
Ability to view complex event flows that are partitioned for scale-out. The debugger can show how events are partitioned by the Grouping operation of the Group and Apply operator, and how events are transformed in the Apply operator.
Ability to perform a set of global analyses that reduce the problem search space or correlate events across multiple stages of the event flow.
Ability to step through the trace of a query execution through the passage of time and understand how events propagate through a streaming query.
Ability to analyze events and understand how they reached a given state, That is, how other events or operators impacted their event times and payloads.
Ability to analyze the impact that any given event has on events that are downstream from the current operator. Essentially, to look ahead into the future processing of the events until the event finally affects the output.

To implement these functionalities, the debugger provides three features for analysis:

Replay - Using this feature, you can step through the event stream one event at a time and watch its progress from one operator to the next. Alternatively, you can set breakpoints at specific operators in the query graph, and "run" the debugger (that is, activate the event flow) until that operator, or specific condition in that operator, is met.
Root Cause Analysis - Using this feature, you can "look back" at the "root cause" - or the sequence of operations or changes that caused the event to reach its present condition.
Event Propagation Analysis – Using this feature, you can analyze the effects of this event down the stream either in terms of the changes the particular event itself goes through, or in terms of how it impacts other events, or causes the generation of new events. This feature is the reverse of Root Cause Analysis.

Using Replay

Once the events have been loaded into the debugger, either through live recording or by loading from a trace log file, the next step is to discover these events. To do this, you replay the events by clicking the Clock icon. This displays an event player. You can step through the events by clicking the Step Forward icon. Alternatively, you can set a breakpoint at any of the operators by clicking the Radio button on the left side of an operator, and then click Step to Next Breakpoint in the event player.

In the following illustration, a breakpoint has been set in the aggregation operator in the Apply branch, and the event flow has been activated. The line with the green highlight in the aggregation operator shows the event flow progression until this point.

Events displayed in the Replay feature

Illustration 4 - Setting a breakpoint in the Aggregation operator.

Now that the query has events flowing through it, you can expand each operator by clicking the triangular icons on the right side of each operator. You can continue to step through the events and see the event progression.

Using Root Cause Analysis

Using Root Cause Analysis, you can analyze how the event got to its current state. From the event grid of any operator, you can start the Root Cause Analysis by right-clicking the event of concern and selecting Root Cause Analysis.

Choosing Root Cause Analysis causes the debugger to show an expanded view of all operators containing the events that can potentially have contributed to the current state of the event that is being analyzed. For example, the following illustration shows how the avgCount of 18 with the given start and end timestamps came to be. From the Context menu, right-click the highlighted event and choose Root Cause Analysis. The debugger shows the stacking of root cause analysis Replay, by placing a second arrow at the header of the tabbed query canvas. This is shown in the following illustration. A contributor to this event's state at this stage in the query processing is an Insert in the SensorInput operator with VehicularCount value of 24.

Root Cause Analysis

Illustration 5 - Using Root Cause Analysis.

Using Event Propagation Analysis

While root cause analysis is about understanding the impact of other events or processing steps on an event, event propagation analysis is a forward-looking analysis to understand the impact that the current event has on events downstream. From the event grid of any operator, you can start Event Propagation Analysis by right-clicking the event of concern and selecting Event Propagation Analysis.

Stacking of Analysis

Analyses can be stacked on top of each other. You can start debugging by using Replay. At some point during in this analysis, you can start a Root Cause Analysis for a particular event. From this view, you can potentially pick another event for propagation analysis. In this manner, you can stack the analysis one on top of the other.

You can also open the same query in multiple tabs and have different analyses and views in the different tabs, giving you the flexibility to compare the same or different segments of the query under different analyses.

Operators

Each rectangular box in the query graph represents an operator - the computing node in a StreamInsight query. The query algebra supports several operators such as Select (Filter), Project, Import, Export, Group-and-Apply, Join, Multicast, Union, Top-K, AlterLifetime, Advancetime, and Cleanse. Each operator is labeled with its given name in the server metadata along with the kind of operator it is.

It is important to note that there may not be a 1:1 correspondence between the operators that you see in the query graph and the operations that compose a LINQ query. For example, an Import operator represents an input adapter instance, and Export represents the output adapter instance. Cleanse is an internal operator introduced by the query optimizer to handle unordered data, and it has no presence in the LINQ query. Similarly, AdvanceTime and AlterLifetime represent the core temporal operations in the query algebra resulting from the specification of AlterEventDuration or AlterEventLifeTime LINQ extensions, or from Windowing operations in the query. However, you will be able to correlate a LINQ query with the resulting query graph without too much difficulty.

Each event grid has the following sections.

Operator Label

The title of the grid is the operator name provided by the query and the operator type.

Filter

In this text box you can specify a conditional C# expression to filter the events of interest. For example, you may want to examine only events that meet or exceed a certain value or time.

Event Fields

The rest of the event grid window shows the fields of an event. The fields include the event kind specified by the user (Insert or Cti), along with internal events (Retract and Expand). The field columns displayed by default are EventKind, StartTime, EndTime, and the payload fields of the event. You can add or remove fields from the event grid by right-clicking the header bar to see the context menu, or, click View, click Columns and then select or clear the field names. Note that all timestamp fields are displayed in Coordinated Universal Time (UTC). The context menu provides you the means to change time zones.

The available fields appear in the order listed in the following table. The events in any given operator can be exported to a file by right-clicking the operator title bar and choosing the option 'Write events to file'. Typically, you would do this to export events for further processing by another program.

Field name	Description
EventKind	Insert, Cti, Retract, or Expand.
StartTime	Start time of the event.
EndTime	End time of the event.
NewEndTime	The modified end time of the event for some special system event kinds. You should ignore the values in this column.
Latency	The system latency of the event at the given point in time. This is the span of time between the time the incoming event that caused this event to be produced entered the system and the time this event was produced by the system.
EnqueueTime	The system time when the event was produced by the operator.
One or more payload fields	The user-defined data fields available in the event.

Group-and-Apply Operator

The Group-and-Apply operator is a special operator that consists of a Grouping operator as the entry point, a set of Apply branch operators, and a Group Union operator as the bounding operator at exit. Expanding the grouping node displays all of the apply branches anchored on respective grouping key values. You can expand unfold (or collapse) the Group-and-Apply operator by simply dragging the branch of interest from the grouping node to the canvas. Now each operator in the branch can be expanded for further analysis. Clicking the X sign in each branch folds it back into the master Group-and-Apply flow sub-graph. The following illustration shows a Group-and-Apply node in the Event Flow Debugger.

Expanding a Group-and-Analysis operator

Illustration 6 - Viewing a Group-and-Apply node.

Additional Usability Features

The debugger offers the following usability features:

You can zoom the query graph canvas in and out of operators to the full query graph seamlessly.
All functionality available through icons is also available by selecting options from the context menus.
When the debugger is connected to the server, any errors returned by the server will be in en-US locale regardless of the installed locale. For non-English locales, these error messages can be used for additional support and diagnosis.
All operators can be expanded or collapsed by using the Expand All and Collapse All features.
The options available from the Tools menu allow you to set defaults for various debugger settings. You can choose to read the event flow trace file in segments of specific sizes. This will allow for a predictable and smoother debugging experience. You can specify the DateTime format and TimeZone across all the operators of the query graph. Additionally, you can specify the maximum and minimum event flow recording duration for the scenario where the debugger is connected to a server.

Monitoring Dashboard

In addition to being a debugging tool, the Event Flow Debugger serves as a monitoring tool for the StreamInsight server. You can connect the debugger to a live server using the steps described earlier. The debugger displays an Object Explorer with all of the server entities. At the top level, it displays the summary statistics for the Event Manager, the Query Manager and Schedulers. To display the runtime diagnostics for one of these objects, right-click the icon for the object and select Diagnostics.

You can also click through the hierarchy of objects and choose a particular query that has been registered in the server. You can enable or disable tracing on a query and start or stop a query. In addition, if the query is running, you can obtain the runtime diagnostics of the query. Information about the events produced and consumed, the latency and throughput characteristics, and memory requirements of the query can be monitored using this interface.

These metrics can also be retrieved programmatically using diagnostic view APIs. For a description of the diagnostic information available for each entity, see Monitoring the StreamInsight Server and Queries.

Change History

Updated content
Added cause and resolution information in the section 'Loading Event Flow Trace (EFT) or Event Tracing for Windows (ETL) File' for the "lost events" warning.
Added step 2 in the section 'Live Query Event Recording".