StreamInsight Server Concepts

Article
02/17/2012

This topic describes the manner in which data is represented, operated on, brought into, and transferred out of the StreamInsight server. It is designed to give you familiarity with the basic concepts associated with complex event processing in Microsoft StreamInsight. The topic begins by describing data structures and then describes the StreamInsight server components that act on or process the data.

Streams

All data in StreamInsight is organized into streams. Each stream describes a potentially unending collection of data that changes over time. For instance, a stock ticker stream provides the price of different stocks in an exchange as they change over time, and a temperature sensor stream provides temperature values reported by the sensor over time.

Consider a power-monitoring scenario in which the goal is to monitor a collection of power meters that measure the power consumption of various devices. Periodically, these power meters transmit data that includes their power consumption in tenths of a watt and an associated time stamp of the reading. The following table shows the power meter readings from 3 meters and assumes that each power meter emits a power reading every second.

Time	MeterID	Consumption
2009-07-27 10:27:23	1	100
2009-07-27 10:27:24	1	200
2009-07-27 10:27:51	2	300
2009-07-27 10:28:52	2	100
2009-07-27 10:27:23	3	200

Because this information can be represented as values that change over time, the data can be represented in a stream. Given the data in this stream, a query against this stream could return the meter with the highest or lowest consumption values over a given time period, or the query could return a top-10 list of meters with the highest power consumption over time.

Events

The underlying data represented in the stream is packaged into events. An event is the basic unit of data processed by the StreamInsight server. Each event consists of the following parts:

Header. An event header contains metadata that defines the event kind and one or more timestamps that define the time interval for the event. The timestamps are application-based and supplied by the data source rather than a system time supplied by the StreamInsight server. Note that the timestamps use the DateTimeOffset data type, which has time zone awareness and is based on a 24-hour clock. The StreamInsight server normalizes all times to UTC datetime and verifies on input that the UTC flag is set on the timestamp fields.
Payload. A .NET data structure that holds the data associated with the event. The fields defined in the payload are user-defined. Their types are based on the .NET type system.

Events in the stream whose application timestamps correspond to their order of arrival into the query are said to be “in order”. When this is not the case, events are said to arrive “out of order”. The StreamInsight server guarantees that if events arrive out of order, the output of the query is the same as if the events arrived in order, unless the query writer explicitly specifies otherwise. Within a stream, typical event arrival patterns are:

A steady rate, such as records from files or tables.
An intermittent and random rate, such as data from a retail barcode scanner.
An intermittent rate with sudden bursts, such as Web clicks or weather telemetry.

Event Header

The header of an event defines the event kind as well as the temporal properties of the event.

Event Kind

The event kind indicates whether the event is a new event in the stream or it is declaring the completeness of the existing events in the stream. StreamInsight supports two event kinds: INSERT and CTI (current time increment).

The INSERT event kind adds an event with its payload into the event stream. In addition to the payload, the header of the INSERT event identifies the start and end time for the event. The following diagram shows the layout of an INSERT event kind.

Header	Payload
Event kind ::= INSERT StartTime ::= DateTimeOffset EndTime ::= DateTimeOffset	Field 1 … Field n as CLR types

Event kind ::= INSERT

StartTime ::= DateTimeOffset

EndTime ::= DateTimeOffset

Field 1 … Field n as CLR types

The CTI event kind is a special punctuation event that indicates the completeness of the existing events in the stream. The CTI event structure consists of a single field that provides a current timestamp. A CTI event serves two purposes:

First, it enables a query to accept and process events whose application timestamps do not correspond to their order of arrival into the query. When a CTI event is issued, it indicates to the StreamInsight server that no subsequent incoming INSERT events will revise the event history before the CTI timestamp. That is, after a CTI event has been issued, no INSERT event can have a start time earlier than the timestamp of the CTI event. This indication of "completeness" of a stream of events enables the StreamInsight server to release the results of windowing or other aggregating operators that have accumulated state, thus ensuring that events flow efficiently through the system.
The second purpose of CTI events is to maintain the low latency of the query. Frequent CTIs will make the query pump out the results at a higher frequency.

Important

Without the presence of CTI events in the input stream, no output will be generated from the query.

For more information, see Advancing Application Time.

The following diagram shows the layout of a CTI event kind.

Header
Event kind ::= CTI StartTime ::= DateTimeOffset

Event kind ::= CTI

StartTime ::= DateTimeOffset

Event Models

The event model defines the event shape based on its temporal characteristics. StreamInsight supports three event models: interval, point, and edge. Interval events can be seen as the most generic type, of which edge and point are special cases.

Interval

The interval event model represents an event whose payload is valid for a given period of time. The interval event model requires that both the start and end time of the event be provided in the event metadata. Interval events are valid only for this specific time interval. It is important to note that start times are inclusive, whereas end times are exclusive regarding the validity of the event's payload.

The following diagram shows the layout of an interval event model.

Metadata	Payload
Event kind ::= INSERT StartTime ::= DateTimeOffset EndTime ::= DateTimeOffset	Field 1 … Field n as CLR types

Event kind ::= INSERT

StartTime ::= DateTimeOffset

EndTime ::= DateTimeOffset

Field 1 … Field n as CLR types

Examples of interval events include the width of an electronic pulse, the duration of (validity of) an auction bid, or a stock ticker activity in which the bid price for the stock is valid for a specific time period. In the power monitoring example described above, the power meter event stream may be represented with the following interval events.

Event Kind	Start	End	Payload (Consumption)
INSERT	2009-07-15 09:13:33.317	2009-07-15 09:14:09.270	100
INSERT	2009-07-15 09:14:09.270	2009-07-15 09:14:22.255	200
INSERT	2009-07-15 09:14:22.255	2009-07-15 09:15:04.987	100

Point

A point event model represents an event occurrence as of a single point in time. The point event model requires only the start time for the event. The StreamInsight server infers the valid end time by adding a tick (the smallest unit of time in the underlying time data type) to the start time to set the valid time interval for the event. Considering that event end times are exclusive, point events are valid only for the single instant of their start time.

The following diagram shows the layout of a point event model.

Metadata	Payload
Event kind ::= INSERT StartTime ::= DateTimeOffset	Field 1 … Field n as CLR types

Event kind ::= INSERT

StartTime ::= DateTimeOffset

Field 1 … Field n as CLR types

Examples of point events include a meter reading, the arrival of an email, a user Web click, a stock tick, or an entry into the Windows Event Log. In the power monitoring example described above, the power meter event stream may be represented with the following point events. Note that the end time is calculated as the start time plus 1 tick (t).

Event Kind	Start	End	Payload (Consumption)
INSERT	2009-07-15 09:13:33.317	2009-07-15 09:13:33.317 + t	100
INSERT	2009-07-15 09:14:09.270	2009-07-15 09:14:09.270 + t	200
INSERT	2009-07-15 09:14:22.255	2009-07-15 09:14:22.255 + t	100

Edge

An edge event model represents an event occurrence whose payload is valid for a given interval of time, however, only the start time is known upon arrival to the StreamInsight server; so the end time is set to the maximum time into the future. The end time of the event is known later and updated. The edge event model contains two properties: occurrence time and an edge type. Together, these properties define either the start or end point of the edge event.

The following diagram shows the layout of an edge event model.

Metadata	Payload
Event kind ::= INSERT Edge time ::= DateTimeOffset Edge type ::= START \| END	Field 1 … Field n as CLR types

Event kind ::= INSERT

Edge time ::= DateTimeOffset

Edge type ::= START | END

Field 1 … Field n as CLR types

Examples of edge events are Windows processes, trace events from Event Tracing for Windows (ETW), a Web user session, or quantization of an analog signal. The valid time interval for the payload of an edge event is the difference between the timestamp of the Start event and the timestamp of the End event. In the following diagram, notice that the event with a payload value of 'c' does not have a known end date at this point in time.

Event Kind	Edge Type	Start Time	End Time	Payload
INSERT	Start	t0	DateTimeOffset.MaxValue	a
INSERT	End	t0	t1	a
INSERT	Start	t1	DateTimeOffset.MaxValue	b
INSERT	End	t1	t3	b
INSERT	Start	t3	DateTimeOffset.MaxValue	c
… and so on

The following illustration shows the quantization of an analog signal using edge events based on the start and end times defined in the table above. Such a continuous signal implies that for every new value, both an END as well as a START edge must be submitted. The described edges in the illustration refer to the event from time t1 to t3.

EdgeEvent

It is important to choose the right event model for your problem. For instance, if you have events that last for a period of time, and your application has the ability to determine both the start and end times of the event, it is better to use interval events to model this. If you have a scenario where you do not know the end time of an event at event arrival, you could consider modeling the event as a point event, alter its lifetime to extend for a period of time, and then use the Clip operation to modify the lifetime when that event’s end is recognized. The other alternative to consider is to model these events as edge events.

While edge events are a very convenient event model, there are a couple of performance implications you should be aware of. Processing edge events works best when these events arrive fully ordered – i.e. all start edges are ordered on start time and end edges on end time and the combined sequence of events is also ordered on time. For example, if you have a sequence of edge events as follows:

Event Kind	Edge Type	Start Time	End Time	Payload
INSERT	Start	1	DateTimeOffset.MaxValue	a
INSERT	End	1	10	a
INSERT	Start	3	DateTimeOffset.MaxValue	b
INSERT	End	3	6	b
INSERT	Start	5	DateTimeOffset.MaxValue	c
INSERT	End	5	20	c

This sequence is unordered on timestamps (1, 10, 3, 6, 5, 20). Instead if the edge events were fully ordered - as in (1, 3, 5, 6, 10, 20) – this will have a lesser performance impact on query processing. Enabling such an ordering followed by processing is easily achieved. Split the problem into two queries. The first query can be an empty query that receives edge events as input, fully orders them, and outputs these ordered edge events. The second query can take this input and perform the main logic. Note that these should be defined as two separate queries, and then joined together using dynamic query composition. For more information, see Composing Queries at Runtime.

Event Payload

The payload of an event is a .NET data structure that contains the data associated with the event. The fields in the payload are user-defined and their types are based on the .NET type system. Most CLR scalar and elementary types are supported for payload fields. Nested types are not supported.

Adapters

Adapters translate and deliver incoming and outgoing event streams to and from the StreamInsight server. StreamInsight provides a highly flexible adapter SDK that enables you to build adapters for your domain-specific event sources and output devices (sinks). Adapters are implemented in the C# programming language and stored as assemblies. The adapter classes are created as templates during design time, registered in the StreamInsight server, and instantiated in the server during run time as adapter instances.

Input Adapters

An input adapter instance accepts incoming event streams from external sources such as databases, files, ticker feeds, network ports, sensors, and so on. The input adapter reads the incoming events in the format in which they are supplied and translates this data into the event format that is consumable by the StreamInsight server.

You create an input adapter to handle the specific event sources for your data source. If the event source produces a single event type only, the adapter can be typed. That is, it can be implemented to emit events of one particular event type. With a typed adapter, all instances of the adapter produce the same fixed payload format in which the number of fields and their types are known in advance. Examples of such events are ticker feed data or sensor data emitted by a specific device. If your event source emits different types under different circumstances, that is, the events might contain different payload formats or the payload format might not be known in advance, implement an untyped adapter. With an untyped (generic) adapter, the event payload format is provided to the adapter as part of a configuration specification at query bind time. Examples of such sources include CSV files that contain a varying number of fields where the type of data stored in the file is not known until query instantiation time, or an adapter for SQL Server tables where the events produced depends on the schema of the table. It is important to note that, at runtime, a single adapter instance, whether typed or untyped, always emits events of one specific type. Untyped adapters provide a flexible implementation to accept the specification of event type at query bind time, rather than defining the event type at the time the adapter is implemented.

Output Adapters

You create an output adapter to receive the events processed by the StreamInsight server, translate the events into a format expected by the output device (sink), and emit the data to that device. Designing and creating an output adapter is similar to designing and creating an input adapter. Typed output adapters are designed against a specific event payload, whereas untyped output adapters are supplied with the event type only at runtime when the query is instantiated.

For more information, see Creating Input and Output Adapters. The core adapter API provides the maximum flexibility to implement against any event source or event sink. In addition, StreamInsight supports event sources and sinks at a higher level of abstraction that implement the IObservable or IEnumerable interfaces. For more information, see Using Observable and Enumerable Event Sources and Event Sinks (StreamInsight).

Processing and Analyzing Events

With StreamInsight, event processing is organized into queries based on query logic that you define. These queries take a potentially infinite feed of time-sensitive input data (either logged or real time), perform some computation on the data, and output the result in an appropriate manner.

Query Templates

A query template is the fundamental unit of query composition. It is the structure that defines the business logic required to continuously analyze and process events submitted to the StreamInsight server from the input adapter and generate an event stream that is consumed by the output adapter. For example, you may want to evaluate incoming power consumption events for maximum or minimum values over a given time period that exceed certain thresholds that you establish.

Query templates can be written to perform specific units of work and then composed into more complex query templates. Query templates are written in LINQ combined with the C# language. LINQ is a language platform that enables you to express declarative computation over sets in a manner that is fully integrated into the host language. This gives you the power to combine declarative processing of events with the flexibility of procedural programming in the same development platform, without the concern of impedance mismatch between these two programming paradigms.

The StreamInsight server provides the following functionality to write expressive queries and analytics:

Calculations to introduce additional event properties

Use cases such as unit conversions require you to perform calculations on top of the events that you receive. Using the projection operation in the StreamInsight server, you can add additional fields to the payload and perform calculations over the fields in the input event. For more information see, Projection.
Filtering events

In use cases such as alert notifications, you may want to check whether a certain payload field exceeds the operating thresholds for the piece of equipment that you are monitoring. In general, only a subset of events that satisfy certain characteristics is relevant for these use cases. Events that do not have these characteristics do not need to be processed and can be discarded. The filter operation allows you to express Boolean predicates over the event payload and discard events that do not satisfy the predicates. For more information, see Filtering.
Grouping events

Consider an event stream that gives you temperature readings from all of your temperature sensors. If all the events are provided through a single event stream, you may want to partition the incoming events based on the sensor location or the sensor ID. The StreamInsight server provides a grouping operation that allows you to partition the incoming stream based on event properties such as location or ID and then apply other operations or complete query fragments to each group separately. For more information, see Group and Apply.
Windows over time

Grouping events over time is a powerful concept that enables many scenarios. For instance, you may want to check the number of failures that occur during a fixed period of time and raise an alarm if they exceed a threshold. Hopping and sliding windows allow you to define windows over your event streams to perform this kind of analysis. For more information, see Using Event Windows.
Aggregation

When you do not care about each single event, you might want to look into aggregate values such as averages, sums, or counts instead. The StreamInsight server provides built-in aggregations for sum, count, min, max, and average that typically operate on time windows. For more information, see Aggregations.
Identifying TOP N candidates

A special kind of aggregation operation is needed in use cases where you want to identify the candidate events that rank highest according to a specific metric in an event stream. The TopK operations allows you to check for those based on an order that you establish over the event fields in the stream. For more information, see TopK.
Matching events from different streams

A common use case is the need to reason about events received from multiple streams. For example, because event sources provide timestamps in their event data, you may want to make sure that you only match events in one stream with an event in the second stream if they are closely related in time. In addition, you may have additional constraints on which events to match, and when to match them. The StreamInsight server provides a powerful join operation that performs both tasks: first, it matches events from the two sources if their times overlap and second, it executes the join predicate specified on the payload fields. The result of such a match contains both the payloads from the first and the second event. For more information, see Joins.
Combining events from different streams in one

Multiple data sources may provide events of the same type that you may want to feed into the same query. The union operation provided by the StreamInsight server allows you to multiplex several input streams into a single output stream. For more information, see Unions.
User defined extensions

The built-in query functionality of the StreamInsight server may not be sufficient in all cases. To allow for domain-specific extensions, queries in the StreamInsight server can invoke user-defined functionality that is provided through .NET assemblies. In addition to user-defined functions, you can define and implement custom aggregations or query operators. For more information, see User-Defined Functions (Stream Insight) and User-defined Aggregates and Operators.

For more information, see Writing Query Templates in LINQ. For detailed guidance on writing LINQ queries for StreamInsight, see A Hitchhiker’s Guide to StreamInsight Queries.

Query Instances

Binding a query template with specific input and output adapters registers a query instance in the StreamInsight server. Bound queries can be started, stopped, and managed in the StreamInsight server. Once data has been brought into the StreamInsight server via input adapters, computation may be continuously performed over the data. In other words, as individual events arrive in the server, these events are processed by standing queries, which emit output events in response to the arrival of input events. The following illustration shows the StreamInsight queries and adapters at runtime. The StreamInsight server consumes and processes the event when the instance of the input adapter is bound to an instance of a query. The processed data is then pushed to the instance of the output adapter that is bound to the same query instance.

CEP query and adapter ecosystem

StreamInsight Server Concepts

Streams

Events

Event Header

Event Kind

Event Models

Interval

Point

Edge

Event Payload

Adapters

Input Adapters

Output Adapters

Processing and Analyzing Events

Query Templates

Query Instances

See Also

Concepts

Additional resources

StreamInsight Server Concepts

Streams

Events

Event Header

Event Kind

Event Models

Interval

Point

Edge

Performance Considerations Related to the Choice of Event Model

Event Payload

Adapters

Input Adapters

Output Adapters

Processing and Analyzing Events

Query Templates

Query Instances

See Also

Concepts

Additional resources