Hopping Windows

 

A hopping window defines a subset of events that fall within some period of time and over which you can perform some set-based computation such as an aggregation. Hopping windows differ from count and snapshot windows in that they divide the timeline along regular intervals, independently of event start or end times. StreamInsight allows for overlapping hopping windows as well as leaving gaps between consecutive windows.

For a general description of event windows and their implementation and use in StreamInsight, see Using Event Windows.

Understanding Hopping Windows

Hopping windows are windows that "hop" forward in time by a fixed period. The window is defined by two time spans: the hop size H and the window size S. For every H time unit, a new window of size S is created.

The following illustration shows a stream with a series of point events. The vertical bars show a hopping window segmenting the timeline. Here, H equals S. This represents a gapless, non-overlapping hopping window and is also known as tumbling window. For convenience, there exists a separate extension method for this type of window. Each orange box represents the stream of windows and the events associated with that window.

Tumbling Windows

For each such window, the set-based operation is carried out and a result is produced, with timestamps that depend on the output policy specified with the window. For the output policy PointAlignToWindowEnd, the result looks as follows:

Tumbling Windows with PointAlignToWindowEnd

If an event spans a window boundary, it is contained in multiple windows. The next illustration shows a stream that contains three interval events where this is the case: e1, e2, and e3. If the hop size H is smaller than the window size S, the windows will overlap, such that events within the overlapping period will fall into more than one window, even if they are point events that do not span multiple windows.

HoppingWindowForEvents

Note that this illustration shows the events in the windows already clipped by the window input policy. The input policy, as for all StreamInsight windows, is to clip the events to the window size. A time-sensitive aggregate or operator will use these clipped event lifetimes in the windows, instead of the original ones, that is, it will not “see beyond” the window.

Optionally, an alignment parameter can be specified. For example, the default alignment for a 24-hour tumbling window is to start and end each window at midnight. If a custom alignment is specified (for example, a 24-hour window from 9:00 A.M. to 9:00 A.M.), the windowing is aligned according to this absolute point on the timeline (in this case, any datetime value having 9:00 A.M. as its time portion). The alignment parameter must be provided as a DateTime object of kind DateTimeKind.Utc.

It is important to understand that by using hopping windows, the applied set-based operation produces a result regardless of whether the input has changed with regard to the previous window. This is because hopping windows divide the timeline along fixed intervals. The following illustration shows the application of a hopping window that is much longer than its hop size. This is typical for scenarios such as "every 10 seconds, compute the average of all events within five minutes."

HoppingWindowForEvents2

The following illustration shows the result of such an aggregation on top of the windowed stream when the output policy is PointAlignToWindowEnd.

Aggregations with PointAlignToWindowEnd

For a time-insensitive aggregation like Sum, Avg, Count, and so forth, all of the aggregation results in this diagram carry the same value, because it is always the same set of payloads, e1 and e2, that contribute to the underlying windows. This behavior of repetitive results must be taken into account, especially if such a window is to be applied inside a group and apply operation with a high number of groups. With a window frequency higher than the original event frequency (for example, as shown in the previous diagram), the output event rate will be significant. If an aggregation result should be produced only if the input changes, a Snapshot window should be used instead.

Output Policies

PointAlignToWindowEnd

This output policy yields a point event whose start time is the end time of the window, as shown in the previous diagram. This new output policy is useful when you combine the result with another stream, since there is only a single valid result at each point in time, which expresses the most recent aggregation result at that point. A hopping window with this output policy can be combined with the point-to-signal design pattern to create a continuous stream of aggregation results, which, at each point in time, contains an interval event carrying the last known result.

PointAlignToWindowEnd is the default output policy for a hopping window if the output policy is not specified.

Note


For StreamInsight 2.0 and earlier, an additional output policy exists for CEPStream streams:

ClipToWindowEnd

This output policy yields a window size that corresponds to the lifetime of the set-based operation, as shown in the following diagram. Note that this has implications for the liveliness of the query.

ClipToWindowEnd output policy for hopping windows

New output policy for hopping windows

CTI Behavior

Note that hopping windows can have an effect on current time increment (CTI) events. When the output policy is ClipToWindowEnd, then each CTI event will be moved to the beginning of the respective window. The reason for this is that the window size is assigned to the result of the operation on top of the window. Hence, as long as events within the window are received, the entire timespan of the window is subject to change. However, when the output policy is NEW_POLICY, then CTI events are passed through without change. For more information about CTI events, see Advancing Application Time.

Defining Hopping Windows

A hopping window is defined by its window size and hop size, as shown in the following example.

var hoppingAgg = from w in inputStream.HoppingWindow(TimeSpan.FromHours(1),  
                                                     TimeSpan.FromMinutes(10))  
                 select new { sum = w.Sum(e => e.i) };  

The policy argument to the hopping window in the example above is a static property that returns an instance of the corresponding policy class.

If the hop size and window size are the same, an abbreviated version called a tumbling window can be used, as shown in the following example.

var tumblingAgg = from w in inputStream.TumblingWindow(TimeSpan.FromHours(1))  
                  select new { sum = w.Sum(e => e.i) };  

The hopping (or tumbling) window alignment is an optional parameter. In the following example, each window starts and ends at 9:00 A.M. Coordinated Universal Time (UTC).

var alignment = new DateTime(TimeSpan.FromHours(9).Ticks, DateTimeKind.Utc);  
var snapshotAgg = from w in inputStream.TumblingWindow(  
                         TimeSpan.FromHours(24),  
                         alignment)  
                  select new { sum = w.Sum(e => e.i) };  

See Also

Aggregations
TopK
User-defined Aggregates and Operators
Count Windows
Snapshot Windows
Using Event Windows