Troubleshooting Resilient StreamInsight Applications

This topic helps you to troubleshoot resilient StreamInsight applications by describing the prerequisites for resilient queries and some of the common errors and failures that can occur.

Resiliency is available only in the Premium edition of StreamInsight. For more information, see Choosing a StreamInsight Edition.

In This Topic

  • Prerequisites for a resilient query

  • Errors when creating and configuring a resilient server

  • Errors when defining a resilient query

  • Errors when saving a checkpoint

  • Errors when recovering from a failure

  • Errors when storing metadata and logs on a network share

Prerequisites for a resilient query

A query that is configured for checkpointing must satisfy the following requirements:

  • It must have an output adapter. That is, it cannot expose only a published stream.

  • It cannot consume a published stream.

  • It cannot use IObservable or IEnumerable inputs.

  • It cannot use streams that have been synchronized by using AdvanceTimeImportSettings to copy CTIs from another stream. This usage is described in Advancing Application Time.

[TOP]

Errors when creating and configuring a resilient server

The call to the Server.Create method can raise an exception under the following conditions:

  • A resiliency configuration is provided through the Server.Create method or the app.config file, but the SQL Server Compact metadata provider is not specified.

  • The log path specified for resiliency does not exist, and the server is not configured to create it.

  • The server is configured to create the log path, but the creation of the log path does not occur or fails.

  • The server has insufficient privileges to write to and read from the specified log path.

  • The server is configured for resiliency, but the edition of StreamInsight is not an edition that supports resiliency. Only the Premium edition supports resiliency. For more information about StreamInsight editions, see Choosing a StreamInsight Edition.

[TOP]

Errors when defining a resilient query

The call to create a resilient query through the CepStream.ToQuery method or the Application.CreateQuery method can raise an exception under the following conditions:

  • The server has not been configured for resiliency by providing a resiliency configuration and by specifying the SQL Server Compact metadata provider.

  • The query consumes events from IEnumerable or IObservable sources.

  • The query consumes events from a published stream.

  • The query uses synchronized streams.

  • The query only writes to a published stream.

[TOP]

Errors when saving a checkpoint

The call to save a checkpoint can fail and the EndCheckpoint method can raise an exception under the following conditions:

  • The server is not configured for resiliency.

  • The query is not configured for resiliency.

  • The query is not running.

  • A checkpoint is already in progress. In this case, the first checkpoint succeeds; all subsequent overlapping checkpoints fail.

  • The EndCheckpoint method is called with an IAsyncResult that does not correspond to an active checkpoint operation.

  • The EndCheckpoint method is called with an IAsyncResult that corresponds to a checkpoint for which EndCheckpoint has already been called.

If there is an I/O error during checkpointing, all checkpoint operations in progress are terminated, and their EndCheckpoint operations raise an exception. However you can continue to attempt subsequent checkpoint operations, since this I/O failure may be transient.

[TOP]

Errors when recovering from a failure

If there is an I/O error reading a checkpoint file, all queries that depend on that file are suspended, and the cause of the failure is logged with the query. The queries are suspended and not aborted to preserve the query metadata, since it is possible that the I/O error is transient.

If the recovery of a query fails, it is not possible to restart recovery. You can try the following steps:

  • If it is possible that the failure is transient, shut down and restart the server to retry the recovery.

  • If the failure is not transient, you can stop the query.

If the query being recovered fails due to an exception in an operator or adapter, the query will be aborted.

If the attempt to recover a query causes the server to fail, then you can take the following steps:

  1. Restart the server without resiliency.

  2. Stop the query or queries that are causing the server to fail.

  3. Restart the server again with resiliency.

[TOP]

Errors when storing metadata and logs on a network share

When the checkpointing log is stored on a network share, transient I/O errors are not fatal to the checkpointing process.

When the SQL Server Compact database of metadata is stored on a network share, any I/O error is fatal, and causes the StreamInsight server to fail.

[TOP]

See Also

Concepts

StreamInsight Resiliency

Building Resilient StreamInsight Applications

Monitoring Resilient StreamInsight Applications