Messaging Delivery Guarantees and Safe Delivery

 

Updated: July 16, 2012

Message Delivery Guarantees and State Consistency

For workflows representing stateful processes, it is common to use distributed transactions in order to transact the processing of a message with the updated state of the workflow instance such that the message is guaranteed to be processed exactly once. Workflow Manager 1.0 does not rely on MSDTC in order to maintain state consistency between the workflow instance and its messages. Instead, an eventual consistency model has been implemented which relies on Service Bus to enable exactly once processing of inbound messages in a highly scalable environment. This does, however, result in some fundamental differences in handling inbound and outbound messages with workflow.

Inbound messages (sent via the Notifications endpoint – see above) are guaranteed to be delivered to the workflow as long as a 200 status code is returned. Messages can be retried idempotently as long as they specify the same MessageId, ensuring that the reliability of the HTTP transport doesn’t affect the overall delivery guarantee. Internally, once received by the Notifications endpoint, the message is stored durably in Service Bus until it is delivered to the workflow(s).

When a workflow receives a message, it is loaded and starts or resumes its execution. During an episode of workflow execution, any side-affecting work, such as outbound messages sent using HTTP activities, is collected and held until the updated state of workflow has been successfully saved. At that time, the inbound message (which has now been processed) is removed from the durable storage, and then the side-affecting work is executed. Outbound messages are sent, and are subsequently retried if they fail due to network issues or process failures. The end result is that outbound messages are guaranteed to be sent at least once, and therefore web services may need to be designed to handle messages idempotently. Responses are returned back to the workflow in order to resume workflow execution.