BizTalk Orchestration: Transactions, Exceptions, and Debugging
Summary: This article examines the transactional support available in Microsoft BizTalk Orchestration Services and looks at how to use the transactions and exception-handling support to handle errors that might occur in schedules. In addition, it looks at how to debug schedules and components in schedules. This article is targeted at designers and developers implementing long-running business processes using BizTalk Orchestration Services. (22 printed pages)
With Microsoft® BizTalk™ Orchestration Designer, users can design long-running business processes, specify an implementation for the individual actions that make up those processes, and compile this information into an executable XML representation, known as an XLANG schedule. The schedules created are distributed across time, organizations, and applications, in a loosely coupled and scalable manner. However, because of the highly distributed nature of these processes, the likelihood of errors and exceptions occurring during execution of schedules is even greater than for traditional short-lived business processes.
Orchestration Designer presents a visual design and development environment that separates the business process being developed from the implementation of that process. Using this tool, developers can specify an implementation for each of the individual actions that make up those processes and can compile this information into an executable XML representation. BizTalk Orchestration Designer provides a rich set of programming constructs, including transactions and exception processing semantics.
Transactions are provided to group collections of actions into a single logical unit of work, to ensure that all the work done by the actions within the group is committed, or that all the work is undone. This grouping of actions provides the highest level of structure and reliability. There is support not only for short-lived transactions, but also for transactions spanning long-running business processes, and timed transactions
Exception processing provides additional logic to undo the results of transactions or to provide an alternate series of actions to take in the event of a processing error. This exception processing includes On Failure and Compensation processing to support error handling for long-running business processes.
Finally this article discusses methodologies to debug and troubleshoot BizTalk Server Orchestration Services and BizTalk Messaging Services.
What Is a Transaction?
Mankind has engaged in transactions since the earliest times. In a typical scenario, a buyer and a seller negotiate a suitable price for some goods. Assuming an agreement is struck, the buyer hands over the money in exchange for the goods. The important point here is that either the whole transaction proceeds or none of the transaction proceeds. If the buyer gets the money and the seller gets the goods, everyone is happy. If the seller doesn't get the money, he doesn't give the goods to the buyer, which is still an acceptable outcome. However, if the buyer hands over the money and doesn't get the goods, the buyer is unhappy. Similarly, if the seller hands over the goods but doesn't get the money, the seller is unhappy.
If the buyer and the seller don't trust each other to uphold their end of the bargain, they might call on the services of a trusted intermediary. To carry out the transaction, the buyer hands the money to the intermediary, and the seller hands the goods to the intermediary. The intermediary then ensures that the seller receives the money and the buyer receives the goods.
What does all this have to do with transactions on computers? Consider a process to transfer money in a banking application: money is taken from one account (a value is decremented in a record in one database) and put into another account (a value is incremented by the same amount in a record in another database). This series of operations must act like a single atomic operation (that is, perform as a single indivisible operation). This is termed a transaction (the term is derived from the phrase "transformation action"). A transaction is an action or series of actions that transform a system from one consistent state to another.
Transactions adhere to a set of properties known as ACID properties:
- Atomicity. A transaction represents an atomic unit of work. Either all modifications within a transaction are performed or none of the modifications are performed.
- Consistency. When committed, a transaction must preserve the integrity of the data within the system. If a transaction performs a data modification on a database that was internally consistent before the transaction started, the database must still be internally consistent when the transaction is committed. Ensuring this property is largely the responsibility of the application developer.
- Isolation. Modifications made by concurrent transactions must be isolated from the modifications made by other concurrent transactions. Isolated transactions that run concurrently will perform modifications that preserve internal database consistency exactly as they would if the transactions were run serially.
- Durability. After a transaction has committed, all modifications are permanently in place in the system. The modifications persist even if a system failure occurs.
Just like the analogy about transactions in a marketplace, often neither party involved in a transaction has control over the other one (for example, in a transaction involving updates in two separate databases), so neither party is able to guarantee the atomicity of the process. And just like in the marketplace analogy, the solution to this problem is to introduce a third party or intermediary to ensure that either both actions occur or neither action occurs.
On Microsoft® Windows NT® 4 and Microsoft Windows® 2000, this intermediary is known as the Microsoft Distributed Transaction Coordinator (MSDTC). MSDTC was first released together with Microsoft SQL Server™ 6 and provides an object-based programming model for creating, destroying, managing, and monitoring transactions. MSDTC works in conjunction with some helper services (known as resource managers) to ensure that the ACID properties for a transaction are maintained.
Resource managers own the objects affected by the transactions and are responsible for the persistent storage of the resource objects. A resource must have a resource manager to take part in a transaction with the Distributed Transaction Coordinator. Note also that the Distributed Transaction Coordinator and the resource managers can be distributed across multiple nodes on a network.
To coordinate the actions in a transaction, and to maintain the ACID properties, the Distributed Transaction Coordinator and the resource managers use a protocol known as the two-phase commit protocol. The algorithm for the two-phase commit protocol is a complex sequence of operations, which increases in complexity as the number of resources (and therefore resource managers) increases. The most significant feature of the protocol is that the records that will be updated must be locked during the two-phase commit. This lock on the database records remains until the transaction is either aborted or committed. This factor has an important bearing on transactions in long-running business processes.
Despite the fact that the Distributed Transaction Coordinator greatly improves the ease with which programmers utilize transactions in their applications, this model still suffers from one big weakness. Transactions are typically implemented within components (initiated, and committed or aborted). When the actions that make up a complete transaction are spread across multiple components (that is, initiated in one component and either committed or aborted in another), it is difficult to reuse those components to implement new transactions composed of a different combination of components.
In the diagram, Transaction 1 is initiated inside the Customer component and committed in the Invoice component. There is also an Agent component, which initiates a second transaction. Now if for some business process another transaction (Transaction 2) is created that consists of the Customer and Agent components, the transaction can't be easily composed, because both components initiate a transaction but neither component completes the transactions (commits or aborts the transaction).
In 1997, Microsoft released the Microsoft Transaction Server (and also released a new version of the MSDTC). This product was revolutionary in providing a declarative model for transaction programming. Now, instead of programming the transaction semantics within a component (and thus essentially hard-coding the composition of the transaction), the programmer declares transaction properties for a component as a whole, and implements entire transactions by composing the transaction from individual COM components.
This new model allowed programmers to compose their transactions in a much simpler manner, greatly increasing reuse of transactional components. COM components became the building blocks for business transactions. All of the services provided by Microsoft Transaction Server have now migrated to Windows 2000 and have been significantly enhanced as COM+ services under Windows 2000.
COM+ provides five levels of transactional support:
- Disabled. This selection specifies that the component will ignore COM transaction management.
- Not Supported. This selection specifies that the component will not participate in a transaction, or propagate the transactions of other components.
- Supported. This selection specifies that if a transaction is currently running, the component will be included in the transaction. However, the component will not initiate a transaction.
- Required. This selection specifies that if a transaction is currently running, the component will be included in the transaction. If there is no transaction running, a new transaction will be created for the component.
- Requires new. This selection specifies that a new transaction will always be created for the component.
BizTalk Orchestration leverages off the existing COM+ services, providing a sophisticated graphical programming paradigm for developing complex business processes, complete with transaction programming and exception handling semantics, that provide the same kind of revolutionary transaction programming semantics as COM+ services.
There are multiple levels of transactional support within BizTalk Orchestration. The first level of that support comes from treating an entire schedule as a COM+ transactional component. Next, it is possible to specify transactional semantics for a collection of actions within that schedule by enclosing those actions within a transaction shape. This allows schedules to support short-lived DTC style transactions (transactions managed by the Distributed Transaction Coordinator and utilizing the underlying COM+ services), and to additionally support long-running transactions (which represent business processes that run over an extended time period) and timed transactions (which represent actions that might time out after an extended period). Schedules also support transaction compensation and exception processing semantics.
Business Process Diagrams as a Transaction Participant
The first level of transaction support provided by BizTalk Orchestration Services allows an entire schedule to be treated as a transactional component. The transactional support of the schedule is set declaratively in a manner similar to the way transactional support is declared for a COM+ component. The schedule is then initiated by a COM+ component, which might or might not already be running within a transactional context. In essence, the schedule provides the implementation of that transactional COM+ component.
The transaction model for a schedule can be set by opening the Properties dialog box for the Begin shape at the start of the schedule. By default this is set to Include transactions within the schedule. To treat the whole schedule as a transactional component, select Treat the XLANG Schedule as a COM+ Component. The level of transactional activation for the schedule can also be set:
- Select Not Supported if the XLANG schedule does not support transactions.
- Select Supports if the XLANG schedule participates in a COM+ transaction.
- Select Requires if the XLANG Scheduler Engine works with COM+ to ensure that all the COM components that are created by the schedule are transactional.
- Select Requires New if the XLANG schedule must participate in a new transaction. If this setting is enabled, COM+ services automatically initiate a new transaction that is distinct from the caller's transaction.
Using this mechanism, the orchestration engine effectively provides business process automation implemented within a single COM+ component. That is, the whole schedule functions as a single COM+ component, and that COM+ component can support transactions as described above. Note that when using an entire schedule as a component, that schedule cannot contain any transaction shapes itself (transaction shapes can be included in the schedule, but the schedule won't compile), and there are limitations on the use of concurrent streams of execution within the schedule—when using the Fork shape in the schedule, all transactional actions must occur in one stream of execution.
Note that the mechanism of using the schedule as a self-contained transactional component relies on the underlying COM+ services to manage transactions. If a transaction is aborted, only the actions implemented in terms of transactional components will be rolled back. Transactional components can be COM+ components, Script Components, or transactional Microsoft Message Queues.
In this example, the schedule has been configured as Treat the XLANG Schedule as a COM+ Component (in the Properties dialog box of the Begin shape). It has also been configured to require a transaction. The implementation of the schedule (not shown) reads a message from a transactional message queue (receive queue) and writes the message to another transactional message queue (send queue).
If this schedule is executed by instantiating it from a COM+ component, which is also configured to require a transaction, and that transaction is committed (the component calls SetCommit), a message will be read from the receive queue and written to the send queue. If, however, the COM+ component for some reason aborts the transaction (calls SetAbort), the message that was read from the receive queue will be replaced in the queue, and no message will be written to the send queue.
Types of Transactions within Schedules
If the transaction model for a schedule is set to Include Transactions within the XLANG Schedule (the default setting), the schedule can contain transaction shapes. To add transactions to a schedule, drag the Transaction shape from the flowchart palette and position it to enclose all the actions that will take part in the transaction.
It is also possible to nest one or more transaction shapes within an outer transaction shape. A short-lived transaction groups a series of actions within its boundaries, but it cannot nest another transaction. Long-running transactions and timed transactions, however, can be used to group any combination of actions—short-lived transactions, long-running transactions, or timed transactions. Note, however, that transactions cannot be nested deeper than two levels.
Properties for the transaction can be set by clicking Properties for the transaction shape, which displays the Transaction Properties dialog box. This allows the transaction to be named and the transaction type (timed, short-lived, or long-running) and other transaction properties to be set.
Additionally, On Failure code or Compensation code can be added to the schedule if appropriate. On Failure code creates a new page on the schedule (On Failure of Transaction page), which is used to design an alternate business process to handle the failure of the selected transaction. This option is available for all transactions (see "Transaction On Failure Processing" later in this article). Compensation code also creates a new page in the schedule (Compensation for Transaction page), which is used to design an alternate business process to undo the logical unit of work that was performed in a nested transaction that has already committed. This option is available only for nested transactions (see "Transaction Compensation Processing" later in this article).
The other transaction properties that can be set are:
- Timeout. This property sets the time a transaction is allowed to run before it will be automatically aborted or retried. This property cannot be set for long-running transactions.
- Retry count. This property determines the number of times a process within a short-lived transaction will be run if the process within the transaction does not complete. For each retry, the state of the application is reset to the starting point of the process within the transaction. This option is available only for short-lived transactions.
- Backoff time. This property determines the interval between each attempt to retry the transaction. The backoff time is used with the retry count value to determine how long to wait before the next transaction retry. The backoff value is exponential. A backoff value of 2 seconds results in intervals of 2, 4, 8, 16 seconds, and so on between each retry. The formula is B**R (B raised to the power of R), where B=backoff time and R=current retry count. If the backoff time of a specific transaction retry attempt is greater than 180 seconds, the XLANG schedule instance will be dehydrated to the persistence database immediately. This option is available only for short-lived transactions.
- Isolation level. The isolation level determines the degree to which data within concurrent transactions is accessible to each other. This option is available only for short-lived transactions. The choices are:
- Serializable to prevent concurrent transactions from making data modifications until the selected transaction is complete. This is the most restrictive of the four isolation levels.
- Read Uncommitted to allow concurrent transactions to make data modifications before the selected transaction is complete. This is the least restrictive of the four isolation levels.
- Read Committed to prevent the selected transaction from accessing data modifications in concurrent transactions until they are committed. This option is the Microsoft SQL Server default setting.
- Repeatable Read to require read locks until the selected transaction is complete.
Short-lived (DTC Style) Transactions
When a transaction shape is set up on a schedule, it defaults to being a short-lived transaction (transaction box is filled in gray). This transaction type is dependent on the underlying transaction support from COM+ and MSDTC. Short-lived transactions allow atomic (single, indivisible) units of work to be created from a number of discrete and independent units.
Although the properties for the transaction can be set in the Properties dialog box, and the boundaries of the transaction defined by the actions that are grouped within the transaction shape, short-lived transactions depend on the transaction properties set for the implementation port connected to that action, and the transaction properties of the components, message queues, or scripts referenced by that implementation port.
Specifically, this means that the implementation for the actions enclosed by a transaction shape should be COM+ components that support transactions, scripts that are marked as transactional, or reads and writes to transactional message queues if those actions are to be successfully aborted. If a transaction shape encloses an action connected to an implementation port that does not support transactions, the work done by that COM+ component, script, or queue will not be rolled back if the transaction is aborted. Taking this into account, nontransactional components can still be used to implement actions that are part of a transaction.
The schedule shows three actions that are enclosed in a short-lived transaction. In the implementation of this schedule (not shown), the first action is a message arriving in a transactional message queue (receive queue). A COM+ component is then instantiated, and a method is called on the component. The method displays a dialog box, which lets the user select either to commit or to abort the transaction (call SetCommit or SetAbort within the method on the COM+ component). The last action takes the original message and writes it to another transactional message queue (send queue). When this schedule is executed, if the user elects to call SetCommit, the message will be read from the receive queue and placed in the send queue. However, if the user elects to call SetAbort, the message will remain in the receive queue.
The last thing to note is that for every instance of this schedule, a new instance of the Query Abort component will be instantiated as the short-lived transaction starts, and that instance will be destroyed when the transaction terminates (either aborts or commits). This is the same just-in-time activation model first delivered with Microsoft Transaction Server. Any state held by the component will be lost.
When looking at a business process that might execute over an indefinite time period, traditional short-lived transactions can't be used. This is because each short-lived transaction holds database locks and resources. Given that there can be thousands of business processes running on a computer at any particular time, the number of these resources held would be impractical. Instead, the transaction type is set to be long-running. A long-running transaction has all the ACID properties described previously except one, Isolation.
Isolation means that nothing outside a transaction can even see (let alone update) any of the data that is being used within a transaction. The reason for isolation is that the result of the transaction is unknown until it either commits or aborts, so the current data value might be valid or invalid. Since the data might be invalid, nothing else can be allowed to access the data, in case it is misused. Isolation is a property of short-lived transactions (one of the ACID properties) and is implemented by locking records in the database.
In a long-running distributed business process, records in a database can't be locked for extended periods of time, nor can records be locked in databases distributed across organizations (imagine trying to convince the database administrator of another organization to let you lock records in his database!). Long-running transactions are specifically designed to group collections of actions into more granular atomic units of work that can exist across time, organizations, and applications. In a long-running transaction, other transactions can see the data being used by the transaction. Of course long-running transactions can also be composed of actions that are themselves short-lived transactions (short-lived transactions can be nested within long-running transactions).
For example, imagine a business process that is initiated when a purchase order request is received. The request is logged to a database and then sent to the request approver. It might take some time (weeks!) for the approval response to be received, at which point the response is also logged to a database and the purchase order is sent to the supplier. Receiving the initial request (and logging it) and receiving the response (and logging it) are themselves each composed of multiple actions (receiving and logging).
In this scenario, short-lived transactions are used to group related actions into a single atomic transaction (receiving a message and logging it to the database). However, the receipt of the purchase request message and the receipt of the approval message can't be grouped within a single short-lived transaction, because that would lock rows in the database for indefinite periods. Imagine if 5000 users all did that at the same time! Instead, a long-running transaction is used to group the two short-lived transactions, which might be separated by a significant time period.
Now imagine what happens when this business process is executed. First, the purchase request is received and the database is updated in a short-lived transaction. If anything goes wrong, the transaction will be aborted and all changes will be undone; otherwise, the transaction commits. Then the schedule waits for the arrival of the approval message. When the message arrives, the database is again updated transactionally.
If anything goes wrong, the To Supplier transaction will abort automatically. However, the Receive PO transaction can't be aborted, because it has already been committed. In this event, the first transaction needs to supply some code that can undo the actions it has performed, in the event of a transaction abort after the transaction has already committed. This is known as a compensating transaction (see "Transaction Compensation Processing" later in this article). In this scenario, if something causes the To Supplier transaction to abort, the resource managers and MSDTC will take care of undoing all work done by the To Supplier transaction. The Compensation code supplied by the Receive PO transaction will undo the already committed changes made by that transaction.
The overall grouping (composition) of short-lived transactions into a long-lived transaction is controlled by the long-lived transaction. Typically, a long-running transaction will contain several nested short-lived transactions. Depending on the requirements of the business process described by the XLANG schedule drawing, an entire business process (with the exception of the Begin shape and an End shape) can be enclosed within a long-running transaction as shown here.
Timed transactions are used to trigger an abort of a long-running transaction if it has not completed in a specified amount of time. Long-running transactions do not utilize the time-out property on the property page. It is typically very difficult to decide in advance how long a business process should take. However, it is possible to make a reasonable estimate of how long a specific action within a business process should take, for example, the arrival of a message.
Thus, a timed transaction can be used to group short-lived transactions and to wait for the arrival of a message within a specified time period. If the message arrives in time, the timed transaction commits; otherwise, the timed transaction aborts and causes the short-lived transactions to execute their Compensation code.
In the example, a short-lived transaction is used to Send Money. This transaction groups the Withdraw Money and Initiate Wire Transfer actions. When the Initiate Wire Transfer action has completed, the business process sequence flows out of the nested transaction. When this happens, the nested transaction is committed: the money is withdrawn from a bank account and sent to a destination. At this point, the business process sequence flows to the Wait for Acknowledgement action in the outer transaction.
In this scenario, the Wire Transfer transaction has been configured as a timed transaction. If the sender has not received an acknowledgement of receipt of the money within the specified amount of time, the outer transaction will abort. When this happens, the business process sequence flows to the Compensation for Send Money page for the nested transaction and to the On Failure of Wire Transfer page for the outer transaction (see "Transaction Compensation Processing" later in this article).
Timed transactions can also be modeled by having two flows of execution within a schedule, one of which waits for the arrival of the message, while the other has a timer that will time out within the specified period. Whichever event occurs first (arrival of the message or time-out of the timer) completes the transaction (causing a commit or abort, respectively). However, modeling a timed business process in this way would impose restrictions on the ability of the schedule to dehydrate itself, and, in any event, timed transactions are much more convenient.
Transaction Properties of Implementation Ports
As noted previously, there is a distinction between the action shapes used in a schedule and the implementation of those shapes in the implementation port. The transaction properties of the actions are dependent on the transactional properties of the underlying implementation. This means that only actions that are implemented using transactional components will actually take part in a transaction.
Specifically, this means that the implementation of the actions enclosed by a transaction shape must be COM+ components that support transactions, scripts that are marked as transactional, or reads and writes to transactional Message Queues if those actions are to be successfully aborted. When linking the binding of COM+ or script components to the port implementation, the transaction support of that implementation port can be set in the same way as transaction support for a COM+ application is set (disabled, not supported, supported, required, requires new).
It is perfectly acceptable to implement actions inside a transaction with nontransactional implementation ports, but any changes made by those implementations won't be rolled back in the event of a transaction abort. In any case, transactions won't be supported in implementation ports unless they deal with resources that are managed by resource managers that can work with the Distributed Transaction Coordinator, which in practice means most common databases, and Microsoft Message Queuing. Nontransactional cases are handled using On Failure processing (see "Transaction On Failure Processing" later in this article).
The last transaction property that can be set in the port implementation is the ability to abort a transaction if an error occurs during the processing of that component or script. Using this mechanism, the current transaction can be aborted by returning a COM+ error from a COM+ object or script.
What Causes Transactions to Abort?
How is a transaction potentially aborted? Transactions execute normally until either the process flows outside the transaction boundaries (the transaction commits and completes) or an abort occurs. An abort can occur for a number of reasons:
- Encountering the Abort shape within the process flow.
- A failure return code from a COM+ component (HRESULT) that is specified to cause an abort in a port binding.
- Any binding technology can, at a system level, introduce a failure event that aborts the transaction. For example, Message Queuing might fail to put a message on a queue.
- The XLANG Scheduler Engine (the COM+ application that executes instances of schedules) might encounter an error that causes it to abort a transaction within a given instance. For example, there might be a DTC error.
- Pausing a schedule might require all transactions within that schedule to abort.
- A transaction time-out within the transaction properties.
When an abort occurs, a transaction might retry from the beginning, depending on the value set in the Retry count property of the transaction group. If, after a transaction has retried the specified number of times, it continues to fail, the On Failure business process will be called. This On Failure code provides a structured place to handle the failure of a transaction.
As the previous section shows, short-lived transactions can be used to provide automatic rollback and recovery for some of the actions in schedules. However, many of the actions can't be implemented in a transactional manner, so to handle error conditions, other forms of error handling, such as exception processing, and compensating transactions must be used. This section focuses on how to build error handling into schedules.
Causes of Errors
Looking first at the possible cause of errors in a schedule, there are three levels of errors that can occur while the XLANG Scheduler Engine is running. In decreasing order of severity, these are:
- Errors that cause a failure. System errors that cannot be trapped by the XLANG Scheduler Engine can cause the engine to fail along with all schedule instances that are running in the same COM+ application. The most likely cause of such a failure is an in-process, badly written COM+ component. Such components should be well tested out-of-process and then placed in process.
- Errors that cause an abnormal termination, including an out-of-sync COM+ component, a message queue that does not exist, or a messaging channel that does not exist.
- Errors that can be trapped.
Naturally, during the processing of the schedule, errors need to be detected and handled appropriately. Errors that can be trapped within an XLANG schedule include COM components that return failure HRESULTs (this applies to COM+ components or scripts) and transaction aborts caused by enlisted services (such as if the connection to a database was lost).
As indicated in the previous section, the XLANG Scheduler Engine can trap application and system errors. XLANG schedules can be designed to react to errors at run time, either by testing explicitly for an error result using a decision rule or by using transaction failure processing.
To use logical branching to explicitly test for an error result, the value returned after calling a method on a COM+ component or script is tested. This value is stored within the __Status__ field of the _out message from the COM component (all actions in a schedule are implemented in terms of messages; in the case of COM+ components this means a message is sent in to the component and another message is sent out from the component).
To implement this, a Decision shape is added immediately after the action whose result needs to be tested, and a rule is added to test the output of the COM component (_out.__Status__ >= 0, where a negative HRESULT indicates failure and a positive HRESULT indicates success). Specific failure codes can also be tested for, if this is appropriate. These codes are defined in the header file Winerror.h.
Errors can also be handled using transaction failure processing. If an action is enclosed within a transaction shape and the action is implemented as a COM+ component or script, and that component or script aborts the transaction, the work done by all components taking part in the transaction will be undone. If the component that triggers the abort is not transactional, the transaction abort needs to be triggered in some other way. Setting the error handling within the COM Component Binding Wizard to abort the transaction if the method returns a failure HRESULT does this.
This option will have an effect only if the communication action that uses this port is within the process flow of a transaction. When this is set for a COM+ component or script, and a bad HRESULT is returned, any transaction currently running will be aborted. The same functionality can be achieved in a schedule by testing for the bad HRESULT using a Decision shape, and then executing an Abort shape if a bad HRESULT is returned (but the error handling in the COM Component Binding Wizard is much more convenient).
Handling a failure in the Message Queuing or BizTalk Messaging implementation technologies can only be performed with transaction failure processes. Transactional support is specified in the Message Queuing Binding Wizard by indicating that transactions are required with this queue (this is done automatically for BizTalk Messaging). Note that a Message Queuing send action that returns successfully indicates that the message has been successfully placed onto the queue, but it does not indicate that the message has been delivered.
Transaction On Failure Processing
Grouping individual actions that use short-lived transactions into more granular business processes is obviously one very effective mechanism for safeguarding schedules against errors. However, with long-running business processes, a number of other mechanisms must be used to develop schedules that can handle errors appropriately.
With short-lived transactions, the boundaries of transactions are set declaratively using the Transaction shape, and then those transactions are aborted either by calling SetAbort within a transactional component or by having a component return a bad HRESULT, which can be trapped. If the actions within the transaction are bound to transactional resources, the Distributed Transaction Coordinator will handle the rollback of all the enlisted actions within the transaction. Any work done will then be completely undone.
However, there are many circumstances where traditional short-lived transactions are either inadequate or unable to perform as required. In these cases, On Failure processing can be used to add additional error handling semantics to schedules. On Failure processing is implemented as unique, separate flows within schedules, implemented on separate processing pages in BizTalk Orchestration Designer. When setting the properties for a Transaction shape, the business process designer can choose to add code for On Failure processing. This results in an additional page, On Failure of Transaction, being added to the schedule . The business process designer can add additional logic here to handle the failure of the transaction. This code will be invoked if the transaction aborts (after the transaction has aborted, and all the transactional components have undone their work).
Now, when a short-lived transaction aborts, any actions bound to nontransactional resources (for example, sending e-mail) will not be rolled back. Additional actions can be added to the On Failure of Transaction processing page to undo these nontransactional actions (for example, sending another e-mail that states that the first e-mail should be ignored).
Of course On Failure code can do literally anything, and it does not have to confine itself to undoing actions grouped within the transaction. As well as undoing nontransactional actions, other work might need to be done when a transaction aborts. In a typical business process, aborting a transaction and cleaning up the work done is seldom sufficient. At the very least, the transaction failure might need to be logged, but more typically it will also need to perform other actions, such as letting the user know the result of the transaction. Once again, On Failure processing can be used to implement these actions in the event of a transaction failure.
On Failure processing can also be applied to long-running transactions and timed transactions. For example, if a timed transaction is set up to await the receipt of a message, the On Failure processing can be used to alert the appropriate user when the message fails to arrive.
There is one further subtlety associated with On Failure processing, namely that any actions that occur after a transaction have no way to determine if that transaction committed or aborted, unless this information is passed to those actions in a message. For example, consider a schedule with a number of actions, some of which are grouped into a transaction. A message (an XML document perhaps) passes through the schedule from start to finish.
If the transaction commits, everything will operate correctly, and the message will be updated by the actions (the work done by the actions will be committed). However, imagine if one of the actions aborts the transaction. In this case, all the changes to the message will be undone. An action after the transaction will not be able to tell if the transaction has committed or aborted (that information is not passed on), and it won't have any idea which actions successfully processed the data and which action failed, since all changes to the message will be rolled back.
The On Failure of Transaction page again is the best way to implement this scenario. The On Failure code will be executed after the transaction aborts and, significantly, all changes made to messages will be available to the On Failure code. Additionally, the On Failure code can set fields in the message that indicate that a transaction abort has occurred, so that successive actions can take appropriate action.
Transaction Compensation Processing
On Failure processing works in terms of a single transaction. Transaction abort processing becomes even more complex with long-running business processes and nested transactions (as discussed previously, transactions can be nested within long-running or timed transactions).
The example shows the timed transaction Wire Transfer, which groups the short-lived transaction Send Money. If this schedule is run, the Send Money transaction will execute (and presumably commit). The schedule will then wait for an acknowledgement to indicate that the wire transfer has occurred correctly.
If the acknowledgement does not arrive within the specified time period (whatever time-out was set for the timed transaction), the timed transaction (Wire Transfer) will abort. However, the inner transaction (Send Money) cannot be aborted, since it has already committed. Even if On Failure code were supplied for this transaction, it would not be called, because this inner transaction has not failed.
This scenario is handled by providing Compensation processing for the inner transaction. In the Transaction Properties dialog box for a nested transaction, Compensation processing code can be added, which (like On Failure processing) results in an additional page, Compensation for Transaction, being added to the schedule. Code can be added to this page to compensate for the (already committed) inner transaction.
With nested transactions, it is entirely feasible that multiple Compensation for Transaction and On Failure of Transaction processing pages will exist within the schedule, and that more than one of these will be executed to perform the error handling required. In the previous example, assume that the transaction Send Money has both an On Failure of Transaction page and a Compensation for Transaction page, and that the timed transaction Wire Transfer has an On Failure of Transaction page.
There are two likely scenarios for failure in this schedule. The first is that the Send Money transaction aborts, in which case the On Failure processing for the Send Money transaction will be executed. Normally, the On Failure processing for the Wire Transfer transaction would not execute, since the outcome of the inner transaction does not affect the outcome of the outer transaction. In this case, since no acknowledgement will be sent, the outer timed transaction will also eventually fail, and the On Failure code for the Wire Transfer will be called. The second scenario is that Send Money will commit, but the acknowledgement message will not be received within the time-out period, causing the timed transaction to abort. In this case the Compensation processing for Send Money will execute first, followed by the On Failure processing for Wire Transfer.
With traditional development systems, it is now commonplace to provide visual debugging facilities, such as those found in Microsoft Visual Basic® and Microsoft Visual C++®. Microsoft BizTalk™ Server does not provide a graphical debugging facility for orchestration schedules. However, remember that orchestration schedules represent a different kind of executable process from traditional short-lived synchronous processes, so the traditional debugging model alone is not effective.
When an orchestration schedule is designed using BizTalk Orchestration Designer, the schedule drawing is in effect a painting of a business process. To represent a process, three artifices are used:
- An Action, which is always either send or receive a message.
- A Message, which is data that is sent or received.
- A Port, where messages are sent to or received from.
In addition, because the schedule represents a process, there is the concept of sequencing from one action to another. When the orchestration schedule is compiled and then executed, typically multiple instances of the schedule will be initiated as individual (long-running) executable processes.
To debug these executables, a combination of tracing and conventional debugging proves most effective. To debug the sequencing of a schedule (the flow from one action to another), tracing is useful. To debug the implementation of an individual action, traditional debugging mechanisms can be employed. By combining the two techniques, schedules can be debugged most effectively.
When running a schedule, the schedule is executed under the control of the XLANG Scheduler Engine, which is a COM+ application. When BizTalk Server is installed, a single instance of the scheduler engine is created, named XLANG Scheduler (the default). It is also possible to create custom COM+ applications that host XLANG schedules.
When these COM+ applications (default or custom) execute a schedule, they generate various events that can be trapped and displayed. BizTalk Server provides a tool, called the XLANG Event Monitor, to trap and display these events. The XLANG Event Monitor can subscribe to events published by host applications on any number of distributed computers, and can store these events for later analysis.
When the XLANG Event Monitor starts, it subscribes to receive events from all XLANG schedule host applications on the local computer. The main window shows all the COM+ applications that host XLANG schedules and, for each host COM+ application, shows all schedule instances that are currently running or completed, coded according to the following scheme:
- Green dot. Represents a running XLANG schedule.
- Black dot. Represents a successfully completed XLANG schedule.
- Red dot. Represents an XLANG schedule that completed with an error.
- Blue snowflake. Represents a dehydrated XLANG schedule.
- Blue lines. Represents a suspended (or paused) XLANG schedule. The schedule stays in this state until it is resumed or terminated.
Each instance in addition has the unique identifier for the instance listed (the instance GUID). Any of the listed running schedule instances can be suspended or terminated from the XLANG Event Monitor.
The XLANG Event Monitor can also be used to start a new instance of a schedule by selecting the COM+ application that is to host the schedule instance and selecting the appropriate schedule file (.skx file).
All the events of a specific schedule instance can be viewed by double-clicking a schedule instance within the XLANG Event Monitor. The events shown can also be filtered to show only certain classes of events, such as transactions, or errors; once events have been captured, they can be saved to disk and later reloaded for display.
Debugging Components in Schedules
While schedules themselves can't be loaded into a visual environment and debugged, COM+ components in those schedules can be debugged. This is done in exactly the same way as debugging a standard COM+ component that is being called from a client application (because in fact the schedule is implemented as a COM+ application, which instantiates and calls these custom COM+ objects).
To debug a Visual Basic component, the project is loaded into Visual Basic and built as usual. Note that the component must be compiled to Compile to Native Code and Create Symbolic Debug Information. The No Optimization check box should also be selected while debugging. Breakpoints can then be set, and the component run from the Visual Basic Integrated Development Environment. When the schedule is executed, it runs normally until it tries to instantiate the component and execute a method on that component. At this point, execution will stop at the breakpoint that was set. The component can be debugged as normal at this point.
Note If the XLANG Scheduler Engine has already loaded the DLL, it will not be possible to compile the component. If this occurs, the XLANG Scheduler Engine must be shut down, using the Component Services application. To do this, start the Component Services application, find the XLANG Scheduler Engine COM+ application, and click Shut Down on the context menu (right mouse button). The component should then compile. Alternatively, if BizTalk Orchestration Designer is running, it has a menu option to Shut Down All Running XLANG Schedule Instances, which can be used instead. After selecting this, all XLANG schedules will be shut down, releasing the lock on the DLL so it can be compiled.
Other Debugging Tips
In addition to tracing the progress of a schedule using the XLANG Event Monitor, other system monitors can be used to detect errors in running schedules and to track execution of a schedule. Errors raised by the XLANG Scheduler Engine will appear on the Application tab of the Event Viewer. These events are labeled as XLANG Scheduler errors within the Event Viewer. If necessary, the events presented in the Event Viewer can also be filtered to show only this event type.
The WFBinding group of errors in the Event Viewer means that a problem has occurred in the per-instance message queue interface between BizTalk Messaging Services and BizTalk Orchestration Services. The Orchestration port setting in BizTalk Messaging Services Messaging Port wizard requires you to enter manually the Orchestration port name and should you spell incorrectly the Orchestration port name then a WFBinding error will occur.
Another common error is a parsing validation error. These errors are most often caused when the instance document does not conform to the document specification created in the BizTalk Editor. In this case the document is delivered to the Suspended Queue and the error is logged in the Event Viewer. If you right-click on the item in the Suspended Queue you can examine, and copy, the document contents to the clipboard. It is often easiest to solve these problems by pasting the clipboard contents into a text editor such as Notepad and saving the file. Now that you have a document instance on the file system open the document specification in the BizTalk Editor and use the Tools-Validate Instance menu item to validate against your existing document instance. Note that even though the dialog box defaults to *.xml you can validate other file types, such as *.csv if you have a flat-file schema. Once you have successfully validated the document then save it to WebDAV.
For performance reasons Microsoft BizTalk Server 2000 does not read document definitions or maps from WebDAV at runtime. While this significantly increases performance, it also results in more work for the developer that can cause versioning issues. In particular BizTalk Messaging Services do not refresh contents of any files saved in WebDAV into the runtime engine. When you change a document specification and save it to WebDAV you must also open up the Messaging Manager, open up the appropriate document definition and then press the Apply button. This will cause the Messaging Manager to refresh its copy of the data from WebDAV. Similarly you need to refresh envelopes and channels manually when the document specification for envelopes, or the maps used in the channels are changed.
Other commonly observed issues include:
Symptom: A File is dropped in a directory but the Receive function associated with it does not pick it up.
Possible causes of this are:
- The File has a read-only attribute. In this case there will be an event in Event Log saying that the Receive Function could not pickup a file with a certain name because it was read-only.
- The File name does not match the mask specified in Receive function configuration, in this case fix the configuration.
- An Incorrect directory specified in Receive function configuration. If the directory exists, there will be no symptoms of something being wrong. If the directory does not exist, the Receive function will get disabled and an event will get logged. Fix the error and re-enable the receive function in the properties page.
- BizTalk server is stopped. Each receive function is configured to run on a certain BizTalk server, and this server must be running.
SQL server is stopped. If receive function cannot put the document on the Work queue it will not remove it from the directory
Symptom: A File is removed from the pickup directory but the subsequent processing does not happen.
- Document could not be parsed or the Messaging Manager has not been refreshed from WebDAV.
- No channel matched the set of Source Org, Destination Org and Doc Def that was specified in receive function properties. Verify these properties to ensure a channel matches.
Symptom: The same schedule appears to be started multiple times after a single document submission.
- A channel can connect to one or more messaging ports. In this case multiple channels connect to a messaging port that instantiates an orchestration instance.
In a similar manner that you break down the complexity of a programming problem in Visual Basic into smaller more manageable parts, when you use BizTalk Server you should isolate which part of the infrastructure contains the issue you wish to resolve. For example, if you are uncertain that the document being delivered from BizTalk Messaging Services to BizTalk Orchestration Services contains the correct instance data then change the messaging port to output to a file instead and examine the contents of the document.
To track execution of running schedules, the Performance Monitor can be used to display the effects of the implementation components of the schedule. These effects include, but aren't limited to:
- Monitoring messages in specific message queues (from the Microsoft Message Queuing Queue object).
- Microsoft Message Queuing incoming and outgoing messages (from the Microsoft Message Queuing Service object).
- BizTalk Messaging document and interchanges processed (from the BizTalk Server object).
- Items in the BizTalk Messaging Suspended queue (from the BizTalk Server object).
- Active, aborted, and committed transactions (from the Distributed Transaction Coordinator object).
As well as the Performance Monitor, the Component Services MMC (Microsoft Management Console) can be used to monitor the transactions initiated and committed or aborted (this can also be done using the SQL Profiler application). The Component Services application can also be used to monitor the instantiation of any of the COM+ components that are installed as COM+ applications.
This is a preliminary document and may be changed substantially prior to final commercial release. This document is provided for informational purposes only and Microsoft makes no warranties, either express or implied, in this document. Information in this document is subject to change without notice. The entire risk of the use or the results of the use of this document remains with the user. The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
Unpublished work. © 2001 Microsoft Corporation. All rights reserved.
Microsoft, BizTalk, Visual Basic, Visual C++, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners