Transaction State Resolution After a System Failure
Updated: April 11, 2008
Applies To: Windows Server 2008
The Microsoft Distributed Transaction Coordinator (MS DTC) eases failure recovery for distributed applications where failure can occur on the client, on the server, or in the network connection between them. An important part of transaction management is resolving situations in which a transaction is left in an unresolved state after a system failure.
The MS DTC automatically resolves most transactions during recovery after a system failure as explained in How the MS DTC Handles Computer Failures. Occasionally, however, some computers cannot reconnect during recovery. When computers in a transaction management system are not connected, unresolved transactions occur. In addition, faulty applications can cause transactions to become unresolved. The following transaction states indicate that a distributed transaction is unresolved: In Doubt, Cannot Notify Abort, and Cannot Notify Commit.
You can use the Component Services administrative tool to resolve In Doubt, Cannot Notify Abort, and Cannot Notify Commit states. To do this, in the Transaction List dialog box, right-click the affected transaction, point to Resolve, and then click one of the following commands:
Forces the transaction to commit
Forces the transaction to abort and roll back to its original state
Deletes a committed or aborted transaction from the MS DTC log file
You can also use Component Services to specify a time-out period within which transactions are automatically aborted if they do not complete. (The default transaction time-out period is 60 seconds.) If you specify a time-out period, you can prevent incorrectly written transaction applications from acquiring and holding transactional resources indefinitely. For example, the records that a transaction updates in a SQL Server database are locked for the duration of the transaction. If the transaction program loops or deadlocks, the transaction time-out prevents the database records from being held too long.
When systems that are involved in transactions are restarted and their connections are restored after a system or connection failure, the DTC automatically resolves the transactions. However, the DTC cannot resolve transactions if the systems are not running or connections are not reestablished. In this case, first try to troubleshoot and resolve any connectivity issues. If resolving connectivity issues is not possible, you can use Component Services to manually resolve transactions that are in the In Doubt, Cannot Notify Abort, or Cannot Notify Commit states.
In this example, a communication line fails between two computers on the network. When a transaction has been manually committed or aborted, it is often necessary to manually force a computer to forget the transaction, which deletes the transaction from the local MS DTC log file.
This example assumes the following conditions:
The DTC on computer A is the commit coordinator.
The lines of communication along which the two-phase, commit protocol is conducted proceed sequentially from computer A to computer D.
The first phase of the commit protocol has concluded, and the commit coordinator has committed the transaction and written a COMMITTED record to the MS DTC log file on computer A.
Communication has failed between computer B and computer C during the second phase of the commit protocol. Computer C and computer D remain in doubt regarding the outcome of the transaction.
It is difficult or impossible to restore communication between computer B and computer C.
The following illustration shows that the transaction is in an unresolved state.
Because the line of communication between computer A and computer B is intact, both systems have committed the transaction. Computer A has forgotten the transaction because it was permitted to forget the transaction after computer B acknowledged receipt of the commit notification. Computer B has committed the transaction but must remember the transaction outcome until computer C knows the outcome. Computer C is in doubt about the outcome of the transaction. Computer D is prepared.
To resolve the transaction (and release the database locks on computer C and computer D), the system administrator forces computer C to commit the transaction. Because the line of communication between computer C and computer D is intact, the forced commit operation on computer C enables the transaction to commit on computer C and computer D. Computer D can now release its database locks and forget the transaction. When computer D confirms to computer C that it has committed and forgotten the transaction, computer C can also forget the transaction.
The transaction is now committed on all computers. However, computer B does not know that computer C has committed the transaction because there is still no connection between these two computers. Computer B continues to remember the transaction. To complete the transaction, the system administrator forces computer B to forget the transaction. The following illustration shows that the commit protocol is manually concluded, and that the transaction is complete.
Because of the outgoing-incoming communication pattern of the commit protocol, we recommend that you manually resolve transactions on computers that are immediately adjacent to any break in communications. In the preceding example, the forced commit occurs on computer C (not computer D), and the forced forget occurs on computer B (not computer A). If you had forced computer B to forget and then communication was restored between computer B and computer C, the system might automatically abort the transaction on computer C before you can force it to commit on computer C. This results in an inconsistent transaction outcome, and it can result in resource manager inconsistency.