Best Practices for the SQL Adapter
Authors: Dwaine Gilmer, John Taylor, Matthew Yip
Published: May 2008
The SQL adapter, which is integrated with Microsoft® BizTalk® Server, is designed for sharing data between BizTalk Server and Microsoft SQL Server™ databases. Although the SQL adapter is ideal for accessing SQL Server databases, it is important to understand a few of its design limitations that are related to performance and usage of resources.
This white paper addresses problem areas related to configuration and design, and other known issues involving the use of the SQL adapter. It provides best practices and workarounds for the IT Professional who deploys the SQL adapter in a production environment.
Relational databases involve defined data sets that are stored in a database. BizTalk Server, however, involves messages, and for BizTalk Server a “message” is defined and structured data. The adapter is the bridge between incoming and outgoing messages and the BizTalk Server MessageBox database. The SQL adapter eliminates the need for writing any C# or Microsoft Visual Basic® code for retrieving, formatting, and submitting stored SQL data to BizTalk Server.
As will be discussed, to deliver the best performance, you must:
- Design the mapping of data in the tables of the database to messages in BizTalk Server.
- Define how the mapping of the data to a BizTalk message takes place.
- Determine the message size and the level of concurrency when accessing the stored data.
- Carefully design, craft, and implement the SQL statements that provide and consume the data.
The remainder of this white paper will cover best practices associated with the following areas:
- Creating the receive side
- Issues with data types
- Microsoft Distributed Transaction Coordinator (MS DTC)-related failures
- Tool/schema wizard problems
|Except where noted, all information in this white paper applies to BizTalk Server 2006 and BizTalk Server 2004.|
This section offers recommendations, followed by conceptual information, about creating and configuring the receive location.
Recommendation 1: Always write the receive location query by using a fixed maximum message size. In SQL syntax, this can be done easily by using a “top n” statement after the “select” keyword (where “n” is the maximum number of rows to return in any given execution of that query)
The best design model for using the SQL adapter is that the developer writes a stored procedure, it runs, and the result becomes a single BizTalk message. This limits the application logic to the SQL Server and limits the data result set that is returned. Stored-procedure design is an intrinsic component in an overall BizTalk solution design. To improve performance, modularity, encapsulation, and security, it is strongly advised to encapsulate commands to the SQL Server in the form of stored procedures, as opposed to dynamic SQL statements. Although it is natural to start working in the Orchestration Designer in BizTalk Server, it is important to invest in writing optimized SQL code that the SQL adapter runs.
Here is the reason. It is easy to write a SQL statement in a test environment to return 100 data rows as a single 100-kilobyte (KB) message. However, on a production system, the same SQL statement might return 100 times that many rows resulting in 100 x 100 KB or 10 megabytes (MB) of messages. This can result in an OUT OF MEMORY error and execution failure. When writing a stored procedure to submit data to a BizTalk message by using the SQL adapter via a receive port, it is critical to explicitly control the maximum size of the returned result set.
As an example, here is a SQL query against a small sample database:
select * from Orders as o inner join [Order Details] as od on o.OrderId = od.OrderId
This example happens to return 2155 rows. If the additional option of “for xml auto” is added to the end of the SQL statement in this example, this results in 475,116 bytes. If the application code explicitly loads the result set into an XML document, the .NET garbage collector reports that an extra 2,009,564 bytes were allocated as the result of loading the results into an XML document.
In a production environment, the size of the Orders table might be significantly larger than in the example. For instance, an unrelated system outage could cause a significant accumulation of unprocessed orders because it is likely that the process that is adding records is still active.
Taking the preceding SQL query and adding a “top 100” immediately after the “select” keyword happens to result in 21,883 bytes. The SQL syntax “top 100” limits the result that is returned to a maximum of 100 rows, no matter how large the result set is. Therefore, in this example, this is the maximum size; it does not matter how many orders are backed up. You should always use the “top” keyword in a query, since it limits the potential message size.
Recommendation 2: When limiting the receive query size to a fixed maximum, consider enabling the Poll While Data Found option
Putting a fixed limit on the size of the result set might cause some data to be left out. This should not be a concern, however, because the adapter (while it is enabled) polls and executes the query regularly. In the SQL Adapter Receive Location property page, there is a Poll While Data Found option. When this is enabled, the adapter immediately returns to the SQL Server to acquire the additional data.
The SQL adapter does not actually wait for the next polling interval to read the data source of all the data from the result set. The default state is set to Disabled. However, the only reason why you might want to keep this as Disabled is to throttle or limit the data being fed into BizTalk Server. Throttling the adapter in this manner is not a particularly effective approach to throttling because there is no feedback loop.
Unfortunately, limiting the size of the result set and leaving the default configuration on the Poll While Data Found option as Disabled can lead to an inordinate amount of time consumed by reading the data. This default configuration ensures that all of the data is read into BizTalk Server, but this can lead to a large data result set. For example, suppose that, in the previous example of the Orders table, the row count is limited in the result set to100 rows. If BizTalk Server produced messages of approximately 20 KB, the time to read the data would be 2155 (total number of rows to process) / 100 (limited result set) * 30 (seconds), or over 10 minutes to read the table! Therefore, when limiting the receive query size to a fixed maximum, consider enabling the Poll While Data Found option.
Recommendation 3: When configuring the SQL adapter’s receive side, use the number of rows in the result set to control the size of the BizTalk message that is created. Use the XML pipeline to disassemble the multirow BizTalk message into a single-row BizTalk message
Up to this point in the white paper, the mapping of the database result set to the BizTalk message has been straight forward; in reality, however, this is unlikely to be the case. The application logic that is in the orchestration or send port is sensitive to whatever constitutes the message. The message that the SQL adapter creates might not be the final message that is inserted into the BizTalk MessageBox. Additional formatting of the message can be applied by using things such as the XML pipeline and mapping.
For more information about the BizTalk Server pipeline, see http://go.microsoft.com/fwlink/?LinkId=89780.
The most typical logical mapping of SQL data to a BizTalk message is to take a single row of the result set and turn it into a message in BizTalk Server. Typically, the SQL adapter returns multiple result rows and will make one composite BizTalk Server message out of the results. All the XML pipeline has to do is to take every first child of the XML document element in the message that comes from the adapter and turn it into its own message: in other words, this basically constitutes standard BizTalk message disassembly.
Because the executing SQL statement controls what data goes into a row or a result set, using the options discussed thus far is a simple and reasonable solution. It provides freedom to experiment with the “top n” rows in the SELECT statement and to tune the system appropriately.
Recommendation 4: Provide a SQL stored procedure for the SQL adapter to receive data into BizTalk Server
Thus far, this white paper has focused on the SELECT statement and on carefully crafting the SQL statements, such that data is received in a fast, efficient, and stable manner. However, the SQL code is also used in order to update or delete rows once BizTalk Server has successfully received them. This process of updating the database can be done along with the SELECT statement. This is because BizTalk Server executes the SQL code in the same distributed transaction that it uses to update its own internal database.
Before discussing the best way to perform this update, note that there is more than one SQL statement to execute and that these statements are tightly coupled. Therefore, it is natural to put them in a stored procedure.
This is probably the most important of all the recommendations. Using a stored procedure eliminates putting numerous dynamic SQL statements into the SQL adapter’s configuration property sheet. Furthermore, it leads to a more easily maintainable and long-term solution, which is easier to work with because the solution is tuned for performance.
Using a stored procedure introduces a level of “programming by contract.” This means that the implementation details of the database are abstracted from the SQL adapter configuration. Because stored procedures are precompiled and stored in the database, this provides better performance than a dynamic SQL statement, which is parsed and compiled every time it is executed.
A dynamic SQL statement is also more prone to programming errors each time that it is used in the client’s code. This is because the stored procedure has one central copy living on the SQL Server.
The general pattern of execution for the SQL adapter is a poll-based mechanism. As each polling interval occurs, the adapter executes the receive-side SQL statements to retrieve data. Once the adapter gets the data, it submits it as a message to BizTalk Server. The retrieve and submit actions are all done within the same distributed transaction. The retrieve step of this processing logic should also update the database, so that the same data is not fetched over and over. Commonly, the adapter moves forward through the data in the database to update or delete the rows that are being returned each time.
The following is an example of a stored procedure that does this:Sample Code
Create the table and an appropriate index:
CREATE TABLE [dbo].[tblOrders] ( [OrderID] [int] IDENTITY NOT NULL , [TextData] [varchar] (2000) , [HasBeenRead] [uniqueidentifier] NULL ) ON [PRIMARY] GO CREATE UNIQUE CLUSTERED INDEX [IX_Order_0] ON [dbo].[tblOrders]([OrderID]) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_Order_1] ON [dbo].[tblOrders]([OrderID],[HasBeenRead]) ON [PRIMARY] GO CREATE INDEX [PK_Guid] ON [dbo].[tblOrders]([HasBeenRead]) ON [PRIMARY] GO
Create the stored procedure to extract data from this table:
CREATE PROCEDURE [dbo].[sp_GetOrder_2K] AS SET nocount on set xact_abort on SET TRANSACTION ISOLATION LEVEL READ COMMITTED declare @ReadID uniqueidentifier set @ReadID = NEWID(); UPDATE tblOrders WITH (ROWLOCK) SET HasBeenRead = @ReadID FROM ( SELECT TOP 1 OrderId FROM tblOrders WITH(UPDLOCK ROWLOCK READPAST ) WHERE HasBeenRead IS null ORDER BY [OrderID]) AS t1 WHERE ( tblOrders.OrderId = t1.OrderId) SELECT TOP 20 OrderID, TextData FROM tblOrders WITH(UPDLOCK ROWLOCK READPAST ) WHERE HasBeenRead = @ReadID FOR XML AUTO GO
The SQL adapter makes asynchronous calls to the stored procedure. If Poll While Data Found is set to Enabled, it is possible to have the configured stored procedure execute many times in a concurrent fashion. There can be multiple, concurrently executing, distributed transactions between the SQL database and the BizTalk internal MessageBox database.
In a concurrent scenario, BizTalk Server could be inserting the new message into its MessageBox database. The locks on the SQL database are held until the transaction is committed. This high level of concurrency, combined with the fact that the dequeuing style (removing data from a database) of stored procedures is awkward with the locks that they consume, can make this paradigm particularly prone to deadlocks.
Developing stored procedures with this concurrency in mind is important to minimizing deadlocks. The previous sample stored procedure avoids deadlocks by using SQL “hints.” (See the WITH statement in the previous code.)
For more information about hints, see http://go.microsoft.com/fwlink/?LinkId=88860.
For more information about WITH statements, see http://go.microsoft.com/fwlink/?LinkId=88861.
An alternative concurrency consideration is to use Application Lock in the stored procedure itself. As the name implies, a SQL Application Lock places a lock on an application resource. Using an Application Lock at the top of the stored procedure means that there is only one stored procedure that is executing at a time.
For more information about SQL Application Locks, see http://go.microsoft.com/fwlink/?LinkId=88862.
In a stored procedure, be careful to not have too many rows marked as read. This results in too many rows being unnecessarily loaded into memory, but never accessed.
Because there is no callback mechanism to do the actual deletion from within the SQL adapter, it is necessary to write a SQL agent job. This agent job deletes any row that it can lock and that has already been read.
Recommendation 5: Do not use Transact-SQL transactions in your stored procedure code
When using stored procedures with the SQL adapter, SQL Server deadlocks can occur under the following conditions:
- The SQL adapter is configured with multiple receive or multiple send host instances.
- The SQL adapter is configured to execute a stored procedure.
- Either the stored procedure uses Transact-SQL transactions or executes under the Serializable isolation level.
SQL stored procedure code that the SQL adapter executes is performed as a distributed transaction started by the BizTalk messaging engine. If you want to implement error-handling logic to stop the calling transaction in case of a fatal error, use the Transact-SQL RAISEERROR command in the stored procedure code. The BizTalk messaging engine will then stop the hosting transaction.
If the stored procedure code is running with a higher isolation level than the default isolation level of Read Committed, consider using a lower isolation level. You can change the isolation level with the SET TRANSACTION ISOLATION LEVEL command. To determine the transaction isolation level that is currently set, use the DBCC USEROPTIONS statement, as shown in the following example:
USE [database_name]; GO DBCC USEROPTIONS; GO
For more information about changing the transaction isolation level, see “SET TRANSACTION ISOLATION LEVEL (Transact-SQL)” available in SQL Server 2005 Books Online at http://go.microsoft.com/fwlink/?LinkId=88858.
For more in-depth troubleshooting of SQL Server deadlocks, follow the steps defined in Microsoft Knowledge Base article 832524 at http://support.microsoft.com/kb/832524.
Recommendation 1: For unsupported data types, pass the data to stored procedures as a string data type
Issues with unsupported data types could be a problem for data that needs conversion, especially when working with the money data type. Due to limitations in SQLXML 3.0 Service Pack 3 (SP3) (which enables XML support for SQL Server), updategrams do not support certain currency types, such as Euros. In this case, it is better to pass the data to stored procedures as a string data type because a string is a common format.
SQLXML exposes a fully XML-based data access application programming interface (API) to insert, update, and delete data from the database. When used in conjunction with the SQL adapter, it provides the ability to automatically generate an updategram schema based on the table that you want to manipulate. Because the SQL adapter relies on updategrams for Insert, Update, or Delete operations, it is subject to SQLXML’s limitations.
In the following updategram, the element “Info” is defined as a string data type and contains a dollar sign ($) as the first character. SQLXML will treat the string “Info” as a SQLXML parameter and not as part of the data, so the SQL adapter will not commit the updategram.
<?xml version="1.0"?> <ns0:TheRequest xmlns:ns0="<http://Txn.Transactions>"> <ns0:sync> <ns0:after> <ns0:TxnData TxnID="12345" Info="$ABCD"> </ns0:TxnData> </ns0:after> </ns0:sync> </ns0:TheRequest>
The failure that the SQL adapter returns will be “Invalid XML elements found inside sync block." This is a case where you should consider the data type that is used when designing the solution. In this case, the proper solution is to use a stored procedure to insert the data.
With SQL Server 2005, there is a new data type called xml. Unfortunately, because of its implementation in SQLXML, the adapter does not support this data type. The solution is to convert the data to a string data type in the code of the stored procedure.
Recommendation 1: Install the latest service packs and service rollups
Problems with a SQL adapter solution might originate in a supporting component of the SQL adapter. You can resolve certain known issues (such as security issues) by applying updates and service packs for the underlying technology on the computer that is running the SQL adapter.
BizTalk Server and its supporting components should be configured with the latest updates and security fixes. This applies to COM+ Rollups, SQLXML, and the .NET 2.0 runtime. These all affect the performance and reliability of the SQL adapter solution.
The following are known issues, along with their resolution:
- The SqlXmlCommand object in the Microsoft.Data.SQLXML namespace: "Out of memory." For more information and for a resolution, see Microsoft Knowledge Base article 897700 at http://support.microsoft.com/?id=897700.
- Incorrect native SQL error information when you use SQLXML to retrieve data. For more information and for a resolution, see Microsoft Knowledge Base article 826770 at http://support.microsoft.com/kb/826770/en-us.
The following is a known issue, along with its resolution:
The System.Threading.Timer class might not be signaled in the Microsoft .NET Framework 1.1 SP1. For more information and for a resolution, see Microsoft Knowledge Base article 900822 at http://support.microsoft.com/kb/900822/en-us.
BizTalk Server 2006 is built using the .NET Framework. A frequent problem with the SQL adapter relates to a fix for the .NET System.Threading.Timer objects. BizTalk Server relies on the System.Threading.Timer object to function correctly. When the object is not signaled properly, a SQL receive location might fail to process messages properly.
Other Related Updates
The following are known issues, along with their resolution:
- The SQL adapter generates a System.OutOfMemory exception when it polls a large-sized message (about 205 MB) in BizTalk Server 2004. For more information and for a resolution, see Microsoft Knowledge Base article 841612 at http://support.microsoft.com/?id=841612.
It is possible that when using a SQL request-response send port that BizTalk Server might throw an exception of type Microsoft.XLANGs.Core.WrongBodyPartException. In BizTalk Server 2004, the response XLANG message only has one part regardless of how many records are inserted. In BizTalk Server 2006,the response XLANG message has multiple parts when multiple records are inserted. However, the containing object is not changed, which causes the wrong Body part to be used in the first index.
- There is an error when converting data type nvarchar to decimal. For more information and for a resolution, see Microsoft Knowledge Base article 918316 at http://support.microsoft.com/kb/918316.
MS DTC Issues
Sometimes there are problems with Microsoft Distributed Transaction Coordinator (MS DTC) configurations. The following are common errors that are encountered on a system that is not configured correctly:
- Error: Definitely DTC-cannot enlist in a distributed transaction
- Error: [0x8004d00a] Unable to enlist in the transaction
Considerations for Using the Distributed Transaction Coordinator
There are a few important considerations when using the SQL adapter across multiple computers.
By default, network DTC access and network COM+ access are disabled on Windows Server® 2003 with Service Pack 1 (SP1). Before installing and configuring BizTalk Server 2006 for use across multiple computers, you must enable network DTC and COM+ access on all computers that are associated with the SQL adapter solution.To enable network DTC and COM+ access on Windows Server 2003
Click Start, point to Control Panel, and then click Add or Remove Programs.
Click Add/Remove Windows Components.
Select Application Server, and then click Details.
Select Enable network DTC access and Enable network COM+ access, and then click OK.
Stop and then restart the Distributed Transaction Coordinator service.
Stop and then restart Microsoft SQL Server and the other resource manager services that participate in the distributed transaction, such as Microsoft Message Queuing.
All BizTalk servers and SQL servers in a group must have the same remote procedure call (RPC) authentication level applied. The DTC proxy might not correctly authenticate DTC when the computers are running on different operating systems, are joined to workgroups, or are in different domains that do not trust each other.
For more information, see http://go.microsoft.com/fwlink/?LinkId=54805.
DTC uses RPC dynamic port allocation which, by default, randomly selects port numbers above 1024. It is important that the system that is running the SQL adapter be properly configured. By modifying the registry, you can control which ports RPC dynamically allocates for incoming communication. You can configure the firewall to confine incoming external communication to only those ports and to port 135 (the RPC Endpoint Mapper port).
For more information, see http://go.microsoft.com/fwlink/?LinkId=61914.
You can test for proper configuration of the DTC setting by using the following MSDTC support tools:
The SQL Adapter Schema Wizard is a tool for importing table metadata from SQL Server and creating a resultant BizTalk XSD schema. As a part of the process of creating a schema, the SQL Adapter Schema Wizard executes a SQL statement or stored procedure to generate the desired XSD schema. During the execution phase of the wizard, the SQL statement can take too long to return and generate a "Failed to execute SQL Statement" error.
For the wizard, all that is relevant are the names and data types of the data. The actual data in the result set is unimportant. Consequently, there is a simple way to work around this problem. When using the SQL Adapter Schema Wizard, change the dynamic SQL statement to return a smaller data set. If the schema is the result of a stored procedure call, the SQL code is fixed and cannot by changed dynamically. In this case, a possible solution is to create an empty stored procedure that does not contain any SQL code, then call it using the wizard to produce the schema. Once the wizard successfully creates the schema, the logic is placed back into the stored procedure.
Although this workaround allows for schema creation, a stored procedure that fails to execute or that times out indicates the actual performance of the SQL statement. Changing the SQL statement to improve the performance of the statement might be a preferable alternative to working around this issue.
This white paper has presented best practices, known issues, and information about resolutions that are related to using the SQL adapter. For more information about other known issues with the SQL adapter, see http://go.microsoft.com/fwlink/?LinkId=88859. For additional best practices when using the SQL Adapter, see http://go.microsoft.com/fwlink/?LinkId=89781.