Export (0) Print
Expand All
Creating Audit Tables, Invoking COM Objects, and More
Exception-handling Techniques
Exploring SQL Server Triggers: Part 2
Deliver User-Friendly Reports from Your Application with SQL Server Reporting Services
SQL Server: Display Your Data Your Way with Custom Renderers for Reporting Services
Updating Data in Linked Servers, Information Schema Views, and More
XML Features in SQL Server 2000
Collapse the table of content
Expand the table of content
Expand Minimize

Optimizing SQLXML Performance


Carl Perry
Microsoft Corporation

April 2002

Applies to:
    Microsoft® SQL Server™ 2000

Summary: How to optimize SQLXML performance for databases, including SQL Server 2000. (12 printed pages)

Download Optimizingsqlxmlsampleapp.exe.


SQLXML Best Practices for Performance
    ISAPI Filter
    The Application
    Templates and Caching

SQLXML Best Practices for Performance

Much important data is stored in some type of RDMBS, whether it is Microsoft® SQL Server™ 2000, or some other third-party database. If the data is stored in disparate databases or if your application is on a different platform, it becomes difficult to interoperate between these environments. Until recently, this was one of the greatest impediments to creating applications that span platforms. Approximately five years ago, a new standard began to transform the development world: XML. Now, if your data is in an XML format, it can be read on any platform. With XML came other related technologies that expanded the developer's repertoire. However, most of your data was still stored in a relational database that all your current applications were using effectively. With the release of SQL Server 2000, exposing your data as XML becomes extremely easy. Now your current applications can continue to access the data with its data access API, like ADO, and you can expose your data as XML views for use on any platform that can access the Web and understand XML. Interoperability was one step closer. As time passed, SQLXML has continued to build on and expand its functionality by including XML Schema (XSD) support, updategrams, integrated Microsoft .NET support and much, much more.

When developers begin using SQLXML, or any technology, the biggest challenge is mastering the API and semantics. Once developers have mastered the technology, they tend to focus on how to achieve the best performance in a particular application. This information is often hard to come by. For obvious reasons, it's difficult to say what a best practice is when talking about performance. Unless there are specific performance requirements, user needs often drive the design of a particular application. Every implementation is different. That's why the documentation tends to focus on how to use the technology and what the specific classes and objects do. Developers often ask "What is the best way to do X?" or "What's the most efficient way to do Y?" This white paper discusses ways to use SQLXML and to ensure you're using the more efficient method to accomplish a given task while also meeting the design requirements.

A sample application is included with this white paper. We will use this application to illustrate and discuss best practices regarding performance and design. Keep in mind that this is simply a sample application. You can install this application on your machine, and, with a couple of changes, have it up and running. From there you can run your own performance numbers and see how it bears out. Note that this does not mean that your application will have the same performance benefits or increases that this application does. Almost everything is based on your application design. This white paper uses a sample application as a basis for discussing where to use what components and specific performance numbers. Also keep in mind that we do not address scalability in this application or white paper. This is a discussion about increasing performance in SQLXML.

This discussion focuses on using all the SQLXML components programmatically. However, using only Extensible Stylesheet Language Transformation (XSLT) documents and your XML data you can utilize the SQLXML ISAPI listener to create a simple Web front end. We'll discuss some issues related to using the ISAPI functionality directly and how to increase performance, but will focus mainly on using SQLXML programmatically.

ISAPI Filter

If your application is fairly simple and you don't require any business logic, using the ISAPI filter directly will give you the greatest throughput in your application. Though the application discussed in this paper was not ported to a pure ISAPI solution (reasons why are discussed later in this paper), you can expect that the application, pared down, generally be faster. This is a direct port of basic functionality. It would not require any changes to the XSLT and the queries.

You can increase the performance of your ISAPI application in many ways. Caching mechanisms, which can be implemented with both the ISAPI listener and ADO, provide some of the largest performance gains. Both implementations benefit from using these mechanisms, but there are some caveats that apply directly to ADO. We discuss caching later in this white paper.

As noted above, we chose to use ADO and ASP in our sample application instead of a pure ISAPI solution. Although ISAPI has been optimized for throughput, it lacks the ability to apply complex business logic. We can perform simply queries and transform the data for our presentation layer using XSLT Stylesheets. However, if we need to perform any business logic, consistency checking, or use other components, using ADO.NET and Active Server Pages (ASPX) (using Microsoft Visual Basic® .NET or Microsoft Visual C#® would also work) is a much more flexible and powerful approach. The following sections focus on the application.

The Application

Every company or group needs to manage appointments and meetings. There are some good programs available that do this for you. However, we thought it would be interesting to write an application that our group could use to manage our appointments and meetings. We'll briefly describe the different aspects of the application and what the user can accomplish in each section.

  • Check your current schedule

    This enables users to check their schedules for a specific day, week, or month. Users can review their schedule and see other people's schedules.

  • Create a new meeting request and add other users

    This enables users to add new meetings and to update meetings for others to attend.

  • Insert new users

    We have multiple ways to do this. We will discuss the performance implications of each option and reasons to use a particular option.


Caching is the easiest way to improve an applications performance. Mapping Schemas (XSD and/or XDR), XSLT Stylesheets, and Templates can be cached. By default, all these files are cached by SQLXML. Requirements for ensuring that caching is working are discussed below. The best way to improve the performance of SQLXML is through caching, and once you know how to achieve caching, it's very easy to implement.

First, let's define what caching means. When speaking of caching, we are discussing four actions: the loading of the files; the compiling of the files; the storage of the intermediate data structure; and the queries. All of this is design-time code that we bypass by implementing caching. All we've done, with caching enabled is execute these actions once, for each file; then, on subsequent executions, SQLXML bypasses this work. Now that we've defined caching, we will examine how to accomplish it.

By default the cache size is 31 files. To turn caching off, use the IIS Virtual Directory Manager for SQLXML 3. Caching of these files is disabled by selecting the appropriate boxes on the Advanced tab for the virtual directory your Web application is running. To decrease or increase the number of files cached, set these values in your registry. These registry settings are not created when you install SQLXML 3.0. You must create these keys yourself. These keys are specific to SQLXML 3.0. For SQLXML 2.0, the key is located in …\MSSQLServer\Client\SQLXML2. To set these values you must add the following registry keys.

  • Template Caching


  • XSLT Stylesheet Caching


  • Schema Caching


    Note   The greatest value that you can set in the Template and XSL registry setting is 128. SQLXML will not cache more files than this.

Always set the number of files to cache at a number higher than the number of actual files. The reason why requires understanding what is happening behind the scenes. Suppose the registry keys haven't been set for any of these caching mechanisms, and in your application SQLXML is used extensively. There are a number of pages in which users will run queries from disparate XML Views and then use XSLT Stylesheets to transform the view for the presentation layer. For each unique view there is an XSLT Stylesheet. If the user base is larger than 50, we can easily exceed the default value of 31 for caching these stylesheets and queries. What happens when we've filled the cache with thirty-one compiled stylesheets/queries and we try to cache the thirty-second one? We flush a percentage of the files from the cache and then insert the new stylesheet and queries, once they're compiled, into the cache. We now have a number of open positions in the cache. There is no way to control which files are flushed and when they are flushed. The files that are flushed could be the least or most used or a combination of these. Now, when we get back up to thirty-one cached files and reach the thirty-second file, we flush again. On a Web site that has a consistent user base or has peaks and valleys of use, the flushing of the cache can occur somewhat often or at the highest usage. Churning isn't the behavior we want to our users to experience. By increasing our cache size to greater than our file size, we bypass churning and achieve the most optimum performance. A good way to identify which queries you are performing often is to run SQL Server Profiler in a test environment. Have the end users test the application and perform the actions they'll be required to do in order to accomplish their job. SQL Server Profiler has a number of predefined templates you can use to capture data. You can identify which queries are used most often and then create templates for those particular queries.

Keep in mind that when you cache these files they are stored in memory. This means that you'll be consuming more memory as the number of files cached increases. However, the memory footprint will not be much larger than the file size itself. Obviously this memory usage is insignificant when compared to the amount of ram machines have today. As files become larger, the amount of memory consumption will increase. The memory usage is close to linear so there should be no surprises with the amount of memory usage.

Now that we've explained why caching is helpful, there are some things you must do to ensure that SQLXML caches the files. If there are queries that are used throughout the application, you should create templates for them. We'll go into depth a little later in this paper. For Mapping Schemas and XSLT Stylesheets there is a way in ADO (using extended properties of the command object) to specify the file location. As we discuss later, you gain all the caching functionality for XSLT stylesheets and Mapping Schemas without doing too much. The more difficult situation is using Templates. For a short description, let's just say that you should use an object, like the ADO Stream object, to load these files. You can then specify what the CommandStream is for your given query. See the following example.

Dim cmd as ADODB.Command
Dim con as ADODB.Connection
Dim instrm as ADODB.Stream
Dim outstrm as ADODB.Stream



Set con = new ADODB.Connection"provider=sqlxmloledb;data 

Set cmd = new ADODB.Command
cmd.ActiveConnection = con
cmd.CommandStream = instrm
cmd.Dialect = "{5d531cb2-e6ed-11d2-b252-00c04f681b71}"
cmd.Properties("Output Stream").Value = outstrm
cmd.Properties("Base Path").Value = "d:\Schemas\SQLXMLWR3\ 
cmd.Properties("Mapping Schema").Value = "mySchema.xml"
cmd.Properties("Output Encoding") = "utf-8"

cmd.Execute , , adExecuteStream

We'll go into more detail in the following sections as to why this needs to be done. Now all the files will be cached (Templates, Mapping Schemas, and XSLT Stylesheets).

There are values from the file that SQLXML uses to check if the file should be cached, and if it is cached whether we should flush the cache with a newer version: 1) the filename and 2) the last modified date. If you use an object to load your file that supports ISTREAM then this information will be in the stream. Since the ADO Stream object implements ISTREAM using this object is fine. If you create your own object you must inherit from ISTREAM so that these values can be retrieved by SQLXML.

There is a particular method that is in ISTREAM, which we use to check the last modified date for the files: Stat. This is the method that we call and the reason that you must implement ISTREAM. The last modified date, ascertained by using the Stat method, is checked against the date for our cached file. The filename is used as the key for each cached file. If one of these values cannot be ascertained, then the files are not cached and we load directly from the stream we are handed. That is why you need to implement ISTREAM in your object that you use to load your files. This is regardless of the setting that you've specified in the registry and/or the advanced tab. If you simply specify a file location or use an object that supports ISTREAM for your template, schema, or XSLT Stylesheet, you'll pull from the cache.

This alone should, understandably, encourage you to cache your files. However, there are other reasons that you should cache these files. If you use a template or a mapping schema the resulting query is translated into an explicit mode query. We won't go into detail about how to write explicit mode queries. Consider the following query:

SELECT EmployeeID, FirstName, LastName FROM Employees FOR XML AUTO

The following mapping schema is needed to create an XML view of this table and the three columns above:

<xsd:schema xmlns:xsd=''
 <xsd:element name= 'root' sql:is-constant='1'> 
         <xsd:element ref = 'Employee'/>
  <xsd:element name='Employee' sql:relation='Employees'> 
          <xsd:attribute name='EmployeeID' type='xsd:integer' />
          <xsd:attribute name='FirstName' type='xsd:string'/> 
          <xsd:attribute name='LastName' type='xsd:string' /> 

The resulting explicit mode query that SQLXML creates is below. It's a bit longer and much more complicated than the simple query. However, using the mapping schema allows you to shape the XML into almost any format that is required. It also allows you to be as specific or open with any queries.

select 1 as TAG,0 as parent,0 as [root!1!_g_0!hide],NULL as
[Employee!2!FirstName],NULL as [Employee!2!EmployeeID],NULL as
[Employee!2!LastName] union all select 2,1,0,_Q2.A0,_Q2.A1,_Q2.A2 from
(select _QB0.FirstName AS A0,_QB0.EmployeeID AS A1,_QB0.LastName AS
A2,_QB0.LastName AS _TBXF3I,_QB0.EmployeeID AS _TBWD6I,_QB0.FirstName AS
_TBKQ5I from Employees _QB0) _Q2 order by 3,2,1

Once the queries become more complex (like joining two or more tables and beyond) the explicit mode query becomes much longer and more time consuming to create. Writing these queries every time the XML needs to be shaped requires significant work that would be better spent designing the application or writing code.

With caching enabled for your mapping schemas and templates, these explicit mode queries are cached also. The SQLXML components don't have to walk the structure of the mapping schemas and build that explicit mode query again. This obviously improves the performance of your queries immensely.

Templates and Caching

As we noted earlier, your source for the command must be able to provide the filename and last modified date (as exposed by ISTREAM through the Stat method) in order to take advantage of any caching. For the input stream, it generally makes the most sense to simply use the ADO Stream object of some other object that implements one of these two interfaces. So in the sample code we use the CommandStream property on the Command object to assign our templates for use in ADO. As stated earlier, in order to have templates cached, SQLXML must be able to ascertain the filename and last modified date. If we are passing in our template using the CommandText property of the Command object then we are unable to get these values. Therefore, the templates are not cached. By loading these templates into a stream object we control when these values will be refreshed and SQLXML just queries the stream, in memory, to see if it has changed. True, we do have to cache the data twice (loading it each time we need to use a template), but the performance gain will be much more significant than if we chose to use build dynamic templates. For Mapping Schemas and XSLT Stylesheets there are SQLXML OLE DB Provider-specific properties for which we can point to files. SQLXML can then query the files for the properties it needs and then cache the data and queries as appropriate.

One last note: we've mentioned the ISTREAM interface. You may or may not know what these interfaces are. If you are using the ADO Stream object to store your files or the results of your query, it does implement ISTREAM. The IIS Response object and MSXML also both implement ISTREAM. If you implement these interfaces, you're safe. You could write your own implementation of a stream in C# and then use it as your output stream. The benefit of implementing your own stream is that you can do work while SQLXML is chunking the response to you. With the ADO Stream object you must wait until SQLXML has written the entire response out to this object. If you implement your own stream you could read the chunks that SQLXML sends as its response and do work after each chunk is passed to you. If you don't have a business need, use one of the objects mentioned above. You'll see great performance using the other objects but it will block until it's completed writing the entire response.

Inserting/updating/deleting data

There are a number of ways to use Data Manipulation Language (DML) with SQLXML. For the most part the needs you have for updating will generally determine the objects you use for this task. There are three different ways that you can accomplish the following tasks using SQLXML: 1) Microsoft SQL Server XML Bulk Load (XML Bulk Load), 2) Updategrams, and 3) OpenXML. Some of these methods will have more flexibility than others. Other methods will perform faster under different conditions. Let's look at each item in turn and discuss the pros and cons of each one.

XML Bulk Load

XML Bulk Load is a COM object that allows the user to load semistructured XML data into SQL Server. When you need to insert large amounts of data at once, XML Bulk Load is the most performant mechanism to accomplish this task. However, if you are inserting smaller amounts of data, the options discussed below will be much faster. There are no hard and fast rules for size but if your inserts begin to exceed 1 MB in size use the XML Bulk Load. However, use this for simpler inserts. When using XML Bulk Load you do not have the opportunity to do any sort of business logic. The data simply gets inserted. You have no way to do any sort of check to make sure if X is here and Y is there then insert Z. Also, you cannot update existing data using the XML Bulk Load object.

When using XML Bulk Load you can choose to have or not have the entire operation transacted. For obvious reasons choosing the non-transacted approach is much faster. The only problem is that if a failure occurs then you cannot be guaranteed a rollback will occur (although partial rollbacks can happen). When you need to ensure all or none of your data is inserted you should use a transacted approach.


An updategram is an XML document that uses a mapping schema to perform updates, inserts, and deletions to a database. The format of the updategram and how to use it can be found in the SQLXML documentation. Use the following example for updating data:

<ROOT xmlns:updg="urn:schemas-microsoft-com:xml-updategram">
<updg:sync >
   <EmployeeID="1" />
   <LastName="Fuller" />

Discussing how to use updategrams is outside the domain of this paper. The previous sample shows a simple update. Deleting and inserting data into your SQL Server is also simple and readable. Updategrams perform simple functions fairly quickly. For more complex scenarios (multiple joins and if your XML data does not directly map to your database structure), create mapping schemas to insert, delete, and/or update your data. If you need to perform some business logic, like confirming that Mike and Jane are available during a specific time before updating their schedules, using updategrams is not the best approach to take.

For performance purposes, updategrams are very fast for updating small amounts of data either all at once or in short bursts. If you were inserting large amounts of data all at once, use the XML Bulk Load object. These updategrams are translated into standard SQL DML statements to accomplish this functionality. The only shortcoming with updategrams is that if you need to do some business logic while you perform these SQL DML operations, you'll be unable to utilize updategrams. That's where OpenXML comes in.


OpenXML achieves everything that updategrams can do and more. OpenXML is an extension to the Transact-SQL language that was introduced in SQL Server 2000. Performance differences are negligible in most cases and, as previously noted, design will drive your needs in this arena, not performance. OpenXML can be a bit more complicated to write than an updategram. The OpenXML syntax may appear a bit strange if you've never seen it. SQL Server Books Online discusses the syntax of the OpenXML command. Once you've overcome the initial problems with understanding the OpenXML syntax, you can do some very powerful things. As noted in the "Updategrams" section of this paper, you are unable to do any sort of consistency checking and/or business logic as you delete/insert/update your data. So, for example, if you want to check that Jane and John aren't busy before you insert a meeting request, use OpenXML. You can perform queries in the OpenXML statement to check if Jane and John have a meeting already scheduled. If they do have meetings on their schedule, you can roll back the insert, set up a tentative request, and so on. Whatever you want, as a result of the query, you can do in the OpenXML command. This can be a very powerful.

As we noted earlier, there are some general performance issues when discussing DML operations with SQLXML. However, in this case, design needs will outweigh performance requirements. Knowing the shortcomings of each method is very important as you choose the correct method to accomplish your task.

Retrieving BLOB values

Retrieving BLOB values from SQL Server, using SQLXML, is driven more by needs than by actual performance. There are some minor performance implications, which are discussed later in this paper, but depending on your architecture, you'll use one method over another. There are two ways that you can retrieve BLOB values from SQL Server, using SQLXML:  1) Using DBObject, and 2) to Base64 encode your data. The determining factor is whether you have access to the pointer that the DBObject returns to you. If you do, using DBobject is the method you should use to retrieve the raw binary data. The DBObject will be faster than return a Base64 encoded value for your binary data. However, if you do not have access to location that the DBObject returns to you or you have some business need to return Base64 encoded data, you'll need to use the Base64 option.

Client-side compared with server-side

There are full-length books comparing client-side and server-side programming and the benefits of each method. This paper does not go into detail regarding the benefits of one programming model of the other. In SQLXML you can specify whether you want the shaping of your XML to be done on the server side or the client side. If your server is experiencing peaks of usage and you'd like to off load some of the processing you can use the client side functionality in SQLXML. The code to achieve the client side processing is very simple. You need only add one more line of code to your application:

cmd.Properties("ClientSideXML") = "True"

All the XML Shaping will be done on the client side. Instead of the server streaming XML back to the client, a rowset is returned, and then SQLXML shapes the result into an XML Stream. This can, at times, improve the responsiveness of your server, especially if SQLXML is used extensively on your server and in your application. The SQLXML 3.0 documentation has a very good discussion comparing client side and server side. Most of the throughput gains you'll see will be reflected on the server.

SQL OLE DB compared with SQLXML OLE DB

If you're using ADO and SQLXML you should always use the SQLXML OLE DB Provider to query your database for the data. The SQLXML OLE DB Provider is faster than simply using the SQL OLE DB Provider for shaping XML. This provider has been optimized to stream XML Data back from SQL Server 2000. When using ADO and SQLXML you should always use this provider. With every query you execute you'll see a performance gain over using just the SQL OLE DB Provider. If you've used the Shape provider, using the SQLXML OLE DB Provider will, in some respects, appear familiar. The only thing you need to do to use the SQLXML OLE DB Provider is to change the connection string. It would look like this: "Provider=SQLXMLOLEDB;data provider=sqloledb;
integrated security=true;server=(local);database=northwind"

Currently the SQLXML OLE DB Provider can be used only with ADO. You'll be unable to use the provider with something like Data Transformation Services (DTS).

Finally, there are a number of miscellaneous points to consider as you work with SQLXML. If you are creating XML Views of your data and you have key fields in your mapping schema, you'll want to index the underlying data store. When we have a key field defined in a mapping schema, we automatically do a join and a sort on the data store. This will speed up the underlying SQL Query significantly.

Our findings

We've discussed best practices and why you need to do X or Y, but the most interesting information is the findings that resulted from testing the sample application included with this white paper. First, let's discuss the testing methodology. We used the Microsoft Web Application Stress Tool for testing our application. It's a good tool to use for stress testing your Web application. We looked at our application and our customer base and decided that most of the actions our users would perform are simply viewing schedules and viewing contact information. In some situations individuals would be doing inserts or updates or deletes on the data, but this would be a much smaller percentage of the processes that customers would perform on a daily basis. With this in mind, we broke down the testing scripts into two distinct areas:  1) Read-only access, and 2) Read and write access with the distribution spread evenly. This allows us to measure the responsiveness of our application in the mainline scenario (reads) and also a mixed environment. We ran each test three times, with 64 threads and the work spread out across three clients. There were four different scenarios that we wanted to analyze with the same test script running against each one:  1) Caching disabled and using SQLOLEDB, 2) Caching disabled and using SQLXMLOLEDB, 3) Caching and using SQLOLEDB, and finally 4) With all the performance enhancements turned on (SQLXMLOLEDB and all caching turned on). We ran the tests for five minutes each. We wanted to stress the application to a level that would never be reached to find out how well SQLXML would respond. Most of the time our stress will be fairly light, but every morning and night the stress will increase and users check their schedules for today and tomorrow. The key measurement we looked at was Requests per second, but we took into account, among other numbers, ASP Requests Queued and % Processor Time. All of these can be added using the Performance Counters option in the Microsoft Web Application Stress Tool.

As you'll see below, our performance gains, in terms of percentages, are very similar, but the actual number of requests we are processing is very different. In some scenarios, we saw that read-only achieved 10-15% greater throughput on up to 100% greater throughput in larger updating scenarios.

Because our application is expected to handle mainly read-only operations (checking a schedule), we examined our schedule for a number of days and then looked at employee information. Let's consider the test in which we had caching disabled and used SQLOLEDB as our baseline. This scenario has the lowest measurement in the test scenarios and is probably a fairly common scenario for most ADO developers. The performance numbers discussed later use this test as a baseline for comparison.

Test ScenarioPercentage Improvement (over baseline)
Caching disabled and using SQLXMLOLEDB13%
Caching enabled and using SQLOLEDB39%
Caching enabled and using SQLXMLOLEDB110%

In our read scenarios, using all the performance suggestions discussed above, we'll increase the overall performance of our application by 110%. That's not just one page; that's looking at our schedule and examining employee data.

Few applications are useful if the data is read-only. At some point the data will need to be altered. The customers who use this application will need to update their schedule and to manage the users of the application. Adding, deleting, and updating the contacts is a core part of the design. However, as noted above, we've found that updating is a much smaller percentage of the activity when compared with the read-only scenario discussed above. Again, the test baseline is using SQLOLEDB with caching disabled. The performance numbers discussed in the following table use this test as a baseline for comparison.

Test ScenarioPercentage Improvement (over baseline)
Caching disabled and using SQLXMLOLEDB25%
Caching enabled and using SQLOLEDB38%
Caching enabled and using SQLXMLOLEDB127%

In this test, our performance was generally better for the entire functionality than just our read-only test. With all the performance mechanisms in place, we saw a 127% performance gain in the total application.

In either scenario the numbers bear out that using SQLXMLOLEDB and enabling all caching is key to increasing the performance of your SQLXML application. If performance is a requirement in your SQLXML application, make sure to review your application and implement these suggestions wherever possible. And remember as you test your application: tweak the implementation to find the best possible performance.


Throughout this paper, we've discussed different ways to ensure your SQLXML application is optimized for the greatest performance possible. Hopefully, this information can help you better design your applications for performance and usability. As you look through your current applications using SQLXML and begin to architect new ones, make sure to keep these tips in mind. By doing so, you'll not only choose the best method for your application, but you'll also be able to squeeze the best performance out of SQLXML.

© 2016 Microsoft