XML Data (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

If your data is highly structured, the relational model is likely to work best for data storage. However if your data is semi-structured, unstructured, or of an unknown structure, XML is a likely a good choice if you want a platform-independent model to ensure the portability of the data by using structural and semantic markup. XML is also an appropriate option if some of the following conditions are satisfied:

  • Your data is sparse, you do not know the structure of the data, or the structure of your data might change significantly in the future.

  • Your data represents a containment hierarchy instead of references among entities, and might be recursive.

  • Order is inherent in your data.

  • You want to query the data or update parts of it, based on its structure.

If none of these conditions are met, you should use the relational data model. For example, if your data is in XML format but your application only uses the database to store and retrieve the data, an nvarchar column is sufficient. However, storing the data in an XML column can have additional benefits, including having the engine determine that the data is well formed or valid and supporting fine-grained queries and updates into the XML data.

Best Practices

The following resources provide additional information. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)

  • See the white paper What's New for XML in SQL Server 2008?1 for a good description of the evolution of Microsoft SQL Server XML capabilities, from the introduction of XML-related capabilities in SQL Server 2000 to the enhancements included in SQL Server 2008.

  • For general guidance and an overview of the XML capabilities in SQL Server, you can refer to the white papers XML Support in Microsoft SQL Server 20052 and XML Options in Microsoft SQL Server 2005.3 Note that these white papers also apply to SQL Server 2008.

  • Sometimes it can be useful to store structured data in an XML data type column, such as when the data has a flexible structure or the structure is not initially known. In property management, for example, object metadata is modeled as XML and stored in an XML data type column. Properties of different types of objects, even properties with different structures and content models, can be stored in the same XML column and queried across. Properties queried most often are promoted into columns of the same table or a different table. The promoted properties can be indexed and queried, and query plans are simpler than querying the XML column.

    Alternatively, the incoming XML data can be decomposed into tables and queried with SQL. If XML generation is a significant part of the query workload, you might want to store a redundant copy of the XML data in an XML data type column. The redundant copy avoids the run-time cost of XML generation. However, be aware that the XML indexes can be quite large compared to the base data, and testing is often needed to determine if the benefits can offset the additional storage costs.

  • The white paper Performance Optimizations for the XML Data Type in SQL Server 20054 has useful best practices and tuning information, including the following:

    • Element and attribute values in untyped XML (XML data not described by XML schemas) are stored internally as Unicode strings. Operations on these values require data conversion to the appropriate type. A large number of such comparisons can become costly. Element and attribute values in typed XML are stored as typed values within XML instances. This allows XML binary large objects (BLOBs) to be parsed more efficiently than untyped XML. However, typed XML requires validation during XML data insertion and modification. The cost of validation might be non-trivial, and depends on factors such as the complexity of the schema definitions and the number of tags occurring in the XML data.

    • To optimize query performance on frequently accessed column values, "promote" these columns into relational columns and create an index. This optimization technique can reduce the complexity of XML query execution and improve performance compared to using XQuery on the XML column, in the same way that pre-computed values speed up query performance.

    • Several query and data modification best practices are available to reduce the costs of XQuery executions, such as choosing appropriate methods in the SELECT list or in a WHERE clause to perform specific operations and obtain faster results. You can find a list of these optimization possibilities in the "Query and Data Modification" section of the white paper.

    • While using generic element names and distinguishing between different types of elements by using additional attributes can be convenient at times, note that doing so can negatively impact query performance because it does not allow efficient XML index lookups. This is explained in the "Indexing XML Data" section of the white paper.

  • The article Data Types (SQL Azure Database)5 explains how Microsoft SQL Azure supports the XML data type. (At this time, we are not aware of any published case studies.)

Case Studies and References

Examples of successful architectures are described in the following case studies and white papers:

Questions and Considerations

This section provides questions and issues to consider when working with your customers.

  • You should have a very good understanding of the usage scenarios and the types of data before you decide if XML is beneficial or necessary. There are benefits and costs associated with XML, and these trade-offs require discussion with the customer and, potentially, testing.

  • If your data is highly structured, the relational model is likely to work best for data storage. On the other hand, if your data is semi-structured or unstructured, or if its structure is unknown, XML is likely a good choice for a platform-independent model that ensures portability of the data through structural and semantic markup. Additionally, XML an appropriate option if some of the properties listed in the introductory section are satisfied.

  • To our knowledge, most XML-based deployments use promotion of selected columns to relational columns and use relational indexes for efficient retrieval. This usage pattern should be given serious consideration.

  • Consider the cost of XML indexes before you decide to use them extensively. We suggest that you test with a small amount of typical data and then project the results to full volumes. Also ensure that you use the correct type of XML index. In addition to primary XML indexes, SQL Server supports three types of secondary XML index that can improve the performance of specific types of XQuery queries.

Appendix

Following are the full URLs for the hyperlinked text.

1 What's New for XML in SQL Server 2008https://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-whats-new-xml.aspx

2 XML Support in Microsoft SQL Server 2005https://msdn.microsoft.com/en-US/library/ms345117(v=SQL.90).aspx

3 XML Options in Microsoft SQL Server 2005https://msdn.microsoft.com/en-US/library/ms345110(v=SQL.90).aspx

4 Performance Optimizations for the XML Data Type in SQL Server 2005https://msdn.microsoft.com/en-US/library/ms345118(v=SQL.90).aspx

5 Data Types (SQL Azure Database)https://msdn.microsoft.com/en-us/library/ee336233.aspx

6 North Shore Credit Union: NSCU banks on deeper business insight with IT help from Microsoft and TEMENOS T24www.microsoft.com/Canada/casestudies/Case_Study_Detail.aspx?casestudyid=4000006663

7 Austrian Broadcast Corporation (ORF): Austrian Broadcaster Gains High Availability with SQL Server 2005 Database Mirroringwww.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=48607