Export (0) Print
Expand All

What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release

 

Mark Fussell, Lead Program Manager
Microsoft Corporation

March 2004

Summary: This first in a series of articles by Mark Fussell details the improvements to the XML APIs in System.Xml and the .NET Framework. These enable you to further enhance the XML support in your applications. (32 printed pages)

Contents

Introduction
Design Goals for System.Xml Version 2.0
The System.Xml Version 2.0 Top Ten Features
10) Static Creation Methods on XmlReader and XmlWriter
9) XML Standards Support by Default
8) Universal Type Support and Conversion
7) XmlReader and XmlWriter Usability
6) The XQuery Language
5) Security
4) Easier XPath Queries with Namespaces
3) The XPathDocument as a Better DOM
2) XPathEditableNavigator, an Updatable Cursor
1) Performance
Conclusion

Introduction

The Microsoft Developer Tools Roadmap 2003-2005 describes the innovations and enhancements within the Visual Studio 2005 and .NET Framework Version V2.0 release (formerly referred to as "Visual Studio codename Whidbey"). Within this roadmap is an overview of the data access support in ADO.NET, covered by the System.Data and System.Xml namespaces. This series of articles provides an in-depth review of the forthcoming V2.0 release for the System.Xml namespace and the XML support in ADO.NET. This first article concentrates on the core XML classes that allow you to read, write, store, manipulate, and query XML. Future articles will focus on:

  • Querying and aggregation of data sources such as files and SQL Server databases with the XQuery language.
  • The XML enhancements in ADO.NET to support the forthcoming SQL Server 2005 Beta 2, formerly referred to as "Yukon."
  • The improvements to the XmlSerializer for serializing between XML and objects.

Design Goals for System.Xml Version 2.0

The System.Xml namespace in the .NET Framework is a set of XML classes that allow you to build XML support into your applications. It consists of a number of essential classes that enable you to read, write, manipulate, and transform XML. Some form of XML manipulation is inevitable in application development today, and all developers need to have an understanding of the core XML classes. The version 1 (V1) release of System.Xml in the .NET Framework had the following design goals:

Standards compliance: Support for the major W3C XML standards that provide cross-platform interoperability, such as XML 1.0, XML Namespaces 1.0, XSLT 1.0, XPath 1.0, and W3C XML Schema 1.0.

Usability: An easy and intuitive API design.

Integration with ADO.NET: The System.Xml classes can really be considered part of ADO.NET as an XML data access API. It was a goal to provide a seamless experience when moving between XML and relational data. The DataSet class is the canonical example here.

Extensibility: This was achieved through the use of abstract classes to define the XML API and the ability to be able to plug together these classes. The significant abstract classes that provide an extensible model are the XmlReader, the XmlWriter, and the XPathNavigator. The latter is of particular significance since it combines a random access, cursor-style API with an XPath query engine.

The V1 design goals above continue to apply in driving the V2.0 release; however, further design goals were added in the V2.0 release around the requirements needed to build a strong set of XML components for the WinFX APIs.

Significant Performance Improvement: This was the number one requirement for the V2.0 release. XML is low in the processing stack and performance gains here have a ripple affect through the rest of the application.

Usability enhancements: These enhancements make common tasks even easier to do in less lines of code, with support for a greater range of features to broaden the reach and capability of the applications.

Querying Data Sources: This is standards support for the W3C XQuery language for querying XML, with the ability to aggregate data from different data sources such as SQL Server databases and local XML files.

Enhanced XML Schema Support and Typing: This provides a set of XML type-aware APIs. Virtually every XML API that ships today is untyped in that the data is both stored and surfaced only as string types (such as the DOM API). Integrating schema information deeply across the System.Xml APIs not only provides support for type-aware query languages such as XQuery, but provides for more efficient storage, improved performance, and better integration with the .NET programming languages.

The System.Xml Version 2.0 Top Ten Features

Based on these guiding design goals, the Webdata XML product team set about creating the best user experience for working with XML as a data access technology. The V1 release of System.Xml contained a wealth of innovation from the simple-to-use pull-model XmlReader and push-model XmlWriter classes, to the industry-first cursor model API, the XPathNavigator, XML Schema validation, and a performant read-only XML store for XSLT based upon the XPath data model, the XPathDocument.

Before we look at code examples using the new features, let's look at the XML document and schema that we will use throughout this article. The XML document is an example bookstore inventory.

<?xml version="1.0" encoding="utf-8"?>
<!-- This file represents a fragment of a book store inventory -->
<bookstore xmlns="http://example.books.com">
  <book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <date>09-06-1956</date>
    <samplechapters>1 3 4</samplechapters>
    <price alternative="discount">5.99</price>
    <price>8.99</price>
  </book>
  <book genre="autobiography" publicationdate="1972" ISBN="0399105573">
    <title>The Moon's a Balloon</title>
    <author>
      <first-name>David</first-name>
      <last-name>Niven</last-name>
    </author>
    <date>09-06-1974</date>
    <samplechapters>4 5</samplechapters>
    <price alternative="discount">1.94</price>
    <price>2.57</price>
  </book>
  <book genre="novel" publicationdate="1967" ISBN="0-201-63361-2">
    <title>The Confidence Man</title>
    <author>
      <first-name>Herman</first-name>
      <last-name>Melville</last-name>
    </author>
    <date>08-04-1967</date>
    <samplechapters>1 2 3</samplechapters>
    <price alternative="discount">7.99</price>
    <price>11.99</price>
  </book>
  <book genre="philosophy" publicationdate="1991" ISBN="1-861001-57-6">
    <title>The Gorgias</title>
    <author>
      <name>Plato</name>
    </author>
    <date>10-02-1997</date>
    <samplechapters>1 9</samplechapters>
    <price alternative="discount">5.99</price>
    <price>9.99</price>
  </book>
</bookstore>

The associated XML Schema is used to provide validation and types for the elements and attributes for the XML document.

<?xml version="1.0" encoding="utf-16"?>
<xs:schema xmlns:tns="http://example.books.com" xmlns="http://example.books.com"
 attributeFormDefault="unqualified" elementFormDefault="qualified"
 targetNamespace="http://example.books.com" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:simpleType name="listofsamplechapters">
  <xs:list itemType="xs:string"/>
  </xs:simpleType>

  <xs:element name="bookstore">
    <xs:complexType>
      <xs:sequence>
        <xs:element maxOccurs="unbounded" name="book">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="title" type="xs:string" />
              <xs:element name="author">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element minOccurs="0" name="name" type="xs:string" />
                    <xs:element minOccurs="0" name="first-name" type="xs:string" />
                    <xs:element minOccurs="0" name="last-name" type="xs:string" />
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element minOccurs="0" name="date" type="xs:string" />
              <xs:element minOccurs="0" name="samplechapters" 
type="listofsamplechapters" />
              <xs:element maxOccurs="unbounded" name="price">
                <xs:complexType>
                  <xs:simpleContent>
                    <xs:extension base="xs:decimal">
                      <xs:attribute name="alternative" type="xs:string" 
use="optional" />
                    </xs:extension>
                  </xs:simpleContent>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
            <xs:attribute name="genre" type="xs:string" use="required" />
            <xs:attribute name="publicationdate" type="xs:string" 
use="required" />
            <xs:attribute name="ISBN" type="xs:string" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

In order to keep you in suspense throughout this overview I am going to do the top ten feature count down for System.Xml as voted by me, my dog, and the bearded dragon in the aquarium staring at me. (For those of you who do not know, bearded dragons are large Australian lizards!)

10) Static Creation Methods on XmlReader and XmlWriter

Starting at number 10, there are now static methods on the XmlReader and XmlWriter classes, which in System.Xml V2.0 should be used in preference to the subclasses of XmlReader and XmlWriter, the XmlTextReader and XmlTextWriter classes respectively. The reasons for introducing these new methods:

  • To provide easier and more flexible configuration, without creating a proliferation of XmlReader and XmlWriter implementations with the need to understand when to use which (such as creating an XmlConformantReader, XsdValidatingReader, and so forth).
  • To provide specialized internal optimizations depending on the chosen settings. Since concrete instances are hidden, it is possible to better optimize for particular settings. For example, based on the encoding type an XmlWriter instance can be generated that writes UTF-8 files more efficiently than UTF-16.
  • To provide the ability to pipeline XmlReader or XmlWriter instances together, thereby layering new feature support on top, such as validation or whitespace stripping.

The example code below creates an XmlReader to read the books.xml file.

XmlReader r = XmlReader.Create("books.xml");

When using these static methods, the options for the type of XmlReader or XmlWriter to create are supplied via the XmlReaderSettings and the XmlWriterSettings classes. For example, the following code is used to create an XmlReader that performs validation and strips insignificant whitespace from the document at the same time.

XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add("books.xsd");
settings.XsdValidate = true;
settings.IgnoreWhitespace = true;
XmlReader reader = XmlReader.Create("books.xml",settings);
while(reader.Read()) {}

The same approach applies to create an XmlWriter class.

XmlWriter writer = XmlWriter.Create(@"c:\output.xml");

When using the XmlWriterSettings, formatting can be specified such as attribute new line indentation that specifies that attributes should be printed on individual lines. The code example below shows how to indent both elements and attributes on new lines.

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent=true;
settings.NewLineOnAttributes=true;
XmlWriter writer = XmlWriter.Create(@"c:\output.xml",settings);

A key scenario for the settings classes is that they can be reused to create multiple XmlReaders and XmlWriters, potentially to multiple threads, for example, when used in the pages in an ASP.NET application where the settings are stored in the Application state.

9) XML Standards Support by Default

The XmlTextReader and XmlTextWriter are not conformant to the XML 1.0 specification by default. The introduction of the static Create methods for the XmlReader and XmlWriter classes provides an ability to enforce standards conformance to a higher level. For instance, one of the difficulties of the XmlTextReader was that it had no DTD support and was unable to resolve entity references in the document. In order to do this, one had to use the XmlValidatingReader class with the correct settings enabled; the functionality was split across separate classes. The XmlTextReader in V2.0 now supports DTDs and hence the ability to resolve entity references, making it a conformant XML parser.

The XmlReader and XmlWriter implementations returned by the static Create methods are conformant by default. The XmlReader created by default has a ConformanceLevel set to "Document," meaning that it attempts to read the XML as a document. You can also set the ConformanceLevel to "Auto," meaning that it automatically attempts to either read the XML as a document or a fragment depending on the type of nodes encountered. The following code creates an XmlReader that performs DTD validation and sets the conformance level to Auto.

XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Auto;
settings.DtdValidate = true;
XmlReader reader = XmlReader.Create("books.xml", settings);
while(reader.Read()) {}

A conformance issue when using the XmlTextWriter was that it did not check the element and attribute name characters for invalid characters. For example, it is possible to write out a name with a space in it, which is illegal according to the XML 1.0 specification. All XmlWriters created via the XmlWriter static Create methods now check for invalid names and by default they also check the content characters to make sure they are valid according to XML 1.0.

It is possible to layer settings onto an existing XmlReader or XmlWriter class to provide additional functionality. This "pipelining" approach allows you to chain together XmlReaders over XmlReaders or XmlWriters over XmlWriters. The following example shows an XmlNodeReader returned from the first book element found in the books.xml document that then has XML schema validation support layered on top while reading.

XmlDocument doc = new XmlDocument();
doc.Load("books.xml");
XmlNode node = doc.SelectSingleNode("//book");
XmlNodeReader nodereader = new XmlNodeReader(node);

XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add("http://example.books.com", "books.xsd");
settings.XsdValidate = true;
XmlReader reader = XmlReader.Create(nodereader, settings);
while(reader.Read()) {}

This approach is additive in that functionality cannot be removed; thus, once an XmlReader is created as a validating reader it cannot have that validation removed by layering a nonvalidating XmlReader on top. For those of you who encountered the V1 constraint that XSD validation with the XmlValidatingReader class could only be performed on an XmlTextReader class, you can sleep peacefully at last! However it is worth noting that DTD validation only works on an XmlTextReader or the XmlReader from the XmlReader.Create method, not any XmlReader.

As another example, it is possible to perform DTD processing followed by XML Schema processing in a pipeline scenario between XmlReaders. The following code example shows this.

XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdValidate = true;
// DTD Validation performed
XmlReader innerreader = XmlReader.Create("books.xml", settings);
settings.Schemas.Add("http://example.books.com", "books.xsd");
settings.DtdValidate = false;
settings.XsdValidate = true;
// XML Schema Validation performed
XmlReader outerreader = XmlReader.Create(innerreader, settings);

8) Universal Type Support and Conversion

Support for XQuery as a strongly typed language has a pervasive effect across the System.Xml APIs and is the primary reason why the XPathNavigator class now surfaces XML Schema type information. As a result, the XmlReader, XmlWriter, and XPathNavigator classes have all become type aware in that they are able to surface underlying XML Schema types and support conversion between XML Schema types and CLR types. It was possible to do this in the V1 release via the XmlValidatingReader and the XmlConvert helper class, but the difference now is that any XmlReader, XmlWriter, and XPathNavigator can report and convert type information.

It is important to note that the common data model supported by these classes is the XQuery 1.0 and XPath 2.0 Data Model, which although based upon the XML Infoset specification, extends it with XML Schema type information and the ability to support collections of documents and complex values, called sequences. A sequence is an ordered collection of nodes, atomic values, or any mixture of nodes and atomic values. Remember that a data model specifies the information in the document that is accessible, but not the APIs to represent or access the data. Neither does it describe how the data is written (serialized), which is the role of the XML 1.0 specification.

In V1 it was possible to use the XmlConvert class to convert the untyped value of an XML node to a CLR type. For example, the following code converts the string value to a CLR Double type.

XmlReader reader = new XmlReader.Create("books.xml");
Double orderTotal = 0.0;
while (reader.Read())
{
   if (reader.IsStartElement() && reader.Name == "price")
      orderTotal += XmlConvert.ToDouble(reader.ReadElementString());
}

In V2.0, using the ReadValueAsXXX methods on the XmlReader class, the value of an element can be read and converted to a CLR value in a single method call, as shown in the following code.

while (reader.Read())
{
   if (reader.IsStartElement() && reader.Name == "price")
      orderTotal += reader.ReadValueAsDouble();
}

These typed reads can also be applied to attribute values. While not immediately a big difference the range of supported types now extends to collections. If the content of the element is separated by spaces, then this can be read into an array of values. For example, given this XML element fragment for a list of sample book chapters,

<samplechapters>1 3 4</samplechapters>

the following code reads the content into an array of integers:

while (reader.Read())
{
   if (reader.IsStartElement() && reader.Name == "samplechapters")
   {
      int[] values = (int[])reader.ReadValueAs(typeof(int[]));
      foreach (int i in values)
         Console.WriteLine(i);
   }
}

Up until now all the values that we have been reading have been untyped. That is, the value is read and stored as unicode string value, which can then be coerced into a CLR value. If an XML schema is associated with the XML via a target namespace, then once the type has been read, simple types can be stored as the "native" CLR representation. Thus, xs:int types are stored as CLR int types and xs;datetime types are stored as a CLR DateTime types. In the code example below, when a validating XmlReader is created with an associated schema, the price element values are stored as double types when parsed from the file stream, and the samplechapters element values are returned as a collection type, since it is an xs:list type in the schema.

XmlReaderSettings settings = new XmlReaderSettings();
settings.Schemas.Add("http://example.books.com", "books.xsd");
settings.XsdValidate = true;
XmlReader reader = new XmlReader.Create("books.xml", settings);

while (reader.Read())
{
   if (reader.IsStartElement() && reader.Name == "price")
   {
      price = reader.ReadValueAsDouble();
      Console.WriteLine(price);
   }
   if (reader.IsStartElement() && reader.Name == "samplechapters")
   {
      foreach (object o in reader.ReadValueAsList())
         Console.WriteLine(o);
   }
}

We will see how the type support, now built across the System.Xml namespace, becomes especially relevant when we look at the improvements to the XPathDocument class, when used for validation, writing business rules, and performance.

The XmlWriter also has the ability to convert CLR types to schema types when writing out XML. The following example uses the WriteValue methods to write CLR values that could have been generated with business logic in the application, such as the book price, publicationdate, and ISBN.

Double price = 9.99;
DateTime publicationdate = new DateTime(2003,3,17);
String isbn = "1-756-345-232";

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.NewLineOnAttributes = true;

using (XmlWriter writer = XmlWriter.Create("output.xml", settings))
{
   writer.WriteStartDocument();
   writer.WriteStartElement("bookstore");
      writer.WriteStartElement("book");
      writer.WriteStartAttribute("publicationdate");
      writer.WriteValue(publicationdate);
      writer.WriteEndAttribute();
      writer.WriteStartAttribute("ISBN");
         writer.WriteValue(isbn);
      writer.WriteEndAttribute();
      writer.WriteElementString("title", "System.Xml V2.0 Review");
      writer.WriteStartElement("book");
         writer.WriteValue(price);
      writer.WriteEndElement(); //price
      writer.WriteEndElement(); //book
   writer.WriteEndElement(); //bookstore
} 

This example writes the XML shown document below. Note that the attributes are each on their own new line as a result of setting the NewLineOnAttributes property on the XmlWriterSettings class, which helps with readability.

<?xml version="1.0" encoding="utf-8"?>
<bookstore>
  <book
    publicationdate="2003-03-17T00:00:00.0000000"
    ISBN="1-756-345-232">
    <title>System.Xml V2.0 Review</title>
    <book>9.99E0</book>
  </book>
</bookstore>

7) XmlReader and XmlWriter Usability

There are a number of new helper methods that complement existing methods on the XmlReader and XmlWriter classes to simplify reading and writing XML.

The new helper methods on the XmlReader are:

  • bool XmlReader ReadSubTree() This method returns a new XmlReader, which reads the current node and all of its descendants and returns false once the entire subtree has been read. It's as if you "snapped" an XmlReader off from the current reader. This method is ideal in those recursive descent scenarios when reading XML. How many times have you had another sub-function to which you have passed the current XmlReader, in order to so some specific processing? The difficulty in the V1 release is that the sub-function had to keep track of the state of the reader through a combination of looking for a named EndElement node type and keeping track of the XmlReader depth. Invariably you got it wrong at the first attempt.
  • bool ReadToDescendent(string qname) This can be considered as the same as an XPath descendent query from the current node or the ".//" query expression. It provides an easy way to find a named element by moving the XmlReader to the next descendent element, which matches the specified name and returns true if a match is found.
  • bool ReadToNextSibling(string qname) This method advances the XmlReader to the next sibling element which matches the specified name, and returns true if a match is found. For example, the following code uses this method and the previous two to split a single file into multiple documents, one for each book in the bookstore.
    using (XmlReader reader = XmlReader.Create("books.xml"))
    {
       reader.MoveToContent();
       if (reader.ReadToDescendant("book"))
       {
          do
          {
    using (XmlWriter writer =XmlWriter.Create(@"bookstore\" + reader["ISBN"] + ".xml"))
             writer.WriteNode(reader.ReadSubtree(), false);
          } while (reader.ReadToNextSibling("book"));
       }
    }
    
    

    The code example above also shows the use of the C# "using" keyword which is now supported on both the XmlReader and the XmlWriter classes. The "using" keyword calls the Dispose method, which in turn calls the Close method. This closes the XmlReader or XmlWriter and frees all resources associated with it, once it goes out of scope and prevents you from having to remember to call Close yourself. This becomes especially useful when handling exceptions thrown by the XmlWriter, where it to forgetting to call Close is a common bug. (As a side note, the SqlConnection class in ADO.NET also supports the "using" keyword in V2.0 on the .NET Framework, which closes the connection).

  • object ReadAsObject (System.Type type) This method melds together the XmlReader and the XmlSerializer classes and enables you to surface CLR classes directly from the XmlReader stream. For example, given a class book defined as
    public class Book
    {
       public string title;
       public double price;
    }
    
    

    the following code creates book objects directly from the XML stream to be handed to another function for processing. This method is useful where you have islands of structured data in an XML document that you want to turn directly into CLR objects.

    if (reader.ReadToDescendant("book"))
    {
       do
       {
          Book book = (Book)(reader.ReadAsObject(typeof(Book)));
          ProcessBook (book) // Do some processing on the book object
       } while (reader.ReadToNextSibling("book"));
    }
    
    

The new helper methods on the XmlWriter are:

  • void WriteNode(XPathNavigator navigator, bool defattr) This overload is useful for serializing out XPathNavigators in the same way that that an XmlReader can be serialized today. For example, the code below is the XPathDocument equivalent to the XmlReader code shown above to split a single file into multiple documents.
    XPathDocument doc = new XPathDocument("books.xml");
    XPathNodeIterator nodes = doc.SelectNodes("//book");
    foreach (XPathNavigator node in nodes)
    {
       string fileName = @"bookstore\" + node.SelectSingleNode("@ISBN").Value + ".xml";
    using (XmlWriter writer = XmlWriter.Create(fileName))
       {
          writer.WriteNode(node,false);
       }
    }
    
    
  • void WriteFromObject(object value) This complements the ReadAsObject() method to invoke the XmlSerializer on the XmlWriter in order to serialize a CLR object to the XML stream. For example the book class after it has been processed can be written using the following code.
    Book book = (Book)(reader.ReadAsObject(typeof(Book)));
    ProcessBook (book) // Do some processing on the book object
    writer.WriteFromObject(book);
    
    

As a final more complete example of combining many of these XmlReader and XmlWriter helper methods together, the following example generates an HTML page from an RSS feed with a hyperlink and a description for each item found within the RSS channel. The relevant helper methods are highlighted in bold. The interesting aspect of this example is the use of the ReadSubTree method as a parameter to subfunctions which greatly simplifies the code and makes it less error prone.

using System;
using System.IO;
using System.Xml;
using System.Net;
/// <summary>
/// This class takes RSS feed and generates an XHTML page.
/// Big caveat. This does not work on all RSS feeds.
/// </summary>
public class RSSReader 
{
   public static void Main(string [] args) 
   {
      // create an instance of RSSReader
      RSSReader rssreader = new RSSReader();
      try 
      {
         string url = "http://msdn.microsoft.com/rss.xml";
         using (XmlWriter writer = XmlWriter.Create("output.html"))
         {
            WebClient wc = new WebClient();
            byte[] data = wc.DownloadData(url);
            MemoryStream stream = new MemoryStream(data);
            using (XmlReader reader = XmlReader.Create(stream, String.Empty))
            {
               reader.MoveToContent();
               rssreader.RSSToHtml(reader writer);
            } // reader gets automatically closed here. No Close() needed 
         } // writer gets automatically closed here. No Close() needed .
      } 
      catch (XmlException e) 
      {
         Console.WriteLine(e.Message);
      }
   }

   public void RSSToHtml(XmlReader reader, XmlWriter writer)
   {
      writer.WriteStartElement("html");   
      while (reader.Read()) 
      {
         switch (reader.LocalName)
         {
            case "channel":
               ChannelToHtml(reader.ReadSubtree(), writer);
               break;
            case "item":
               ItemToHtml(reader.ReadSubtree(), writer);
               break;
         }
      }
      writer.WriteEndElement();
   }

   void ChannelToHtml(XmlReader reader, XmlWriter writer)
   {
      //Create the HTML page for the RSS channel
      writer.WriteStartElement("head");
      reader.Read();
      reader.ReadToDescendant("title");
      writer.WriteNode(reader, true);
      writer.WriteEndElement();
      writer.WriteStartElement("body");
      while (reader.Read()) 
      {
         if (reader.Name == "item") 
         {
            ItemToHtml(reader.ReadSubtree(), writer);
         }
      }
      writer.WriteEndElement(); // close head element
   }

   void ItemToHtml(XmlReader reader, XmlWriter writer)
   {
      //Create the HTML reference link for the item element 
      writer.WriteStartElement("p");
      string title = null, link = null, description = null;
      while (reader.Read())
      {
         switch (reader.Name)
         {
            case "title":
               title = reader.ReadString();
               break;
            case "link":
               link = reader.ReadString();
               break;
            case "description":
               description = reader.ReadString();
               break;
         }
      }
      // elements written out in a different order than read in
      writer.WriteStartElement("a");
      writer.WriteAttributeString("href", link);
      writer.WriteString(title);
      writer.WriteEndElement();
      writer.WriteStartElement("br");
      writer.WriteEndElement();
      writer.WriteString(description);
      writer.WriteEndElement();// write closing </p>
   }
}

6) The XQuery Language

XML has query languages today, XPath 1.0 and XSLT 1.0, both of which are hugely popular. XSLT is positioned as an XML-to-XML transformation language but is also capable of performing queries across XML data sources, often using the document function. Despite the availability of these technologies, the W3C decided to introduce a new XML query language called XQuery. The justification for XQuery is described in the XQuery Language specification at http://www.w3.org/TR/xquery as:

A query language that uses the structure of XML intelligently can express queries across all kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources
XQuery is designed to meet the requirements identified by the W3C XML Query Working Group XML Query 1.0 Requirements and the use cases in XML Query Use Cases. It is designed to be a language in which queries are concise and easily understood. It is also flexible enough to query a broad spectrum of XML information sources, including both databases and documents.

XQuery can also be summarized in the following statement:

The XQuery language is to XML as the SQL language is to relational databases.

Primarily the XQuery language was designed to provide the following benefits:

  • A greater expressiveness with an ability to perform complex query operations such as joins, ordering, and sorting.
  • A human-friendly, non-XML syntax.
  • Strong typing at both runtime and compile time. That is, through the use of W3C XML Schemas types, you can query for particular types of data in the data source, such as "find all the books within this document."
  • A rich set of functions and operators to operate on the XML Schema types. Whereas XML Schema only defined the types, XQuery has defined the operations that are allowed on those types.

In the Visual Studio 2005 Beta 1 release of System.Xml, XQuery is a significant addition. In order to execute an XQuery expression, the XQueryCommand class is used from the System.Xml.Query namespace. The code example below shows a query used to select all the books in the bookstore that have an attribute genre of autobiography, and write the titles of these books out as a new bookstore.

using (XmlWriter writer = XmlWriter.Create("output.xml "))
{
   XQueryCommand xq = new XQueryCommand();
   string query =
         "<bookstore>" +
         "{ for $s in /bookstore/book " +
         "where $s/@genre='autobiography' " +
         "return $s/title }" +
         "</bookstore>";
   xq.Compile(query);
   xq.Execute("books.xml", new XmlUrlResolver(), writer);
}

Here the query expression is:

<bookstore>
{for $s in /bookstore/book
where $s/@genre='autobiography’
return $s/title}
</bookstore>

This expression is first compiled by the Compile method, which does type checking on the values and generates an executable. The Execute method executes the query taking the books.xml document as input (where the XmlUrlResolver is used to load the document from file), and this may throw run-time errors, just like any CLR programming language. The results are written to an XmlWriter.

XQuery provides a distributed query mechanism across data sources that are exposed as XML and is set to become a universal query language for data integration. We will look at the use of XQuery over SQL Server in depth in the next article in this series.

5) Security

We are now into the top five features. The V1.1 .NET Framework release of System.Xml addressed many of the security issues within the XML APIs, such as preventing the subclassing of the XmlTextReader in semi trusted code and providing an XmlSecureUrlResolver class to disallow access to resources that were restricted through policies on the machine. The significant security additions in the V2.0 release are:

  • The ability to treat XML with the same security privileges as code by applying .NET Code Access Security (CAS).
In the same manner that CAS is applied to code when it is downloaded, CAS is now applied to XML as data in order to make it secure. This security model for XML is achieved by "flowing" evidence derived from the source document between XML components, thereby maintaining its security. For example, when an XmlReader is used to load an XML document from a URL, this URL along with the host and the site are used as evidence. Based on the policy settings, this determines the level of trust that can be associated with the document and hence what it is able to execute. In the case of XML documents that contain external entity references, evidence is used to prevent redirection requests to malicious sites. When the XML document is XSLT that could contain a CLR language script (Visual Basic or C#) it is important to be able to control what access privileges that code has. Hence the XmlReader class exposes an Evidence property that is used to set the evidence on the XPathDocument class, which in turn, if this is a stylesheet, is used by the XSLT processor as evidence during compilation and execution. The security model for XML ensures the integrity of the data as it flows between components in System.Xml.
  • The ability to prohibit DTD parsing when loading an XML document.
When the XML 1.0 specification was authored, security was not a top concern, and as a result DTDs have the unfortunate capability of severe Denial of Service (DoS) attacks, typically through the use of an internal entity expansion technique. In System.Xml V2.0, in order to provide protection against DTD DoS attacks there is the capability of turning off DTD parsing through the use of the ProhibitDtd property on the XmlReaderSettings class. Setting this property causes the XmlReader to throw an exception when any DTD content is encountered in the XML.

4) Easier XPath Queries with Namespaces

One of the most often asked questions on the XML newsgroups, and the one that is a pitfall for many a user of XPath, is how to issue an XPath query when there are namespaces declarations in the document.

When working with the XmlDocument class there is a useful overload for the SelectNodes method that allows you to specify a namespace to prefix mapping using an XmlNamespaceManager class. For example the code below maps the book prefix to the xmlns:book=http://example.books.com namespace, so that the msbooks prefix can now be used in the //msbooks:price query to return all the prices elements that belong to that namespace. Without this mapping no results are returned, here is where the majority of developers fall down, often resorting to stripping the namespace declarations from their document.

XmlNamespaceManager nsmgr = new XmlNamespaceManager(nav.NameTable); 
nsmgr.AddNamespace("msbooks", "http://example.books.com");
XmlNodeList nodes = doc.SelectNodes(//msbooks:price, nsmgr);

For an in-depth article of how namespaces work and how they can be used in XPath queries, read XML Namespaces and How They Affect XPath and XSLT.

Furthermore, when using an XPathNavigator with an XPathDocument in V1 the code becomes very obscure to write, requiring you to compile the XPath query, call the SetContext method with the XmlNamespaceManager on the compiled XPathExpression class, before providing this as a parameter to the Select method. It is a painful API to solve a painful problem. If you are doing this today then the following code is what you typically need to write:

XPathDocument doc = new XPathDocument("books.xml");
XPathNavigator nav = doc.CreateNavigator();
XPathExpression Expr = nav.Compile("books/msbooks:book/price");
XmlNamespaceManager context = new XmlNamespaceManager(nav.NameTable);
context.AddNamespace(msbooks, "http://example.books.com");
Expr.SetContext(context);
XPathNodeIterator Iterator = nav.Select(Expr);

Here the line:

context.AddNamespace(msbooks, "http://example.books.com");

is used to associate the msbooks prefix with the http://example.books.com namespace to enable the XPath query to successfully find the book elements in the document. Further explanation on this code can be found in the .NET Framework documentation at XPath Queries with Namespaced Mapped Prefixes.

The irony is that the vast majority of XML developers simply want to use the namespace/prefix combinations that are already paired in their document without having to redefine these and add them to an XmlNamespaceManager class. In order to do this, a component can now expose an IXmlNamespaceResolver interface from which the XmlNamespaceManager derives. All implementations of XmlReaders and XPathNavigators are required to implement this interface, which exposes the namespace stack that is in scope, depending upon the position of the XmlReader or XPathNavigator. What this means now is that for a large class of documents that typically declare namespaces on the document element, a single Select method call on an XPathNavigator is all that is needed to execute XPath queries that contain prefixes that match the prefixes declared in the document. On the XPathNavigator this Select method has the following prototype:

XPathNodeIterator Select(string xpath, IXmlNamespaceResolver nsresolver);

The following document contains a namespace declaration for xmlns:msbooks="http://example.books.com" on the bookstore document element.node.

<bookstore xmlns:msbooks="http://example.books.com">
  <book genre="autobiography" publicationdate="1981" ISBN="1-861003-11-0">
    <title>The Autobiography of Benjamin Franklin</title>
    <author>
      <first-name>Benjamin</first-name>
      <last-name>Franklin</last-name>
    </author>
    <date>09-06-1956</date>
    <samplechapters>1 3 4</samplechapters>
    <msbooks:price alternative="discount">5.99</book:price>
    <price>8.99</price>
  </book>
...
</bookstore>

The following code returns all the <msbooks:price> elements that are bound to the http://example.books.com namespace.

XPathDocument doc = new XPathDocument("books.xml");
XPathNavigator nav = doc.CreateNavigator();
//move to document element 
nav.MoveToChild(XPathNodeType.Element);
foreach (XPathNavigator node in nav.Select("//msbooks:price",nav))
{
   Console.WriteLine(node.Value);
}

Here the XPathNavigator is first moved to the document element, named bookstore, to ensure that the namespace declaration is in scope and then the XPathNavigator is supplied as parameter to the Select method. Of course you still have to be careful with those documents that reuse prefix names where the prefix name appears multiple times in the document, but these tend to be very rare. With the shift towards the XPathDocument as the preferred XML in-memory store, XPath and namespace querying has become a whole lot easier.

3) The XPathDocument as a Better DOM

System.Xml V1 shipped with an XML store called the XPathDocument, which is built on the XPath data model to allow for fast XSLT queries. In System.Xml V2.0 this is now the primary XML store and the one that you should prefer to use. That is not to say that the XmlDocument (DOM) class is going away. It is still fully supported and stays as a useful standards based API, but the XmlDocument (DOM) was never designed for use with XML from a data perspective and certainly is not suitable for query support, be that XSLT, XPath, or XQuery. As the world of XML moves forward to embrace XML as both data and documents, the DOM API is now no longer suitable, certainly for high-performance enterprise scale applications. The W3C DOM specification continues to be patched up and to play catch-up in the world of XML processing.

The Limitations of the DOM

Three major limitations with the DOM limit its capability:

  • The Data Model The DOM's (implied) data model mirrors the XML syntax, with such concepts as entity references and CData sections, for example. For XML considered as data, these do not make sense and complicate the API. They are simply lexical information from the XML 1.0 specification. Significantly, since the DOM's data model is not the same as any of the XML query languages (XPath 1.0, XSLT 1.0, or XQuery) that model is extremely inefficient for queries. This was the primary reason for the XPathDocument class in System.Xml V1 release.
  • Nodes Exposed Directly Since nodes are exposed by the DOM API, there is no way to optimize the internal storage. There is no possibility to develop more efficient ways to store the data over time without breaking applications. For example, the internal storage of the XPathDocument has considerably changed between the .NET Framework V1 and V2.0 releases in order to support faster querying and loading. In V2.0 the XPathDocument is now more performant than the XmlDocument for loading, querying, and writing, since the internal storage structure could be rewritten.
  • The API The DOM API is ingrained in the world of XML development, but it is not the most usable. It is widely considered that the pull model XmlReader API is easier to use than the event driven SAX API. We need an easier API for in-memory XML. In System.Xml, the XmlWriter is an intuitive way to write XML documents. Reusing the same class for streaming XML provides a single API for writing XML documents. The XmlWriter is a more usable top-down push model rather than the bottom-up DOM model for building node trees in a document. This is in effect an XmlNodeWriter design.

In System.Xml V2.0, the XPathDocument is positioned as a better XML store for a world that is becoming more data centric than document centric. What this means is that the XPathDocument cannot be round-tripped, preserving its document fidelity. DTDs are lost (the XPath data model cannot represent them), CDATA sections are not preserved, and certain types of white space are not kept. In other words, the data in your document is certainly maintained as you describe, but the document does not preserve some of the XML syntax during serialization. The document can be and typically is serialized differently from its original form. The XPathDocument is completely capable of dealing with XML as documents, but through a better abstraction layer that does not tie you to the serialization format of XML 1.0. You could say that the XPathDocument is more data centric or "infoset centric," because it is focused on the information the XML document contains and not on the surface syntax.

In V1 the XPathDocument provided the following features;

  • Fast XPath 1.0 and XSLT 1.0 query support
  • A read-only random access cursor-style API called the XPathNavigator.

In the V2.0 release of the .NET Framework, XPathDocument has the following enhanced features:

  • The ability to use the XmlWriter class to write content. This has the additional benefit in that you can use the XSLT or XQuery classes, which output to an XmlWriter, to build a completely new document or subtree. Imagine building a single DOM from the transformations over five different documents and pulling XML from each one. This is hard today without using intermediate DOMs to cache the results, with a significant performance hit.
  • A fully typed XML store with W3C XML schema type information for each node. As described in the earlier section on Type Support, having schema information is necessary to support XQuery. Further, validation of the XPathDocument can occur at the time of editing through the use of the Validate method, which applies schema information and content validation to the document.
  • Support for an updateable, random access cursor-style API. Called the XPathEditableNavigator, this derives from the read-only XPathNavigator class.
  • Support for node level events.
  • Support for the ability to accept or reject changes across a whole document, after it has been edited.
  • The ability to load and save XML documents to files, streams and URLs.

The following code example loads an XPathDocument, creates an XPathEditableNavigator, navigates to the bookstore element, and appends a new book as a child element using the XmlWriter class. It then saves the document back to the original file.

XPathDocument doc = new XPathDocument("books.xml");
XPathEditableNavigator editor = doc.CreateEditor();

// move to the comment node
editor.MoveToFirstChild();
// move to the bookstore element 
editor.MoveToNext();
// create an XmlWriter that appends a child node to the current position
// i.e. As the last book in the bookstore
using (XmlWriter writer = editor.AppendChild())
{
   writer.WriteStartElement("book");
   writer.WriteAttributeString("genre", "XML Technologies");
   writer.WriteAttributeString("publicationdate", "10-27-2003");
   writer.WriteAttributeString("ISBN", "1-861003-11-1");
   writer.WriteElementString("title", "ADO.NET and System.Xml V2.0");
   writer.WriteStartElement("author");
   writer.WriteElementString("first-name", "Mark");
   writer.WriteElementString("last-name", "Fussell");
   writer.WriteEndElement();
   writer.WriteElementString("price", "27.99");
   writer.WriteEndElement();
}
doc.Save("books.xml");

The content of an XPathDocument can be validated with an XML schema to check the structure of the XML document, using the Validate method, which allows the document to be revalidated once it is updated with the XPathEditableNavigator. Before we look at a code example, we need to cover the new XML schema library class called the XmlSchemaSet.

The XmlSchemaSet Class

In System.Xml V1, XML schemas can be loaded into an XmlSchemaCollection class as a library of schemas. In System.Xml V2.0 the XmlValidatingReader and the XmlSchemaCollection classes have been made obsolete, to be replaced by the XmlReader Create methods and the XmlSchemaSet class respectively. The XmlSchemaSet has been introduced into System.Xml V2.0 to fix a number of deficient issues including standards compatibility, performance, and obsoleting of the Microsoft XDR schema format. A comparison between the XmlSchemaCollection and the XmlSchemaSet is provided in Table 1.

Table 1. Comparison between the XmlSchemaCollection and the XmlSchemaSet

XmlSchemaCollection XmlSchemaSet
Supports Microsoft XDR and W3C XML schemas. Only supports W3C XML schemas.
Schemas are compiled when the Add method is called. Schemas are not compiled when the Add method is called. This provides a performance improvement during creation of the schema library.
Each schema generates an individual compiled version that can result in "schema islands." As a result all includes and imports are scoped only within that schema. Compiled schemas generate a single logical schema, a "set" of schemas. Any imported schemas within a schema that are added to the set are directly added to the set themselves. Thus, all types are available to all schemas.
Only one schema for a particular target namespace can exist in the collection. Multiple schemas for the same target namespace can be added as long as there are no type conflicts.

The following example shows an XML Schema being associated with an XML document and then validating that the document conforms to the schema content model through the Validate method on the XPathEditableNavigator.

XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add("http://example.books.com", "books.xsd");
schemaSet.Compile();

XPathDocument doc = new XPathDocument("books.xml");
XPathEditableNavigator editor = doc.CreateEditor();
editor.Validate(schemaSet, new ValidationEventHandler(ValidationCallback));

public static void ValidationCallback(object sender, ValidationEventArgs args)
{

   if(args.Severity == XmlSeverityType.Warning)
      Console.Write("Warning: ");
   else if(args.Severity == XmlSeverityType.Error)
      Console.Write("Error: ");    
   Console.WriteLine(args.Message);
}

2) XPathEditableNavigator, an Updatable Cursor

You saw this class in the previous topic; it is one that brings tears of joy to my eyes. The XPathEditableNavigator combined with the XmlWriter provide editing support for the XPathDocument class. The XPathEditableNavigator is a cursor-based API for editing XML with methods to perform edit operations on a node tree. Since it is derived from the XPathNavigator class, it has all the properties of the XPathNavigator such as XPath, support to easily navigate the document with queries. Like the XmlReader and XmlWriter classes, the XPathEditableNavigator also supports the ability to read and write type information to the XPathDocument.

Rather than showing numerous different code examples of the flexibility of the XPathEditableNavigator API, it is simplest to look at the class definition to see the editing support.

abstractclass XPathEditableNavigator : XPathNavigator, IXPathEditable
{
method public abstract XPathNavigator Clone();

// IXPathEditable implementation
method public virtual XPathEditableNavigator CreateEditor();

// IXPathNavigable Interface Implementation
method public virtual XPathNavigator CreateNavigator();

// Create Methods
method public virtual XmlWriter PrependChild();
method public abstract XmlWriter AppendChild();
method public virtual XmlWriter InsertAfter();
method public abstract XmlWriter InsertBefore();
method public abstract XmlWriter CreateAttributes();

// Replace Methods
property public new virtual string OuterXml{get; set;}
property public new virtual string InnerXml{get; set;}

// Update Methods
method public abstract void SetValue(object value);
method public virtual void SetFromObject(object value);
method public virtual XPathEditableNavigator AppendChild(String xml)
method public virtual XPathEditableNavigator AppendChild(XmlReader reader)
method public virtual XPathEditableNavigator AppendChild(XPathNavigator nav)
method public virtual XPathEditableNavigator PrependChild(String xml)
method public virtual XPathEditableNavigator PrependChild(XmlReader 
reader)
method public virtual XPathEditableNavigator PrependChild(XPathNavigator nav)
method public virtual XPathEditableNavigator InsertBefore(String xml)
method public virtual XPathEditableNavigator InsertBefore(XmlReader 
reader)
method public virtual XPathEditableNavigator InsertBefore(XPathNavigator nav)
method public virtual XPathEditableNavigator InsertAfter(String xml)
method public virtual XPathEditableNavigator InsertAfter(XmlReader reader)
method public virtual XPathEditableNavigator InsertAfter(XPathNavigator 
nav)

// Delete Methods
method public abstract bool DeleteCurrent();

// Helpers
method public virtual void PrependChildElement(string prefix, string localName, 
string namespaceURI, object value);
method public virtual void AppendChildElement(string prefix, 
string localName, string namespaceURI, object value);
method public virtual void InsertElementBefore(string prefix, 
string localName, string namespaceURI, object value);
method public virtual void InsertElementAfter (string prefix, 
string localName, string namespaceURI, object value);
method public virtual void CreateAttribute(string prefix, 
string localName, string namespaceURI, object value);

// Schema Validation Methods
method public abstract void Validate(XmlSchemaSet schemas, ValidationEventHandler validationEventHandler)
}

As you can see from this class definition, the XPathEditableNavigator has a very rich editing API that is integrated with the other XML components, such as the ability to create subtrees by reading from other XmlReaders and XPathNavigators.

Using an XmlWriter returned from the creation methods is the primary way to build an XML document from scratch. The update methods allow you to easily amend subtrees in an existing document and the DeleteCurrent method removes the current node from the tree. Finally the helper methods are there to provide the ability to insert simple type elements and attributes on loaded documents.

The power and flexibility of the design is highlighted when the XPathEditableNavigator is used with other XML components. In the example shown below, the XQueryCommand class is used to query an existing XML document and generate a new XPathDocument via the XmlWriter. It is then straightforward to navigate to another part of this document with an XPath expression and start creating new nodes either via calls directly to the XmlWriter or by querying data from another document, a database, or any other XML data source.

XPathDocument doc = new XPathDocument();
XPathEditableNavigator editor = doc.CreateEditor();
using (XmlWriter writer = editor.AppendChild())
{
   XQueryCommand xq = new XQueryCommand();
   string query =
         "<bookstore>" +
         "{ for $s in /bookstore/book " +
         "where $s/@genre='autobiography' " +
         "return $s/title }" +
         "</bookstore>";

   xq.Compile(query);
   xq.Execute("books.xml", new XmlUrlResolver(), writer);
}
Console.WriteLine(doc.CreateEditor().InnerXml);
doc.Save("output.xml");

The XPathEditableNavigator has the abilty to read and write typed values. The example below compares the XmlDocument class to the XPathDocument class when reading and writing CLR types to an XML document. If the XPathDocument is provided with schema information at load time and the document validated, the simple types are stored internally as CLR types. Depending on the types in the data, this can provide a significant reduction in the memory footprint for the loaded XML document. (For example, numeric values such as integers have better storage as int types rather than kept as a string.)

// using the XmlDocument class
XmlDocument xmldoc = new XmlDocument();
xmldoc.Load("books.xml");
foreach (XmlNode node in xmldoc.SelectNodes("//price/text()"))
{
   Double price = XmlConvert.ToDouble(node.Value);
   node.Value = XmlConvert.ToString(price * 1.1); // add 10%
}
Console.WriteLine(xmldoc.DocumentElement.InnerXml);


// using the XPathDocument class
XPathDocument doc = new XPathDocument("books.xml");
foreach (XPathEditableNavigator node in doc.SelectNodes("//price"))
{
   Double price = node.ValueAsDouble;
   node.SetValue(price * 1.1); // add 10%
}
Console.WriteLine(doc.CreateEditor().InnerXml);

The XPathEditableNavigator and the XPathDocument classes provide a powerful combination of easy-to-use editing support over a strongly typed XML store that ushers in the end of the reign of the DOM as the API of choice for working with and manipulating XML. The DOM API era is over. Long live the DOM.

And finally, the number one feature is:

1) Performance

Performance was the top goal established from the very start of the V2.0 development milestone and has been dramatically improved across XML parsing, writing, and transformation. This has also been driven by aggressive users of System.Xml, such the XML messaging needs of "Indigo".

The big performance improvements are:

  • The XmlTextReader and the XmlTextWriter are on average twice as fast as the V1.1 release. This improvement has been achieved through a significant redesign and rewrites that optimized for the most common code path.
  • The XmlReaders created via the static Create methods have the same performance as the constructed types. since these share the same code path.
  • The XmlWriters created via the static Create methods have better performance than XmlTextWriter.
  • XML Schema validation is approximately 20 percent faster and is more compliant to the W3C standard.
  • The XSLT performance is three to four times faster.

These relative performance improvements are illustrated in the graph below.

Figure 1. Relative performance improvements in System.Xml

One of the biggest challenges in the V1 release of the .NET Framework is that the MSXML 4.0 XSLT processor is significantly faster than the XslTransform class. This is due to the timing of the V1 release, whereby the XslTransform class was mainly based upon the MSXML 3.0 XSLT processor design. The XslTransform class is still very performant compared to others in the industry, but not as good as the best, MSXML 4.0. Meanwhile, the MSXML 4.0 implementation leapt ahead thorough further improvements and innovations in XSLT through understanding how better to apply optimizations. A primary goal of the V2.0 release is to match and surpass the performance of the MSXML 4.0 XSLT processor. This has been achieved by redesigning the XSLT processor from the ground up, which includes generating MSIL directly that can then be JIT-ed by the .NET runtime. This new XSLT processor is called the XsltCommand class and lives in the System.Xml.Query namespace alongside the XQueryCommand class. In fact, both these classes share a common query runtime architecture compiling XML query languages down to a common intermediate format in the same manner that the CLR does for programming languages. There is a trade-off here in that it may take longer than in V1 to compile an XSLT stylesheet to generate the executable code, but then the execution time is significantly faster. Since the most common design pattern by far is to precompile XSLT stylesheets in advance and cache these for reuse, in the majority of cases this is immaterial.

The code example below shows the XsltCommand class loading a stylesheet called storeDB.xsl and calling the Execute method to transform the storeDB.xml document to the output document, called store.html.

XsltCommand myProcessor = new XsltCommand();
myProcessor.Compile("storeDB.xsl");
using (XmlWriter writer = XmlWriter.Create("store.htm"))
{
   myProcessor.Execute("storeDB.xml", new XmlUrlResolver(), null, writer);
}

The performance improvements in the System.Xml V2.0 release also go beyond working with individual components and, given that XML schema type information is available everywhere, further performance optimizations can be applied using this type information. When you load an XPathDocument from a validating reader with an associated schema, the XML is stored as CLR types to optimize the storage. For example if the XML schema indicates that the values are of type xs:int, these are stored as CLR int types in the XPathDocument, rather than as untyped strings. Not only does this enable you to work with the types in your CLR language of choice, it reduces the storage and working set of the document loaded into memory. Of course, how much of a storage benefit is achieved is entirely dependent on your type of data. More important, if you apply an XSLT or XQuery to the XPathDocument and use this to generate another XPathDocument, these CLR types are "flowed" between components in that they are not first copied to string values and then reparsed through a text XML parser. This provides a significant performance improvement when chaining together XML components that utilize schema type information, with the XML 1.0 serialized form only created when written out to text format via the XmlWriter, such as to a TextStream class.

Finally run the following piece of code on the V1 release of the .NET Framework with your favorite XML document and write down the average time in milliseconds that it takes to parse. The bigger the document the better. When the .NET Framework V2.0 Beta 1 release is available, do the same again and hopefully the new number that you have will be at least half your existing number. Then hug someone close to you out of pure delight.

int start, stop;
start = Environment.TickCount;
for (int i = 0; i < 100; i++)
{
    XmlTextReader reader = new XmlTextReader("largedocument.xml");
    while (reader.Read());
}
stop = Environment.TickCount;
PrintTime("XmlTextReader document parsing time in ms: " + (stop – start).ToString());

Conclusion

This article has provided an in-depth overview through a climax-building top ten countdown of the best features of the core XML classes in System.Xml in the forthcoming .NET Frameworks Beta 1 release. These enable you to read, write, manipulate, and transform XML. With improvements in performance, usability, typing, and querying the XML support in the V2.0 release continues to lead the industry in innovation, standards support, and ease of use. In the next article in this series, we will examine XML more in-depth as a data access technology that is integrated as part of ADO.NET using XQuery to query over SQL Server and other disparate sources of XML.

Mark Fussell is a Lead Program Manager in Microsoft's WebData team, which develops Microsoft's data access technologies including the components within the System.Xml and System.Data namespaces of the .NET Framework, Microsoft XML Core Services (MSXML) and Microsoft Data Access Components (MDAC). His blog is http://weblogs.asp.net/mfussell.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft