Using Schema and Serialization to Leverage Business Logic

Article
06/30/2006

Eric Schmidt
Microsoft Corporation

April 2001

Note The following is required for this demo:

MSXML 4.0
Visual Basic 6.0
Microsoft Visual Studio .NET Beta 1

A note from Chris Lovett I would like to introduce a good friend of mine, Eric Schmidt, and no I don't mean the former CEO of Novell. I have decided to hand off the Extreme XML column to Eric so I can focus my energies on a new top-secret project I have taken on. I've enjoyed writing this column and I'll miss it. Eric is a Program Manager in the XML Core Services team here at Microsoft and has a great passion for customers and a deep understanding of MSXML and the .NET Frameworks. Happy reading.

Hi, I took on this mammoth task because I love talking about XML. My main job on the team is to work with our internal teams building MSXML, SQL Server XML Features, SOAP, and the XML classes for the .NET Framework. Specifically, I work on our evangelism strategy and customer usage of these products. My focus had been data-centric for the past eight years and I truly believe that nothing is more important in an application than the data. Furthermore, I firmly believe that XML and Web Service-enabled systems will revolutionize all business processes and computing environments. I am a Cancer, I believe that soccer is the number one sport in the world and that substituting protein for carbohydrates is great for your diet.

In regards to the future of the Extreme XML column, you'll see a bit of a shift in the column, with more of a focus on developer tasks facing today's over-leveraged developer. Also, you'll see that the word extreme will mean different things; maybe extremely abstract, extremely difficult, extremely pragmatic, or in this case extremely long. For example, this article is longer and broader than most, it is really two articles in one. I had a lot of things to get off my chest with the goal of burdening you to go build better applications. Enjoy!

What's Ahead

In this issue of Extreme XML, we are going to examine the importance of schema usage and the use of serialization technology to leverage XML in your applications and services. The majority of development tasks today revolve around developers taking existing infrastructure (business components, databases, queues, and so on) and morphing them into the next version of their product.

Most "next version" applications must support some type of loosely coupled environment between new trading peers and new public services. These new applications must be able to exchange and process data in an efficient, cross-platform and language neutral manner. Sounds like an application for XML and Web Services. Now this article is not about the importance of XML in business-to-business transactions or that XML is the holy grail of e-business infrastructure. I'll pontificate on those subjects in future articles. Instead, I am going to focus on the real-world, developer problem of taking existing business logic and migrating it to a Web Services and XML enabled environment.

To begin with, I am going to review, in a concentrated manner, the importance of defining schema for the data being processed by your applications. Then I am going to walk through several serialization technologies that can be used to leverage existing business logic.

As you read the article it would prudent to have the source code and schema in front of you in order to better understand the flow as several key concepts are only implied. You can download the source code from the link at the top of this column.

The Schema Problem

The surge of XML usage over the past several years has not led to a complimentary increase in defined data models for XML documents. For this section, I am referring to a data model for XML to be the structure, content, and semantics for XML documents.

The one main reason for this slow growth in XML data models is the lack of, until now, a robust XML schema standard. Document Type Definitions (DTDs) have out grown their usefulness in the enterprise space because of their focus on XML from a document perspective and not viewing XML document instances from a data and type perspective. Typed data items like addresses, line items, employees, orders, and so on have complex models and are the basis for most applications. Applications look at data from strongly typed perspective. For example, a Line Item is an inherited member of an order and contains typed information like product price, which is of type currency. The majority of this type of modeling cannot be accomplished with DTDs. Due to the simple structuring and typing mechanisms in DTDs, numerous XML validation, structuring, and typing systems have been created, including Document Content Description (DCD), SOX, Schematron, RELAX and XML-Data Reduced (XDR). The later, XDR, has gained much momentum in the Windows® and B2B based communities due to its usage in products like SQL Server™, BizTalk™ Server, and MSXML. In addition, most independent software vendors (ISVs) and B2B integrators support XDR because of its data typing support, namespace support, and its XML-based language. However, XDR's usefulness stills falls short of providing a truly extensible modeling and typing system for complex data structures. This was a known issue at the time of XDR's creation.

Building on the lessons learned from previous schema implementations, the W3C XML Schema working group set out to create a specification (XML Schema) for defining the structure, content, and semantics of XML documents. Ultimately, this specification should provide an extensible environment so that it could be applied to any type of business or processing logic. During the development of this article, I was pleased to see that the W3C released XML Schema as a recommendation. This is a tremendous step in solidifying and stabilizing XML-based implementations that need to employ schema services. Next, we're going to look at the importance and power behind XML Schema.

Unleashing XML Schema

For this section, I am going to use the term XML Schema to represent the three specific parts of the W3C recommendation: Part 0:Primer, Part 1: Structures, and Part 2: DataTypes. Secondly, instead of reiterating the specification, I have distilled five core items you need to know about XML Schema so you can get up and running.

XML Schema is represented in XML 1.0 syntax—This makes parsing XML Schema available to any XML 1.0-compliant parser, and thus can be used within a higher-level API like the DOM.
Data typing of simple content—XML Schema provides a specification for primitive data types (string, float, double, and so on) found in most common programming languages. Expanding upon these primitive types, XML Schema provides derived types like int, short, and unsignedShort. In order to extend these primitive and derived types, XML Schema provides the ability to create user-defined types. In addition, you can further restrain the types with constraints like length, range, and format (patterns). This typing facility provides a mechanism to provide validity constraints for any type of XML document.
Typing of complex content—XML Schema provides the ability to define content models as types. These types can be explicit or abstract and can be restricted or extended in a type instance. For example, you can create a manager type that is based on an employee content model.
Distinction between the type definition and instance of that type—Unlike XDR, XML Schema type definitions are independent of instance declarations. This makes it possible to reuse type definitions in different contexts to describe distinct nodes within the instance document. For example, manager and supervisor elements within the same instance document can both be instances of the same type.
W3C support and industry implementation—On May 2, 2001, the XML Schema specification reached recommendation status. This means that the specification is stable and can be used as the basis for production level implementations. Like many other XML based recommendations (for example: DOM, XSLT and XPath) produced by W3C working groups, these recommendations become the sole industry standard. Born from these standards are implementations, which are the "center of the universe" for any XML enabled application. The most common implementation is the parser, but most parsers do more than just parse XML. These parsers or engines provide a spectrum of services including parsing, validation, DOM creation, firing SAX events, and XLST and XPath functionality. All of these must be done in a compliant manner. Again, I stress compliance because without compliance the recommendation means nothing. These types of standards-based implementations are crucial for interoperability between services and systems.

An Aside About Implementation

One of the nice things about working on the XML team at Microsoft is that we are constantly striving to deliver the most compliant technology in the most efficient manner. We do this because we are committed to the development community. We also understand the importance of compliant implementations as they relate to building Web Services that can be easily coupled. In terms of XML Schema implementations, you'll find support in the following Microsoft products: MSXML 4.0, .NET Framework, Visual Studio .NET, and SQL Server 2000 XML Features. This is where we stand today, understanding that other Microsoft products will migrate over to the XML Schema specification throughout this year. This point about implementation goes way beyond Microsoft products, however. For example, other language platforms like Java, databases like DB2, or XML editing tools like XMLSpy, are or will be (more than likely) implementing support for XML Schema. The benefit to you, the developer, is two-fold. First, you are dealing with a standard across languages, operating systems, and development frameworks. The second benefit is that you are also getting an implementation that was competitively designed. Meaning that you should be seeking out vendors that provide not just compliant implementations, but that are also built to provide scalable, high-performance service.

Schema Applied

Enough about the benefits of Schema, lets look take a look at an actual implementation. This article focuses on the concept of purchase order processing. Here is a snapshot of the purchase order schema (You can find the entire schema [po.xsd] located in the download package for this article):

<xsd:element name="PurchaseOrder" type="PurchaseOrderType"/>   

<xsd:complexType name="PurchaseOrderType">
   <xsd:sequence>
<xsd:element name="Comment" type="xsd:string"/>
<xsd:element name="PurchaseOrderID" type="xsd:string"/>
<xsd:element name="PurchaseOrderDate" type="xsd:date"/>
<xsd:element name="BuyerInformation" type="BuyerInformationType"/>
<xsd:element name="BillingInformation" type="BillingInformationType"/>
<xsd:element name="ShippingInformation" type="ShippingInformationType"/>
<xsd:element name="OrderLineItems" type="OrderLineItemsType"/>
<xsd:element name="ShipTerms" type="xsd:string"/>
<xsd:element name="ShippingCost" type="xsd:float"/>
<xsd:element name="SubTotal" type="xsd:float"/>
<xsd:element name="TaxesAndFees" type="xsd:float"/>
<xsd:element name="Total" type="xsd:float"/>
<xsd:element name="PaymentInformation" type="PaymentInformationType"/>
   </xsd:sequence>
<xsd:attribute name="CorrelationID" type="xsd:string"/>
<xsd:attribute name="OriginatorID" type="xsd:string"/>
</xsd:complexType>

You'll notice that the top-level element of the schema is named PurchaseOrder, which is of type PurchaseOrderType. PurchaseOrderType contains a sequence of elements, all of which are typed with built-in XML Schema types or references to declared user defined types. The sequence element is important here because it enforces the order of the elements as they appear below PurchaseOrder. Sequencing is not mandatory, but it can provide a more restrictive environment for consumers. For example, parsers and processors can look for data at certain ordinal locations in an XML document by name if sequencing is guaranteed. Typing is also very important because it provides type safety and reflection about a given element. For example, the above element named ShippingInformation is based on the complexType named ShippingInformationType.

<xsd:complexType name="ShippingInformationType">
   <xsd:sequence>
      <xsd:element name="Name" type="NameType"/>
      <xsd:element name="StreetAddress" type="ShippingStreetAddressType"/>
      <xsd:element name="BriefContact" type="BriefContactType" minOccurs="0"/>
      </xsd:sequence>
</xsd:complexType>

The ShippingInformationType is built upon a sequence of elements—Name, StreetAddress, and BriefContact. Notice that minOccurs attribute for the BriefContact element is set to zero. This provides an alternative for my application to omit the BriefContact element. The significant element above in the ShippingInformationType is the StreetAddress element. The StreetAddress element is based on the type ShippingStreetAddressType below. This is the implementation for the complexType named ShippingStreetAddressType. For this exercise, I am combining the benefits discussed earlier in the Unleashing XML Schema section.

<xsd:complexType name="ShippingStreetAddressType">
   <xsd:complexContent>
      <xsd:extension base="AbstractStreetAddressType">
         <xsd:sequence>
            <xsd:element name="HouseColor" type="xsd:string" 
                        minOccurs="0" maxOccurs="unbounded"/>
         </xsd:sequence>
         </xsd:extension >
      </xsd:complexContent>
</xsd:complexType>

This is where things get interesting. Notice that the complexType ShippingStreetAddressType is based upon an extension with a base of AbstractStreetAddressType. The extension mechanism provides the ability to inherit from the AbstractStreetAddressType. This is extremely powerful because any XML Schema-enabled consumer will understand that the complexType ShippingStreetAddressType is really based on the AbstractStreetAddressType type or class. This is the implementation for the AbstractStreetAddressType.

   <xsd:complexType name="AbstractStreetAddressType" abstract="true">
      <xsd:sequence>
         
         <xsd:element name="AddressCode">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="100"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
            
         <xsd:element name="AddressLine" minOccurs="0" maxOccurs="2">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="100"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="City">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:maxLength value="75"/>
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="State_Province" type="State_ProvinceEnum"/>
            
         <xsd:element name="PostalCode">
            <xsd:simpleType>
               <xsd:restriction  base="xsd:string">
                  <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/>         
               </xsd:restriction>
            </xsd:simpleType>
         </xsd:element>
         
         <xsd:element name="Country" type="xsd:string"/>
         <xsd:element name="Room" type="xsd:string"/>
         <xsd:element name="Building" type="xsd:string" minOccurs="0" maxOccurs="unbounded"/>
      </xsd:sequence>
   </xsd:complexType>

Notice, on the complexType named AbstractStreetAddressType that the abstract attribute is set to true. This provides a mechanism to restrict the use of this type only through a derived complexType. This is similar to a base, abstract class in C++ or C#. This is exactly what the ShippingStreetAddressType does, in fact, I extended the ShippingStreetAddressType by adding a HouseColor element that was not provided by the AbstractStreetAddressType. This model provides a completely typed environment. Anytime that I am dealing with street address information I can use the base class to build my content model in a consistent, typed manner.

Take some time and review the rest of the schema. There are some other interesting points about its model.

XML Enabling Business Components: Step 1—Validation

Note This article is based upon Visual Studio Beta 1 features.

Before we get started I felt that it would be prudent to show you the logical view of how we are going to process XML in this application. The diagram below highlights the steps as they relate to the sections of this document.

Figure 1. How XML is processed in this application

Now that we have spent some time creating an adequate schema and defining the data flow, the next challenge is to retrofit some existing business logic that does purchase order processing. To begin, lets examine the core purchase order library (po.dll). This dll has several classes. Now to be honest, I built this object model based on the schema, but as you'll later see, the same rules apply when you want to XML-enable your components. XML enabling means providing a mechanism for existing business components to participate in XML-based environments like Web Services. The primary public class is called PurchaseOrder and has the following private members:

public class PurchaseOrder
{
private string          _PurchaseOrderId;
private string         _CorrelationID;
private string         _OriginatorID;
private string         _PurchaseOrderDate;
private string         _Comment;
private BuyerInformation   _BuyerInfo;
private ShipToAddress      _ShipToAddress;
private BillToAddress      _BillToAddress;
private string         _ShipTerms;
private LineItems         _LineItems;
private float         _ShippingCost= 0.0f;
private float         _TaxesAndFees;
private float         _Total = 0.0f;
private PaymentInformation   _PaymentInformation;
}

The first task is to hydrate an object instance of the PurchaseOrder class. Basically, I want to deserialize an XML document instance that represents a purchase order by mapping the XML data to the members of the object. In order to deserialize the purchase order XML document, I need to make sure that the document is in fact an XML document and that it is valid against the Purchase Order schema. The easiest way to accomplish this task is with a validating parser or validating XML reader. With the current MSXML 4.0 Web release, XML Schema validation is easily accomplished using the DOMDocument and SchemaCache classes. Here is the code that validates the document using MSXML 4.0 and the purchase order XML schema.

Public Function Validate(POXml As String) As String

Dim oPOSchema       As MSXML2.DOMDocument40
Dim oSchemaCache    As MSXML2.XMLSchemaCache40
Dim oPODoc          As MSXML2.DOMDocument40
    
Set oPOSchema = New MSXML2.DOMDocument40
Set oSchemaCache = New MSXML2.XMLSchemaCache40

oPOSchema.async = False
oPOSchema.Load App.Path + "\po.xsd"

oSchemaCache.Add "", oPOSchema

Set oPODoc = New MSXML2.DOMDocument40

With oPODoc
    .async = False
    Set .schemas = oSchemaCache
    .loadXML POXml
End With

If oPODoc.parseError <> 0 Then
    Err.Raise vbObjectError + 1, "PO Validation", oPODoc.parseError.reason
Else
    Validate = oPODoc.xml
End If
    
End Function

Another method to perform validation is to use the XmlValidatingReader class provided in the System.XML namespace of the .NET Framework (available in Beta 2). The XmlValidatingReader provides fast, non-cached, forward-only stream access to XML data. In addition, it also validates against DTD, XDR, and XML Schema. This is implemented in a pull model fashion. One of the nice features with the .NET Framework implementation is that the XmlValidatingReader can raise multiple validation errors to a call back function in order capture applicable validation errors. However, please note that MSXML will stop parsing when the first validation error occurs. Here is some XmlValidatingReader code that will work in the Visual Studio .NET Beta 2 timeframe.

public bool Validate(string po)
{
StringReader POString = new StringReader(po);
   XmlReader = new XmlTextReader(POString);
   XmlValidatingReader reader = new XmlValidatingReader(XmlReader);
   XmlSchemaCollection SchemaColl = new XmlSchemaCollection();

ValidationEventHandler eventHandler = new ValidationEventHandler(PurchaseOrder.ValidationCallback);

   try 
   {
   //add the schema to the cache
   string path = AppDomain.CurrentDomain.BaseDirectory;
   SchemaColl.Add(null, new XmlTextReader(path + "po.xsd"));
            
   // error schema
   //SchemaColl.Add(null, new XmlTextReader(path + "po_error.xsd"));
         
   //add cache to the reader
   reader.Schemas.Add(SchemaColl);

   //set the validation type
reader.ValidationType = ValidationType.Auto;
            
   //set event handler
   reader.ValidationEventHandler += eventHandler;

   while (reader.Read()) 
   {                                                                        
   }

   …
}

static void ValidationCallback(object sender, ValidationEventArgs args )
{   
   _ValidationError = args.Message;

FileStream fs = new FileStream("po_err.txt", FileMode.OpenOrCreate, FileAccess.Write);
   
StreamWriter w = new StreamWriter(fs);        
   w.BaseStream.Seek(0, SeekOrigin.End);     

w.WriteLine(_ValidationError + " : " + XmlReader.LineNumber + "," + XmlReader.LinePosition);
   
w.Close(); 
}

XML Enabling Business Components: Step 2—Deserialization

Now that the validation code is implemented, we can start de-serializing the XML and hydrating the Purchase Order object. For this task, I chose to use the XmlTextReader, which is currently available in Beta 1 of the .NET Framework. There are several ways of de-serializing the XML. I could have used the DOMDocument instance in the above MSXML validation code and walked the tree with DOM calls or XPath. I could have also written a SAX2 implementation to process the XML in a push model fashion.

My decision to use the XmlTextReader was based on one critical point. I wanted to potentially validate and deserialize my data in one read, and I wanted stream-level control over the read. The only problem is that the XmlTextReader does not do validation. What I need is the XmlValidatingReader, but since it isn't publicly available at this time, I took a different track. (Note: the XmlValidatingReader will be shipping in VS Beta 2 along with full XML Schema support.) I decided to use the XML Schema power in MSXML 4.0 for validation, in conjunction with the efficient, non-cached services of the XmlTextReader to deserialize the XML data. Again I could have used the DOM or SAX, but I find the XmlReader to be more developer friendly, plus I get the power of the rest of the .NET Framework.

A side note here about choosing specific technologies. This is a common lunchroom topic. Which is better SAX or DOM, XSLT or ASP, push model or pull model? The list goes on and on. Bottom line, they are all excellent technologies for specific implementations. Comparing pros and cons in order to make a choice on implementation between them isn't necessarily the right path. A more strategic path is to look at your current architecture, along with your programming practices, and then base your decision on core issues. For example, if your development time is restricted, in the short term Visual Studio .NET may not be a practical decision. If you have a strong base of C++ programmers, SAX may be a more palatable API for parsing, where as Visual Basic® programmers may prefer the pull model of the .NET Framework readers and writers. If you want a strongly typed, reflection-based object model for core application and Web Service development, the .NET Framework is the way to go. We are committed to providing XML core feature fidelity between our COM based implementations like MSXML and the XML classes in the .NET Framework. The beauty of this model is that since they are both built on core XML standards, the interoperability is seamless. Granted, there are API differences when you move to .NET. However, keep in mind that you are moving to new a programming model, but since the core XML standards drive the implementation your parsing, XSD, XDR, DTD, XSLT, and XPath implementations should not need to change. For example, this sample application runs fine under Visual Basic 6.0 with MSXML or in Visual Studio .NET.

Using the XmlTextReader is quite simple. Most of your code will be constructed with while, if and switch statements. The XmlTextReader reads XML from the source; in this case it is reading a string of XML, by reading the XML in a node-based fashion from the source buffer. As it reads the XML, you have the full control over what data you want to interrogate or pull from the reader.

In the case below, I read and checked for well-known element names using the LocalName property. These elements are guaranteed to be there because I have previously validated the document. Once I pull the necessary element from the reader, for example PaymentInformation, I pass the reader by reference to the applicable deserializer code. In this case, the PO.PaymentInforomation.Deserialize method is based upon the PaymentInfromation class. The code in the PaymentInformation class applies similar logic, except it has the job of pulling only the specific data needed to hydrate the data members of the object. Using the reader.ReadString() method, I read the text value of the current element. I like the ReadString() method because it provides the functionality of reading past the current element start tag and returns the text contained within the element. In a SAX-based model, I would read the start element, then read the character data, and then read the end element. Either mechanism provides the same results.

Notice that this code was place in the constructor of the object. I could have implemented this a different way, but I chose this one for convenience for consumers so that they can create an instance of this object and hydrate it in one call.

public class PurchaseOrder
{
 public PurchaseOrder(string POXml)
  {
   this.Validate(POXml);

   StringReader strReader = new StringReader(POXml);
   XmlTextReader reader = new XmlTextReader(strReader);

   while (reader.Read())
   {      
    if(reader.NodeType == XmlNodeType.Element)
     {
      switch(reader.LocalName)
      {
    case "PurhaseOrder":
     this.CorrelationID = reader.GetAttribute("CorrelationID").ToString();
     this.OriginatorID = reader.GetAttribute("OriginatorID").ToString();
     break;
    case "PaymentInformation":
     this.PaymentInformation.Deserialize(ref reader);
     break;
    case "ShippingInformation":
     this.ShipToAddress.Deserialize (ref reader);
     break;
     …
     /* remaining logic removed  */
     …
   }
     }
    }
  }      
}

public class PaymentInformation
{
 override public void Deserialize(ref XmlTextReader reader)
 {
  while (reader.Read())
  {
   if(reader.NodeType == XmlNodeType.Element)
   {
    switch(reader.LocalName)
    {
     case("CardExpires"):
      _CardExpires = reader.ReadString();
      break;
     case("CardIssueCode"):
   _CardIssueCode = reader.ReadString();
   break;
     /* logic removed */            
    }
   }
            
   if(reader.LocalName == "PaymentInformation" && reader.NodeType ==   XmlNodeType.EndTag)
  {
   break;
  }
 }
}

I call the appropriate Deserialize method for each member class in the PurchaseOrder class. Each class is responsible for pulling the appropriate amount of data from the buffer. For example, notice that the Deserialize method in PaymentInformation class will read until it finds a local name of PaymentInformation with a XmlNodeType of EndTag. This is a nice and safe way to process the data because if there is additional data inside of the PaymentInformation element, the code just continues to run. For example, maybe my schema also allowed a Frequent Flyer element to be a child of the PaymentInformation element, my code would safely ignore it.

Before I move on to the last step in the process, I want to highlight some additional considerations. First, you'll notice that when I am mapping my data from the reader, I am using the private members of the object instead of calling accessors. This is an optimization if you normally are doing type and length checking in the accessor. Since this was already done for more in the validation process, I no longer have to do this work. There are certain instances where this is not possible. For example, you may have a member that is derived from other members and internal data, therefore you would need to call the accessor to do the work. Next, you'll notice that the deserialization code is fairly black box. I pass a reader with no return. Now I did not add any business logic for failure, but this would be simple to do since each deserializer is written the same way. Finally, since my deserializer code relies heavily on the content model of the instance document, I can control the entire process through my schema. If I want to add a new class of data or an element to an existing class, I edit my schema, add a new member to my class, and drop a new case in for the reader—simple and safe.

XML Enabling Business Components: Step 3—Serialization with XmlTextWriter

Okay, so we have successfully deserialized the XML and hydrated the purchase order. Now we need to serialize the purchase order back out to XML for continued processing by another service. This is a key point. You can write serialization code to target any necessary service. If a down level service needs the purchase order in a binary format, simply add another serializer. If you find yourself writing numerous serializations and mapping functions, you may want to look into the messaging features of BizTalk Server 2000.

There are several ways to serialize data into XML. The most barbaric way would be to do string concatenation. Although this is simple to do, the code is very fragile and there is no built-in mechanism for constructing a well-formed document instance. For example, the string type and your application logic does not know that the following is malformed XML or valid until it is parsed.

Note The majority of the logic has been omitted for the sake of simplicity.

public string OldSchoolSerialization()
{
   string temp = "";

   temp = "<PurchaseOrder>";
   /* this is malformed xml */
   temp += "<LineItems>";
   temp += "</PurchaseOrder>";

   return temp;
}

The point here is that you need an XML serialization mechanism that creates well-formed documents and provides other services like output escaping and namespace management. Traditionally, developers have turned to the Document Object Model (DOM) to build their XML documents. Although this approach works, it is not very efficient. When using the DOM, you are building internal structures to represent the nodes within the DOM, and you are allocating memory for data that you do not need. The recommendation is to not use the DOM as a means of serializing your XML unless it is already in a DOM instance. I'll address this concept later in the article.

A more efficient and safe means of serializing your data into XML is to use some type of XML writer. MSXML 3.0 has a SAX-based writer. In addition, the .NET framework XML namespace has a XmlTextWriter class that provides a conformant mechanism for creating document instances.

Let's take a look at how you would use the XmlTextWriter to serialize an instance of a Purchase Order object. First, I created a new method called Serialize (). Notice that the function returns a string, instead of typed object like a DOM. I am returning a string so that any XML-enabled client could read the message. Granted, if this data was moving in a tightly coupled manner I would probably use a more strongly typed route for efficiency purposes.

The XmlTextWriter is simple to use. For this implementation, I chose to write my output to a StringWriter. I could have chosen to write to a stream, but my goal was to get the XML into a string in one pass. The XmlTextWriter is similar to a SAX writer in the sense that you write your document using node-based concepts, which translate into the info-set of the document instance. When using the XmlTextWriter class, you make method calls like WriteStartDocument and WriteStartElement to create the document. In a SAX2 implementation, you would throw events on a writer interface. As with my previous XmlTextReader and SAX2 ContentHandler comparison, the reason to chose a specific technology should be based on strategic reasons, not preference of a certain implementation.

In the Serialize() method, I walk through the purchase order instance, calling the appropriate writer methods to create the document instance in appropriate schema. For example, the PurchaseOrder element contains two attributes, CorrelationID and OriginatorID. Using the WriteAttribute method to create attributes for an element, the XmlTextWriter gives you control over the attribute's prefix and namespace association. There are some other interesting methods of the XmlTextWriter that help you serialize XML in a flexible, yet safe manner. For example you could use the WriteString method to create entitized output. This is a helpful feature when your source data could possibly contain special characters like an ampersand that needs to entitized. On the opposite end of the spectrum, there is the WriteRaw method that simply writes what ever is in the string buffer to the output with no entitization. I did not use either feature and chose to use the traditional element and attribute mechanisms. Notice below that I pass the writer by reference to the BuyerInformation.Serialize function.

Note In cases where you are getting data from non-XML based sources us should always use entitization services.)

public class PurchaseOrder
{
public string Serialize()
   {
      StringWriter w = new StringWriter();
      XmlTextWriter writer = new XmlTextWriter(w);

      writer.Formatting =Formatting.Indented;
      writer.Indentation= 5;

      writer.WriteStartDocument();
      writer.WriteStartElement("", "PurchaseOrder", "");
      writer.WriteAttribute("","CorrelationID","", this.CorrelationID);
      writer.WriteAttribute("","OriginatorID","", this.OriginatorID);
      writer.WriteElementString("Comment","", this.Comment);
   /* code ommitted*/
      this.BuyerInformation.Serialize(ref writer);
         
      writer.WriteEndElement();
      writer.WriteEndDocument();

      writer.Flush();
         
      this.Validate(w.ToString());

      return (w.ToString());
   }      
}

Once the function logic satisfies the content for the purchase order schema, the function returns the XML as a string. To reiterate, I could have passed this back as strongly typed object, but I wanted to the flexibility of dealing with the string. You should also notice that the code is non-validating—meaning that there is no validation service in the XmlTextWriter.

XML Enabling Business Components: Step 3.5—Serialization with System.Xml.Serialization

As an additional method of serializing your objects, you may also want to look into the XMLSerializer class and related classes available in the System.Xml.Serialization namespace. Simply put, this namespace contains classes that are used to serialize objects into XML documents or streams. This is accomplished using attributed-based programming methods. For example, the following code shows that the property Name is qualified by the attribute XmlElementAttribute. When the object that contains this property is passed to the XmlSerializer class, the property will be written out as an XML element with a text value equal to the value of the name property.

[XmlElementAttribute(IsNullable=false)]
public string Name
{
   get
   {
      return _Name;
   }
   set
   {
      _Name = value;
   }
}

One of the great areas to use the XmlSerializer class is to perform local persistence of objects. This was one of its main design goals. For example, you could use this functionality to persist object state for long running transactions. You simply serialize out the state and persist the file to disk. When you need to rehydrate the object, deserialize the XML, and your object state is regained.

To reiterate, you would include the appropriate attributes on the applicable classes and their respective members. Next, you create an instance of the XmlSerializer class and pass an instance of the object to the class. Here is some sample code.

Public void Serialize(PurchaseOrder po, string filename)
{
   XmlSerializer serializer = new XmlSerializer(typeof(PurchaseOrder));
   TextWriter writer = new StreamWriter(filename);
   serializer.Serialize(writer, po);
   writer.Close();
}

To deserialize, you would reverse the logic and call the deserialize method off the serializer. For example:

Public void DeSerialize(string filename)
{
   PurchaseOrder po;
   FileStream fs = new FileStream(filename, FileMode.Open);
   XmlSerializer serializer = new XmlSerializer(typeof(PurchaseOrder));
   po = (PurchaseOrder) serializer.Deserialize(fs);
}

So you may be wondering why I didn't use the XmlSerializer class, after all, the code is simpler. I chose to create my own serialization mechanism so that I could have complete control over the processing, separate from the actual class model implementation. In addition, I have greater procedural granularity over how I process the document. For example, although the XmlSerializer has facilities to handle unmapped nodes and to alter the output beyond the annotations of the classes, you end up programming to the XmlSerializer API instead of your business logic code. In my custom serialization code, I explicitly state how I want to handle these anomalies. Finally, separating this process from the actual class code makes the environment more pluggable, meaning that I could write serializers and deserializers in a safe, compact manner without disturbing my class structure.

I wouldn't look at this as a pro or con. Instead, I would view this as a preference issue that will be driven by your architecture. If you are building services that will need to process data from various sources and target services beyond business objects, the Reader/Writer model is a smart implementation. If you are building components that need to perform efficient, self-contained XML based serialization, the serialization namespace is the way to go. In future articles, we'll dive into the internal workings of the Serialization namespace and how to leverage its classes.

Conclusion

First and foremost, creating specific and lucid schema should be your first task when creating XML- and Web Service-enabled applications. If your partners need other schema definitions than XML Schema, for example DTD, start with an XML Schema approach and then port the implementation. You'll come out ahead in the long run. Second, assess the overall taxonomy of your business objects and services. Review their hierarchy and coupling; you'll undoubtedly find areas where items could be aggregated. For example instead of having helper functions that perform specific and private tasks, aggregate them into a more meaningful service that can universally be accessed. Third, start building serialization code for your business objects. (Use whatever technologies you like. For example if you are going to live in a non-.NET Framework environment, I would suggest that you look at building SAX-based implementations.) During this process, start coupling your services together. For example, in the purchase order sample above, I could just as easily couple the credit card validation service to a new service, bypassing the purchase order service due to its implementation. In future columns, I'll address the process of exposing these services in a Web Services environment using SOAP and related technologies. One final word of advice—eat more protein and learn to shoot left-footed penalty kicks.