Using XML Data Types
SQL Server 2005 introduced an xml data type that enables you to store XML documents and fragments in a SQL Server database. The xml data type is a built-in data type in SQL Server, and is in some ways similar to other built-in types, such as int and varchar. As with other built-in types, you can use the xml data type as a column type when creating a table; as a variable type, a parameter type, or a function-return type; or in CAST and CONVERT functions.
XML can be self-describing in that it can optionally include an XML header that specifies the encoding of the document, for example:
<?xml version="1.0" encoding="windows-1252"?><doc/>
The XML standard describes how an XML processor can detect the encoding used for a document by examining the first few bytes of the document. There are opportunities for the encoding specified by the application to conflict with the encoding specified by the document. For documents passed as bound parameters, XML is treated as binary data by SQL Server, so no conversions are made and the XML parser can use the encoding specified within the document without problems. However, for XML data that is bound as WSTR, then the application must ensure that the document is encoded as Unicode. This may entail loading the document into a DOM, changing the encoding to Unicode and serializing the document. If this is not done, data conversions may occur, resulting in invalid or corrupt XML.
There is also potential for conflict when XML is specified in literals. For example the following are invalid:
INSERT INTO xmltable(xmlcol) VALUES('<?xml version="1.0" encoding="UTF-16"?><doc/>')
INSERT INTO xmltable(xmlcol) VALUES(N'<?xml version="1.0" encoding="UTF-8"?><doc/>')
DBTYPE_XML is a new data type specific to XML in the SQL Server Native Client OLE DB provider. In addition, XML data can be accessed through the existing OLE DB types of DBTYPE_BYTES, DBTYPE_WSTR, DBTYPE_BSTR, DBTYPE_XML, DBTYPE_STR, DBTYPE_VARIANT, and DBTYPE_IUNKNOWN. Data stored in columns of type XML can be retrieved from a column in a SQL Server Native Client OLE DB provider rowset in the following formats:
A text string
The SQL Server Native Client OLE DB provider does not include a SAX reader, but the ISequentialStream can be easily passed to SAX and DOM objects in MSXML.
ISequentialStream should be use used for retrieval of large XML documents. The same techniques used for other large value types also apply to XML. For more information, see Using Large Value Types.
Data stored in columns of type XML in a rowset can also be retrieved, inserted, or updated by an application via the usual interfaces such as IRow::GetColumns, IRowChange::SetColumns, and ICommand::Execute. Similarly to the retrieval case, an application program can pass either a text string or an ISequentialStream to the SQL Server Native Client OLE DB provider.
To send XML data in string format through the ISequentialStream interface, you must obtain ISequentialStream by specifying DBTYPE_IUNKNOWN and set its pObject argument to null in the binding.
When retrieved XML data is truncated due to the consumer buffer being too small, the length may be returned as 0xffffffff, which means that the length is unknown. This is consistent with its implementation as a data type that is streamed to the client without sending length information ahead of the actual data. In some cases the actual length may be returned when the provider has buffered the whole value, such as IRowset::GetData and where data conversion is performed.
XML data sent to SQL Server is treated as binary data by the server. This prevents any conversions occurring and allows the XML parser to auto-detect the XML encoding. This allows a wider range of XML documents (for example those encoded in UTF-8) to be accepted as input to SQL Server.
If input XML is bound as DBTYPE_WSTR, the application must ensure it is already Unicode encoded to avoid any possibility of corruption by unwanted data conversions.
The following table describes the binding and coercion that occurs when using the listed data types with the SQL Server xml data type.
OK 11, 6
OK4, 6, 12
OK6, 9, 10
OK5, 6, 12
Byte stream via ISequentialStream7
Byte stream via ISequentialStream11
DBTYPE_VARIANT (VT_UI1 | VT_ARRAY)
1If a server type other than DBTYPE_XML is specified with ICommandWithParameters::SetParameterInfo and the accessor type is DBTYPE_XML, an error occurs when the statement is executed (DB_E_ERRORSOCCURRED, the parameter status is DBSTATUS_E_BADACCESSOR); otherwise the data is sent to the server, but the server returns an error indicating that there is no implicit conversion from XML to the parameter’s data type.
2Beyond the scope of this topic.
3Format is UTF-16, no bye-order mark (BOM), no encoding specification, no null termination.
4Format is UTF-16, no BOM, no encoding specification, null termination.
5Format is multibyte characters encoded in client code page with null terminator. Conversion from server supplied Unicode may cause data corruption, so this binding is strongly discouraged.
6BY_REF may be used.
7UTF-16 data must start with a BOM. If it does not, the encoding may not be correctly recognized by the server.
8Validation can happen at create accessor time, or at fetch time. The error is DB_E_ERRORSOCCURRED, binding status set to DBBINDSTATUS_UNSUPPORTEDCONVERSION.
9Data is converted to Unicode using the client codepage before being sent to the server. If the document encoding does not match the client codepage, this can result in data corruption, so this binding is strongly discouraged.
10A BOM is always added to data sent to the server. If the data already started with a BOM, this results in two BOMs at the start of the buffer. The server uses the first BOM to recognize the encoding as UTF-16 and then discards it. The second BOM is interpreted as a zero-width nonbreaking space character.
11Format is UTF-16, no encoding specification, a BOM is added to data received from the server. If an empty string is returned by the server, a BOM is still returned to the application. If the buffer length is an odd number of bytes, the data is truncated correctly. If the whole value is returned in chunks, they can be concatenated to reconstitute the correct value.
12If the buffer length is less than two characters--that is, not enough space for null termination--an overflow error is reported.
No data is returned for NULL XML values.
The XML standard requires UTF-16 encoded XML to start with a byte-order mark (BOM), UTF-16 character code 0xFEFF. When working with WSTR and BSTR bindings, SQL Server Native Client does not require or add a BOM as the encoding is implied by the binding. When working with BYTES, XML, or IUNKNOWN bindings, the intent is to provide simplicity in dealing with other XML processors and storage systems. In this case a BOM should be present with UTF-16 encoded XML, and the application need not be concerned with the actual encoding, since the majority of XML processors (including SQL Server) deduces the encoding by inspecting the first few bytes of the value. XML data received from SQL Server Native Client using BYTES, XML, or IUNKNOWN bindings is always encoded in UTF-16 with a BOM and without an embedded encoding declaration.
Data conversions supplied by OLE DB core services (IDataConvert) are not applicable to DBTYPE_XML.
Validation is done when data is sent to the server. Client-side validation and encoding changes should be handled by your application, and it is strongly recommended that you not process the XML data directly, but should instead use a DOM or SAX reader to process it.
DBTYPE_NULL and DBTYPE_EMPTY can be bound for input parameters but not for output parameters or results. When bound for input parameters the status must be set to DBSTATUS_S_ISNULL or DBSTATUS_S_DEFAULT.
DBTYPE_XML can be converted to DBTYPE_EMPTY and DBTYPE_NULL, DBTYPE_EMPTY can be converted to DBTYPE_XML, but DBTYPE_NULL cannot be converted to DBTYPE_XML. This is consistent with DBTYPE_WSTR.
DBTYPE_IUNKNOWN is a supported binding (as shown in the above table), but there are no conversions between DBTYPE_XML and DBTYPE_IUNKNOWN. DBTYPE_IUNKNOWN may not be used with DBTYPE_BYREF.
SQL Server Native Client adds new values or changes to many of the core OLE DB schema rowsets.
Additions to the COLUMNS and PROCEDURE_PARAMETERS schema rowsets include the following columns.
The name of a catalog in which an XML schema collection is defined. NULL for a non-XML column or un-typed XML column.
The name of a schema in which an XML schema collection is defined. NULL for a non-XML column or un-typed XML column.
The name of XML schema collection. NULL for a non-XML column or un-typed XML column.
In the PROVIDER_TYPES schema rowset, the COLUMN_SIZE value is 0 for the xml data type, and the DATA_TYPE is DBTYPE_XML.
A new schema rowset SS_XMLSCHEMA is introduced for clients to retrieve XML schema information. The SS_XMLSCHEMA rowset contains the following columns.
The catalog an XML collection belongs to.
The schema an XML collection belongs to.
The name of an XML schema collection for typed XML columns, NULL otherwise.
The target name space of an XML schema.
The XML schema content.
Each XML schema is scoped by catalog name, schema name, schema collection name, and target name space Uniform Resource Identifier (URI). In addition, a new GUID with the name DBSCHEMA_XML_COLLECTIONS is also defined. The number of restrictions and restricted columns for the SS_XMLSCHEMA schema rowset are defined as follows.
Number of restrictions
SQL Server Native Client adds new values or changes to many of the core OLE DB property sets.
In order to support the xml data type through OLE DB, SQL Server Native Client implements the new DBPROPSET_SQLSERVERPARAMETER property set, which contains the following values.
The name of a catalog (database) in which an XML schema collection is defined. A part of the SQL three-part name identifier.
The name of an XML schema within the schema collection. A part of the SQL three -part name identifier.
The name of the XML schema collection within the catalog A part of the SQL three -part name identifier.
To support the creation of tables in the ITableDefinition interface, SQL Server Native Client adds three new columns to the DBPROPSET_SQLSERVERCOLUMN property set.
For typed XML columns, this property is a string specifying the name of the catalog where the XML schema is stored. For other column types this property returns an empty string.
For typed XML columns, this property is a string specifying the name of XML schema that defines this column.
For typed XML columns, this property is a string specifying the name of the schema XML schema collection defining the value.
Like the SSPROP_PARAM values, all of these properties are optional and default to empty. SSPROP_COL_XML_SCHEMACOLLECTION_CATALOGNAME and SSPROP_COL_XML_SCHEMACOLLECTION_SCHEMANAME may only be specified if SSPROP_COL_XML_SCHEMACOLLECTIONNAME is specified. When passing XML to the server, if these values are included they are checked for existence (validity) against the current database and the instance data is checked against the schema. In all cases, to be valid they are either all empty or all filled in.
SQL Server Native Client adds new values or changes to many of the core OLE DB interfaces.
In order to support the xml data type through OLE DB, SQL Server Native Client implements a number of changes including the addition of the ISSCommandWithParameters interface. This new interface inherits from the core OLE DB interface ICommandWithParameters. In addition to the three methods inherited from ICommandWithParameters; GetParameterInfo, MapParameterNames, and SetParameterInfo; ISSCommandWithParameters provides the GetParameterProperties and SetParameterProperties methods that are used to handle server specific data types.
The ISSCommandWithParameters interface also makes use of the new SSPARAMPROPS structure.
SQL Server Native Client adds the following SQL Server-specific columns to the rowset returned by the IColumnRowset::GetColumnsRowset method. These columns contain the three-part name of an XML schema collection. For non-XML columns or untyped XML columns, all three columns take the default value of NULL.
The catalog an XML schema collection belongs to,
The schema an XML schema collection belongs to. NULL otherwise.
The name of XML schema collection for typed XML column, NULL otherwise.
An XML instance in an XML column is retrieved through the IRowset::GetData method. Depending on the binding specified by the client, an XML instance can be retrieved as DBTYPE_BSTR, DBTYPE_WSTR, DBTYPE_VARIANT, DBTYPE_XML, DBTYPE_STR, DBTYPE_BYTES, or as an interface via DBTYPE_IUNKNOWN. If the consumer specifies DBTYPE_BSTR, DBTYPE_WSTR, or DBTYPE_VARIANT, the provider converts the XML instance to the user requested type and put it into the location specified in the corresponding binding.
If the consumer specifies DBTYPE_IUNKNOWN and sets the pObject argument to NULL, or sets the pObject argument to IID_ISequentialStream, the provider returns an ISequentialStream interface to the consumer so that the consumer can stream the XML data out of the column. ISequentialStream then returns the XML data as a Unicode character stream.
When returning an XML value bound to DBTYPE_IUNKNOWN, the provider reports a size value of sizeof (IUnknown *). Note that this is consistent with the approach taken when a column is bound as DBTYPE_IUnknown or DBTYPE_IDISPATCH, and by DBTYPE_IUNKNOWN/ISequentialStream when the exact column size cannot be determined.
There are two ways a consumer can update an XML instance in a column. The first one is through the storage object ISequentialStream created by the provider. The consumer can call the ISequentialStream::Write method to directly update the XML instance returned by the provider.
The second approach is through IRowsetChange::SetData or IRowsetChange::InsertRow methods. In this approach, an XML instance in the consumer’s buffer can be specified in a binding of type DBTYPE_BSTR, DBTYPE_WSTR, DBTYPE_VARIANT, DBTYPE_XML or DBTYPE_IUNKNOWN.
In case of DBTYPE_BSTR, DBTYPE_WSTR, or DBTYPE_VARIANT, the provider stores the XML instance residing in the consumer buffer into the proper column.
In the case of DBTYPE_IUNKNOWN/ISequentialStream, if the consumer does not specify any storage object, the consumer must create an ISequentialStream object in advance, bind the XML document with the object, and then pass the object to the provider through the IRowsetChange::SetData method. The consumer can also create a storage object, set the pObject argument to IID_ISequentialStream, create an ISequentialStream object and then pass the ISequentialStream object to the IRowsetChange::SetData method. In both cases, the provider can retrieve the XML object through the ISequentialStream object and insert it into a proper column.
IRowsetUpdate interface provides functionality for delayed updates. The data made available to the rowsets is not made available to other transactions until the consumer calls the IRowsetUpdate:Update method.
The IRowsetFind::FindNextRow method does not work with the xml data type. When IRowsetFind::FindNextRow is called and the hAccessor argument specifies a column of DBTYPE_XML, DB_E_BADBINDINFO is returned. This occurs regardless of the type of column that is being searched. For any other binding type, the FindNextRow fails with DB_E_BADCOMPAREOP if the column to be searched is of the xml data type.
In the SQL Server Native Client ODBC driver, a number of changes have been made to various functions to support the xml data type.
The SQLColAttribute function has three new field identifiers, including SQL_CA_SS_XML_SCHEMACOLLECTION_CATALOG_NAME, SQL_CA_SS_XML_SCHEMACOLLECTION_SCHEMA_NAME, and SQL_CA_SS _XML_SCHEMACOLLECTION_NAME.
The SQL Server Native Client ODBC driver reports SQL_SS_LENGTH_UNLIMITED for the SQL_DESC_DISPLAY_SIZE and SQL_DESC_LENGTH columns.
The SQLColumns function has three new columns including SS_XML_SCHEMACOLLECTION_CATALOG_NAME, SS_XML_SCHEMACOLLECTION_SCHEMA_NAME, and SS_XML_SCHEMACOLLECTION_NAME. The existing TYPE_NAME column is used to indicate the name of the XML type, and the DATA_TYPE for a XML type column or parameter is SQL_SS_XML.
The SQL Server Native Client ODBC driver reports SQL_SS_LENGTH_UNLIMITED for the COLUMN_SIZE and CHAR_OCTET_LENGTH values.
The SQL Server Native Client ODBC driver reports SQL_SS_LENGTH_UNLIMITED when the column size cannot be determined in the SQLDescribeCol function.
The SQL Server Native Client ODBC driver reports SQL_SS_LENGTH_UNLIMITED as the maximum COLUMN_SIZE for the xml data type in the SQLGetTypeInfo function.
The SQLProcedureColumns function has the same column additions as the SQLColumns function.
The SQL Server Native Client ODBC driver reports SQL_SS_LENGTH_UNLIMITED as the maximum COLUMN_SIZE for the xml data type.
When converting from SQL to C data types, SQL_C_WCHAR, SQL_C_BINARY, and SQL_C_CHAR can all be converted to SQL_SS_XML, with the following stipulations:
SQL_C_WCHAR: Format is UTF-16, no byte-order mark (BOM), with null termination.
SQL_C_BINARY: Format is UTF-16, with no null termination. A BOM is added to data received from the server. If an empty string is returned by the server a BOM is still returned to the application. If the buffer length is an odd number of bytes the data ise truncated correctly. If the whole value is returned in chunks they can be concatenated to re-constitute the correct value
SQL_C_CHAR: Format is multibyte characters encoded in client code page with null termination. Conversion from server supplied UTF-16 may cause data corruption, so this binding is strongly discouraged.
When converting from C to SQL data types, SQL_C_WCHAR, SQL_C_BINARY, and SQL_C_CHAR can all be converted to SQL_SS_XML, with the following stipulations:
SQL_C_WCHAR: A BOM is always be added to data sent to the server. If the data already started with a BOM, this results in two BOMs at the start of the buffer. The server uses the first BOM to recognize the encoding as UTF-16 and then discard it. The second BOM is interpreted as a zero-width nonbreaking space character.
SQL_C_BINARY: No conversion is performed, and the data is passed to the server "as is." UTF-16 data must start with a BOM; if it does not, the encoding may not be correctly recognized by the server.
SQL_C_CHAR: The data is converted to UTF-16 on the client and sent to the server just as SQL_C_WCHAR (including the addition of a BOM). If the XML is not encoded in the client code page this can cause data corruption.
The XML standard requires UTF-16 encoded XML to start with a byte-order mark (BOM), UTF-16 character code 0xFEFF. When working with a SQL_C_BINARY binding, SQL Server Native Client does not require or add a BOM, as the encoding is implied by the binding. The intent is to provide simplicity in dealing with other XML processors and storage systems. In this case a BOM should be present with UTF-16 encoded XML, and the application need not be concerned with the actual encoding, because the majority of XML processors (including SQL Server) deduce the encoding by inspecting the first few bytes of the value. XML data received from SQL Server Native Client using SQL_C_BINARY bindings are always encoded in UTF-16 with a BOM and without an embedded encoding declaration.