Primary XML Index

The primary XML index indexes all tags, values, and paths within the XML instances in an XML column. To create a primary XML index, the table in which the XML column occurs must have a clustered index on the primary key of the table. SQL Server uses this primary key to correlate rows in the primary XML index with rows in the table that contains the XML column.

The primary XML index is a shredded and persisted representation of the XML BLOBs in the xml data type column. For each XML binary large object (BLOB) in the column, the index creates several rows of data. The number of rows in the index is approximately equal to the number of nodes in the XML binary large object. When a query retrieves the full XML instance, SQL Server provides the instance from the XML column. Queries within XML instances use the primary XML index, and can return scalar values or XML subtrees by using the index itself.

Each row stores the following node information:

  • Tag name such as an element or attribute name.

  • Node value.

  • Node type such as an element node, attribute node, or text node.

  • Document order information, represented by an internal node identifier.

  • Path from each node to the root of the XML tree. This column is searched for path expressions in the query.

  • Primary key of the base table. The primary key of the base table is duplicated in the primary XML index for a back join with the base table, and the maximum number of columns in the primary key of the base table is limited to 15.

This node information is used to evaluate and construct XML results for a specified query. For optimization purposes, the tag name and the node type information are encoded as integer values, and the Path column uses the same encoding. Also, paths are stored in reverse order to allow matching paths when only the path suffix is known. For example:

  • //ContactRecord/PhoneNumber where only the last two steps are known

OR

  • /Book/*/Title where the wildcard character (*) is specified in the middle of the expression.

The query processor uses the primary XML index for queries that involve xml Data Type Methods and returns either scalar values or the XML subtrees from the primary index itself. (This index stores all the necessary information to reconstruct the XML instance.)

For example, the following query returns summary information stored in the CatalogDescriptionxml type column in the ProductModel table. The query returns <Summary> information only for product models whose catalog description also stores the <Features> description.

WITH XMLNAMESPACES ('https://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ProductModelDescription' AS "PD")

SELECT CatalogDescription.query('
  /PD:ProductDescription/PD:Summary
') as Result
FROM Production.ProductModel
WHERE CatalogDescription.exist ('/PD:ProductDescription/PD:Features') = 1

With regard to the primary XML index, instead of shredding each XML binary large object instance in the base table, the rows in the index that correspond to each XML binary large object are searched sequentially for the expression specified in the exist() method. If the path is found in the Path column in the index, the <Summary> element together with its subtrees is retrieved from the primary XML index and converted into an XML binary large object as the result of the query() method.

Note that the primary XML index is not used when retrieving a full XML instance. For example, the following query retrieves from the table the whole XML instance that describes the manufacturing instructions for a specific product model.

USE AdventureWorks;

SELECT Instructions
FROM Production.ProductModel 
WHERE ProductModelID=7;