Querying varbinary(max) and xml Columns (Full-Text Search)

If a varbinary(max), varbinary, or xml column is full-text indexed, it can be queried using the full-text predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE), like any other full-text indexed column.

Important

Full-text search also works with image columns. However, the image data type will be removed in a future version of SQL Server. Avoid using this data type in new development work, and plan to modify applications that currently use it. Use the varbinary(max) data type instead.

varbinary(max) or varbinary data

A single varbinary(max) or varbinary column can store many types of documents. SQL Server 2008 supports any document type for which a filter is installed and available in the operative system. The document type of each document is identified by the file extension of the document. For example, for a .doc file extension, full-text search uses the filter that supports Microsoft Word documents. For a list of available document types, query the sys.fulltext_document_types catalog view. 

Note that the Full-Text Engine can leverage existing filters that are installed in the operating system. Before you can use operating-system filters, word breakers, and stemmers, you must load them in the server instance, as follows:

EXEC sp_fulltext_service @action='load_os_resources', @value=1;

To create a full-text index on a varbinary(max) column, the Full-Text Engine needs access to the file extensions of the documents in the varbinary(max) column. This information must be stored in a table column, called a type column, that must be associated with the varbinary(max) column in the full-text index. When indexing a document, the Full-Text Engine uses the file extension in the type column to identify which filter to use.

xml data

An xml data type column stores only XML documents and fragments, and only the XML filter is used for the documents. Therefore, a type column is unnecessary. On xml columns, the full-text index indexes the content of the XML elements, but ignores the XML markup. Attribute values are full-text indexed unless they are numeric values. Element tags are used as token boundaries. Well-formed XML or HTML documents and fragments containing multiple languages are supported.

For more information about querying on an xml column, see Full-Text Index on an XML Column.