Full-Text Search Filters
Indexing documents in an varbinary, varbinary(max), image, or xml data type column requires extra processing. This processing must be performed by a filter. The filter extracts the textual information from the document (removing the formatting). The filter then sends the text to the word-breaker component for the language associated with the table column.
A given filter is specific to a given document type (.doc, .pdf, .xls, .xml, and so forth). These filters implement the IFilter interface. For more information about these document types, query the sys.fulltext_document_types catalog view.
Binary documents can be stored in a single varbinary(max) or image column. For each document, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in a varbinary(max) or image column, the file extension (.doc, .xls, .pdf, and so forth) must be stored in a separate column in the table, called a type column. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document. In the Document table in Adventure Works, the Document column is of type varbinary(max), and the type column, FileExtension, is of type nvarchar(8).
A filter might be able to handle objects embedded in the parent object, depending on its implementation. However, SQL Server does not configure filters to follow links to other objects.
SQL Server 2008 installs its own XML and HTML filters. In addition, any filters for Microsoft proprietary formats (.doc, .xdoc, .ppt and so on) that are already installed on the operating system are also loaded by SQL Server. To identify the filters that are currently loaded on an instance of SQL Server, use the sp_help_fulltext_system_components stored procedure, as follows:
EXEC sp_help_fulltext_system_components 'filter';
Before you can use filters for non Microsoft formats, however, you must manually load them into the server instance. For information about installing additional filters, see How to: Alter the List of Registered Word Breakers and Filters (Transact-SQL).
To view the type column in an existing full-text index