IFilters and protocol handlers (Office SharePoint Server 2007)
Updated: April 16, 2009
Applies To: Office SharePoint Server 2007
The crawler in Microsoft Office SharePoint Server 2007 uses protocol handlers to access content and then IFilters to extract content from files that are crawled. IFilters remove application-specific formatting before the engine indexes the content of a document. Only file types for which a protocol handler and IFilter are installed are crawled by Office SharePoint Server 2007.
This section describes the IFilters and protocol handlers that are included by default in an Office SharePoint Server 2007 installation and describes how you can install and register additional IFilters and protocol handlers.
The crawler uses protocol handlers and IFilters as follows:
The crawler retrieves the start addresses of content sources and calls the protocol handler based on the URL’s prefix.
The protocol handler connects to the content source and extracts system-level metadata and access control lists information.
The protocol handler identifies the file type of each content item, based on the file name extension, and calls the appropriate IFilter associated with that file type.
The IFilter extracts content, removing any embedded formatting, and then retrieves content item metadata.
Content is parsed by one or more language-appropriate word breakers and is added to the content index, also called the full-text index. Metadata and access control lists are added to the search database.
In this section: