IFilters and protocol handlers (Search Server 2008)

Applies To: Microsoft Search Server 2008

 

Topic Last Modified: 2009-03-13

Note

Unless otherwise noted, the information in this article applies to both Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.

The crawler in Search Server 2008 uses protocol handlers to access content and then IFilters to extract content from files that are crawled. IFilters remove application-specific formatting before the engine indexes the content of a document. Only file types for which a protocol handler and IFilter are installed are crawled by Search Server.

This section describes the IFilters and protocol handlers that are included by default in a Search Server installation and describes how you can install and register additional IFilters and protocol handlers.

The crawler uses protocol handlers and IFilters as follows:

  1. The crawler retrieves the start addresses of content sources and calls the protocol handler based on the URL’s prefix.

  2. The protocol handler connects to the content source and extracts system-level metadata and access control lists information.

  3. The protocol handler identifies the file type of each content item, based on the file name extension, and calls the appropriate IFilter associated with that file type.

  4. The IFilter extracts content, removing any embedded formatting, and then retrieves content item metadata.

  5. Content is parsed by one or more language-appropriate word breakers and is added to the full-text index, also called the content index. Metadata and access control lists are added to the search database.

In this section:

See Also

Concepts

Add content sources (Search Server 2008)