Full-Text Search Architecture
Full-text search is powered by the Full-Text Engine. The Full-Text Engine has two roles: indexing support and querying support.
Beginning in SQL Server 2008, full-text search architecture consists of the following processes:
The SQL Server process (sqlservr.exe)
The MSFTESQL service does not exist in SQL Server 2008 and later versions. Full-text tasks that were performed by the MSFTESQL service in SQL Server 2005 and earlier versions are now performed by the SQL Server process.
The filter daemon host process (fdhost.exe)
For security reasons, beginning in SQL Server 2008, filters are loaded by separate processes called the filter daemon hosts. A server instance uses a multithreaded process for all multithreaded filters and a single-threaded process for all single-threaded filters.
fdhost.exe replaces the Full-Text Engine filter daemon (msftefd.exe) of SQL Server 2005 and earlier versions.
The fdhost.exe processes are created by an FDHOST launcher service (MSSQLFDLauncher), and they run under the security credentials of the FDHOST launcher service account. Therefore, this service must be running for full-text indexing and full-text querying to work. For information about setting the service account for this service, see How to: Set the FDHOST Launcher (MSSQLFDLauncher) Service Account for Full-Text Search (SQL Server Configuration Manager).
These processes contain the components of the full-text search architecture. These components and their relationships are summarized in the following illustration. The components are described after the illustration.
Full-text search uses the following components of the SQL Server process:
These tables contain the data to be full-text indexed.
The full-text gatherer works with the full-text crawl threads. It is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs.
Beginning in SQL Server 2008, a full-text catalog is a virtual object and does not belong to any filegroup. A full-text catalog is a logical concept that refers to a group of full-text indexes.
These files contain synonyms of search terms. For more information, see Thesaurus Configuration.
Stoplist objects contain a list of common words that are not useful for the search. For more information, see Stopwords and Stoplists.
Stoplist objects replace the noise word files of SQL Server 2005 and earlier versions.
SQL Server query processor
The query processor compiles and executes SQL queries. If a SQL query includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and during execution. The query result is matched against the full-text index. For more information, see Full-Text Engine.
The Full-Text Engine in SQL Server is now fully integrated with the query processor. The Full-Text Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist. In SQL Server 2008 and later versions, the Full-Text Engine for SQL Server runs inside the SQL Server query processor.
Index writer (indexer)
The index writer builds the structure that is used to store the indexed tokens.
Filter daemon manager
The filter daemon manager is responsible for monitoring the status of the Full-Text Engine filter daemon host.
The filter daemon host is a process that is started by the Full-Text Engine. It runs the following full-text search components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input:
The components of the filter daemon host are as follows:
This component pulls the data from memory for further processing and accesses data from a user table in a specified database. One of its responsibilities is to gather data from the columns being full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as required. .
Some data types require filtering before the data in a document can be full-text indexed, including data in varbinary, varbinary(max), image, or xml columns. The filter used for a given document depends on its document type. For example, different filters are used for Microsoft Word (.doc) documents, Microsoft Excel (.xls) documents, and XML (.xml) documents. Then the filter extracts chunks of text from the document, removing embedded formatting and retaining the text and, potentially, information about the position of the text. The result is a stream of textual information. For more information, see Full-Text Search Filters.
Word breakers and stemmers
A word breaker is a language-specific component that finds word boundaries based on the lexical rules of a given language (word breaking). Each word breaker is associated with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. At indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the textual data from a given table column. The language that is associated with a table column in the full-text index determines which word breaker and stemmer are used for indexing the column. For more information, see Word Breakers and Stemmers.
For information about all of the full-text linguistic components, see Configuring Full-Text Linguistic Components.