Getting Started with Full-Text Search
Databases in SQL Server are full-text enabled by default. However, to use a full-text index on a table, you must set up full-text indexing capability on the columns of the tables that you want to access using the Full-Text Engine.
This involves the following basic steps:
Create a full-text catalog to store full-text indexes.
Each full-text index must belong to a full-text catalog. You can create a separate text catalog for each full-text index, or you can associate multiple full-text indexes with a given catalog.
Beginning in SQL Server 2008, a full-text catalog is a virtual object and does not belong to any filegroup. A full-text catalog is a logical concept that refers to a group of full-text indexes.
Create a full-text index on the table or indexed view.
A full-text index is a special type of token-based functional index that is built and maintained by the Full-Text Engine. To create full-text search on a table or view, it must have a unique, single-column, non-nullable index. The Full-Text Engine requires this unique index to map each row in the table to a unique, compressible key. A full-text index can include char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, and varbinary(max) columns. For more information, see Full-Text Indexing Overview.
Before learning about creating full-text indexes, it is important to consider how they differ from regular SQL Server indexes. The following table lists the differences.
Regular SQL Server indexes
Only one full-text index allowed per table.
Several regular indexes allowed per table.
The addition of data to full-text indexes, called a population, can be requested through either a schedule or a specific request, or can occur automatically with the addition of new data.
Updated automatically when the data upon which they are based is inserted, updated, or deleted.
Grouped within the same database into one or more full-text catalogs.
This section covers the following:
Choosing the column language
Choosing a filegroup for a full-text index
Assigning the full-text index to a full-text catalog
Associating a stoplist with the full-text index
Updating a full-text index
Choosing the Column Language
For information about things to consider when you are choosing the column language, see Best Practices for Choosing a Language When Creating a Full-Text Index.
Choosing a Filegroup for a Full-Text Index
The process of building a full-text index is fairly I/O intensive (on a high level, it consists of reading data from SQL Server, and then propagating the filtered data to the full-text index). As a best practice, locate a full-text index in the database filegroup that is best for maximizing I/O performance or locate the full-text indexes in a different filegroup on another volume.
When ease of management is important to you, we recommend that you store table data and any affiliated full-text catalogs in the same filegroup. Sometimes, for performance reasons, you might want to have the table data and the full-text index in different filegroups that are stored on different volumes to maximize I/O parallelism.
Assigning the Full-Text Index to a Full-Text Catalog
It is important to plan the placement of full-text indexes for tables in full-text catalogs.
We recommend associating tables with the same update characteristics (such as small number of changes versus large number of changes, or tables that change frequently during a particular time of day) together under the same full-text catalog. By setting up full-text catalog population schedules, full-text indexes stay synchronous with the tables without adversely affecting the resource usage of the database server during periods of high database activity.
When you assign a table to a full-text catalog, consider the following guidelines:
Always select the smallest unique index available for your full-text unique key. (A 4-byte, integer-based index is optimal.) This reduces the resources required by Microsoft Search service in the file system significantly. If the primary key is large (over 100 bytes), consider choosing another unique index in the table (or creating another unique index) as the full-text unique key. Otherwise, if the full-text unique key size exceeds the maximum size allowed (900 bytes), full-text population will not be able to proceed.
If you are indexing a table that has millions of rows, assign the table to its own full-text catalog.
Consider the amount of changes occurring in the tables being full-text indexed, as well as the total number of rows. If the total number of rows being changed, together with the numbers of rows in the table present during the last full-text population, represents millions of rows, assign the table to its own full-text catalog.
Associating a Stoplist with the Full-Text Index
SQL Server 2008 introduces stoplists. A stoplist is a list of stopwords, also known as noise words. A stoplist is associated with each full-text index, and the words in that stoplist are applied to full-text queries on that index. By default, the system stoplist is associated with a new full-text index. However, you can create and use your own stoplist instead. For more information, see Stopwords and Stoplists.
For example, the following CREATE FULLTEXT STOPLISTTransact-SQL statement creates a new full-text stoplist named myStoplist3 by copying from the system stoplist:
CREATE FULLTEXT STOPLIST myStoplist FROM SYSTEM STOPLIST; GO
The following ALTER FULLTEXT STOPLISTTransact-SQL statement alters a stoplist named myStoplist, adding the word 'en', first for Spanish and then for French:
ALTER FULLTEXT STOPLIST MyStoplist ADD 'en' LANGUAGE 'Spanish'; ALTER FULLTEXT STOPLIST MyStoplist ADD 'en' LANGUAGE 'French'; GO
Updating a Full-Text Index
Like regular SQL Server indexes, full-text indexes can be automatically updated as data is modified in the associated tables. This is the default behavior. Alternatively, you can keep your full-text indexes up-to-date manually or at specified scheduled intervals. Populating a full-text index can be time-consuming and resource-intensive, therefore, index updating is usually performed as an asynchronous process that runs in the background and keeps the full-text index up to date after modifications in the base table. Updating a full-text index immediately after each change in the base table can be resource-intensive. Therefore, if you have a very high update/insert/delete rate, you might experience some degradation in query performance. If this occurs, consider scheduling manual change tracking updates to keep up with the numerous changes from time to time, rather than competing with queries for resources.
To monitor the population status, use either the FULLTEXTCATALOGPROPERTY or OBJECTPROPERTYEX functions. To get the catalog population status, run the following statement:
SELECT FULLTEXTCATALOGPROPERTY('AdvWksDocFTCat', 'Populatestatus');
Typically, if a full population is in progress, the result returned is 1.
The following two-part example creates a full-text catalog named AdvWksDocFTCat on the AdventureWorks database and then creates a full-text index on the Document table in AdventureWorks. This statement creates the full-text catalog in the default directory specified during setup. The folder named AdvWksDocFTCat is in the default directory.
To create a full-text catalog named AdvWksDocFTCat, the example uses a CREATE FULLTEXT CATALOG statement:
USE AdventureWorks; GO CREATE FULLTEXT CATALOG AdvWksDocFTCat;
Before you can create a full-text index on the Document table, ensure that the table has a unique, single-column, non-nullable index. The following CREATE INDEX statement creates a unique index, ui_ukDoc, on the DocumentID column of the Document table:
CREATE UNIQUE INDEX ui_ukDoc ON Production.Document(DocumentID);
After you have a unique key, you can create a full-text index on the Document table by using the following CREATE FULLTEXT INDEX statement.
CREATE FULLTEXT INDEX ON Production.Document ( Document --Full-text index column name TYPE COLUMN FileExtension --Name of column that contains file type information Language 2057 --2057 is the LCID for British English ) KEY INDEX ui_ukDoc ON AdvWksDocFTCat --Unique index WITH CHANGE_TRACKING AUTO --Population type; GO
The TYPE COLUMN defined in this example specifies the type column in the table that contains the type of the document in each row of the column 'Document' (which is of binary type). The type column stores the user-supplied file extension—".doc", ".xls", and so on—of the document in a given row. The Full-Text Engine uses the file extension in a given row to invoke the correct filter to use for parsing the data in that row. After the filter has parsed the binary data of the row, the specified word breaker will parse the content (in this example, the word breaker for British English is used). Note that the filtering process happens only at indexing time or if a user inserts or updates a column in the base table while automatic change tracking is enabled for the full-text index. For more information, see Full-Text Search Filters.
For more information about indexes, see Indexes.
Catalog or Dynamic Management View
Returns a row for each full-text catalog to full-text index reference.
Contains a row for each column that is part of a full-text index.
A fulltext index uses internal tables called full-text index fragments to store the inverted index data. This view can be used to query the metadata about these fragments. This view contains a row for each full-text index fragment in every table that contains a full-text index.
Contains a row per full-text index of a tabular object.
Returns information about the content of a full-text index for the specified table.
Returns information about the document-level content of a full-text index for the specified table. A given keyword can appear in several documents.
Returns information about the full-text index populations currently in progress.