Full-Text Search Overview
Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.
Full-text search is applicable in a wide range of business scenarios such as e-businesses—searching for items on a web site; law firms—searching for case histories in a legal-data repository; or human resources departments—matching job descriptions with stored resumes. The basic administrative and development tasks of full-text search are equivalent regardless of business scenarios. However, in a given business scenario, full-text index and queries can be honed to meet business goals. For example, for an e-business maximizing performance might be more important than ranking of results, recall accuracy (how many of the existing matches are actually returned by a full-text query), or supporting multiple languages. For a law firm, returning every possible hit (total recall of information) might be the most important consideration.
For any scenario, a database administrator performs the following basic steps to configure table columns in a database for full-text search:
Create a full-text catalog.
On each table that you want to search, create a full-text index by:
Identify each text columns that you want to include in the full-text index.
If a given column contains documents stored as binary data (varbinary(max), or image data), you must specify a table column (the type column) that identifies the type of each document in the column being indexed.
Specify the language that you want full-text search to use on the documents in the column.
Choose the change-tracking mechanism that you want to use on the full-text index to track changes in the base table and its columns.
Full-text search supports multiple languages through the use of the following linguistic components: word breakers and stemmers, stoplists that contain stopwords (also known as noise words), and thesaurus files. Thesaurus files and, in some cases, stoplists require configuration by a database administrator. A given thesaurus file supports all full-text indexes that use the corresponding language, and a given stoplist can be associated with as many full-text indexes as you want.
For more information, see Administering Full-Text Search.
After the columns have been added to a full-text index, applications and users can run full-text queries on the text in the columns. These queries can search for any of the following:
One or more specific words or phrases (simple term)
A word or a phrase where the words begin with specified text (prefix term)
Inflectional forms of a specific word (generation term)
A word or phrase close to another word or phrase (proximity term)
Synonymous forms of a specific word (thesaurus)
Words or phrases using weighted values (weighted term)
Full-text queries all use a small set of Transact-SQL predicates (CONTAINS and FREETEXT) and functions (CONTAINSTABLE and FREETEXTTABLE). However, the search goals of a given business scenario influence the exact structure of the full-text queries; for example:
e-business—searching for a product on a website:
SELECT product_id FROM products WHERE CONTAINS(product_description, ”Snap Happy 100EZ” OR FORMSOF(THESAURUS,’Snap Happy’) OR ‘100EZ’) AND product_cost<200 …
Recruitment scenario—searching for job candidates that have experience working with SQL Server:
SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”) AND candidate_division =DBA
For more information, see Querying SQL Server Using Full-Text Search.
In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.