Integrated Full-Text Search (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions
Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions. |
Full-text search refers to the functionality in Microsoft SQL Server that supports full-text queries against character-based data. These types of queries can include words and phrases, in addition to multiple forms of a word or phrase. To support full-text queries, you must implement full-text indexes on the columns referenced in the query. You can configure the columns with character data types or with binary data types. When binary data is indexed, another column in the table must indicate the type of data that is contained within the binary column.
A full-text index is made up of word tokens that are derived from the text being indexed, but ignores stopwords. A stopword can be a word with meaning in a specific language, or it can be a token that does not have linguistic meaning. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search. Although iFTS ignores the inclusion of stopwords, the full-text index does take into account their position.
Best Practices
The following resources provide examples of customer scenarios for integrated full-text search (iFTS), in addition to general iFTS reference material. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)
The white paper SQL Server 2008 Full-Text Search: Internals and Enhancements1 thoroughly covers iFTS in SQL Server 2008, and includes many useful data points.
The SQL customer advisory team (CAT) blog post Best Practices for Integrated Full Text Search (iFTS) in SQL Server 20082 provides best practices and lessons learned from working with iFTS.
The SQL Server Full-Text Search (FTS) Blog3 is managed by the SQL Server Full-Text Search engineering team and provides helpful information.
The article Full-Text Indexing Terabytes of Files with SQL Server and Cloud Storage4 describes a solution for maintaining the full-text index on file in cloud storage.
Surrogate (supplementary) Unicode characters are not supported by the full-text engine. The SQT CAT blog post Full-text query in local languages5 describes a situation in which the full-text index was populated with an English word breaker, but the query was using Chinese to search for it. While this might be acceptable for Unicode, it would not necessarily be acceptable where surrogate Unicode characters are involved. For the definition of surrogate characters, see the article Supplementary Characters.6
The article Best Practices for Choosing a Language When Creating a Full-Text Index7 provides guidance for specifying a column-level language for the indexed column when creating a full-text index.
Case Studies and References
The following two case studies provide helpful information.
Questions and Considerations
This section provides questions and issues to consider when working with your customers.
Understand the number and type of users, as well as their expected search patterns and response-time requirements. With one customer, response-time performance was critical because the users, who were lawyers, were investigating the depositions for arguing a well-known case. Ask customers about user expectations so that you can keep them in mind as the usual design trade-offs are made.
Review the article SQL Server 2008 Full-Text Search: Internals and Enhancements1 to get a good understanding of the capabilities.
Does the search application need to tightly integrate with relational queries and the relational structured data? If so, and if the relational predicate is very selective, it is optimal to have a predicate push-down plan. See the section Implementing Full-Text Search: Best Practices10 in the article "SQL Server 2008 Full-Text Search: Internals and Enhancements."
Does the search application need to integrate with Internet search or Microsoft SharePoint search?
Consider where the binary data resides, because FILESTREAM supports iFTS, but remote binary large object (BLOB) storage (RBS) does not.
Appendix
Following are the full URLs for the hyperlinked text.
1 SQL Server 2008 Full-Text Search: Internals and Enhancementshttps://msdn.microsoft.com/en-us/library/cc721269.aspx
2 Best Practices for Integrated Full Text Search (iFTS) in SQL 2008http://sqlcat.com/msdnmirror/archive/2008/11/05/best-practices-for-integrated-full-text-search-ifts-in-sql-2008.aspx
3 SQL Server Full-Text Search (FTS) Bloghttps://blogs.msdn.com/b/sqlfts/
4 Full Text Indexing Terabytes of Files with SQL Server and Cloud Storagehttps://blogs.msdn.com/b/sqlcat/archive/2010/02/03/full-text-indexing-terabytes-of-files-with-sql-server-and-cloud-storage.aspx
5 Full-text query in local languageshttp://sqlcat.com/msdnmirror/archive/2010/05/20/full-text-query-in-local-languages.aspx
6 Supplementary Charactershttps://msdn.microsoft.com/en-us/library/ms180942.aspx
7 Best Practices for Choosing a Language When Creating a Full-Text Indexhttps://msdn.microsoft.com/en-us/library/ms142507.aspx
8 RSS Aggregator NewsGator Manages 2.5 Billion Articles with SQL Server 2008https://download.microsoft.com/download/4/4/d/44da642f-1c69-4164-93e8-
9 FileControl Partners: Faster Searches of 1 Terabyte of Litigation Documents Gained with SQL Server 2005https://www.microsoft.com/casestudies/Microsoft-Visual-Studio-6.0/FileControl-Partners/Faster-Searches-of-1-Terabyte-of-Litigation-Documents-Gained-with-SQL-Server-2005/200075
10 Implementing Full-Text Search: Best Practiceshttps://msdn.microsoft.com/en-us/library/cc721269(SQL.100).aspx#_Toc202506250