How to: Edit a Thesaurus File (Full-Text Search)

The thesaurus for a given language can be configured by editing its thesaurus file (an XML file). During setup, empty thesaurus files that contain only the <xml> container and a commented-out sample <thesaurus> element are installed. In order for full-text search queries that look for synonyms to work properly, you must create an actual <thesaurus> element that defines a set of synonyms. You can define two forms of synonyms, expansion sets and replacement sets. For information about the location and structure of a thesaurus file, see Thesaurus Configuration.

Restrictions for thesaurus files

The following restrictions apply to editing a thesaurus file:

  • Only system administrators can update, modify, or delete thesaurus files.

  • When editing thesaurus files using text editor tools, the files must be saved in Unicode format, and Byte Order Marks must be specified.

  • Thesaurus entries cannot be empty or word break to an empty string.

  • Phrases in the thesaurus file must be no longer than 512 characters.

  • A thesaurus must not contain any duplicate entries among the <sub> entries of expansion sets and the <pat> elements of replacement sets.

Recommendations for thesaurus files

We recommend that entries in the thesaurus file contain no special characters. This is because word breakers have subtle behaviors with respect to special characters. If a thesaurus entry contains any special characters, word breakers used in combination with that entry can have subtle behavioral implications for a full-text query.

We recommend that <sub> entries contain no stopwords since stopwords are omitted from the full-text index. Queries are expanded to include the <sub> entries from a thesaurus file, and if a <sub> entry contains stopwords, query size increases unnecessarily.

To edit a thesaurus file

  1. Open the thesaurus file in Notepad.

  2. If you are editing the thesaurus file for the first time, remove the following comment lines at the beginning and end of the file, respectively:

    <!--Commented out
    -->
    
  3. Add, modify, or delete a replacement set, or expansion set. For more information, see Thesaurus Configuration.

  4. Save the file and close Notepad.

  5. Use sp_fulltext_load_thesaurus_file to load the content of the thesaurus file into tempdb, specifying the local identifier (LCID) that corresponds to the language of the thesaurus file. For example, for the English thesaurus file, tsenu.xml, the corresponding LCID is 1033.

    USE AdventureWorks2008R2 ;
    EXEC sys.sp_fulltext_load_thesaurus_file 1033;
    GO