Export (0) Print
Expand All

Manage thesaurus files (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

Topic Last Modified: 2012-04-02

By using thesaurus files, a search administrator can specify replacements or synonyms for words or phrases that occur in search queries.

  • Specifying replacements for query words or phrases   A search administrator can designate one or more words or phrases as replacements for particular words or phrases that a user might type in a search box. For example, an administrator might specify that whenever the term “Longhorn” appears in a query, the search system replaces it with “Windows Vista” or “Vista”. Similarly, an administrator might specify that whenever the term “NT5” or the term “W2K” appears in a query, the search system replaces it with “Windows 2000”.

    To specify replacements for query words or phrases, the search administrator inserts a replacement set into a thesaurus file. For more information, see Using replacement sets later in this article.

  • Specifying synonyms for query words or phrases   A search administrator can specify one or more words or phrases as synonyms for a particular word or phrase that a user might type in a search box. For example, an administrator might specify “IE”, “IE8”, and “Internet Explorer” as synonyms for one another. When one of these terms appears in a query, the system also searches for the other terms. Therefore, a query on any of these three terms could return search results that contain “IE”, “IE8”, or “Internet Explorer”.

    To specify synonyms for query words or phrases, the search administrator inserts an expansion set into a thesaurus file. For more information, see Using expansion sets later in this article.

In this article:

The Microsoft SharePoint Server 2010 installation program installs a thesaurus file for each language that the product supports. The installation also provides the language-neutral thesaurus file, which is named tsneu.xml. This file is applied to all queries during query processing, regardless of whether there is a thesaurus file that is specific to the query language. For more information, see Thesaurus files by language later in this article.

By default, SharePoint Server 2010 installs the thesaurus files for all supported languages at %ProgramFiles%\Microsoft Office Servers\14.0\Data\Office Server\Config. When a search administrator creates a Search service application, the search system automatically copies the thesaurus files from the installation location (including any thesaurus files there that an administrator has edited) to %ProgramFiles%\Microsoft Office Servers\14.0\Data\Office Server\Applications\GUID-query-0\Config, where GUID is the GUID of the new Search service application. The search system performs the same operation on every query server that is running the new Search service application. Thus there is a copy of each thesaurus file on each query server that is running that Search service application.

Upon installation, each thesaurus file contains only inactive, sample content that is in comments. Therefore, you must edit a thesaurus file before the search system can use it. In addition to replacement sets and expansion sets, thesaurus files contain a “diacritics_sensitive” tag that specifies whether diacritical marks such as accents are ignored or respected by the search system. By default, diacritics_sensitive is set to 0 so that diacritical marks are ignored. To direct the search system to respect diacritical marks, change the value of diacritics_sensitive to 1.

The following example shows the default XML in a thesaurus file:

<XML ID="Microsoft Search Thesaurus">

<!--  Commented out

    <thesaurus xmlns="x-schema:tsSchema.xml">
        <diacritics_sensitive>0</diacritics_sensitive>
        <expansion>
            <sub>Internet Explorer</sub>
            <sub>IE</sub>
            <sub>IE8</sub>
        </expansion>
        <replacement>
            <pat>NT5</pat>
            <pat>W2K</pat>
            <sub>Windows 2000</sub>
        </replacement>
        <expansion>
            <sub>run</sub>
            <sub>jog</sub>
        </expansion>
    </thesaurus>
-->
</XML>

A search administrator inserts a replacement set into a thesaurus file to designate one or more words or phrases as replacements for particular words or phrases that a user might type in a search box. Each replacement set in a thesaurus file is enclosed in <replacement> tags. In the replacement set, the administrator specifies one or more query words or phrases to replace by enclosing each word or phrase in <pat> (pattern) tags, and the administrator specifies one or more replacements by enclosing each replacement in <sub> (substitution) tags. For example, the following replacement set replaces the query term “Longhorn” with “Windows Vista” or “Vista”:


<replacement>
    <pat>Longhorn</pat>
    <sub>Windows Vista</sub>
    <sub>Vista</sub>
</replacement>

Similarly, the following example shows a replacement set that specifies that the query terms “NT5” and “W2K” are replaced by “Windows 2000”:


<replacement>
    <pat>W2K</pat>
    <pat>NT5</pat>  
    <sub>Windows 2000</sub>
</replacement>

By specifying a pattern with an empty substitution, the search administrator can specify that a query on a particular term returns no results. In the following example, queries for the term “bugs” will not return any results:


<replacement>
    <pat>bugs</pat>    
    <sub></sub>
</replacement>

A search administrator uses an expansion set in a thesaurus file to designate one or more words or phrases as synonyms of one another. A search query that contains any word or phrase in the expansion set is expanded to include all synonyms in the expansion set. Therefore, a search query that includes any word or phrase in the expansion set also returns search results that contain any of the synonyms in the set.

Each expansion set is enclosed in <expansion> tags. In the expansion set, the administrator specifies one or more synonyms by enclosing each synonym in <sub> tags. For example, a search administrator might want to specify an expansion set that designates the following three terms as synonyms: writer, author, blogger. To specify this expansion set, the search administrator adds the following lines to the thesaurus file:


<expansion>
    <sub>writer</sub>
    <sub>author</sub>
    <sub>blogger</sub>
</expansion>

This expansion set specifies that a query on any of the three terms also returns search results that contain either or both of the other two terms.

The word breaker for a given language identifies individual words in a search query by determining word boundaries according to the lexical rules of the language. If you include a word in a thesaurus file that the word breaker might not recognize as a single word, you should also include the word in a custom dictionary so that the word breaker does not break the word into smaller tokens. For example, if you use the term “IT&T” in an expansion set but you do not include it in a custom dictionary, the word breaker might break the term into three separate terms, “IT”, “&”, and “T”. This can cause the expansion set in the thesaurus file not to work as expected when a user issues a search query for “IT&T”. For information about how to create and use custom dictionaries, see Create a custom dictionary (SharePoint Server 2010).

If you edit a thesaurus file in the installation location, the search system automatically propagates the edited file to Search service applications that are created afterward. However, the edited thesaurus file is not automatically propagated to existing Search service applications. For each existing Search service application to which you want the changes to apply, you must manually copy the edited file to the Search service application folder on each query server that is running that Search service application.

noteNote
  • A file that is named tsschema.xml is installed in the same directory with the thesaurus files. Do not modify the tsschema.xml file. It is used by all other thesaurus files. Changing this file might cause unpredictable results.

  • Each <pat> or <sub> tag counts as an item in a thesaurus file. A typical thesaurus file contains about 1,000 items. For performance reasons, it is important not to exceed about 10,000 items in a thesaurus file.

  • If you use words in a thesaurus file that are specified in a stop word file, the search system filters those words out of the thesaurus file. For more information, see Manage stop word files (SharePoint Server 2010).

  • Thesaurus file entries cannot contain only special characters.

Use the following procedure to edit a thesaurus file.

noteNote
When editing a file, you must use matching pairs of opening and closing tags around each entry in the file. If the XML tags in the thesaurus file do not match correctly, an error is logged in the application event log.
To edit a thesaurus file
  1. Verify that the user account that is performing this procedure is a member of the Administrators group on the local computer.

  2. Open a thesaurus file in a text editor. For information about how to locate and identify the appropriate thesaurus file, see Understanding thesaurus files earlier in this article.

  3. If you are changing the thesaurus file for the first time, remove the <!-- Commented out comment line at the beginning of the file, and the --> comment line at the end of the file.

  4. Edit the thesaurus file as necessary.

  5. Save the thesaurus file.

    noteNote
    When you save a thesaurus file, always use the default Encoding value, which is Unicode.

After you edit a thesaurus file, you must restart the SharePoint Server Search 14 service for the changes to take effect. Thesaurus file changes take effect after the SharePoint Server Search 14 service restarts. It is not necessary to perform a crawl for the changes to take effect.

To restart the SharePoint Server Search 14 service
  1. Verify that the user account that is performing this procedure is a member of the Administrators group on the local computer.

  2. Click Start, point to Administrative Tools, and then click Services.

  3. Right-click SharePoint Server Search 14, and then click Restart.

    Thesaurus file changes take effect after the SharePoint Server Search 14 service restarts.

The following thesaurus files are installed automatically and available to use.

 

Language File name

Language-neutral

tsneu.xml

Arabic

tsara.xml

Bengali

tsben.xml

Bulgarian

tsbul.xml

Catalan

tscat.xml

Chinese (Simplified)

tschs.xml

Chinese (Traditional)

tscht.xml

Croatian

tscro.xml

Czech

tsces.xml

Danish

tsdan

Dutch (Netherlands)

tsnld.xml

English (United Kingdom)

tseng.xml

English (United States)

tsenu.xml

Finnish

tsfin.xml

French (Standard)

tsfra.xml

German (Standard)

tsdeu.xml

Gujarati

tsguj.xml

Hungarian

tshun.xml

Icelandic

tsice.xml

Indonesian

tsind.xml

Italian

tsita.xml

Japanese

tsjpn.xml

Kannada

tskan.xml

Korean

tskor.xml

Lithuanian

tslit.xml

Malay (Malaysian)

tsmal.xml

Malayalam

tsmly.xml

Marathi

tsmar.xml

Norwegian (Bokmal)

tsnor.xml

Polish

tsplk.xml

Portuguese (Brazil)

tsptb.xml

Portuguese (Portugal)

tspor.xml

Punjabi

tspun.xml

Romanian

tsrom.xml

Russian

tsrus.xml

Serbian (Cyrillic)

tssbc.xml

Serbian (Latin)

tssbl.xml

Slovak

tssvk.xml

Slovenian

tsslo.xml

Spanish

tsesn.xml

Swedish

tssve.xml

Tamil

tstam.xml

Telugu

tstel.xml

Thai

tstha.xml

Turkish

tstur.xml

Ukrainian

tsukr.xml

Urdu (Pakistan)

tsurd.xml

Vietnamese

tsvie.xml

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft