Manage stop word files (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

A stop word, or noise word, is a word that the search system ignores in end-user search queries. A word might be designated as a stop word because it occurs in the language so frequently that it is unlikely to be helpful for identifying or narrowing search results. Articles such as “an” and “the” are typically specified as stop words for English, for example. If a user types the English query “the highest mountain”, “the” is removed from the query if it is a stop word, so that the query becomes “highest mountain”. Potentially offensive words are also sometimes specified as stop words.

In this article:

  • Understanding stop word files

  • Edit a stop word file

  • Stop word files by language

Understanding stop word files

The stop words for a given language are listed in the stop word file for that language. The Microsoft SharePoint Server 2010 installation program automatically installs one stop word file for each language that the product supports. Following installation, many of the stop word files contain some typical stop words for the associated language. For example, by default the U.S. English stop word file (noiseenu.txt) contains the words a, and, is, in, it, of, the, to. At any time after product installation, the search administrator can add or remove words in a stop word file to improve relevance of search results or to meet organization standards. For information about adding or removing words in a stop word file, see Edit a stop word file later in this article. For information about supported languages, see Stop word files by language later in this article.

At query time, the word breaker for the language of the query identifies individual words in the search query by determining word boundaries based on the lexical rules of the language. The word breaker then removes any words from the query that are listed in the stop word file.

By default, the stop word files for all supported languages are installed at %ProgramFiles%\Microsoft Office Servers\14.0\Data\Office Server\Config. When a farm administrator creates a Search service application, the search system automatically copies the stop word files from the installation location (including any stop word files there that a search administrator has edited) to %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID-query-n\Config, where GUID is the GUID of the new Search service application and query-n is the query component that is created when the index component is built. The search system performs the same operation on every query server that is running the new Search service application. In this way, there is a copy of each stop word file on each query server that is running that Search service application.

Note

It is not a good practice to directly edit such a copy of a stop word file, because if you change the search topology, or if you create a mirror of the query component, the copy of the stop word file will automatically be overwritten with the stop word file from the installation location.

Edit a stop word file

If you edit a stop word file in the installation location, the system automatically propagates the edited stop word file to Search service applications that are created afterward. However, the edited stop word file is not automatically propagated to existing Search service applications. For each existing Search service application to which you want the changes to apply, you must manually copy the edited file to the Search service application folder on each query server that is running that Search service application.

Note

  • If you delete a stop word file, the search system might consider all single characters as stop words and remove them from search results. A stop word file must contain at least one entry, even if the entry is merely a period (.) character.

  • If you delete a stop word file and then restart the SharePoint Server Search 14 service, the search system automatically replaces the file by copying the file of the same name from %Program Files%Microsoft Office Servers\14.0\Data\Office Server\Config to the folder where the file was deleted.

Use the following procedure to edit a stop word file.

To edit a stop word file

  1. Verify that the user account that is performing this procedure is member of the local server Administrators group.

  2. Open the stop word file in a text editor. For information about locating and identifying the appropriate stop word file, see Understanding stop word files earlier in this article.

  3. Edit the file so that it includes only the words that you want the search system to ignore in search queries.

  4. Save the stop word file.

    Note

    When you save a stop word file, always use the default Encoding value, which is Unicode.

  5. Restart the SharePoint Server Search 14 service by following these steps:

    1. Click Start, point to Administrative Tools, and then click Services.

    2. Right-click SharePoint Server Search 14, and then click Restart.

      Stop word changes take effect after the SharePoint Server Search 14 service restarts.

      Note

      In Microsoft Office SharePoint Server 2007, the search system excluded stop words from queries and from the index. Therefore, after an administrator removed a word from a stop word file, it was necessary to perform a full crawl to index any instances of that stop word that the crawler might encounter. In contrast, in SharePoint Server 2010, the search system excludes stop words from queries, but by design it does not exclude stop words from the index. Therefore, in SharePoint Server 2010, if you remove a word from a stop word file, it is not necessary to perform a new crawl because the stop word is already in the index if it was encountered during a crawl. (If you add a word to a stop word file, it is not necessary to perform a new crawl either, because the search system does not look for stop words in the index.)

Stop word files by language

When you install SharePoint Server 2010, stop word files are installed for the following languages. If a stop word file does not exist for a language, the search system uses the neutral stop word file noiseneu.txt.

Language Stop word file name

Arabic

noiseara.txt

Bengali

noiseben.txt

Bulgarian

noisebul.txt

Catalan

noisecat.txt

Czech

noiseces.txt

Chinese (Simplified)

noisechs.txt

Chinese (Traditional)

noisecht.txt

Croatian

noisecro.txt

Danish

noisedan.txt

Dutch (Netherlands)

noisenld.txt

English (United Kingdom)

noiseeng.txt

English (United States)

noiseenu.txt

Finnish

noisefin.txt

French

noisefra.txt

German

noisedeu.txt

Greek

noisegrc.txt

Gujarati

noiseguj.txt

Hebrew

noiseheb.txt

Hindi

noisehin.txt

Hungarian

noisehun.txt

Icelandic

noiseice.txt

Indonesian

noiseind.txt

Italian

noiseita.txt

Japanese

noisejpn.txt

Kannada

noisekan.txt

Korean

noisekor.txt

Language neutral

noiseneu.txt

Latvian

noiselav.txt

Lithuanian

noiselit.txt

Malay

noisemal.txt

Malayalam

noisemly.txt

Marathi

noisemar.txt

Norwegian (Bokmal)

noisenor.txt

Polish

noiseplk.txt

Portuguese (Portugal)

noisepor.txt

Portuguese (Brazil)

noiseptb.txt

Punjabi

noisepun.txt

Romanian

noiserom.txt

Russian

noiserus.txt

Serbian (Cyrillic)

noisesbc.txt

Serbian (Latin)

noisesbl.txt

Slovak

noisesvk.txt

Slovenian

noiseslo.txt

Spanish

noiseesn.txt

Swedish

noisesve.txt

Tamil

noisetam.txt

Telugu

noisetel.txt

Thai

noisetha.txt

Turkish

noisetur.txt

Ukrainian

noiseurk.txt

Urdu (Pakistan)

noiseurd.txt

Vietnamese

noisevie.txt