Loading Licensed Third Party Wordbreakers
New: 5 December 2005
Microsoft SQL Server 2005 includes licensed third-party wordbreakers for the following languages:
- Danish
- Polish
- Portuguese-Brazilian
- Portuguese-Iberian
- Russian
- Turkish
These wordbreakers are available but are not installed by default, and must be manually registered.
Note
We recommend having the Microsoft Full-Text Engine for SQL Server (MSFTESQL) service set to run under a low-privileged account.
Registering the Wordbreakers
To register a wordbreaker, you must do the following:
- Add the COM ClassID(s) for the wordbreaker and stemmer interfaces for the language being registered as a key to the <InstanceRoot>\MSSearch\CLSID node of the Microsoft Windows Registry.
- Add a key to the <InstanceRoot>\MSSearch\Language node for the language.
- Add configuration values that specify the location of the lexicon, noise word, and thesaurus files for the language.
Warning
Incorrectly editing the registry can severely damage your system. Before making changes to the registry, you should back up any valued data on the computer.
You must also have the following information:
- Instance IDs for each instance of SQL Server on which you want to register the wordbreakers.
- The FTDATA path for each instance.
- Retrieving Instance IDs for Multiple Instances of SQL Server
The registry paths listed in the following instructions are for the first instance of SQL Server 2005, which has instance ID MSSQL.1. If you have multiple instances of SQL Server, you must modify the registry paths by substituting the instance ID for that instance instead of MSSQL.1. To obtain the instance ID for an instance:
- Click Start, and click Run.
- In the Run dialog box, in the Open box, type Regedit.
- Click OK. This opens the Registry Editor.
- In the Registry Editor, select the following registry key for the first instance of SQL Server:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL
The right pane displays the instance names and their corresponding instance IDs.
Retrieving Instance-Specific FTDATA Path
After obtaining the instance IDs, you must retrieve the appropriate instance-specific path to the FTData folder. You will use this path when adding configuration values that specify the lexicon, noise word, and thesaurus files for a language. To obtain the instance-specific FTData folder path:
- Click Start, and click Run.
- In the Run dialog box, in the Open box, type Regedit.
- Click OK.
- In the Registry Editor, select the following registry key for an instance of SQL Server:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\<Instance ID>\MSSQLServer
where <Instance ID> is MSSQL.1 for the first instance of SQL Server. The registry key value will therefore be:HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer
The right pane displays the FullTextDefaultPath value, which contains the instance specific path to the FTData folder. For example, for the first instance this might be C:\Program.
Installing the Wordbreakers
The installation procedure for third-party wordbreakers licensed by Microsoft consists of three stages.
Note
The Danish wordbreaker is used as an example in the steps below. Values to install other language wordbreakers are provided in the tables later in this topic.
Stage 1: Add the COM ClassID(s) for the wordbreaker and stemmer interfaces for the language being registered.
To add COM Class ID(s) for these components:
- Open the Registry Editor, by:
- Clicking Start, and clicking Run.
- In the Run dialog box, in the Open box, type Regedit.
- In Registry Editor, select the following registry key for the first instance of SQL Server: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\CLSID
- On the menu bar, click Edit, click New, and click Key.
- Type {16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D}.
- Press ENTER.
- In the right pane, right-click the Default registry value, and then click Modify.
- In the Edit String dialog box, in the Value data box, type danlr.dll, and then click OK.
- Repeat steps 3 through 7, replacing the value in step 4 with {83BC7EF7-D27B-4950-A743-0F8E5CA928F8}.
For other languages, follow the steps above, replacing the key values in steps 4 and 8 with the key values for the language you want. These values are listed below. In step 7, replace danlr.dll with the .dll name for the language you want.
Language | Key value for step 4 | .DLL name for step 7 | Key value for step 8 |
---|---|---|---|
Polish |
{B8713269-2D9D-4BF5-BF40-2615D75723D8} |
lrpolish.dll |
{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6} |
Portuguese-Brazilian |
{25B7FD48-5404-4BEB-9D80-B6982AF404FD} |
ptblr.dll |
{D5FCDD7E-DBFF-473F-BCCD-3AFD1890EA85} |
Portuguese-Iberian |
{5D5F3A69-620C-4952-B067-4D0126BB6086} |
ptslr.dll |
{D4171BC4-90BE-4F70-8610-DAB1C17F063C} |
Russian |
{20036404-F1AF-11D2-A57F-006052076F32} |
ruslr.dll |
{20036414-F1AF-11D2-A57F-006052076F32} |
Turkish |
{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6} |
trklr.dll |
{8DF412D1-62C7-4667-BBEC-38756576C21B} |
Stage 2. Add a key to the <InstanceRoot>\MSSearch\Language node for the language.
To add a key to this node for the Danish language:
- Select the following registry key for the first instance of SQL Server:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language
- Repeat steps 3 through 5 in the preceding procedure, replacing the key name in step 4 with dan.
For other languages, follow the preceding steps, replacing the key name in step 4 with the value listed below for the specific language.
Language | Key name for step 4 |
---|---|
Polish |
plk |
Portuguese-Brazilian |
ptb |
Portuguese-Iberian |
pts |
Russian |
rus |
Turkish |
trk |
Stage 3. Add configuration values that give the location of the lexicon, noise word, and thesaurus files for the language.
To add configuration values for these components for the Danish language:
- Select the registry key you entered in Stage 2 above. For the first instance of SQL Server this would be:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\dan
- On the menu bar, click Edit, click New, and click String Value.
- Type NoiseFile.
- Press ENTER.
- Right-click the NoiseFile registry value you just added, and then click Modify.
- In the Edit String dialog box, in the Valuedata box, type <Instance_specific_FTData_path>\noisedan.txt, where <Instance_specific_FTData_path> is the path retrieved in the section described earlier ("Retrieving the instance specific FTData folder path").
- Click OK.
Repeat steps 2 through 7 to add the values listed below, replacing the value type (step 2), value name (steps 3 and 5), and value data (step 6) for each value.
Note
These values are for Danish.
Value type for step 2 | Value names for steps 3 and 5 | Value type for step 6 |
---|---|---|
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tsdan.xml |
DWORD value |
Locale |
00000406 |
String value |
WBreakerClass |
{16BC5CE4-2C78-4CB9-80D5-386A68CC2B2D} |
string value |
StemmerClass |
{83BC7EF7-D27B-4950-A743-0F8E5CA928F8} |
For the Polish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Polish in Stage 2 above. For the first instance of the SQL Server this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\plk
Value type for step 2 | Value names for steps 3 and 5 | Value data for step 6 |
---|---|---|
String value |
NoiseFile |
<Instance_specific_FTData_path>\noiseplk.txt |
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tsplk.xml |
DWORD value |
Locale |
00000415 |
String value |
WBreakerClass |
{CA665B09-4642-4C84-A9B7-9B8F3CD7C3F6} |
String value |
StemmerClass |
{B8713269-2D9D-4BF5-BF40-2615D75723D8} |
For the Portuguese-Brazilian language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Portuguese-Brazilian in Stage 2 above. For the first instance of the SQL Server this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\ptb
Value type for step 2 | Value names for steps 3 and 5 | Value data for step 6 |
---|---|---|
String value |
NoiseFile |
<Instance_specific_FTData_path>\noiseptb.txt |
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tsptb.xml |
DWORD value |
Locale |
00000416 |
String value |
WBreakerClass |
{25B7FD48-5404-4BEB-9D80-B6982AF404FD} |
String value |
StemmerClass |
{D5FCDD7E-DBFF-473F-BCCD-3AFD1890EA85} |
For the Portuguese-Iberian language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Portuguese-Iberian in Stage 2 above. For the first instance of SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\pts
Value type for step 2 | Value names for steps 3 and 5 | Value data for step 6 |
---|---|---|
String value |
NoiseFile |
<Instance_specific_FTData_path>\noisepts.txt |
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tspts.xml |
DWORD value |
Locale |
00000816 |
String value |
WBreakerClass |
{5D5F3A69-620C-4952-B067-4D0126BB6086} |
String value |
StemmerClass |
{D4171BC4-90BE-4F70-8610-DAB1C17F063C} |
For the Russian language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Russian in Stage 2 above. For the first instance of SQL Server, this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\rus
Value type for step 2 | Value names for steps 3 and 5 | Value data for step 6 |
---|---|---|
String value |
NoiseFile |
<Instance_specific_FTData_path>\noiserus.txt |
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tsrus.xml |
DWORD value |
Locale |
00000419 |
String value |
WBreakerClass |
{20036404-F1AF-11D2-A57F-006052076F32} |
String value |
StemmerClass |
{20036414-F1AF-11D2-A57F-006052076F32} |
For the Turkish language, follow the steps outlined above, using the values listed below. Select the registry key you entered for Turkish in Stage 2 above. For the first instance of SQL Server this would be: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSearch\Language\trk
Value type for step 2 | Value names for steps 3 and 5 | Value data for step 6 |
---|---|---|
String value |
NoiseFile |
<Instance_specific_FTData_path>\noisetrk.txt |
String value |
TsaurusFile |
<Instance_specific_FTData_path>\tstrk.xml |
DWORD value |
Locale |
0000041f |
String value |
WBreakerClass |
{8DF412D1-62C7-4667-BBEC-38756576C21B} |
String value |
StemmerClass |
{23A9C1C3-3C7A-4D2C-B894-4F286459DAD6} |
See Also
Concepts
Word Breakers and Stemmers
Loading Licensed Third Party Wordbreakers
Full-Text Search