Deploying Full-Text Indexing
Topic Last Modified: 2005-05-12
Use Exchange System Manager to deploy full-text indexing. Deployment involves the following tasks:
Creating a full-text index
Optimizing full-text indexing
Performing a full population
Setting a schedule for incremental populations
Enabling full-text indexing queries
Of these tasks, the most server intensive is the full population process, which can take from a few minutes for a small database to several days for a large database. However, you can run the population process in the background during business hours without significant impact on system response time for users.
Before you can use full-text indexing, you must create an initial index (catalog) for each mailbox or public folder store that you want to index. This process will create the necessary file structure, which you will modify when you are optimizing the index.
For detailed steps about how to create an initial full-text index, see How to Create an Initial Full-Text Index.
This section describes how to optimize full-text indexing on your computer running Exchange Server 2003. By distributing frequently accessed files across a RAID array, you can enhance system performance.
There are five major categories of full-text indexing files. By default, these files are installed on the system drive, which typically does not have the input/output (I/O) throughput of the RAID array. Arrange the disk locations of these files (as described in the following table) to optimize the performance of full-text indexing. In some cases, this topic provides separate procedures for moving files in clustered topologies and unclustered topologies. The following are the major categories of full-text indexing files:
- Catalogs The main indexes. There is only one catalog for each mailbox store or public folder store in Exchange Server 2003.
- Property store A database that contains various properties of items indexed in the catalog. There is only one property store per server.
- Property store logs The log files associated with the property store database.
- Temporary files The files that contain temporary information used by the Microsoft Search service.
- Gather logs The log files that contain log information for the indexing service. One set of logs exists for each index.
This section refers to the following tools for moving files:
- Pstoreutl Located in Program Files\Common Files\System\MSSearch\Bin.
- SetTempPath Located in Program Files\Common Files\System\MSSearch\Bin.
- Catutil Located in Program Files\Common Files\System\MSSearch\Bin.
Recommended locations for full-text indexing files
|File type||Recommended location||How to specify the location|
Specify a location on the RAID array when you create the catalog using Exchange System Manager.
Use the Pstoreutl tool.
Property store logs
RAID array in the same location as the property store
Use the Pstoreutl tool.
Use the SetTempPath tool.
Leave in the default location, or move to any location you prefer.
Assign the location in the StreamLogsDirectory registry key.
For detailed steps about how to optimize full-text indexing, see How to Optimize Full-Text Indexing.
When the first index is created on your server, Exchange Server 2003 creates a new property store database on your Exchange Server 2003 system drive. To improve performance, move the property store database files to your RAID array. You need to move the property store and the property store logs only one time for each server, because all indexes on a server use the same property store.
For detailed steps about how to move the property store in a non-clustered environment, see How to Move the Property Store and the Property Store Logs for Full-Text Indexing in a Non-Clustered Environment.
For detailed steps about how to move the property store in a clustered environment, see How to Move the Property Store and the Property Store Logs for Full-Text Indexing in a Clustered Environment.
By default, the gather and filter temporary files (also known as temp files) are located on the Exchange Server 2003 system drive, which typically does not have the I/O throughput of the RAID array. Use the SetTempPath tool to move the temporary directory to the RAID array. You need to move this directory only one time for each server, because all indexes on a server use the same temporary directory.
For detailed steps about how to move the Microsoft Search service temporary directory, see How to Move the Microsoft Search Server Temporary Directory.
The index should be located on the RAID array. If you did not specify this location when you created the index, use the Catutil tool to move it.
For detailed steps about how to move an index, see How to Move the Index (Catalog) for Full-Text Indexing.
The gather logs are created on the Exchange Server 2003 system drive, which typically does not have the I/O throughput of the RAID array. You can choose to leave the gather logs in the default location, or you can specify a location on a higher performance drive.
For detailed steps about how to move the gather logs, see How to Move the Gather Logs for Full-Text Indexing.
By default, the index includes messages (including attachments) that are 16 MB or less in size. Therefore, messages with large attachments may be excluded from the index and from the search results of users. To avoid performance problems, it is recommended that you increase this limit to the maximum setting of 4,000 MB so that larger messages and attachments are indexed.
For detailed steps about how to increase the message size limit, see How to Increase the Message Size Limit for Full-Text Indexing.
It is strongly recommended that you use the checkpointing script provided with Microsoft Exchange 2000 Server SP2 to prevent possible indexing problems. If the Microsoft Search service terminates abnormally during an incremental population of the index, some folders and messages may not be indexed properly. (An incremental population is a process that updates an existing index with data that has changed since the previous population.) Checkpointing remedies this problem by maintaining the following backup files in the catalog directory:
Two checkpoint record files: <catalog>.chk1.gthr and <catalog>.chk2.gthr.
Approximately 13 files consisting of the last known complete and uncorrupted set of catalog files stored in a Save subdirectory.
Checkpointing is not turned on by default because it requires a significant amount of additional disk space. The additional file size is approximately 200 bytes for each document in your database. For example, 5,000,000 messages or documents in your database generate checkpointing files totaling 1 gigabyte (GB). The size of these files grows as the number of documents in your database grows. You should ensure that there is sufficient disk space before you run the checkpointing script. It is recommended that at least 15 percent free disk space is available on the disk on which you keep full-text indexing catalogs.
For detailed steps about how set up checkpointing, see How to Set Up Checkpointing for Full-Text Indexing.
After you create the index, you must run a full population (also called a crawl) to fill the index with data. The resource usage setting for full-text indexing is located on the Full-Text Indexing tab of the server's Properties dialog box. By default, it is set to Low. It is recommended that you use the default setting. A higher setting yields little benefit and could slow down user access to the server running Exchange Server 2003.
With a resource usage setting of Low, the population process runs in the background and can be performed during business hours. Population process threads use idle processing time. User activities receive priority on the system. Because full-text indexing uses only cycles that would otherwise be idle, it should not significantly slow down user access to the server. Expect CPU usage to approach 100 percent as a normal effect of the population process.
|If you are experiencing performance issues with the Exchange server while the Microsoft Search service is performing a full or incremental population, you can drop the resource usage to Minimum. By setting the resource usage to Minimum, you further reduce the amount of resources the Microsoft Search service can use. Therefore, full or incremental populations take longer to complete, but there will not be any data loss.|
For detailed steps about how to start a full population, see How to Start a Full-Text Indexing Full Population.
The initial full population can take a long time. With a typical Exchange Server 2003 configuration, population performance typically ranges from 10 to 20 messages per second. Performance varies based on the hardware configuration, the type and size of messages, and the server resources that are available. As a result, the total time required for a full population can range from a few minutes for a small database, to several days for a large database.
The content language of documents on your server also affects the time the population takes. For example, populating an index on a server that contains documents written mostly in East Asian languages can take more than five times longer than for a server containing documents that are written in Western European languages. Folders containing Internet news feeds can also significantly lengthen population time if the folders contain messages in uuencode format.
For detailed steps about how to view the status of the population process, see How to View the Status of a Full-Text Indexing Population.
For detailed steps about how to pause a full population, see How to Pause a Full-Text Indexing Full Population.
Determine how often you want to run an incremental population to update the index. Because an incremental population runs in the background the same way a full population does, frequent updates do not significantly affect system response time for users. Although you should schedule incremental population to occur at least once daily, you may want to schedule more frequent updates, because the index is only as current as the last time it was populated. You should also consider the amount of time it takes to complete an incremental population. For example, a typical schedule sets incremental updates at the beginning of each hour. However, if the update lasts more than an hour, the next incremental population begins at the start of the following hour.
The schedule for the incremental population only determines when the population process can begin. It does not place a time limit on the population process. Therefore, it is possible that an incremental population will continue to completion outside of the scheduled time.
|Generally, if the mailbox store or public folder store is 6GB or smaller, you can perform incremental updates hourly. If the store is larger than 6GB, or the server has high memory usage, you may want to update the index less frequently.|
For detailed steps about how to set the incremental population schedule, see How to Set the Full-Text Indexing Incremental Population Schedule.
After the initial population and at least one incremental population are complete, enable the use of the index so that users can begin conducting full-text searches against the index.
For detailed steps about how to enable the use of a full-text index, see How to Enable Full-Text Indexing Queries.
After you have enabled queries, notify users that the indexes are available for searching, and educate them about what they can expect when they run full-text index searches. For example, to notify users, you can send out an e-mail announcement to your users.