Manage the crawling impact of the FAST Search specific connectors

 

Applies to: FAST Search Server 2010

This topic explains how to manage the crawling impact of the indexing connectors that are specific to FAST Search Server 2010 for SharePoint: the FAST Search database connector, the FAST Search Lotus Notes connector, and the FAST Search Web crawler.

Managing the impact of the FAST Search database connector

The FAST Search database connector itself does not have a throttling mechanism to reduce the impact of a crawl on the source database or on the search engine.

However, it is unlikely that the FAST Search database connector will put a significant load on the source database. The connector is able to extract database rows from the source database at a much higher rate than that the search engine is able to index them. Only when the SELECT query used by the FAST Search database connector is very complex there may be an increased load on the database.

In some types of SQL/JDBC implementations, selecting a large dataset can cause a large part of the memory to be allocated to the FAST Search database connector. In some cases, the complete dataset will be transferred to the FAST Search database connector client before it can be processed. You can avoid this by having a server-side cursor. If you are using SQL Server, add “;selectMethod=cursor” to the JDBCURL parameter in FAST Search database connector configuration file. Note that this will increase the memory consumption on the database server side, since the result set is held in memory there until it is transferred to the connector.

Managing the impact of the FAST Search Lotus Notes connector

A single instance of the FAST Search Lotus Notes connector is typically able to extract documents from the Domino server significantly faster than the search engine is capable of indexing them. Because of this, the load on the Domino server is not likely to significantly increase when the FAST Search Lotus Notes connector is running.

However, the connector automatically slows down the extraction rate to match the feed rate to the search engine. This means that the FAST Search Lotus Notes connector could increase the load on the search engine backend.

To throttle the extraction rate from the Domino server, you can configure the AdapterThrottleSleepMS parameter in the FAST Search Lotus Notes content connector configuration file. The value of this parameter, located in the ConnectorExecution group, sets how many milliseconds the connector should sleep between each Note it extracts from the Domino Server.

Note that the sleep interval counts for each adapter thread in the connector. If you use multiple adapter threads, specified in the configuration parameter ConnectorExecution/NumAdapters, the maximum extraction rate for the connector increases. For example: you set the parameter ConnectorExecution/AdapterThrottleSleepMS to 200. This means that each adapter thread can extract a maximum of 5 documents per second from Domino. If you use the default value for the parameter ConnectorExecution/NumAdapters, which is 3, the maximum number of documents that the connector can extract per second is 15. The actual number will be lower since, in addition to completing the sleep interval, the extraction itself takes some time as well.

Managing the impact of the FAST Search Web crawler

While configuring the FAST Search Web crawler, make sure to avoid potentially overloading Web servers. There are a number of settings that can be considered in order to reduce the load of the FAST Search Web crawler.

Each Node Scheduler crawls a number of Web sites (and servers) at the same time, as configured by the max_sites setting. A request will be issued every number of seconds to these Web sites and servers, as configured in the delay setting, with a maximum number of requests pending to any server at any time as configured in the max_pending setting. This combination of settings can result in a significant load, especially if multiple Web sites are hosted on the same server.

Enabling JavaScript support (with the use_javascript setting) causes an additional load, as both JavaScript and CSS dependencies have to be downloaded. It is not uncommon for a Web item to refer around ten to thirty external JavaScripts. In order to improve performance on Web items that contain many external JavaScripts, the FAST Search Web crawler will by default use a request delay of 0 seconds for these dependencies. This delay can be increased using the javascript_delay setting. If you increase the javascript_delay setting, make sure that you adjust the processing timeouts in the Browser Engine too, to avoid timeouts while it is downloading dependencies.