Manage crawler impact (Search Server 2008)
Updated: September 11, 2008
Applies To: Microsoft Search Server 2008
Unless otherwise noted, the information in this article applies to both Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.
Content crawls can place a significant load on crawled servers and thereby adversely affect response times for server users. Therefore, we recommend that you use crawler impact rules to specify how aggressively your crawler should perform. A search services administrator can manage the affect of the crawler on a crawled site by using a crawler impact rule to specify one of the following:
The maximum number of documents that the crawler can request at a time from the specified site.
The frequency with which the crawler can request any particular document from the specified site.
For crawling internal content in your organization, you can set crawler impact rules based on the performance and capacity of the crawled servers. For example, you might try to avoid crawling internal servers at peak load times. However, for crawling external sites, this kind of coordination is usually not feasible. Therefore, it is best to configure crawl requests to minimize consumption of external site resources and bandwidth so that external site administrators are less inclined to restrict your future access.
During initial deployment, set your crawler impact rules to minimize impact on crawled servers while crawling them frequently enough to ensure relatively fresh results. Later, during the operations phase, you can adjust crawler impact rules based on your experience and the data from your crawl logs.