Configure a dedicated front-end Web server for crawling (Search Server 2008)

Applies To: Microsoft Search Server 2008

 

Topic Last Modified: 2011-01-31

Note

The information in this article applies to both Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.

By default, Search Server 2008 uses all of the front-end Web servers in a server farm to crawl content in the farm. When a farm is configured in this way, crawler behavior depends on the number of front-end Web servers in the farm. If the farm has only one front-end Web server, the index server sends get requests directly to that server. If the farm has multiple front-end Web servers, the index server sends get requests to the network load balancer, which forwards each request to one of the front-end Web servers. (If a server farm has more than one front-end Web server, the farm must use a network load balancer to distribute user content requests across the front-end Web servers.) Over time, the network load balancer spreads requests across all front-end Web servers. When a front-end Web server receives a content request, it gets the content from content databases that are associated with the SharePoint sites that are being crawled and returns that content to the index server.

In this article:

  • Performance issues caused by using all front-end Web servers for crawling

  • Recommended solution

  • About configuring a dedicated front-end Web server for crawling

Performance issues caused by using all front-end Web servers for crawling

Using all front-end Web servers for crawling in a farm can work well for small to medium-size organizations. Large organizations, however, tend to crawl more content. Such organizations might crawl gigabytes or even terabytes of content. Crawling content in a farm can cause surges in network traffic and can place considerable demands on front-end Web server resources such as the disk, processors, and memory. Crawling a large amount of content can produce more network traffic with the farm’s front-end Web servers than all user requests combined. This traffic can adversely affect the performance of all front-end Web servers in the farm and thereby decrease response times for end-user requests for SharePoint site content.

We recommend that you use a dedicated front-end Web server for crawling, especially if crawling content is producing more traffic on the front-end Web servers than user requests. You can specify any front-end Web server in your farm for crawling.

Note

We recommend that you do not use the index server as the dedicated front-end Web server for crawling. This configuration can degrade crawl performance by causing contention between the indexing process and the process that serves requests to the index server. However, this configuration can be useful if there are hardware constraints and the index server has the capacity to perform the necessary processes simultaneously.

We also recommend that you do not include the dedicated front-end Web server in the network load balancing rotation for incoming user requests for content. Otherwise, user requests that the network load balancer directs to the dedicated front-end Web server for crawling might be subjected to inconsistent performance.

When not to configure a dedicated front-end Web server for crawling

Do not configure a dedicated front-end Web server for crawling under any of the following conditions:

  • Another application (such as the Excel Calculation service) is running on the index server. Configuring a dedicated front-end Web server for crawling might prevent that application from communicating with other servers in the farm.

    If other applications are running on the index server, move those applications to another application server before configuring a dedicated front-end Web server for crawling.

  • You want to use the index server as the dedicated front-end Web server for crawling and the index server is also configured as a query server.

  • The NetBios name of your query server is also the host name of your SharePoint site.

In either of the preceding two cases, configuring a dedicated front-end Web server for crawling can prevent the index server from propagating the index to another server.

About configuring a dedicated front-end Web server for crawling

There are two ways to configure a dedicated front-end Web server for crawling:

  • Use the Configure Office SharePoint Server Search Service Settings page in Central Administration.

  • Update the Hosts file directly.

Before you configure a dedicated front-end Web server for crawling, we recommend that you read the following section to determine which configuration method to use.

How the Hosts file is affected when you use the user interface to configure a dedicated front-end Web server for crawling

When crawling content, Search Server 2008 reads the Hosts file on the index server to determine whether to use all front-end Web servers for crawling (the default), or to use a dedicated front-end Web server for crawling.

When you use the Configure Office SharePoint Server Search Service Settings page in Central Administration to select a dedicated front-end Web server for crawling, the SharePoint timer service writes the following entries to the Hosts file:

  • One entry that specifies the IP address and the computer name of the front-end Web server.

  • One entry for each Web application on the front-end Web server that you configured to use a host header. Each such entry specifies the IP address of the front-end Web server, followed by the host header.

Each entry is on a separate line in the Hosts file, like this:

111.11.111.111 MyMossMachine #Added by Office SharePoint Server Search (7/15/2008 2:56 PM).

111.11.111.111 Marketing #Added by Office SharePoint Server Search (7/15/2008 2:56 PM).

111.11.111.111 Human Resources #Added by Office SharePoint Server Search (7/15/2008 2:57 PM).

Possible problems

In some cases, the timer service writes the incorrect IP address to your Hosts file. (For more information, see the blog post at https://go.microsoft.com/fwlink/?LinkId=135698.) This can cause problems ranging from inability to crawl content to inability to view sites, such as the Search Services Provider (SSP) or Central Administration site. The timer service can add an incorrect IP address to the Hosts file in cases such as the following:

  • The server that you specified as your dedicated front-end Web server for crawling has multiple IP addresses assigned to one or more network cards.

  • Your server farm is using network load balancing.

If either of these conditions is true, we recommend that you add the entries to the Hosts file directly instead of using the user interface to specify a dedicated front-end Web server for crawling.

Important

When you use the Configure Office SharePoint Server Search Service Settings page in Central Administration to specify a dedicated front-end Web server for crawling, you cannot change the Hosts file manually if the timer service adds the wrong IP address. This is because the timer service repeatedly overwrites the entries in the Hosts file every few minutes. If this occurs, use the Configure Office SharePoint Server Search Service Settings page in Central Administration to specify that all front-end Web servers are used for crawling, and then remove the entries in the Hosts file that were made by the timer service.

To configure a dedicated front-end Web server for crawling, perform one of the following procedures: