In Office SharePoint Server 2007, search is a shared service available at the SSP level. The Office SharePoint Server 2007 search system consists of two main server roles: the index server and the query server.
Crawling and indexing are resource-intensive operations. Crawling content is the process by which the system accesses and parses content and its properties to build a content index from which search queries can be serviced. Crawling consumes processing and memory resources on the index server, the query server or servers servicing the crawl operations, the server or servers hosting the content repository that is being crawled, and the database server that is serving the Office SharePoint Server 2007 farm.
Crawls affect the overall performance of the system, and directly affect user response time and the performance of other shared services in the farm as well as the Web service on the query server that services crawl operations. You can dedicate a query server for crawling operations to reduce the load on other farm servers.
Indexing the crawled content can also affect the overall performance of the system if crawl operations are not assigned to a dedicated query server. If search-related operations constitute a significant portion of farm operations, consider deploying a dedicated query server. See the Dedicated query server for crawling section in this article for more information.
Use the information in this section to specify requirements for index servers in your Office SharePoint Server 2007 farm.
Index server CPU
The index server processor speed influences the crawl speed and the number of crawling threads that can be instantiated. Although there is no specific number or type of processors that are recommended, you should consider the amount of content that will be crawled when determining the index server requirements. In an enterprise environment, the index server should have multiple processors to handle a large indexing load.
The following table shows how crawl speed increases as the number of processors available on the index server increases.
| Number of processors | Percentage of improvement in crawl speed |
|
1
|
0.00
|
|
2
|
10.89
|
|
4
|
19.77
|
|
8
|
30.77
|
Index server memory
On the index server, documents are loaded in buffers for processing by the crawler engine. In a farm with a corpus of approximately 1 million documents, the index server requires approximately 1.5 GB of memory. After a document is processed in memory, it is written to disk. The greater the memory capacity, the more documents the crawler can process in parallel, which results in improved crawl speed.
We recommend a minimum of 4 GB RAM on the index server for crawling a corpus with more than 1 million documents.
Index server disk speed
We recommend that you specify RAID 10 with 2 millisecond (ms) access times and greater than 150 MB/sec write times for fast disk writes.
Single index and relevance
In SharePoint Portal Server 2003, the content index could be split up across multiple servers to create subsets of the indexed content and to better accommodate growth. Although Office SharePoint Server 2007 supports the use of multiple index servers for scaling out, each index server requires a separate SSP, and there is no way to combine the separate indexes.
Number of index servers
You can deploy multiple index servers to a farm in cases where complete isolation between SSPs is desired, or to scale out your system. Although there is no hard limit on the number of index servers in a farm, testing has been conducted with a maximum of four index servers in a single farm.
The number of index servers you use in a farm depends on the way you want to target your search experience. If the search experience requires that content being crawled needs to be contained within a single results set, you should deploy one SSP with a single index server. Most organizations want all crawled content to be searchable by users, and therefore do not require multiple search scopes.
If the search experience can be split across different scopes to provide separate relevant search result sets over different content repositories, multiple SSPs and index servers can be used. An example of a scenario in which different search scopes are desirable is an enterprise with one division that maintains sensitive documents that must be searchable only by a specific group of users.
Depending on your scale and security requirements, you can associate all your SSPs with a single index server, or associate each SSP with a separate index server.
Note:
|
|
Querying across multiple SSPs to get a single relevant set of results is not supported in Office SharePoint Server 2007.
|
A single index server with a robust hardware configuration can support up to 50 million documents. If you are building a single index of this size, we recommend using no more than one index server in a farm because the index is propagated to all query servers in the farm. If a second index server is added, the indexes from the second index server are also propagated to all query servers in the farm, which adds load on the query servers.
To increase search capacity by adding SSPs, you will also need to scale out. At the very least, you should add another index server, database server, and dedicated Web server. If your hardware currently supports indexing 10 million documents within a single SSP, you can scale up by using the same hardware to host 20 SSPs.
Note:
|
|
In Microsoft Office SharePoint Server 2007 for Search, you can only use one SSP.
|
This will enable you to index approximately 2 million documents per SSP for a total of approximately 40 million documents.
Note:
|
|
An SSP is always associated with only one index server. However, an index server can accommodate multiple SSPs.
|
Dedicated query server for crawling
It is a best practice to dedicate a query server for crawl operations.
In a search-enabled farm, all query servers in the farm service crawl operations by default. When a crawl operation commences, the index server sends a request to the query servers, which in turn fetch the content to be crawled and deliver it to the index server. When user load is high, a crawl operation might reduce the responsiveness of the system to user requests.
To mitigate the impact of crawl operations on the performance of the farm, you can configure a dedicated query server for crawling. Dedicating a query server for crawling forces all crawl operations to be serviced through the dedicated server, while all other query servers in the farm continue to service user requests. This configuration is particularly useful for environments in which crawl operations cannot be confined to an overnight window, or for geographically distributed environments in which users are making requests at all hours.
For more information about how to dedicate a query server for crawling, see Configure a dedicated front-end Web server for crawling (Office SharePoint Server 2007).
Note:
|
|
Dedicating a query server for crawling might affect other services running on the server. A query server used in this way cannot be load balanced, and will not serve end-user requests.
|
Index server performance optimization
Indexing operations increase the load on the database server, and can reduce the responsiveness of the farm. Indexing operations can also affect other shared services on the application server running the Search Indexing service. You can adjust the indexing performance level for each index server to one of the following three values:
-
Reduced
-
Partly reduced
-
Maximum
The default setting is Reduced. You can only configure this setting for a specific index server, not for the SSP.
Crawls affect performance of the database server because the Office SharePoint Server Search service writes all the metadata collected from the crawled documents into database tables. It is possible for the index server or servers to generate data at a rate that can overload the database server.
You should conduct your own testing to balance crawl speed, network latency, database load, and the load on the content repositories that are being crawled.
The following table shows the relationship between the performance-level setting and the CPU utilization on the index and database servers as tested.
| Performance-level setting | Index server CPU utilization percentage | Database server CPU utilization percentage |
|
Reduced
|
20
|
20
|
|
Partly reduced
|
24
|
24
|
|
Maximum
|
25
|
26
|
Consider the scenarios and recommendations for the performance-level setting in the following list:
-
If the index server and database servers are used only for the Office SharePoint Server Search service, you can set the level to Maximum. However, we recommend that the maximum increase in database server CPU utilization related to index server activity not be greater than 30 percent. If the increase in database server CPU utilization exceeds 30 percent when the performance level is set to Maximum, we recommend setting the performance level to the next lower setting.
-
If the application server and the database server are shared across multiple shared services such as the Office SharePoint Server Search service and Excel Calculation Services, we recommend that you select a lower performance-level setting. However, reducing the maximum allowed indexing activity reduces the speed at which items are indexed, which might cause search results to be outdated. Monitor local server performance to help determine the appropriate index server performance level.
Use the following procedure to specify the performance-level setting on the index server.
Adjust index server performance
-
Click Start, point to All Programs, point to Microsoft Office Server, and then click SharePoint 3.0 Central Administration.
-
On the Central Administration home page, click Operations.
-
On the Operations page, in the Topology and Services section, click Services on server.
-
On the Services on Server page, on the Server menu, select the index server that you want to manage.
-
In the Start services in the table below section, click Office SharePoint Server Search.
-
On the Configure Office SharePoint Server Search Service Settings page, in the Indexer Performance section, select the performance level that you want to apply.
-
Click OK to save your changes.
Crawler impact rules
Crawler impact rules are farm-level search configuration settings that specify the number of simultaneous requests that the Office SharePoint Server Search service generates when it crawls using a specified content source. The greater the number of simultaneous requests, the faster the crawl speed. Note that the request frequency specified in a crawler impact rule directly affects the load on the database server and the load on the server hosting the content that is being crawled. If you increase the request frequency for a given site, you should carefully monitor the servers being crawled to evaluate whether the greater load is acceptable.
The default value is the number of processes on the index server. Therefore, for a quad-processor computer, the default value is eight. We recommend that you adjust the value and measure the load on the target server to determine the optimum number of simultaneous requests. You can select the number of simultaneous requests from the following available values: 1, 2, 4, 8, 16, 32, 64.
You can also create a rule to request one document at a time and wait a specified number of seconds between requests. Such a rule can be useful for crawling a site that has a constant user load.
The following table shows the relationship between the number of simultaneous requests and the CPU utilization on index servers and database servers.
| Number of crawl threads | Index server CPU utilization percentage | Database server CPU utilization percentage |
|
4
|
35
|
12
|
|
8
|
40
|
15
|
|
12
|
45
|
15
|
|
16
|
60
|
20
|
You can create a crawler impact rule by using the following procedure.
Create a crawler impact rule
-
Click Start, point to All Programs, point to Microsoft Office Server, and then click SharePoint 3.0 Central Administration.
-
On the Central Administration home page, click Application Management.
-
On the Application Management page, in the Search section, click Manage search service.
-
On the Manage Search Service page, in the Farm-Level Search Settings section, click Crawler impact rules.
-
On the Crawler Impact Rules page, click Add Rule.
-
On the Add Crawler Impact Rule page, in the Site section, type the name of the site for which you want to create a rule. Do not include the protocol (for example, do not include http://)
-
In the Request Frequency section, specify how the crawler will request documents from this site.
-
To simultaneously request multiple documents, select Request up to the specified number of documents at a time and do not wait between requests, and then select the value that you want from the Simultaneous requests list.
-
To request one document at a time, select Request one document at a time and wait the specified time between requests, and then type the number of seconds to wait between requests in the Time to wait (in seconds) box.
-
Click OK to create the rule.
Use the information in this section to determine specifications for query servers in your Office SharePoint Server 2007 farm.
Query server memory
The greater the memory that is available, the fewer times the Office SharePoint Server Search service will need to access the hard disk to perform a given query. Having adequate memory also permits more effective caching. Ideally, enough memory should be installed on the query servers to accommodate the entire index.
The following figure shows the relationship between the size of the index on the query servers and the user response time per query.
Query server disk speed
We recommend using RAID 10 for fast disk writes.
Number of query servers
You can deploy multiple query servers in the farm to achieve redundancy and load balancing. The number of query servers you use depends on how many users are present in the farm and the peak hour load that you expect. We have tested up to eight query servers per farm.
The following figure shows query throughput, database server CPU utilization percentage for the search database, and the query server CPU utilization percentage as query servers are added to the farm. In the test from which this data was generated, the database server used was shared between content databases and the service databases.
Remote server latency
Server latency is a major factor that affects crawl performance. Performance between farm servers must be balanced for overall crawl performance to reach its potential. For example, a powerful index server can be operating at 25% of its capacity if the database server being crawled is not able to respond quickly enough. In such a case, you can scale up the database server, which will in turn increase crawl speeds across the entire farm.
You should conduct your own testing to evaluate the responsiveness of servers in your environment. The database server serving the target farm is often the bottleneck in cases where crawl performance is poor. To improve crawl performance, you can:
-
Scale up database server hardware by adding or upgrading processors, adding memory, and upgrading to hard disks with faster seek and write times.
-
Increase the memory on query servers in the farm
-
Crawl during non-peak hours so that the database server being crawled can service user traffic during the day, and respond to crawls during off-peak hours.
The Office SharePoint Server 2007 search system crawls both text data and the metadata associated with the content. In Office SharePoint Portal Server 2003, all metadata gathered by the indexing system was stored in a JET database property store. In Office SharePoint Server 2007, the inverted full text index is stored on the index server, and the metadata is stored in the Search database. The index server writes metadata to the database, and the query servers read that data to process property-based queries issued by users.
Use the information in this section to determine specifications for database servers in your Office SharePoint Server 2007 farm.
Database throughput
The database metadata store is shared by the index server and all query servers in the farm. The index server writes all metadata, and the query servers read this data to process search requests. Query throughput is dependent largely on the metadata store responsiveness.
As the number of query servers increases in the farm, the load on the database server also increases and affects the overall query throughput. You should carefully monitor the database server when adding index servers or query servers to the farm to ensure that database performance remains adequate.
Database server hard disk distribution
Because the Office SharePoint Server Search service writes a large amount of data to the search database during crawls, we recommend using separate spindles for the SharedServices_Search_Db, SharedServices_Db, and TempDb databases for better performance in scenarios in which the index contains more than 5 million items.
Database server disk speed
We recommend using RAID 10 for fast disk writes.