Estimate performance and capacity requirements for search environments

Applies To: Office SharePoint Server 2007

This Office product will reach end of support on October 10, 2017. To stay supported, you will need to upgrade. For more information, see , Resources to help you upgrade your Office 2007 servers and clients.

 

Topic Last Modified: 2016-11-14

In this article:

  • Key characteristics

  • Test environment

  • Recommendations

This performance and capacity planning scenario incorporates a single Microsoft Office SharePoint Server 2007 farm used for searching and indexing Office SharePoint Server content in an enterprise environment.

Important

Some of the guidance in this article has been updated for Office SharePoint Server 2007 with SP1. For a comprehensive list of Office SharePoint Server 2007 with SP1 updates, see Downloadable book: Planning and deploying Service Pack 1 for Office SharePoint Server 2007 in a multi-server environment.

Key characteristics

Key characteristics describe environmental factors, usage characteristics, and other considerations that are likely to be found in deployments based on this scenario.

The key characteristics for this scenario include:

  • **User response times   **Target user response times for common, uncommon, long-running, and rare operations are listed in the "User response time" table in Plan for software boundaries (Office SharePoint Server). Some organizations might tolerate slower user response times or might require faster user response times. The expected user response time is a key factor that determines overall throughput targets. Throughput is how many requests the server farm can process per second. When you have more users, you require a higher throughput target to achieve the same user response time.

  • **User concurrency   **A concurrency rate of 10 percent is assumed, with one percent of concurrent users making requests at a given moment. For example, for 10,000 users, 1,000 users are actively using the solution simultaneously, and 100 users are actively making requests.

  • **Long-running asynchronous tasks   **Tasks such as crawling content and backing up databases add a performance load to the server farm. The general performance characteristics of sample topologies assume that these tasks are running during off-peak hours, such as overnight. Thus, user response rates during business hours are not affected.

Test environment

Testing for this scenario was designed to help develop estimates of how different farm configurations respond to changes in a variety of factors, including:

  • How many concurrent users are using the system.

  • What kinds of user operations are being performed.

  • How many documents are in the index that is being queried.

It is important to note that although certain conclusions can be drawn from the test results, the specific capacity and performance figures in this section will be different from the figures in real-world environments. The figures in this article are intended to provide a starting point for the design of a properly scaled environment. After you complete your initial system design, test the configuration to determine if your system will support the factors inherent in your environment.

Note

These tests were conducted to simulate an enterprise environment with millions of documents and a large user base. The hardware used for the test environment was configured with robust processors and a large amount of memory and disk capacity. See Hardware Recommendations in the Recommendations section of this article for starting-point hardware recommendations.

For more information about testing your deployment, see Tools for performance and capacity planning (Office SharePoint Server).

Assumptions

  • **64-bit architecture   **Only 64-bit servers were used in the test environment. Although Office SharePoint Server 2007 can be deployed on 32-bit servers, we recommend that you employ 64-bit servers in Office SharePoint Server 2007 farm deployments. For more information, see the "64-bit vs. 32-bit" section in the article About performance and capacity planning (Office SharePoint Server).

  • Disk-based caching is enabled   Disk-based caching eliminatesthe need to access the database multiple times for code fragments or large binary files, such as image, sound, and video files. Enabling disk-based caching will improve performance across your entire deployment. Note that disk-based caching is not enabled by default. For information about enabling disk-based caching, see Disk-based Caching for Binary Large Objects (https://go.microsoft.com/fwlink/?LinkId=82617&clcid=0x409).

Lab Topology

A number of farm configurations were used for testing, ranging from one through eight query servers, one index server, one SSP, and one database server computer running Microsoft SQL Server 2005 database software. All server computers were running the default configuration of Office SharePoint Server 2007 Enterprise Edition on the Microsoft Windows Server 2003 operating system with Service Pack 1 (SP1), Enterprise x64 Edition.

The following table lists the specific hardware used for testing.

Computer role Hardware Hard disk capacity

Query servers

4 dual-core Intel Xeon 2.66 gigahertz (GHz) processors

32 gigabytes (GB) RAM

40 GB for the operating system (Redundant Array of Independent Disks (RAID) 5)

956 GB for the content index and the operating system paging file (RAID 10)

Index server

4 dual-core Intel Xeon 2.66 GHz processors

32 GB RAM

40 GB for the operating system (RAID 5)

956 GB for the content index and the operating system paging file (RAID 10)

Database server

4 dual-core Intel Xeon 2.66 GHz processors

32 GB RAM

40 GB for the operating system (RAID 5)

956 GB for the SharedServices_Search_DB database with dedicated small computer system interface (SCSI) controller (RAID 10)

The following disks shared a SCSI controller:

273 GB for the SharedServices_DB database (RAID 10)

273 GB for the TempDb database (RAID 10)

273 GB for log files (RAID 10)

136 GB for the SharePoint_Config database (RAID 10)

A gigabit (1 billion bits/sec) network was used in the test environment. We recommend using a gigabit network between servers in a Office SharePoint Server farm to ensure adequate network bandwidth.

Usage profile

The following tables show the usage profile for the Office SharePoint Server 2007 search test environment.

Note

For testing of this scenario, only query user operations were used to determine system performance.

Approximately 50 million items were crawled for testing. The following table shows the type and number of items crawled. Items were 10 kilobytes (KB) to 100 KB in size, and included list items, web pages, and various document types.

Type of item Number of items

Content on SharePoint sites

10 million items, including the following:

  • 420 site collections

  • 4,000 sites

  • 24,200 lists

  • 47,780 document libraries

Content on file shares

15 million items

HTTP content

15 million items

People profiles

2.5 million

Stitch (in-memory test tool that generates documents in memory)

7.5 million

Properties (metadata)

1 million

The following table shows disk space usage.

Type of usage Volume

Index size on query server

100 GB*

Index size on index server

100 GB*

Search database size

600 GB

Note

The tested index sizes are smaller than what might be observed in a production environment. In the test-generated corpus, the number of unique words is limited and often repeated.

The time to perform a full crawl during testing was 35 days (approximately 15 documents per second). Note that these test results were observed in a production environment where network latency and the responsiveness of the crawled repositories affected crawl speed. Crawl speed measured by documents per second might be significantly faster in a pure test environment, or in environments with greater bandwidth and greater responsiveness of crawled repositories.

If two percent of a corpus of the size used in the test environment changes, an incremental crawl to catch up with the changes takes approximately 8-12 hours, depending on latency and the responsiveness of the sites being crawled. Note that changes to metadata and outbound links take longer to process than changes to the contents of documents.

Recommendations

This section provides general performance and capacity recommendations. Use these recommendations to determine the capacity and performance characteristics of the starting topology that you created in Plan for redundancy (Office SharePoint Server), and to determine whether you need to scale out or scale up the starting topology.

Note

Scale out means to add more servers in a particular role, and scale up means to increase the performance or capacity of a given server by adding memory, hard disk capacity, or processor speed.

Hardware recommendations

The following table lists the recommended hardware for Web servers, index servers, and database servers.

Note

Memory requirements for Web, index and database servers are dependent on the size of the farm, the number of concurrent users, and the complexity of features and pages in the farm. The memory recommendations in the following table may be adequate for a small or light usage farm, but memory usage should be carefully monitored to determine if more memory must be added.

Server role Recommended hardware

Web (query) server

Dual 2.5 GHz or faster processors (3 GHz or faster recommended)

2 GB RAM minimum recommended

3 GB available disk space

DVD drive, local or network accessible

Index server

Dual 2.5 GHz or faster processors (3 GHz or faster recommended)

4 GB RAM minimum recommended

3 GB available disk space

DVD drive, local or network accessible

Database server

Dual 2.5 GHz or faster processors (3 GHz or faster recommended)

4 GB RAM minimum recommended

Hard disk space for the content database is based on a 1:1.2 ratio of content size to database capacity. For example, if you plan for 100 GB of content, you need at least 120 GB of available disk space for the content database, plus additional space for transaction logs.

Hard disk space for the search database is based on a 1:4 ratio of index size to database capacity. For example, if your index will be 100 GB in size, you need at least 400 GB of available disk space for the search database, plus additional space for transaction logs.

DVD drive, local or network accessible

Note

The amount of hard disk space required on the database server for transaction logs depends on the log settings. For more information, see Understanding and Managing Transaction Logs (https://go.microsoft.com/fwlink/?LinkId=82925&clcid=0x409).

For more information about minimum and recommended system requirements, see Determine hardware and software requirements (Search Server 2008).

Starting-point topologies

You can estimate the performance of your starting-point topology by comparing your topology to the starting-point topologies that are provided in Plan for redundancy (Office SharePoint Server). So doing can help you quickly determine if you need to scale up or scale out your starting-point topology to meet your performance and capacity goals.

Capacity and performance of scaled-up and scaled-out topologies

To increase the capacity and performance of one of the starting-point topologies, either scale up by implementing server computers with greater capacity or scale out by adding servers to the topology. This section describes the general performance characteristics of several scaled-up or scaled-out topologies. The sample topologies represent the following common ways to scale up or scale out a topology for a search environment:

  • To accommodate greater user load, add query server computers. You can also add index servers and dedicated query servers to relieve some of the processing burden from the Web servers.

  • To accommodate greater data load, add capacity to the database server role by increasing the capacity of a single (clustered or mirrored) server, by upgrading to a 64-bit server, or by adding clustered or mirrored servers.

  • Maintain a ratio of no greater than eight query server computers to one (clustered or mirrored) database server computer. Testing in our lab yielded an optimum ratio of 7x1x1 (seven query servers to one index server and one database server)

Estimating throughput targets

This section provides test data that shows farm throughput for an increasing number of query servers and more user connections.

Because Office SharePoint Server 2007 can be deployed and configured many ways, there is no simple way to estimate how many users can be supported by a given number of servers. Therefore, it is important that you conduct testing in your own environment before deploying Office SharePoint Server 2007 in a production environment.

There are several factors that can affect throughput, including the number of users, complexity and frequency of user operations, caching, and customization of pages and Web Parts. Each of these factors can have a major effect on farm throughput. You should carefully consider each of these factors when you are planning your deployment.

For more information about caching in Office SharePoint Server 2007, see the following resources:

If your organization has an existing search solution, you can view the Internet Information Services (IIS) logs to determine the usage patterns and trends in your current environment. For more information about parsing IIS logs, see Analyzing Log Files (IIS 6.0) (https://go.microsoft.com/fwlink/?LinkId=78825&clcid=0x409).

If your organization is planning a new search solution deployment, use the information in the following section to estimate your usage patterns.

Test results: Throughput by farm configuration

The table in this section shows test results for a variety of user operation profiles using the hardware and usage profile listed in Test environment earlier in this article. Note that for each farm configuration, a range of one through eight query servers was tested in conjunction with one index server and one database server. Therefore, a 3x1x1 farm configuration signifies three query servers by one index server by one database server. Testing was not conducted on farms containing multiple index or database servers.

The following table shows test results for search-related user operations.

Farm size RPS Query server CPU utilization percentage Index server CPU utilization percentage Database server CPU utilization percentage Database server disk writes/sec average

1x1x1

24.01

99.49

1.98

7.23

6.11

2x1x1

48.04

96.98

3.95

13.02

2.66

3x1x1

71.07

94.73

5.61

20.56

2.29

4x1x1

93.11

91.77

8.81

29.21

2.41

5x1x1

114.95

90.50

10.27

39.38

2.45

6x1x1

133.34

87.29

11.91

52.94

2.83

7x1x1

148.52

80.20

15.24

63.72

3.14

8x1x1

146.94

65.65

15.15

69.15

2.87

The following graph shows changes in throughput for search operations when the number of query servers changes.

Requests per second versus query servers

Estimate crawl window

In a Office SharePoint Server 2007 search environment, crawling content typically is the longest-running operation that is not initiated by users. You will need to perform testing in your own environment to determine the amount of time it takes to crawl content using a particular content source, and whether the throughput consumed by crawling this content interferes with your target user response times. Typically, you should verify that crawling a particular content source can be contained within an overnight time span of 12 hours.

Estimate disk space requirements

Use the following information to plan the disk space requirements for the index servers, query servers, and database servers in your environment.

Disk space requirements for index servers and query servers

Use the following information to plan the disk space requirements for the index servers and query servers in your server farm.

Note

The size of the content index is typically smaller than the corpus because all noise words are removed before the content is indexed.

Note

If the query server role is enabled on a server other than the index server, the index is automatically propagated to those query servers. To store a copy of the content index in the file system on a query server, each query server requires the same amount of disk space as the index server uses for the the content index. For more information, see Plan for redundancy (Office SharePoint Server).

To estimate the disk space requirements for the hard disk that contains the content index:

  1. Estimate how much content you plan to crawl and the average size of each file. If you do not know the average size of files in your corpus, use 10 KB per document as a starting point.

    Use the following formula to calculate how much disk space you need to store the content index:

    GB of disk space required = Total_Corpus_Size (in GB) x File_Size_Modifier x 2.85

    where File_Size_Modifier is a number in the following range, based on the average size of the files in your corpus:

    • 1.0 if your corpus contains very small files (average file size = 1 KB).

    • 0.12 if your corpus contains moderate files (average file size = 10 KB).

    • 0.05 if your corpus contains large files (average file size = 100 KB or larger).

    • 0.01 if your corpus contains very large files (average file size = 500 KB or larger).

Note

This equation is intended only to establish a starting-point estimate. Real-world results may vary widely based on the size and type of documents being indexed, and how much metadata is being indexed during a crawl operation.

Note

The File_Size_Modifier decreases as the file size increases because typically text makes up a smaller portion of a large file. This is because large files often contain embedded bitmaps or other binary objects. This is also why the File_Size_Modifier can vary considerably from one deployment to another, even if file sizes are similar. Therefore, a more reliable way to estimate the index size is to crawl a small sample of the corpus and then extrapolate the total index size from the sample result.

In this equation, you multiply Total_Corpus_Size (in GB) x File_Size_Modifier to get the estimated size of the index file. Next, you multiply by 2.85 to accommodate overhead for master merges when crawled data is merged with the index. The final result is the estimated disk space requirement.

For example, for a corpus size of 1 GB that primarily contains files that average 10KB in size, use the following values to calculate the estimated size of the index file:

1 GB x 0.12 = 0.12GB

According to this calculation, the estimated size of the index file is 120MB.

Next, multiply the estimated size of the index file by 2.85:

120 MB x 2.85 = 342MB

Thus, the disk space required for the index file and to accommodate indexing operations is 342MB, or 0.342GB.

Note

The volume of crawled data can differ based on the content being crawled. A content source is a set of options that you can use to specify the protocol to use when crawling, what URLs from which to start crawling, and how many levels deep and when to crawl.

  1. Based on your estimate, if the content index will fit within your available hard disk space on the index and query servers, go to step 3. Otherwise, add disk space or reevaluate step 1 before proceeding to step 3.

  2. Crawl some of the content.

  3. Evaluate the size of the content index and the number of files that were crawled. Use this information to increase the accuracy of the calculation you performed in step 1.

  4. If the remaining hard disk space is adequate, crawl some more content. Otherwise, add hard disk space as necessary or reevaluate how much content you plan to crawl.

  5. Repeat steps 3 through 5 until all content is crawled.

    After you have crawled the entire corpus, we recommend that you keep a record of the size of your content index and search database for each crawl so that you can determine an average growth rate. Because a corpus tends to grow over time as new content is added to the farm, you should monitor the available hard disk space to ensure that adequate capacity for indexing operations is maintained.

Disk space requirements for the search database

The search database, which stores metadata and crawler history information for the search system, typically requires more disk space than the index. This is especially the case if you primarily crawl SharePoint sites, which are very rich in metadata.

Note

Both the metadata for all indexed content and the crawler history are stored in the search database. For this reason, the search database requires more storage space than the content index.

Use the following formula to calculate how much disk space you need for the search database:

GB of disk space required = Total_Corpus_Size (in GB) x File_Size_Modifier x 4

where File_Size_Modifier is a number in the following range, based on the average size of the files in your corpus:

  • 1.0 if your corpus contains very small files (average file size = 1KB).

  • 0.12 if your corpus contains moderate files (average file size = 10KB).

  • 0.05 if your corpus contains large files (average file size 100KB or larger)

For example, for a corpus size of 1 GB that primarily contains files that average 10 KB in size, substitute the following values into the equation to calculate the estimated size of the index file:

1GB x 0.12 = 0.12GB, or 120MB

Then multiply the estimated size of the index file by 4:

120MB x 4 = 480MB

Thus, the disk space required for the search database is 480MB, or 0.48GB.

Determining specifications for index, query, and database servers

In Office SharePoint Server 2007, search is a shared service available at the SSP level. The Office SharePoint Server 2007 search system consists of two main server roles: the index server and the query server.

Crawling and indexing are resource-intensive operations. Crawling content is the process by which the system accesses and parses content and its properties to build a content index from which search queries can be serviced. Crawling consumes processing and memory resources on the index server, the query server or servers servicing the crawl operations, the server or servers hosting the content repository that is being crawled, and the database server that is serving the Office SharePoint Server 2007 farm.

Crawls affect the overall performance of the system, and directly affect user response time and the performance of other shared services in the farm as well as the Web service on the query server that services crawl operations. You can dedicate a query server for crawling operations to reduce the load on other farm servers.

Indexing the crawled content can also affect the overall performance of the system if crawl operations are not assigned to a dedicated query server. If search-related operations constitute a significant portion of farm operations, consider deploying a dedicated query server. See the Dedicated query server for crawling section in this article for more information.

Determining specifications for index servers

Use the information in this section to specify requirements for index servers in your Office SharePoint Server 2007 farm.

Index server CPU

The index server processor speed influences the crawl speed and the number of crawling threads that can be instantiated. Although there is no specific number or type of processors that are recommended, you should consider the amount of content that will be crawled when determining the index server requirements. In an enterprise environment, the index server should have multiple processors to handle a large indexing load.

The following table shows how crawl speed increases as the number of processors available on the index server increases.

Number of processors Percentage of improvement in crawl speed

1

0.00

2

10.89

4

19.77

8

30.77

Index server memory

On the index server, documents are loaded in buffers for processing by the crawler engine. In a farm with a corpus of approximately 1 million documents, the index server requires approximately 1.5 GB of memory. After a document is processed in memory, it is written to disk. The greater the memory capacity, the more documents the crawler can process in parallel, which results in improved crawl speed.

We recommend a minimum of 4 GB RAM on the index server for crawling a corpus with more than 1 million documents.

Index server disk speed

We recommend that you specify RAID 10 with 2 millisecond (ms) access times and greater than 150 MB/sec write times for fast disk writes.

Single index and relevance

In SharePoint Portal Server 2003, the content index could be split up across multiple servers to create subsets of the indexed content and to better accommodate growth. Although Office SharePoint Server 2007 supports the use of multiple index servers for scaling out, each index server requires a separate SSP, and there is no way to combine the separate indexes.

Number of index servers

You can deploy multiple index servers to a farm in cases where complete isolation between SSPs is desired, or to scale out your system. Although there is no hard limit on the number of index servers in a farm, testing has been conducted with a maximum of four index servers in a single farm.

The number of index servers you use in a farm depends on the way you want to target your search experience. If the search experience requires that content being crawled needs to be contained within a single results set, you should deploy one SSP with a single index server. Most organizations want all crawled content to be searchable by users, and therefore do not require multiple search scopes.

If the search experience can be split across different scopes to provide separate relevant search result sets over different content repositories, multiple SSPs and index servers can be used. An example of a scenario in which different search scopes are desirable is an enterprise with one division that maintains sensitive documents that must be searchable only by a specific group of users.

Depending on your scale and security requirements, you can associate all your SSPs with a single index server, or associate each SSP with a separate index server.

Note

Querying across multiple SSPs to get a single relevant set of results is not supported in Office SharePoint Server 2007.

A single index server with a robust hardware configuration can support up to 50 million documents. If you are building a single index of this size, we recommend using no more than one index server in a farm because the index is propagated to all query servers in the farm. If a second index server is added, the indexes from the second index server are also propagated to all query servers in the farm, which adds load on the query servers.

To increase search capacity by adding SSPs, you will also need to scale out. At the very least, you should add another index server, database server, and dedicated Web server. If your hardware currently supports indexing 10 million documents within a single SSP, you can scale up by using the same hardware to host 20 SSPs.

Note

In Microsoft Office SharePoint Server 2007 for Search, you can only use one SSP.

This will enable you to index approximately 2 million documents per SSP for a total of approximately 40 million documents.

Note

An SSP is always associated with only one index server. However, an index server can accommodate multiple SSPs.

Dedicated query server for crawling

It is a best practice to dedicate a query server for crawl operations.

In a search-enabled farm, all query servers in the farm service crawl operations by default. When a crawl operation commences, the index server sends a request to the query servers, which in turn fetch the content to be crawled and deliver it to the index server. When user load is high, a crawl operation might reduce the responsiveness of the system to user requests.

To mitigate the impact of crawl operations on the performance of the farm, you can configure a dedicated query server for crawling. Dedicating a query server for crawling forces all crawl operations to be serviced through the dedicated server, while all other query servers in the farm continue to service user requests. This configuration is particularly useful for environments in which crawl operations cannot be confined to an overnight window, or for geographically distributed environments in which users are making requests at all hours.

For more information about how to dedicate a query server for crawling, see Configure a dedicated front-end Web server for crawling (Office SharePoint Server 2007).

Note

Dedicating a query server for crawling might affect other services running on the server. A query server used in this way cannot be load balanced, and will not serve end-user requests.

Index server performance optimization

Indexing operations increase the load on the database server, and can reduce the responsiveness of the farm. Indexing operations can also affect other shared services on the application server running the Search Indexing service. You can adjust the indexing performance level for each index server to one of the following three values:

  • Reduced

  • Partly reduced

  • Maximum

The default setting is Reduced. You can only configure this setting for a specific index server, not for the SSP.

Crawls affect performance of the database server because the Office SharePoint Server Search service writes all the metadata collected from the crawled documents into database tables. It is possible for the index server or servers to generate data at a rate that can overload the database server.

You should conduct your own testing to balance crawl speed, network latency, database load, and the load on the content repositories that are being crawled.

The following table shows the relationship between the performance-level setting and the CPU utilization on the index and database servers as tested.

Performance-level setting Index server CPU utilization percentage Database server CPU utilization percentage

Reduced

20

20

Partly reduced

24

24

Maximum

25

26

Consider the scenarios and recommendations for the performance-level setting in the following list:

  • If the index server and database servers are used only for the Office SharePoint Server Search service, you can set the level to Maximum. However, we recommend that the maximum increase in database server CPU utilization related to index server activity not be greater than 30 percent. If the increase in database server CPU utilization exceeds 30 percent when the performance level is set to Maximum, we recommend setting the performance level to the next lower setting.

  • If the application server and the database server are shared across multiple shared services such as the Office SharePoint Server Search service and Excel Calculation Services, we recommend that you select a lower performance-level setting. However, reducing the maximum allowed indexing activity reduces the speed at which items are indexed, which might cause search results to be outdated. Monitor local server performance to help determine the appropriate index server performance level.

Use the following procedure to specify the performance-level setting on the index server.

Adjust index server performance

  1. Click Start, point to All Programs, point to Microsoft Office Server, and then click SharePoint 3.0 Central Administration.

  2. On the Central Administration home page, click Operations.

  3. On the Operations page, in the Topology and Services section, click Services on server.

  4. On the Services on Server page, on the Server menu, select the index server that you want to manage.

  5. In the Start services in the table below section, click Office SharePoint Server Search.

  6. On the Configure Office SharePoint Server Search Service Settings page, in the Indexer Performance section, select the performance level that you want to apply.

  7. Click OK to save your changes.

Crawler impact rules

Crawler impact rules are farm-level search configuration settings that specify the number of simultaneous requests that the Office SharePoint Server Search service generates when it crawls using a specified content source. The greater the number of simultaneous requests, the faster the crawl speed. Note that the request frequency specified in a crawler impact rule directly affects the load on the database server and the load on the server hosting the content that is being crawled. If you increase the request frequency for a given site, you should carefully monitor the servers being crawled to evaluate whether the greater load is acceptable.

The default value is the number of processes on the index server. Therefore, for a quad-processor computer, the default value is eight. We recommend that you adjust the value and measure the load on the target server to determine the optimum number of simultaneous requests. You can select the number of simultaneous requests from the following available values: 1, 2, 4, 8, 16, 32, 64.

You can also create a rule to request one document at a time and wait a specified number of seconds between requests. Such a rule can be useful for crawling a site that has a constant user load.

The following table shows the relationship between the number of simultaneous requests and the CPU utilization on index servers and database servers.

Number of crawl threads Index server CPU utilization percentage Database server CPU utilization percentage

4

35

12

8

40

15

12

45

15

16

60

20

You can create a crawler impact rule by using the following procedure.

Create a crawler impact rule

  1. Click Start, point to All Programs, point to Microsoft Office Server, and then click SharePoint 3.0 Central Administration.

  2. On the Central Administration home page, click Application Management.

  3. On the Application Management page, in the Search section, click Manage search service.

  4. On the Manage Search Service page, in the Farm-Level Search Settings section, click Crawler impact rules.

  5. On the Crawler Impact Rules page, click Add Rule.

  6. On the Add Crawler Impact Rule page, in the Site section, type the name of the site for which you want to create a rule. Do not include the protocol (for example, do not include http://)

  7. In the Request Frequencysection, specify how the crawler will request documents from this site.

    1. To simultaneously request multiple documents, select Request up to the specified number of documents at a time and do not wait between requests, and then select the value that you want from the Simultaneous requests list.

    2. To request one document at a time, select Request one document at a time and wait the specified time between requests, and then type the number of seconds to wait between requests in the Time to wait (in seconds) box.

  8. Click OK to create the rule.

Determining specifications for query servers

Use the information in this section to determine specifications for query servers in your Office SharePoint Server 2007 farm.

Query server memory

The greater the memory that is available, the fewer times the Office SharePoint Server Search service will need to access the hard disk to perform a given query. Having adequate memory also permits more effective caching. Ideally, enough memory should be installed on the query servers to accommodate the entire index.

The following figure shows the relationship between the size of the index on the query servers and the user response time per query.

Performance and capacity analysis for search

Query server disk speed

We recommend using RAID 10 for fast disk writes.

Number of query servers

You can deploy multiple query servers in the farm to achieve redundancy and load balancing. The number of query servers you use depends on how many users are present in the farm and the peak hour load that you expect. We have tested up to eight query servers per farm.

The following figure shows query throughput, database server CPU utilization percentage for the search database, and the query server CPU utilization percentage as query servers are added to the farm. In the test from which this data was generated, the database server used was shared between content databases and the service databases.

Search server performance graph

Remote server latency

Server latency is a major factor that affects crawl performance. Performance between farm servers must be balanced for overall crawl performance to reach its potential. For example, a powerful index server can be operating at 25% of its capacity if the database server being crawled is not able to respond quickly enough. In such a case, you can scale up the database server, which will in turn increase crawl speeds across the entire farm.

You should conduct your own testing to evaluate the responsiveness of servers in your environment. The database server serving the target farm is often the bottleneck in cases where crawl performance is poor. To improve crawl performance, you can:

  • Scale up database server hardware by adding or upgrading processors, adding memory, and upgrading to hard disks with faster seek and write times.

  • Increase the memory on query servers in the farm

  • Crawl during non-peak hours so that the database server being crawled can service user traffic during the day, and respond to crawls during off-peak hours.

Determining specifications for database servers

The Office SharePoint Server 2007 search system crawls both text data and the metadata associated with the content. In Office SharePoint Portal Server 2003, all metadata gathered by the indexing system was stored in a JET database property store. In Office SharePoint Server 2007, the inverted full text index is stored on the index server, and the metadata is stored in the Search database. The index server writes metadata to the database, and the query servers read that data to process property-based queries issued by users.

Use the information in this section to determine specifications for database servers in your Office SharePoint Server 2007 farm.

Database throughput

The database metadata store is shared by the index server and all query servers in the farm. The index server writes all metadata, and the query servers read this data to process search requests. Query throughput is dependent largely on the metadata store responsiveness.

As the number of query servers increases in the farm, the load on the database server also increases and affects the overall query throughput. You should carefully monitor the database server when adding index servers or query servers to the farm to ensure that database performance remains adequate.

Database server hard disk distribution

Because the Office SharePoint Server Search service writes a large amount of data to the search database during crawls, we recommend using separate spindles for the SharedServices_Search_Db, SharedServices_Db, and TempDb databases for better performance in scenarios in which the index contains more than 5 million items.

Database server disk speed

We recommend using RAID 10 for fast disk writes.

Download this book

This topic is included in the following downloadable book for easier reading and printing:

See the full list of available books at Downloadable content for Office SharePoint Server 2007.

See Also

Concepts

Configure a dedicated front-end Web server for crawling (Office SharePoint Server 2007)