Export (0) Print
Expand All

Best practices for search performance (Office SharePoint Server 2007)

SharePoint 2007

Updated: July 30, 2009

Applies To: Office SharePoint Server 2007

Updated: 2009-07-30

This article describes some best practices that you can use to maintain a healthy and efficient search system in Microsoft Office SharePoint Server 2007.

In this article:

Reasons for initiating a full crawl

Crawling and indexing a large volume of information, documents, and Web pages requires a large amount of computer processing. The crawl process also consumes network and other resources. You must configure a Office SharePoint Server 2007 farm carefully, to ensure that the crawling and indexing process does not adversely affect the service available to users. For example, content is usually crawled and indexed during off-peak hours—when servers are underused—in order to maintain peak-hour services for users.

One way to reduce the impact of crawling is to schedule incremental crawls instead of full crawls. An incremental crawl indexes only items that have changed, so that the process requires far fewer computer resources. You can schedule full crawls and incremental crawls on each content source. For example, you can consider running incremental crawls at midnight on weekdays and full crawls at midnight on Saturdays.

NoteNote:

For more information about how to plan crawl schedules, see Plan to crawl content (Office SharePoint Server)

To ensure that the index is complete, you should manually start a full crawl under the following circumstances:

  • When administrators have applied an update or service pack to the servers in the farm.

  • When administrators have added a new managed property to the search configuration.

  • When administrators have added or modified a crawl rule.

  • When you want to re-index ASP.NET pages on Windows SharePoint Services 3.0, Microsoft Office SharePoint Server 2007, or Search Server 2008 sites. The crawler cannot detect when such Web pages have changed. Therefore, only a full crawl re-indexes them.

  • When there have been consecutive incremental crawl failures. If incremental crawl fails one hundred times consecutively at any level in the repository, the index server removes the affected content from the index.

  • When administrators have repaired a corrupted index.

  • When an administrator has created a server name mapping.

  • When you have altered permissions for some content but the content itself has not changed and you do not have the Post-Service Pack 1 hotfix package (KB941442) or a later service pack applied on the servers. Without this hotfix package, incremental crawls do not check Access Control Lists (ACLs). If ACLs are not checked, a user may see an item in the search results even if an administrator has removed permission for that user since the last full crawl.

Under the following circumstances Office SharePoint Server 2007 performs a full crawl when an incremental crawl was scheduled or manually initiated:

  • When an administrator manually stopped the previous crawl.

  • When an administrator restored a content database from a backup.

    NoteNote:

    If you have installed the Infrastructure Update for Microsoft Office Servers, you can specify whether a restore operation causes a full crawl by using options on the Stsadm command-line tool.

  • When an administrator has detached and reattached a content database.

  • When Office SharePoint Server has never run a full crawl on the content.

  • When the change log contains no information for the addresses in the crawl. The change log is used to determine whether an item has changed since the last crawl. Without change log information, incremental crawls cannot occur.

  • When the account that is used to access content has changed. This account can be the default access account or one specified in a crawl rule.

  • When an index corruption has occurred.

You must carefully consider your actions in these circumstances. This is because if you start a crawl manually or with a schedule, the full crawl might require many more resources than the incremental crawl would have required. This could affect the service that users receive.

Monitoring Web servers

To maximize Office SharePoint Server 2007 search performance, thoroughly analyze the system. You should create a base line by making a thorough investigation of the servers. You should periodically re-run performance tests to observe any changes and diagnose problems early. When you make an optimization, you can use the base line to characterize the improvements that result. The following sections list performance monitor counters and data available from other tools that are relevant to the performance of Office SharePoint Server 2007.

NoteNote:

For more information about general system monitoring, see Monitoring Performance (http://go.microsoft.com/fwlink/?LinkID=105584&clcid=0x409).

In Office SharePoint Server 2007, Web servers host all sites and site collections. They obtain content that is stored on the database servers and respond to user requests. Because Web servers send responses to user search queries, their performance has a direct effect on search performance. Web servers also affect indexing because the crawler indexes Office SharePoint Server content by sending requests to Web servers.

To monitor the health of Web servers, use the following Performance Monitor counters:

  • Processor/% Processor Time/_Total

    This counter is the primary indicator of processor activity. It measures the proportion of time that the processors spend executing non-idle threads. If this counter averages above 80% over prolonged periods, the processors may be a bottleneck, and you should consider an upgrade.

  • Memory/Available Mbytes

    This counter measures the physical memory immediately available to processes or to the system. If this counter drops too low, the system will slow down as it uses more paging. You should consider adding more memory to the server.

  • Web Service/Current Connections

    This counter measures the number of connections to the World Wide Web service. On Office SharePoint Server 2007 Web servers, this count includes all the concurrent users and, during indexing, the crawler. Use this counter to profile usage patterns and determine peak hours. There is no limit for this counter. Very high values may indicate excellent performance.

    NoteNote:

    In an Office SharePoint Server farm with multiple Web servers, this counter measures connections on only one server. For information about how to monitor user activity across the whole farm, see Monitoring Office SharePoint Server with SharePoint Diagnostics.

Monitoring database servers

Office SharePoint Server 2007 uses Microsoft SQL Server 2005 or Microsoft SQL Server 2008 to store content databases. Although the index server stores the content index on the file system—not in a database—it stores document metadata and permissions in the search database. Because many searches check metadata and all searches involve security trimming, the performance of database servers directly affects the performance of search.

To monitor the health of database servers, use the following Performance Monitor counters:

  • Processor/% Processor Time/_Total

    This counter is the primary indicator of processor activity, and therefore is as important on database servers as it is on Web servers.

  • LogicalDisk/% Disk Time/ DiskName

    This counter measures the proportion of elapsed time that the disk spent servicing read or write requests. For search, monitor this counter for the disk that holds the search database. If this counter frequently averages larger than 90%, the disk may represent a bottleneck for search.

  • System: Processor Queue Length

    This counter should average less than two multiplied by the number of CPU cores on the server.

  • Memory: Available Mbytes

    Ensure that this counter averages at least 20 percent of the total physical RAM.

  • Memory: Pages/sec

    This counter should average under 100.

  • Logical Disk: Disk Transfers/sec

    This counter measures the overall throughput for a partition.

  • Logical Disk: Disk Read Bytes/sec & Disk Write Bytes/sec

    This counter measures the total bandwidth of a particular disk.

  • Logical Disk: Average Disk sec/Read

    Also known as read latency, this counter indicates the time that is required for the disk to retrieve data. Low read latency is important for good responsiveness to user queries.

  • Logical Disk: Average Disk sec/Write

    Also known as write latency, this counter indicates the time that is required for the disk to write the data. Low write latency improves the performance of indexing.

  • LogicalDisk/% Disk Read Time/ DiskName

    This counter measures the proportion of elapsed time that the disk spent servicing read requests. A large proportion of read requests for the search database disk may indicate that users are running a large number searches.

  • LogicalDisk/% Disk Write Time/ DiskName

    This counter measures the proportion of elapsed time that the disk spent servicing write requests. A large proportion of write requests for the search database is expected during the indexing process.

    NoteNote:

    When possible, optimize search performance by placing the search database on a separate disk from other databases. If the search database is separated in this manner, these Logical Disk counters are highly diagnostic of search performance because the disk is dedicated to search.

  • SQLServer:Buffer Manager/Page life expectancy

    This counter measures the number of seconds that a database page remains in the buffer pool without references. It should remain above 300 seconds. Values lower than 300 seconds are highly diagnostic of a memory bottleneck and should lead you to consider adding memory to the server.

NoteNote:

A full description of general SQL Server optimization is beyond the scope of this Office SharePoint Server article. For more information, see the following resources:

Monitoring index servers

In Office SharePoint Server 2007, index servers crawl content and create an index file. When the crawl process is complete, the index is propagated to query servers that respond to search requests from users.

If the index server performs poorly, there may be no direct effect on users. However, the crawl process is lengthened, sometimes to the extent that you cannot limit crawling to off-peak hours and to separate crawling from other off-peak activities such as backup.

NoteNote:

The index server may not always be put on a separate server, depending on the number of available servers.

To monitor the health of the index server, use the following Performance Monitor counters during a crawl:

  • Office Server Search Archival Plugin/Total Docs in first queue/Portal_Content

    This counter measures the number of documents in the first queue of the archival plug-in. This should be less than 500 during a crawl, and very low at other times. If the counter is above 500, the search database drive on the database server is probably a bottleneck.

  • Office Server Search Archival Plugin/Total Docs in second queue/Portal_Content

    This counter measures the number of documents in the second queue of the archival plug-in. As with the previous counter, it should be less than 500 during a crawl.

  • Office Server Search Gatherer/Idle Threads

    This counter measures the number of threads in the gatherer process waiting for documents. If this counter is 0, the gatherer is starved for resources. Consider reducing the number of concurrent crawls.

  • Office Server Search Gatherer/Threads Accessing Network

    This counter measures the number of threads in the gatherer process that have sent requests to a remote data store and are either waiting for or processing the response. If this counter is consistently high, network bandwidth may be a bottleneck or the index server may be connected to one or more slow hosts to index content.

  • Office Server Search Gatherer/Filtering Threads

    This counter measures the number of threads that have retrieved content and are filtering it. If this number is high, it may indicate that resources on the index server are acting as a bottleneck.

  • Office Server Search Gatherer/Threads In Plug-Ins

    This counter measures the number of threads that have retrieved content and are processing it through plug-ins such as IFilters. This is the stage in the crawl process when the index and the search database are populated.

  • Office Server Search Gatherer Projects/Crawls in progress

    This counter measures the number of crawls in progress. This value should match the number of content sources with crawls scheduled, unless an administrator has manually initiated a crawl.

Monitoring Office SharePoint Server with SharePoint Diagnostics

The SharePoint Diagnostics tool v1.0 (also known as SPDiag) is an advanced diagnosis and analysis tool that presents a very large range of information about any server or farm that runs SharePoint Products and Technologies. You can use SPDiag to view highly detailed snapshots of the server and farm configurations. You can also merge and view information from Internet Information Services (IIS), Unified Logging Service (ULS) logs, and Event logs, and investigate performance issues such as slow requests.

NoteNote:

SPDiag is part of the Microsoft SharePoint Administration Toolkit v3.0 (http://go.microsoft.com/fwlink/?LinkId=141504).

SPDiag can present graphs of performance counters from any server on the farm. However, it also includes several counters that measure farm-wide data based on IIS logs. Such a farm-wide analysis is not possible with Performance Monitor alone.

NoteNote:

To use SPDiag effectively, you must read the SharePoint Diagnostics Tool (SPDiag) User Guide (http://go.microsoft.com/fwlink/?LinkId=141204).

Use the following farm-wide counters to investigate the responsiveness of the Office SharePoint Server 2007 farm:

  • SharePointRequests/Number of unique client IP

    This counter measures the number of unique clients that have made requests over the specified time period. Note that clients that access the farm through a proxy server appear as a single IP address.

  • SharePointRequests/Number of unique client agents

    This counter measures the number of unique client agents—browsers—that have made requests over the specified time period. This counter can differentiate among clients that access the farm through a proxy server because it is based on the agent specified in the browsers’ HTTP requests.

    NoteNote:

    The following counters measure numbers of requests. These requests include both search queries and requests for content.

  • SharePointRequests/Total requests

    This counter measures the total number of requests as it varies over the specified time period. You should always monitor this counter together with the following counters to determine the proportion of fast and slow requests.

  • SharePointRequests/Number of requests with time taken < 1 second

    This counter measures the number of requests satisfied within 1 second. In a well performing farm, this counter approaches the Total requests counter.

  • SharePointRequests/Number of requests with time taken 1-5 seconds

    This counter measures the number of requests satisfied within 1 to 5 seconds. Such performance is usually acceptable to users, but an increasing proportion of these requests over time may indicate that a bottleneck is developing.

  • SharePointRequests/Number of requests with time taken 5-10 seconds

    This counter measures the number of requests satisfied within 5 to 10 seconds. Users may click away from a page before results are returned.

  • SharePointRequests/Number of requests with time taken > 10 seconds

    This counter measures the number of requests satisfied in more than 10 seconds. If these requests are a significant proportion of the Total requests, the farm is unresponsive and you should investigate bottlenecks and consider upgrading hardware.

Using Unified Logging Service logs to diagnose query bottlenecks

You can use the Universal Logging Service (ULS) to monitor and optimize the performance of the Office SharePoint Server 2007 system. You should use the ULS as one method of optimizing the system. Other methods include using the Performance Monitor, Event Logs, and Web logs.

In this section, you will see how to use ULS logs to diagnose delays in the performance of searches. By using ULS logs, you can diagnose which stages of the search process consume the most time and delay the return of results to users. You should also use ULS logs to assess the performance improvements made by small changes to the configuration of the system.

NoteNote:

Use ULS logging sparingly in order to avoid any effect on performance and use of disk space. After diagnosing performance issues with ULS, you can maximize performance by disabling ULS logging. Take care to store the ULS logs on a disk that has lots of space.

NoteNote:

For more information and additional examples of queries to use in Log Parser 2.2, see How to: Mine the ULS logs for query latency (http://go.microsoft.com/fwlink/?LinkId=148540) in the Microsoft Enterprise Search Blog.

Configuring the Unified Logging Service

Start by configuring the ULS in the Office SharePoint Server 2007 Central Administration Web site.

ImportantImportant:

Membership in the Farm Administrators group is required to complete the following procedure.

To Configure the Unified Logging Service for diagnosing search performance issues

  1. Click Start, click All Programs, click Microsoft Office Server, and then click SharePoint 3.0 Central Administration.

  2. Click the Operations tab.

  3. In the Logging and Reporting section, click Diagnostic logging.

  4. In the Event Throttling section, in the Select a category list, click MS Search Query Processor.

  5. In the Least critical event to report to the trace log, click High.

  6. At the bottom of the page, click OK.

Using the Log Parser tool

Like Internet Information Services (IIS) logs, Office SharePoint Server 2007 ULS logs are very large text files. To analyze the content of such files, use a log parsing utility to focus on interesting traces and to format the data. The Microsoft Log Parser 2.2 (http://go.microsoft.com/fwlink/?LinkId=148542) tool is free and is ideal for this purpose.

The Log Parser tool is a command-line program. You must use the following parameters to indicate that the ULS logs are Unicode, tab-separated text files:

logparser -i:TSV -iCodepage:-1 -fixedSep:ON

In addition to these parameters, you must provide a Transact-SQL-style query that tells Log Parser how to parse the log file. For example, to focus on query processor timer messages, use the following WHERE clause:

WHERE Category = 'MS Search Query Processor' AND Message LIKE '%Completed query execution with timings:%'

NoteNote:

You can type SQL query text directly at the Log Parser command prompt. Alternatively, you can save the query in a text file and use the file: parameter to pass its location to Log Parser. This is helpful for longer or more complex queries and for queries that you expect to use often.

Search messages in the ULS logs

When the ULS is configured as described above, the following messages will appear in the log files as users run queries:

  • Completed query execution with timings

    This message indicates that a query has completed and includes six timings that describe the query in milliseconds. The six timings are as follows:

    • v1. The total time that is spent processing the query.

    • v2. The latency of the query measured after duplicate detection.

    • v3. The latency of the query measured after security trimming.

    • v4. The latency of the query measured after joining results from the index with results from the search database.

    • v5. The time that is spent waiting for results from the index server.

    • v6. The time that is spent in calls to the database excluding fetching properties from the search database.

    From these six timings you can calculate the following information:

    • v1 – v2. The time that is spent retrieving properties and highlighting hits.

    • v2 – v3. The time that is spent detecting duplicates.

    • v3 – v4. The time that is spent in security trimming.

    • v4 – v5. The time that is spent joining index results with search database results.

  • Join retry

    This message indicates that a retry occurred because the results returned from the search database did not match the results returned from the full-text index so they could not be joined.

  • Security Trimming retry

    This message indicates that a user executed a query that returned results the user is not authorized to read. Such a query is retried, with a larger number of results requested, until enough results are returned.

  • Near duplicate removal retry

    This message indicates that many virtually identical documents were trimmed out of the results and query processor did not have enough results to display. Such a query is retried, with a larger number of results requested, until enough results are returned.

Analyzing query timings

Of the messages described earlier in this section, the Completed query execution with timings message is of most use when diagnosing delays in query processing. For example, if security trimming is slow, the v3 – v4 calculation will be a large proportion of the total processing time v1.

To analyze timings, pass the following SQL query to the Log Parser tool. The tool will create a comma separated file that includes the v1 to v6 timings and their differences.

Select  Timestamp,
TO_INT(Extract_token(Message,7, ' ')) as TotalQPTime,
      TO_INT(Extract_token(Message,8, ' ')) as v2,
      TO_INT(Extract_token(Message,9, ' ')) as v3,
      TO_INT(Extract_token(Message,10, ' ')) as v4,
      TO_INT(Extract_token(Message,11, ' ')) as TimeSpentInIndex,
      TO_INT(Extract_token(Message,12, ' ')) as v6,
      SUB(v4, TimeSpentInIndex) as JoinTime,
      SUB(v3, v4) as SecurityTrimmingTime,
      CASE v2
           WHEN 0 THEN 0 
           ELSE SUB(v2, v3) 
      End as DuplicateDetectionTime,
      SUB(TotalQPTime, v2) as FetchTime
INTO QTiming
FROM 'C:\Program Files\Common Files\microsoft shared\Web Server Extensions\12\LOGS\MOSSServer1*.log'
WHERE Category = 'MS Search Query Processor' 
AND Message LIKE '%Completed query execution with timings:%'

You can import the resulting CSV file into Microsoft Office Excel or another analysis tool to create graphs and reports. Consider that in a busy system many queries take place and you may have to take averages over many results to generate a meaningful analysis.

Analyzing Retries

The timing data available in the Completed query execution with timings message does not include any information about retries—for example, retries caused by security trimming. To analyze how much time that the query processor spends on retries, you must compare the total number of queries with the number of retries of different types.

The following query counts the total number of queries executed:

SELECT count (Message) 
FROM 'C:\Program Files\Common Files\microsoft shared\Web Server Extensions\12\LOGS\MOSSServer1*.log' 
WHERE Category = 'MS Search Query Processor' 
AND Message LIKE '%Completed query execution%'

The following query counts the total number of retries caused by security trimming:

SELECT count (Message) 
FROM 'C:\Program Files\Common Files\microsoft shared\Web Server Extensions\12\LOGS\MOSSServer1*.log' 
WHERE Category = 'MS Search Query Processor' 
AND Message LIKE '%Security trimming retry%'

The following query counts the total number of retries caused by duplicate result removal:


SELECT count (Message) 
FROM 'C:\Program Files\Common Files\microsoft shared\Web Server Extensions\12\LOGS\MOSSServer1*.log' 
WHERE Category = 'MS Search Query Processor' 
AND Message LIKE '%Near duplicate removal retry%'

Use these queries to diagnose how retries delay query results. When the number of retries is a significant proportion of the total number of queries, you should consider revising the configuration. For example, if only a few users can access the documents in one of the content sources, this may increase the number of security trimming retries. Consider removing this source from the index.

Optimizing SQL Server for search in Office SharePoint Server 2007

Microsoft Office SharePoint Server 2007 uses Microsoft SQL Server to store content, configuration information, and properties of indexed content. You must optimize SQL Server to ensure optimal performance from Office SharePoint Server.

NoteNote:

Office SharePoint Server 2007 can use databases stored in SQL Server 2008 or SQL Server 2005.

General SQL Server optimization

Some optimizations improve the performance of SQL Server regardless of the server’s purpose. You should ensure that these optimizations are in place for a Office SharePoint Server database.

When you plan and deploy the database server, consider the following recommendations:

  • Where possible, run SQL Server on a dedicated server or server farm.

  • Scale out to multiple database servers in a server farm. Generally, you should install a database server for every four Web servers that are running at capacity.

  • Use a 64-bit version of SQL Server on computers together with 64-bit operating systems.

  • Use as many physical disk spindles as the hardware budget allows for.

  • Use high-speed disks.

  • Place databases on RAID 1+0 or RAID 1 disk arrays.

  • Separate log files onto a physical disk that does not store the primary and secondary data files.

NoteNote:

A full description of general SQL Server optimization is beyond the scope of this Office SharePoint Server article. For more information, see the following resources:

Optimizing SQL Server for search queries and indexing

A high priority for many Office SharePoint Server 2007 administrators is optimization of crawling, indexing, and queries. The Office SharePoint Server content index is stored on the file system outside any SQL Server database. However the search database stores metadata from indexed files and ACLs. As a result, the performance of SQL Server directly affects the following two search operations:

  • Indexing, when Office SharePoint Server stores metadata and ACLs.

  • Querying. All Office SharePoint Server queries involve security trimming, during which Office SharePoint Server checks the ACLs in the search database and removes results that the user is not permitted to view. In addition, if the user has performed a property search, metadata must be retrieved from the search database.

You should start by following the general SQL Server optimization recommendations given earlier in this article. Additionally, the optimizations listed in the following sections specifically benefit indexing and querying.

Optimize the performance of the tempdb database.

Every user query involves activity in the SQL Server tempdb database. Therefore, the configuration of this database directly affects how quickly query responses are displayed to users.

Take the following steps to optimize tempdb:

  • Set the recovery model to SIMPLE.

  • Allow tempdb files to automatically grow as required.

  • Set the file growth increment to a reasonable value. As a rule, this increment should be approximately 10 percent of the tempdb file size.

  • Pre-allocate space for tempdb files by setting the file size to a value that can accommodate all activity during crawls and queries.

  • Use multiple data files to maximize disk bandwidth. As a rule, use one data file for each CPU core on the computer that is running SQL Server.

  • Make each data file the same size.

  • Put the tempdb database on a separate physical disk from those that store the content databases, configuration database, and search database.

NoteNote:

For more information about tempdb optimization, see Optimizing tempdb Performance (http://go.microsoft.com/fwlink/?LinkId=148537).

Use filegroups to improve performance.

The Office SharePoint Server 2007 crawling and indexing process is I/O intensive and writes a large volume of data to the search database. If the system experiences pronounced off-peak hours, you should schedule indexing to run during those times to ensure that indexing does not compete for resources with user queries. However, in some organizations with global or 24 hour activity, there are no significant drops in demand. If the database server that hosts the search database is close to its I/O capacity, you should consider other ways to separate indexing I/O operations from those associated with user queries.

The search database can be divided into tables that are involved in indexing and tables that are primarily involved in user queries. If you separate these groups of tables onto two physical disks, you ensure that indexing has a minimal effect on query performance. Although the indexing and query tables are in a single database, you can use the Microsoft SQL Server filegroups feature to achieve this separation.

To do this, you create a user filegroup with a secondary data file in it. This data file must be on a dedicated physical disk to achieve a performance improvement. Then move the tables involved in indexing into that filegroup using the following procedure. This procedure can take a significant time, depending on the size of the search database. Therefore, you should perform the procedure when demand is low.

ImportantImportant:

Backup and restore in Office SharePoint Server 2007 are not filegroup-aware. There is no way to indicate where the new filegroup should be restored to. The restore process tries to put the crawl filegroup on the same drive where it existed when you made the backup. To restore, you must have drives for the crawl and primary filegroups that have the same drive letter as the initial backup drive.

For more information about how to use filegroups for the search database, including Transact-SQL scripts to move tables, see SQL File groups and Search(http://go.microsoft.com/fwlink/?LinkId=132066&clcid=0x409).

To separate indexing tables into a dedicated filegroup

  1. In SQL Server Management Studio, right-click the search database and then click Properties.

  2. In the Select a page list, click Filegroups.

  3. Click Add and name the new filegroup “CrawlFileGroup”.

  4. In the Select a page list, click Files.

  5. Click Add and name the new file.

  6. In the Filegroup cell, select CrawlFileGroup.

  7. In the Path cell, select a path on a separate physical disk from the PRIMARY filegroup.

  8. Click OK.

  9. In SQL Server Management Studio, click New Query.

  10. Copy the following query into the query window, and then click Execute. This Transact-SQL code creates a new stored procedure that moves tables into the new filegroup.

    
    IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'proc_MoveTableToFileGroup')
       BEGIN
          DROP  Procedure  dbo.proc_MoveTableToFileGroup
       END
    GO
    CREATE PROCEDURE [dbo].[proc_MoveTableToFileGroup]
    (
       @fileGroup sysname,
       @tableName sysname
    )
    as
    begin
       declare @data_space_id int
       declare @object_id int
       declare @index_id int
       declare @index_name sysname
       declare @object_name sysname
       declare @fileGroupName sysname
       declare @index_cols nvarchar(4000)
       declare @sql nvarchar(4000)
       declare @key_ordinal int
       set @index_id = 0
       set @key_ordinal = 0
       select @data_space_id = data_space_id, @fileGroupName = name from sys.filegroups where name = @fileGroup
       if @data_space_id is null
       begin
          raiserror ('The specified filegroup does not exist.', 16, 1)
          return
       end
       while 1=1
       begin
          select top 1
                @object_id = i.object_id, 
                @index_id = i.index_id,
                @index_name = i.name,
                @object_name = o.name
             from 
                sys.indexes AS i
             inner join 
                sys.objects AS o
             ON
                i.object_id = o.object_id
             where 
                i.index_id > 0 and
                o.type = 'U' and
                o.name = @tableName and
                i.index_id > @index_id and
                i.data_space_id <> @data_space_id
             order by i.index_id
          if @@rowcount = 0 
          begin
             if @index_id = 0
                print 'There are no indexes to move to filegroup ' + @fileGroupName + ' for table ' + @tableName
             break
          end
          set @index_cols = null
          set @key_ordinal = 0
          while 1=1
          begin
             select top 1 
                @index_cols = case when @index_cols is null then '[' + c.name + ']' else @index_cols + ', [' + c.name + ']' end + 
                case when i.is_descending_key = 0 then ' asc' else 'desc' end, 
                @key_ordinal = i.key_ordinal
             from 
                sys.index_columns i 
             inner join 
                sys.columns as c
             on 
                i.object_id = c.object_id and
                i.column_id = c.column_id
             where 
                i.object_id = @object_id and 
                i.index_id = @index_id and 
                i.key_ordinal > @key_ordinal
             order by i.key_ordinal
             if @@rowcount = 0 
                break
          end
          select @sql = 
             case when i.is_primary_key = 1 then 
                N'begin try ' + 
                N'begin tran ' + 
                N'alter table [' + @object_name + '] drop constraint [' + i.name + '] ' + 
                N'alter table [' + @object_name + '] add constraint [' + i.name + '] ' + 
                'primary key ' +
                case when  i.type = 1 then 'clustered ' else 'nonclustered ' end +
                ' (' + @index_cols + ') ' + 
                'with (' +
                'IGNORE_DUP_KEY = ' + case when  i.ignore_dup_key = 1 then 'ON ' else 'OFF ' end +
                ', PAD_INDEX = ' + case when  i.is_padded = 1 then 'ON ' else 'OFF ' end +
                case when  i.fill_factor = 0 then '' else ', FILLFACTOR = ' + CONVERT(nvarchar(10),i.fill_factor) end + 
                ') ' + 
                'ON [' + @fileGroupName + ']' + 
                N' commit ' + 
                N'end try ' + 
                N'begin catch ' + 
                N'rollback ' + 
                N'end catch ' 
             else 
                N'create ' + 
                case when  i.is_unique = 1 then 'unique ' else '' end +
                case when  i.type = 1 then 'clustered ' else 'nonclustered ' end +
                'index [' + i.name + '] on [' + @object_name + '] (' + @index_cols + ') ' + 
                'with (' +
                'IGNORE_DUP_KEY = ' + case when  i.ignore_dup_key = 1 then 'ON ' else 'OFF ' end +
                ', PAD_INDEX = ' + case when  i.is_padded = 1 then 'ON ' else 'OFF ' end +
                case when i.fill_factor = 0 then '' else ', FILLFACTOR = ' + CONVERT(nvarchar(10),i.fill_factor) end + ', DROP_EXISTING = ON ) ' + 
                'ON [' + @fileGroupName + ']'
             end
    from 
             sys.indexes AS i
          where 
             i.object_id = @object_id and
             i.index_id = @index_id
          print 'Moving index ' + @index_name + ' to filegroup ' + @fileGroupName 
          print @sql
          print ''
          exec sp_executesql @sql
       end
    end
    
    
  11. In SQL Server Management Studio, click New Query.

  12. Copy the following query into the query window, and then click Execute. This Transact-SQL code executes the new stored procedure for each indexing table.

    
    declare @fileGroup sysname
    set @fileGroup = 'CrawlFileGroup'
    if not exists (select 1 from sys.filegroups where name = @fileGroup)
    begin
       raiserror ('The specified filegroup does not exist.', 16, 1)
       return
    end
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSAnchorChangeLog'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSAnchorPendingChangeLog'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSAnchorText'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSAnchorTransactions'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlChangedSourceDocs'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlChangedTargetDocs'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlContent'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlDeletedErrorList'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlDeletedURL'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlErrorList'
    begin try
       begin tran
       IF  EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MSSCrawlContent_MSSCrawlHistory]') AND parent_object_id = OBJECT_ID(N'[dbo].[MSSCrawlContent]'))
       ALTER TABLE [dbo].[MSSCrawlContent] DROP CONSTRAINT [FK_MSSCrawlContent_MSSCrawlHistory]
       exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlHistory'
       ALTER TABLE [dbo].[MSSCrawlContent]  WITH CHECK ADD  CONSTRAINT [FK_MSSCrawlContent_MSSCrawlHistory] FOREIGN KEY([CrawlID])
       REFERENCES [dbo].[MSSCrawlHistory] ([CrawlID])
       commit
    end try
    begin catch 
       rollback
    end catch
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlHostList'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlQueue'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlURL'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSCrawlURLLog'
    exec proc_MoveTableToFileGroup @fileGroup, 'MSSTranTempTable0'
    
    

Maintaining SQL Server databases for search

Like any complex system, SQL Server databases require regular maintenance to keep performance at the highest level. This section describes some recommended practices to maintain maximum performance.

The following disk maintenance procedures may increase the performance of the database servers:

  • Maintain at least 25 percent free disk space to allow for databases to grow. Monitor disk size and free space regularly to avoid a drop in this percentage. The search database may increase in size significantly and very quickly if administrators add new content sources.

  • If the disk controller uses Disk Write Cache, make sure that it has a backup battery so that a power failure will not result in an interruption in service.

The following SQL Server maintenance procedures may improve the performance of the database servers:

  • Run the DBCC CHECKDB Transact-SQL routine regularly on all Office SharePoint Server databases. However, this routine places a significant load on the SQL Server computer and may affect responsiveness. Therefore, use the PHYSICAL_ONLY option and schedule DBCC CHECKDB for off-peak hours. Do not run this routine during an Office SharePoint Server crawl.

  • Avoid performing any unsupported operations on the production computers that are running SQL Server. For example, do not change the database collation or add columns to any Office SharePoint Server database table. For more information about unsupported operations, see Microsoft Knowledge Base article 841057: Support for changes to the databases that are used by Office server products and by Windows SharePoint Services (http://go.microsoft.com/fwlink/?LinkID=105589&clcid=0x409).

Over time, fragmentation in the SQL Server indexes that support the search database can affect indexing and query performance. The Microsoft Knowledge Base article KB943345 includes a Transact-SQL script that creates a stored procedure called proc_DefragmentIndices. You can use proc_DefragmentIndices to defragment both the search and content databases in the Office SharePoint Server farm. However, this stored procedure is very expensive as measured by server resources. Therefore, you must use it with great care to avoid a drop in query responsiveness.

NoteNote:

For the proc_DefragmentIndices script and more information about its use, see How to defragment Windows SharePoint Services 3.0 databases and SharePoint Server 2007 databases (http://go.microsoft.com/fwlink/?LinkID=105588&clcid=0x409).

In addition, a defragmentation stored procedure called proc_DefragSearchIndexes that is designed specifically for the Office SharePoint Server search database is available. It defragments only those indexes that provide the maximum performance benefit, such as IX_MSSDocProps and IX_MSSDocSdids. This minimizes the demand on database server resources. You should schedule the stored procedure to run at off-peak times and carefully monitor the effect on users.

NoteNote:

For the proc_DefragSearchIndexes script and more information about its use, see SQL Index defrag and maintenance tasks for Search (http://go.microsoft.com/fwlink/?LinkID=158799&clcid=0x409) in the Microsoft Enterprise Search Blog.

If you diagnose a disk or RAID bottleneck on a database server in the farm, the following actions may reduce the effect of the problem:

  • Relocate data files and transaction logs to separate disks or RAID arrays. The use of filegroups, as described earlier in this article, maximizes performance.

  • Add disks to the RAID array.

  • Replace slower disks with faster disks.

Avoiding crawler starvation

In a large or complex organization, there are significant challenges to overcome when you configure the Office SharePoint Server 2007 indexing system. You may have many content sources of many kinds with a very large corpus of content that takes a long time to crawl. You should also have carefully planned targets for the freshness of entries in the index to ensure that search results reflect up-to-date content. If you are to achieve your freshness targets, you must optimize the performance of the indexer so that it can crawl all content regularly.

A major obstacle to the speed of indexing is starvation in the crawler. The crawler is in a starved state when it cannot allocate another thread to retrieve the next document in the queue. Starvation can have the following causes:

  • Contention for resources on the database server that hosts the search database

  • Too many hosts participating in the crawl

  • Hosts that do not quickly relinquish a thread. (In this article these are known as “hungry” hosts.)

  • Backups running concurrently with the crawl

Hungry hosts lock a thread and prevent it from moving to the next document. This lock can occur in the following circumstances:

  • When the host is slow because it lacks CPU, memory, or other resources.

  • When the host requires extra work for incremental crawls. For example, if the source is a Microsoft SharePoint Portal Server 2003 server, the crawler must re-process a complete document if the permissions have changed.

  • Hosts or documents that are rich in properties. When there are many properties for each document, the database server that hosts the search database must work harder. Business Data Catalog content sources and My Sites are typically rich in properties.

The most efficient crawls usually occur with the following kinds of host:

  • Office SharePoint Server 2007 servers. These servers maintain a log of changes that the crawler can use.

  • File shares. The crawler can check for changes at the folder level and does not examine each document.

  • Exchange public folders. The crawler can check for changes at the folder level and does not examine each post.

Guidelines to avoid starvation

You can minimize crawler starvation by following these best practices:

  • Minimize the number of content sources. It increases efficiency to group hosts of similar size and type into single content sources.

  • Crawl large Office SharePoint Server 2007 hosts first before you add other host types. These hosts are very efficient to crawl and incremental crawls release threads very quickly.

  • Do not schedule crawls for more than one hungry host simultaneously.

  • As a starting point, schedule four concurrent crawls. You may be able to increase this number for some index servers. For more information, see the following section in this article.

  • If the crawler becomes starved, pause the crawl for hungry hosts to free threads.

How to diagnose starvation

When you install a new Office SharePoint Server 2007 search system, build up the crawler configuration by adding new content sources over several days or weeks. Monitor the performance of the system as you add each content source to avoid starvation and ease troubleshooting. In this way, you can gain a thorough understanding of the system’s behavior during crawls.

The number of threads that the crawler uses is determined by the Indexer Performance setting. To change this setting, in Central Administration, click the Operations tab, click Services on Server, and then click Office SharePoint Server Search. The following table shows how the Indexer Performance setting controls crawl threads.

Indexer Performance setting Total number of threads Maximum threads for each host

Reduced

Number of processors

Number of processors

Partially reduced

4 times the number of processors

Number of processors plus 4

Maximum

16 times the number of processors

Number of processors plus 4

In Office SharePoint Server 2007, the number of crawl threads never exceeds 64.

The most common cause of starvation is contention for resources on the database server. You can diagnose this by monitoring the Archival Plugin. In Performance Monitor, use the Office Server Search Archival Plugin object and the Total docs in first queue and Total docs in second queue counters. If these counters are both at 500 for the Portal_Content instance or 50 for the ProfileImport instance, the crawler is starved. Consider tuning the database server as described in Optimizing SQL Server for search in Office SharePoint Server 2007 earlier in this article.

States of starvation that are not caused by the Archival Plugin can be diagnosed with the Office Server Search Gatherer object in Performance Monitor. Concentrate on the Idle Threads, Threads Accessing Network, and Threads In Plug-ins counters and compare them with the total number of threads for the system. For full descriptions of these counters, see Monitoring index servers earlier in this article.

Avoiding bottlenecks caused by Access Control Lists

When Office SharePoint Server 2007 crawls and indexes a content source, it checks Access Control Lists (ACLs) and stores them in the search database. When users search the index the search database is checked for each result to ensure that the user is authorized to access that result. This process is known as security trimming. Large ACLs with many levels of nesting can therefore adversely affect the performance of both the crawl process and searches within Office SharePoint Server. This section describes how to minimize such performance degradation.

Active Directory provides the following kinds of security principals that you can use to help secure Office SharePoint Server sites and indexed content:

  • User accounts

  • Global groups

  • Domain local groups

  • Universal groups

In Office SharePoint Server 2007, you also have SharePoint groups. This system is very flexible and can include multiple layers of nesting. However, security principals can adversely affect Office SharePoint Server search performance.

To ensure maximum performance from the Office SharePoint Server 2007 crawler and searches, observe the following rules when you are using Active Directory security principals and SharePoint groups:

  • Place user accounts into global groups, and global groups into domain local groups. Assign permissions to domain local groups. This is the recommended best practice for using security principals in Active Directory. It ensures that domain controllers can look up group memberships quickly and that users can access resources throughout the forest.

  • If universal groups are necessary, use the same system but put global groups into universal groups and universal groups into domain local groups.

  • Put domain local groups into SharePoint groups to assign permissions to SharePoint sites and other resources.

  • Limit the number of nesting levels used in group membership.

The following specific situations will harm the performance of Office SharePoint Server 2007 search and should be avoided:

  • Do not assign Office SharePoint Server site permissions to individual users.

    Office SharePoint Server sites slow down when more than 2,000 security principals are listed in a site’s DACL. However, an Active Directory or SharePoint group is a single security principal, regardless of the number of users it contains. Therefore, use SharePoint groups to assign site permissions and put users or Active Directory groups into SharePoint groups.

  • Do not use deeply nested Active Directory security groups.

    When a user attempts to access a Office SharePoint Server site, the server must evaluate group memberships to authorize the user. When groups are deeply nested, many queries to the domain controllers are required. In addition, queries to remote domains in the forest may be necessary. These queries must complete before the user is granted access.

  • Do not use distribution lists or security groups that contain contacts.

    Contacts in the Active Directory can be added to groups to, for example, distribute e-mail. However, contacts are not security principals and are not involved in authorization. Performance may be decreased if you assign permission to a group that contains contacts.

In Office SharePoint Server 2007, the site owner for each site can add users and groups to the site DACL. It is important that you educate site owners how to use group memberships responsibly to avoid introducing bottlenecks.

Troubleshooting missing Search Box Web Parts

The Search Box Web Part will not appear in the Search Center and other locations in the Office SharePoint Server 2007 user interface in the following circumstances:

  • A developer has modified the Search Center content page, or the site master page, to hide or remove the Search Box Web Part.

  • A user who has the Full Control or Design permission level on the site has removed or hidden the Search Box Web Part.

  • The Search service is unavailable in the Office SharePoint Server farm, either because of a service interruption or because administrators have taken it offline.

NoteNote:

For more information about the Search Box Web Part, see Configure properties for the Search Box Web Part (Office SharePoint Server 2007).

See Also

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft