Best practices for Search in Office SharePoint Server
Updated: August 28, 2008
Applies To: Office SharePoint Server 2007
This article is one of a series of Best Practices articles for Microsoft Office SharePoint Server 2007. This article describes the best practices for Enterprise Search. Unless otherwise noted, this article applies to both Office SharePoint Server 2007 and Microsoft Search Server 2008. For more articles in the series, see Best practices. For additional information and resources regarding best practices for Office SharePoint Server 2007, see the Best Practices Resource Center (http://go.microsoft.com/fwlink/?LinkID=125981&clcid=0x409).
1. Plan your deployment
Plan for findability. For a search technology to be useful to end-users, they must be able to find what they are looking for with a minimum amount of effort. For a good discussion of findability, see “Chapter 15: Implementing an Optimal Search and Findability Topology” in Microsoft Office SharePoint Server 2007 Best Practices by Ben Curry and Bill English (Microsoft Press, Redmond, WA, 2008).
Use managed properties. This feature enables search administrators to create a one-to-many mapping of related properties. This process reduces the number of property names that users have to use when they perform advanced queries. For example, a search administrator can map the property named “author” to the “writer” and “author2” properties so that users who include the “author” property in their query also get search results for “writer” and “author2”. For more information about managed properties, see Plan the end-user search experience (Office SharePoint Server) and Plan the end-user search experience (Search Server 2008).
Create service level agreements. Ensure that service level agreements (SLA) for content crawls are agreed to before deployment.
2. Start with a well-configured infrastructure
Deploy two or more query servers for increased availability. Multiple query servers provide redundancy for end-user queries. If a query server fails, queries are automatically directed to a healthy query server. For more information, see Plan for redundancy (Office SharePoint Server) and the blog post SearchBeta Hardware Configuration (http://go.microsoft.com/fwlink/?LinkId=126330) on the Microsoft Enterprise Search Blog.
Use separate computers to run SQL Server for content databases and the Shared Services Provider (SSP). For more database recommendations, see Physical storage recommendations (Office SharePoint Server).
Use File Groups to separate the query and crawl tables in the search database.
Use a gigabit network for intra-farm connections. For more information, see Additional performance and capacity planning factors (Office SharePoint Server).
3. Manage access by using Windows security groups
We recommend that you add users to Windows security groups instead of adding users to SharePoint groups for the following reasons:
Because changes to Windows security groups do not directly affect the access control entries (ACEs) on SharePoint sites, you do not have to crawl again when user accounts within those Windows security groups are changed.
During the indexing process, the system stores the ACE of each user who has been added to a SharePoint group instead of the ACE of the SharePoint group itself. This process supports approximately 1000 users per access control list (ACL), after which the “Parameter is incorrect” error causes crawling to fail.
4. Defragment the search database
The search database contains metadata and ACLs of crawled content. Over a series of crawls, the search database can become fragmented. To improve performance of crawls and queries, periodically defragment the search database. For more information, see Database maintenance for Office SharePoint Server 2007 (white paper).
If you are mirroring the computers that run SQL Server, turn mirroring off before you defragment the search database and turn it back on after defragmentation is completed.
5. Always keep your system updated
After testing updates in the test environment, install the latest software updates for Office SharePoint Server 2007, Search Server 2008, and SQL Server as soon as possible. For general guidance about how to deploy software updates, see Deploy software updates for Office SharePoint Server 2007.
6. Monitor SQL Server latency
Search is I/O intensive for SQL Server and is sensitive to I/O latencies on the Temp database and Search database. Both search and content hosting make heavy use of the Temp database. We recommend that you keep the Search database, SSP database, Temp database, content databases, and their corresponding log files all on separate spindles. This lets you optimize each file, depending on its specific needs. For very large server farms it is also a good idea to separate the content databases onto separate computers that are running SQL server. Doing so provides the Search and SSP databases with a different Temp database and instance of SQL Server than the content databases. For best search performance, we recommend that you maintain the following latencies:
10 milliseconds (ms) or less for the Temp database
10 ms or less for the Search database
20 ms or less for the database log file
Follow the other recommendations in the blog post SQL Monitoring and I/O (http://go.microsoft.com/fwlink/?LinkId=123950) on the Microsoft Enterprise Search Blog. For information about how to troubleshoot SQL Server performance problems, see the I/O Bottlenecks section of the following SQL Server technical article: Troubleshooting Performance Problems in SQL Server 2005 (http://go.microsoft.com/fwlink/?LinkId=123952).
7. Monitor to prevent search starvation
Search starvation occurs when the crawler cannot allocate another thread to retrieve the next document in the crawl queue. Starvation can be caused by:
Resource (I/O) contention on the computer that is running SQL Server.
Too many hosts are being crawled at the same time.
Hungry hosts that do not quickly relinquish a thread. Hungry hosts include the following:
Slow hosts. A host being crawled does not have the capacity to service all of the requests that the crawler is sending to it.
Hosts requiring extra work for incremental crawls. Basic HTTP crawls are partially in this category because each document requires a round trip to the server, but the modified date is checked before downloading the document.
Hosts and content that are rich in properties. You will see this more frequently with the following content store types: Business Data Catalog, People Import, and People crawls.
Crawls are paused when backups are being performed.
For more information, see the following blog post: Creating crawl schedules and starvation - How to detect it and minimize it (http://go.microsoft.com/fwlink/?LinkID=123794) on the Microsoft Enterprise Search Blog.
8. Monitor your system to understand query bottlenecks
9. Validate the search visibility setting for each crawled site
The standard best practices for optimizing sites and pages for search engines are equally relevant for Web content management (WCM) sites in SharePoint deployments. A site or page that is better optimized for search engines appears higher in the search results and will help increase traffic to your site. For more information, see How to Optimize SharePoint Server 2007 Web Content Management Sites for Search Engines (http://go.microsoft.com/fwlink/?LinkId=123956).
10. Manually pause crawls before initializing a query server or backing up a farm
Prior to backing up an SSP used for search or initializing query servers, we recommend that you pause all crawls. After the backup is complete, you must manually resume paused crawls. For more information, see Pause and resume a crawl (Office SharePoint Server 2007).
11. Test the crawling and querying subsystems after making any configuration changes
We recommend that you test the crawling and querying functionality of your server farm after you make configuration changes. An easy way to do this is to create a temporary content source that is used only for this purpose. To test, we recommend that you crawl ten items, for example .txt files on a file share, and then perform search queries on them. Make sure that these items are not currently in the index. It is helpful if they contain unique words that will be displayed at the top of the search results page when queried. After the test is complete, we recommend that you delete the content source that you created for this test because doing this removes the items that you crawled from the index. Therefore, they can be crawled again when you want to perform this test and will not appear in search results after you are finished testing. For information about crawling content, see Getting your content crawled (Office SharePoint Server 2007) or How to crawl content (Search Server 2008).
12. Review your antivirus policy for crawled objects
When you use certain file-level antivirus software programs in Windows SharePoint Services 3.0, Office SharePoint Server 2007, or Search Server 2008, you should exclude certain folders from being scanned. If you do not exclude these folders, you may experience many unexpected issues. For more information see the following article in the Microsoft Knowledge Base: 952167: Folders may have to be excluded from antivirus scanning when you use a file-level antivirus program in Windows SharePoint Services 3.0 or in SharePoint Server 2007 (http://go.microsoft.com/fwlink/?LinkId=123963).
13. If you have custom queries, mark appropriate properties as “scope-able” from the crawled property UI so that they do not execute expensive SQL queries
The Office SharePoint Server 2007 Content Publishing team thanks the following contributors to this article:
Luca Bandinelli, Microsoft SharePoint Customer Advisory Team
Dan Blood, Microsoft Search Server
Sid Shah, Microsoft Search Server
Richard Riley, Microsoft SharePoint Marketing
Mitch Prince, Microsoft Consulting Services
Larry Kuhn, Microsoft Consulting Services