Administration at the search service application level

 

Summary: Learn how Microsoft IT configures crawls, content sources, and scopes, and monitors the FAST Search Server 2010 for SharePoint deployment.

Applies to: Microsoft FAST Search Server 2010 for SharePoint, Microsoft Office SharePoint Server 2007

This is the eighth of 12 articles that compose How Microsoft IT deployed FAST Search Server 2010 for SharePoint (white paper). In this article:

  • Configuring crawls

  • Configuring other search settings

  • Monitoring the search system

Configuring crawls

In addition to setting up content sources for crawls, MSIT uses crawl rules and dedicated crawl targets.

Content sources

MSIT reviews content sources on a quarterly schedule to make sure that that crawls are appropriately configured. Also, MSIT is often asked to add or remove a site from crawls.

To reduce the complexity of scheduling and managing crawls, MSIT consolidates content sources and start addresses based on similarity and priority when possible. For example, MSIT decreased the number of content sources from 25 in the Microsoft Office SharePoint Server 2007 search solution to 14 in the FAST Search Server 2010 for SharePoint solution.

Table 1 shows the number of each kind of content source in the previous and the current enterprise search solutions. Some of the content sources contain multiple start addresses. For example, a content source that contains sites that MSIT does not host (SharePoint sites such as http://sqlserversites and http://devdivsites, and other websites) has more than 150 start addresses.

Table 1. Comparison of content sources

Type of content source Number of content sources (SharePoint Server 2007 search solution) Number of content sources (FAST Search Server 2010 for SharePoint solution)

SharePoint sites hosted by MSIT (including the user profile store

13

11

Sites not hosted by MSIT (SharePoint sites and other websites)*

6

1

File shares**

1

1

Microsoft Exchange public folders

1

0

Business Data Catalog

3

1

Custom

1

0

*****These sites host content that the search service crawls, but the sites do not consume the search service.

******The search service currently crawls nine file shares.

Full crawls

MSIT performs a full crawl of all content sources only when it is necessary, as in the following situations:

  • After deployment of a crawl-related hotfix or a schema configuration change.

  • After addition of managed properties to the search system. In this case, it is only necessary to perform a full crawl of content sources to which the managed properties might apply.

MSIT tries to minimize the impact of full crawls on search performance by crawling content sources in order of priority, so that crawls are performed in a staggered manner.

As of August 2011, the search service was crawling about 90 million items on the Microsoft intranet worldwide. Table 2 shows the number of items that each major content source was crawling at that time.

Table 2. Items crawled during full crawl of each content source

Content source Number of items crawled

Academy portal

72,619

Asia content (includes team sites, My Sites, and portal sites)

9,344,940

EMEA content (includes team sites, My Sites, and portal sites)

15,798,592

File shares

55,407

Infopedia portal

69,967

MSLibrary portal

91,202

MSW portal

292,751

My Sites

6,131,717

Sites not hosted by MSIT (SharePoint sites and other websites)

2,715,466

Office portal

8,717,300

Primus

25,168

Redmond custom portals

5,099,814

SharePoint

29,495,749

Team

6,367,784

Incremental crawls

MSIT determines the priority of a content source primarily according to the degree of dependence that consumer sites have on the content source for search, and according to how frequently the associated content is updated. MSIT schedules incremental crawls according to four levels of priority of content sources. The content sources that have priorities 1, 2, and 3 are for repositories that MSIT hosts and manages. Table 3 shows how the frequency of incremental crawls depends on the priority of content sources.

Table 3. Incremental crawl frequencies

Content source priority Crawl frequency Content sources (examples)

1

Four times a day

MSW, Infopedia, user profiles

2

Two times a day

Academy, Team, My Sites

3

Once a day

SharePoint, MSLibrary

4

Three times a week

Sites that MSIT does not manage

Crawl rules

The current search solution uses about 120 crawl rules. When MSIT builds a new deployment, it uses a custom Windows PowerShell script to create some of these crawl rules. Some of the crawl rules specify how to crawl complex URLs. Other crawl rules specify certain sites or site collections to exclude from crawling because they contain confidential information or information that should not be included in the content index. For example, although the FAST Search Server 2010 for SharePoint deployment can crawl content on the Internet, MSIT uses a crawl rule that excludes the path http://*.*. This rule prevents crawlers from going outside the firewall.

MSIT tries to limit the number of crawl rules that it uses, and it encourages site owners to set permissions on content locally to include or exclude content from crawls. For more information, see Plan site permissions (SharePoint Server 2010).

Dedicated crawl targets

MSIT uses at least one web server as a dedicated crawl target in each farm that it crawls. MSIT uses two web servers as dedicated crawl targets for each farm that hosts a large amount of content (relative to the amount of content in other farms) and for each farm that hosts content that must be crawled frequently due to requirements for freshness of search results.

For information about how to use dedicated crawl targets, see Manage crawl load (SharePoint Server 2010).

Crawler impact rules

MSIT used crawler impact rules at one point as a temporary solution to reduce the load on two sites that had performance issues. The issues were not related to crawler impact and were resolved. Currently, MSIT is not using any crawler impact rules.

For more information, see Manage crawler impact rules (FAST Search Server 2010 for SharePoint).

Configuring other search settings

MSIT configures other search settings such as global scopes and removal of items from the content index, and it manages other global operations such as backups.

Global scopes

Table 4 shows the global scopes that the FAST Search Server 2010 for SharePoint deployment provides. Site owners can use these scopes on their sites.

Table 4. Global scopes

This global scope Provides results from this content

Intranet

All SharePoint sites (preconfigured scope)

MSLibrary

http://mslibrary

People

User profiles and My Sites (preconfigured scope)

EnterpriseMedia

All SharePoint content that has content type Video. (This scope is used by the Video tab in the enterprise Search Center.)

News

MSW

Webcasts

MSW

MSArchives

MSW

ImageOnly

MSW

Removal of items from the content index

In rare cases, MSIT must immediately remove items from the content index because of business or legal requirements. To do this, MSIT uses Windows PowerShell. MSIT then creates a crawl rule to exclude the content from future crawls.

For more information, see Remove URLs from search results (SharePoint Server 2010).

Time-out settings, IFilters, and file types

MSIT uses the default time-out settings for connection time and request-acknowledgement time. In addition, MSIT uses only the default IFilters and the default file-type inclusions and exclusions list.

For more information, see the following:

Thesaurus settings and stop word files

MSIT uses the default configuration for thesaurus files and stop word files. During product development, MSIT provided input to the product team about the default configuration for these files, and helped validate the usefulness of the default settings before the software was released.

For more information, see the following:

Backups

MSIT performs weekly backups of the FAST Search Server 2010 for SharePoint configuration. This includes the administration database, which contains settings such as managed properties, Best Bets, and global scopes. For more information, see Configuration backup and restore (FAST Search Server 2010 for SharePoint).

As part of regular SQL Server backups, MSIT also backs up the SharePoint configuration database and the other SQL Server databases that are part of the FAST Search Server 2010 for SharePoint deployment.

Monitoring the search system

MSIT uses various tools to monitor the health and use of the search system.

Search health monitoring

To monitor the health of the search system, MSIT uses the following FAST Search Server 2010 for SharePoint tools:

  • Crawl logs. MSIT reviews the crawl logs every day to find errors and troubleshoot issues. For example, site owners or users sometimes report that content is missing from search results. By reviewing the crawl logs, MSIT can determine the time of the last successful crawl of each content source, and can determine whether crawled content was successfully added to the index, whether it was excluded because of a crawl rule, or whether indexing failed because of an error. In the case of a complex issue, MSIT sometimes works with the enterprise search product group, such as when an item is in the content index but does not appear as it should in search results.

    For more information, see Best practices for using crawl logs (FAST Search Server 2010 for SharePoint).

  • Search administration reports. MSIT reviews search administration reports for the following details about crawl and query performance:

 

MSIT also uses the following tools to monitor the search system:

  • Microsoft System Center Operations Manager 2007 R2 with the FAST Search Server 2010 for SharePoint management pack. MSIT uses System Center Operations Manager to monitor the status of each server and service in the FAST Search Server 2010 for SharePoint deployment. For example, System Center Operations Manager obtains monitoring data and performance counters from the FAST Search Server 2010 for SharePoint Monitoring Service, which runs on each server in the farm. System Center Operations Manager helps identify issues and can provide alerts when performance bottlenecks occur. For more information, see Monitor FAST Search Server 2010 for SharePoint with SCOM.

  • Custom SQL Server tool for monitoring crawls. MSIT developed a custom tool that helps monitor crawl progress. The tool runs a UNRESOLVED_TOKEN_VAL(SQL Server 2008) job on each computer that hosts a crawl database. This helps monitor crawl latencies and can identify when a crawl hangs. The tool writes an entry with a time stamp to a SQL Server table for each item that is crawled. The SQL Server job sends a message if more than a 30-minute interval appears between successive time stamps.

  • Windows Server 2008 R2 performance counters. MSIT uses Windows Server 2008 R2 performance counters to monitor the following:

    • Crawl and query progress

    • Disk, memory, and application performance

Search usage monitoring

To understand the effectiveness of the enterprise search service and provide the best experience for users, MSIT regularly analyzes use of the service. MSIT uses the results of this analysis to make adjustments to improve search results.

To monitor search usage, MSIT uses the following FAST Search Server 2010 for SharePoint tools:

  • Click-through logs. These logs provide information about click-through rates on search results, which indicate how users browse through results.

  • Query logs. Query logs provide the following information about search queries that users submit:

    • Queries submitted during the previous 30 days

    • Queries submitted most frequently during the previous 30 days

    • Queries submitted most frequently from each site collection during the previous 30 days

    • Queries submitted per scope during the previous 30 days

    • Queries submitted during the previous 12 months

    • Queries that returned zero results

    • Destination pages reached most frequently from search results

      This information helps MSIT configure the search service to provide more useful search results, such as by adjusting Best Bets.

 

MSIT constantly strives to improve the search experience for employees, and it uses the following resources to understand the user experience:

  • Feedback Tool for search. The Feedback Tool is a Microsoft-internal tool that was developed with the help of the enterprise search product team. This tool provides a Feedback button on the enterprise Search Center results page that invites the search user to explain the goal of the most recent search, rate the search experience and usefulness of the results, report search problems, and suggest improvements. When the user clicks Send, the tool sends an email message to an alias that monitors the feedback. The message captures the user's query, the URL where the user submitted the query (which indicates the scope that was used), and the search results, with an indication of which results were Best Bets, if any. If information that a user was looking for did not appear in the search results, MSIT works with site owners to determine where the associated content is available so that links to the content can be returned in search results. If the information that a user is looking for is in the search results but the ranking is unsatisfactory, MSIT can adjust the Best Bets or use URL promotion or demotion accordingly. For more information, see Best bets and URL promotion.

  • Annual user survey regarding the MSW enterprise portal. This survey includes several questions to help measure user satisfaction with the enterprise search service. Users can provide feedback on the effectiveness of the search service and on what kind of content is difficult to find.

MSIT also uses a third-party tool that monitors and tracks website usage. The tool collects and reports data on the number of visits to MSW and the queries that were performed there.

To view the white paper as a single article on TechNet, or to download it, see Improving enterprise search at Microsoft: How FAST Search Server 2010 for SharePoint Powers Worldwide Intranet Search at Microsoft (https://technet.microsoft.com/en-us/library/bb735129.aspx).