Manage crawl rules (Office SharePoint Server)

Applies To: Office SharePoint Server 2007

This Office product will reach end of support on October 10, 2017. To stay supported, you will need to upgrade. For more information, see , Resources to help you upgrade your Office 2007 servers and clients.

 

Topic Last Modified: 2009-08-10

You can add a crawl rule to include or exclude specific paths when you crawl content. When you include a path, you can optionally provide alternative account credentials to crawl it. In addition to adding new crawl rules, you can test, edit, delete, or reorder existing crawl rules.

Crawl rules are applied in the order that they are listed.

To manage crawl rules, you must first open the Manage Crawl Rules page:

  1. Open the administration page for the Shared Services Provider (SSP).

    To open the administration page for the SSP, do the following:

    1. On the top navigation bar, click Application Management.

    2. On the Application Management page, in the Office SharePoint Server Shared Services section, click Create or configure this farm’s shared services.

    3. On the Manage this Farm’s Shared Services page, click the SSP whose administration page you want to open.

  2. On the Shared Services Administration Home page, in the Search section, click Search settings.

  3. On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.

What do you want to do?

  • Add a crawl rule

  • Test crawl rules on a URL

  • Edit a crawl rule

  • Delete a crawl rule

  • Reorder crawl rules

Add a crawl rule

  1. On the Manage Crawl Rules page, click New Crawl Rule.

  2. On the Add Crawl Rule page, in the Path box in the Path section, type the path affected by the rule. You can use standard wildcard characters in the path. For example:

    • http://server1/folder* contains all Web resources with a URL that starts with http://server1/folder.

    • *://*.txt contains every document with the txt extension.

  3. In the Crawl Configuration section, select one of the following:

    • Exclude all items in this path. Select this option if you want all items in the specified path to be excluded from the crawl.

    • Include all items in this path. Select this option if you want all items in the path to be crawled. If you select this option, you can further refine the inclusion by selecting any combination of the following:

    • Follow links on the URL without crawling the URL itself. Select this option if you want to crawl links contained within the URL, but not the URL itself.

    • Crawl complex URLs (URLs that contain a question mark (?)). Select this option if you want to crawl URLs that contain parameters that use the question mark (?) notation.

    • Crawl SharePoint content as HTTP pages. Normally, SharePoint content is crawled by using a special protocol. Select this option if you want SharePoint content to be crawled as HTTP pages instead. When the content is crawled by using the HTTP protocol, item permissions are not stored.

  4. In the Specify Authentication section, do one of the following:

    • To use the default content access account, select Use the default content access account (NT AUTHORITY\LOCAL SERVICE).

    • If you want to use a different account, select Specify a different content access account and then do the following:

    1. In the Account box, type the account name that can access the paths defined by this crawl rule. Examples are user_name and DOMAIN\user_name.

    2. In the Password and Confirm Password boxes, type the password for this account.

    3. To prevent Basic authentication from being used, select the Do not allow Basic Authentication check box. The server attempts to use Windows NTLM authentication. If NTLM authentication fails, the server attempts to use Basic authentication unless the Do not allow Basic Authentication check box is selected.

    • To use a client certificate for authentication, select Specify client certificate, and then click a certificate on the Certificate menu.
  5. Click OK.

Test crawl rules on a URL

You can test crawl rules on a URL to determine what rules will be applied when the URL is crawled and what the result of applying those rules will be (either inclusion or exclusion of content). Testing crawl rules on a URL does, however, not actually crawl the URL.

  1. On the Manage Crawl Rules page, in the Type a URL and click test to find out if it matches a rule box, type the URL that you want to test.

  2. Click Test.

  3. The result of the test is listed below the Type a URL and click test to find out if it matches a rule box.

Edit a crawl rule

If you edit a crawl rule, the changes do not take effect until the next full crawl is started.

  • On the Manage Crawl Rules page, in the crawl rules list, click Edit on the menu of the crawl rule that you want to edit.

    You can find information about the settings for crawl rules in the Add a crawl rule section.

Delete a crawl rule

If you delete a crawl rule, the deletion is not reflected until the next full crawl is started.

  1. On the Manage Crawl Rules page, in the crawl rules list, click Delete on the menu of the crawl rule that you want to delete.

  2. Click OK in the message box confirming that you want to delete the crawl rule.

Reorder crawl rules

  • On the Manage Crawl Rules page, in the Order column in the list of crawl rules, select a value in the drop-down list that specifies the position you want the rule to occupy. Other values are shifted accordingly.

    Crawl rules are applied in the order that they are listed. Therefore, if two rules cover the same or overlapping content, the first rule that is listed is applied.

    You can also use a global exclusion rule, which applies regardless of the order in which it is listed. For more information about administrating crawl rules, see the Administrating Crawl Rules section in the following resource: Book Excerpt - Chapter 16 Enterprise search and indexing architecture and administration.