Use crawl rules to specify a different content access account or authentication method (Office SharePoint Server 2007)

Applies To: Office SharePoint Server 2007

This Office product will reach end of support on October 10, 2017. To stay supported, you will need to upgrade. For more information, see , Resources to help you upgrade your Office 2007 servers and clients.

 

Topic Last Modified: 2016-11-14

Before you perform the procedures in this article, confirm that:

In Microsoft Office SharePoint Server 2007, you can create new crawl rules or edit existing crawl rules to specify a different content access account or authentication method to use when crawling a particular path. You can also specify the order in which crawl rules are applied.

Note

The path describes the namespace (typically a URL) that is affected by the rule. For example, the path can be a specific URL — for example, http://contoso — or can include wildcards —for example, ://.txt includes every document with the .txt file name extension.

In this article:

  • Crawling sites that use forms-based authentication

  • Create a crawl rule

  • Edit a crawl rule

  • Delete a crawl rule

  • Reorder crawl rules

Important

You must be a shared services administrator to perform the following procedures. For more information, see Plan for security roles (Office SharePoint Server).

Crawling sites that use forms-based authentication

Note

The information in this section applies only to server farms with the Infrastructure Update for Microsoft Office Servers installed. For more information, see Description of the Microsoft Office Servers Infrastructure Update (https://go.microsoft.com/fwlink/?LinkID=121886).

Office SharePoint Server 2007 supports crawling sites that use forms-based authentication (FBA), when FBA is implemented by using an input Submit type. Office SharePoint Server 2007 does not support crawling content on sites that have logon pages that contains a series of forms that span multiple pages (wizard-based forms), or forms that use dynamic content rendered by using AJAX, JavaScript, or other dynamic scripting methods. Sites that use the following types of forms-based authentication forms are not supported:

  • Wizard-style logon pages   Office SharePoint Server 2007 does not crawl sites that use a series of screens to authenticate users. These wizard-style forms present one or more pages based on the user's input in a form on a previous page. Because Office SharePoint Server 2007 cannot crawl multiple logon pages, the creation of a crawl rule for a site that uses this type of authentication is not supported.

  • Logon forms that change dynamically   Office SharePoint Server 2007 does not crawl sites that have logon pages that change dynamically, because they are designed to use technologies such as AJAX. A logon screen that uses AJAX can present new options to a user without a visible postback — in other words, scripting enables the display of new data without the need to refresh the page in the browser. When a user interacts with a logon page that uses this technology, he or she might type a password, and then be presented with a new form to answer a security question, without seeing the page refresh in the browser. The creation of a crawl rule for a site that uses this type of design is not supported.

Create a crawl rule

Use the following procedure to create a crawl rule that specifies a different content access account or authentication method to use when crawling a particular path.

Create a crawl rule

  1. Complete one of the following steps depending on the status of your installation.

    • If the Infrastructure Update for Microsoft Office Servers is installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search administration.

      On the Search Administration page, on the Quick Launch, in the Crawling section, click Crawl rules.

    • If the Infrastructure Update for Microsoft Office Servers is not installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search settings.

      On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.

  2. On the Manage Crawl Rules page, click New Crawl Rule.

  3. On the Add Crawl Rule page, in the Path section, in the Path box, type the path affected by this rule. You can use standard wildcard characters in the path. For example, you can type:

    • http://server1/folder* to include all Web resources with a URL that starts with http://server1/folder.

    • *://*.txt to include every document with the .txt file name extension.

  4. In the Crawl Configuration section, to prevent a folder or subsite in the path from being crawled, click Exclude all items in this path.

  5. To select whether items in the path are included, click Include all items in this path, and then select any combination of the following check boxes:

    • **Follow links on the URL without crawling the URL itself   **

      Select this check box if you want the links on the logon page to be crawled, but you do not want the text on the logon page to be indexed.

    • **Crawl complex URLs (URLs that contain a question mark (?))   **

      Select this check box if you want to crawl URLs that use parameters to display additional content.

    • Crawl SharePoint content as Http pages

      Normally, content on SharePoint sites is crawled by using a special protocol. Select this check box if you want content on SharePoint sites to be crawled as HTTP pages instead.

    Note

    When the content is crawled by using the HTTP protocol, the item permissions are not stored.

  6. In the Specify Authentication section, do one of the following:

    Note

    To select any of the options in this section, make sure to click Include all items in this path under Crawl configuration.

    • To use the default content access account when crawling URLs affected by this crawl rule, select Use the default content access account.

    • If you want to use a different content access account, select Specify a different content access account, and then do the following:

      In the Account box, type the account name that can access the paths defined by this crawl rule—for example, user_name or DOMAIN\user_name.

      In the Password and Confirm Password boxes, type the password for the account.

      If you want to prevent Basic authentication from being used, select the Do not allow Basic Authentication check box. Otherwise, if you want to use Basic authentication, clear the Do not allow Basic Authentication check box.

      Note

      You cannot use Basic authentication, if the domain account assigned to the content access account that is used to crawl the content affected by this crawl rule is from a different domain than your server farm.

    • To use a client certificate for authentication, select Specify client certificate, and then on the Certificate menu, click a certificate.

    • To use forms-based authentication, click Specify forms credentials, type the form location in the Form URL box, and then click Enter Credentials. Note that this option is available only if the Infrastructure Update for Microsoft Office Servers has been installed on your server farm.

      Note

      Office SharePoint Server 2007 with the Infrastructure Update for Microsoft Office Servers supports crawling sites that use forms-based authentication (FBA), when FBA is implemented by using an input Submit type. Office SharePoint Server 2007 with the Infrastructure Update for Microsoft Office Servers does not support crawling content on sites whose logon pages contain a series of forms that span multiple pages (wizard-based forms) or forms that use dynamic content rendered by using AJAX, JavaScript, or other dynamic scripting methods.

    • To use cookie authentication, click Use cookie for crawling, and then do one of the following:

      Note

      This option is available in Office SharePoint Server 2007 only if the Infrastructure Update for Microsoft Office Servers is installed.

      • To obtain a cookie from a URL, type the full location in the Obtain cookie from a URL box, and then click Get Cookie.

      • To select a specific cookie from your computer or your network, click Specify cookie for crawling, click Browse, and then select the cookie to be used.

      • To specify the error pages that display when a cookie is expired, in the Error Pages (semi-colon delimited) box, type the URLs for the pages, separated by semi-colons.

  7. Click OK.

Edit a crawl rule

You can edit an existing crawl rule at any time by going to the Manage Crawl Rules page, clicking the crawl rule, and then making the necessary changes to the path and configuration, as described in the previous procedure.

Delete a crawl rule

Use the following procedure to delete a crawl rule that you no longer need.

To delete a crawl rule

  1. Complete one of the following steps depending on the status of your installation.

    • If the Infrastructure Update for Microsoft Office Servers is installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search administration.

      On the Search Administration page, on the Quick Launch, in the Crawling section, click Crawl rules.

    • If the Infrastructure Update for Microsoft Office Servers is not installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search settings.

      On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.

  2. On the Manage Crawl Rules page, point to the crawl rule that you want to delete, click the arrow that appears, and then click Delete on the menu that appears.

  3. Click OK to confirm the deletion.

Reorder crawl rules

After you create new crawl rules or edit existing ones, we recommend that you specify the order in which you want the rules to be applied when content is crawled. Crawl rules are applied in the order in which they are listed. Therefore, if two rules cover the same or overlapping content, the first rule that is listed is applied. Use the following procedure to specify the order of your crawl rules.

Reorder crawl rules

  1. Complete one of the following steps depending on the status of your installation.

    • If the Infrastructure Update for Microsoft Office Servers is installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search administration.

      On the Search Administration page, on the Quick Launch, in the Crawling section, click Crawl rules.

    • If the Infrastructure Update for Microsoft Office Servers is not installed, in Central Administration, on the Quick Launch, in the Shared Services Administration group, click a shared service.

      On the Shared Services Administration page, in the Search section, click Search settings.

      On the Configure Search Settings page, in the Crawl Settings section, click Crawl rules.

  2. On the Manage Crawl Rules page, in the Order column in the list of crawl rules, select a value in the list that specifies the position you want the rule to occupy. Other values are shifted accordingly.