How to crawl content (Search Server 2008)
Updated: April 16, 2009
Applies To: Microsoft Search Server 2008
Unless otherwise noted, the information in this article applies to both Microsoft Search Server 2008 and Microsoft Search Server 2008 Express.
Before end users can use the enterprise search functionality in Microsoft Search Server 2008 to search for content, you must first crawl the content that you want to make available for the end users to query. For the purpose of this article, content is an item that can be crawled, such as a Web page, a Microsoft Office Word document, or a SharePoint site.
This article describes the basic process needed to get started crawling content and gives you links to articles that can help you by providing more information and procedures.
Create a content source A content source defines the type of repository that contains the content you want to crawl, the start addresses from which to start crawling, the behavior to use when crawling, and the crawling schedule. For information about creating a content source, see About content sources (Search Server 2008) and Add a content source to crawl SharePoint sites, Web sites, file shares, or Microsoft Exchange Server public folders (Search Server 2008).
Specify the credentials to use when crawling all URLs or a specific range of URLs By default, the default content access account uses Windows domain user credentials to crawl the content repositories that are defined by content sources. You can use a crawl rule to specify a different content access account, which can be a client certificate, forms credentials, a cookie, or a different content access account. For information about setting the default content access account, see Change the default content access account (Search Server 2008). For information about using a crawl rule, see Use crawl rules to determine what content gets crawled (Search Server 2008).
Configure proxy server settings for search When you crawl content that is hosted outside your network, you probably set up a proxy server to reach the host server. In this case, it is important to verify the settings for the proxy server and configure them in Search Server 2008. To do this, on the Search Administration page, under Crawling, click Proxy and timeouts. Usually, you only need to set this option once.
Start a full crawl You can begin by crawling small amounts of content defined in a particular content source to test your setup configuration. After you have a small amount of content working, increase your criteria to build your index. For information about starting a full crawl, see Start a full crawl (Search Server 2008).
View the crawl log During the crawl, we recommend that you view the crawl log to check on its progress. This allows you to confirm that the crawl is successful or to detect problems. Common problems are that the authorization fails or that the host is unreachable. When you see problems in the log file, you can stop the crawl, adjust the settings on the Manage Content Sources, Manage Crawl Rules, and Manage Farm-Level Search Settings pages, and then try the crawl again. If you encounter problems with federated locations, see Repair federated locations (Search Server 2008).