Manage crawl load (SharePoint Server 2010)
Published: November 1, 2011
Summary: Reduce crawler impact on a SharePoint farm by directing traffic to a dedicated Web server and by using Resource Governor to limit CPU usage.
Microsoft SharePoint Server 2010 supports dedicated crawl load management. Crawling is a resource-intensive process that can overload a SharePoint farm. You can manage the search system by scheduling crawls to occur at times when the farm is not being heavily used, and also by configuring the system to crawl as described in this article. Crawl load management can help resolve and prevent performance issues in which a SharePoint farm is accessed at the same time by users and the crawler. This is most often seen in large environments, environments that have large volumes of user requests, and where frequent crawls occur.
By default, the SharePoint Server 2010 crawler crawls all available Web front-end computers in a SharePoint farm through the network load balancer in that farm. Therefore, when a crawl is occurring, the crawler can cause increased network traffic, increased usage of hard disk and processor resources on Web front-end computers, and increased usage of resources on database servers. Putting this additional load on all Web front-end computers at the same time can decrease performance across the SharePoint farm.
This decrease in performance occurs only on the SharePoint farm that is serving user requests, and not on the SharePoint search farm. This decreased performance can cause delayed response times on the Web front-end computers and delayed response times for the overall farm. The decreased performance might not be diagnosed by specific logs, resource counters, or standard monitoring.
You can reduce the effect of crawling on SharePoint performance by doing the following:
Redirect all crawl traffic to a single SharePoint Web front-end computer in a small environment or a specific group of computers in a large environment. This prevents the crawler from using the same resources that are being used to render and serve Web pages and content to active users.
Limit search database usage in Microsoft SQL Server 2008 R2, SQL Server 2008 with Service Pack 1 (SP1) and Cumulative Update 2, and SQL Server 2005 with SP3 and Cumulative Update 3 to prevent the crawler from using shared SQL Server 2008 R2, SQL Server 2008 with SP1 and Cumulative Update 2, and SQL Server 2005 with SP3 and Cumulative Update 3 disk and processor resources during a crawl.
In Microsoft Office SharePoint Server 2007, you could use Central Administration to redirect crawler traffic to a dedicated Web front-end server. However, in Microsoft SharePoint Server 2010, you must use Windows PowerShell to redirect crawler traffic.
Redirect crawler traffic to a dedicated Web front-end server
This procedure redirects crawler traffic to a dedicated Web front-end server. Before performing this procedure, make sure that the server is removed from network load balancing.
The dedicated Web front-end computer must be online for successful crawls to occur. If the dedicated Web front-end server goes offline, crawling is not automatically re-directed to another computer, and it will fail after 10 minutes. To prevent this, you can configure multiple dedicated Web front-end computers as crawl targets.
To configure a dedicated Web front-end server to be a crawl target
Verify that you meet the following minimum requirements: See Add-SPShellAdmin. Also verify that the user account that is performing this procedure is a member of the Farm Administrators group.
At the Windows PowerShell command prompt, run the script in the following example:
$listOfUri = new-object System.Collections.Generic.List[System.Uri](1) $zoneUrl = [Microsoft.SharePoint.Administration.SPUrlZone]'Default' $webAppUrl = "<Default Zone FQDN URL>" $webApp = Get-SPWebApplication -Identity $webAppUrl $webApp.SiteDataServers.Remove($zoneUrl) ## By default this has no items to remove $URLOfDedicatedMachine = New-Object System.Uri("<Dedicated crawl target URL>") $listOfUri.Add($URLOfDedicatedMachine); $webApp.SiteDataServers.Add($zoneUrl, $listOfUri); $WebApp.Update()
Verify that the Web front-end server is configured for crawling by running the following script at the Windows PowerShell command prompt:
$WebApplication=Get-SPWebApplication <Web application URL> $WebApplication | fl SiteDataServers
If this returns any values, the Web application uses a dedicated Web front-end server.
When a Web front-end server is dedicated for search crawls, you can remove the throttling configuration that would otherwise limit the load the server accepts from requests and services. You can remove throttling from a server by running the following script at the Windows PowerShell command prompt:
$svc=[Microsoft.SharePoint.Administration.SPWebServiceInstance]::LocalContent; $svc.DisableLocalHttpThrottling=$true; $svc.Update();
To reset a dedicated Web front-end server
If you have to roll back this change so that all Web front-end servers are crawled, you can run the following script at the Windows PowerShell command prompt:
$zoneUrl = [Microsoft.SharePoint.Administration.SPUrlZone]'Default' $webAppUrl = "<Your Default Zone FQDN URL>" $webApp = Get-SPWebApplication -Identity $webAppUrl $webApp.SiteDataServers.Remove($zoneUrl); $WebApp.Update()
Limit search database usage with Resource Governor
Resource Governor is a technology introduced in SQL Server 2008 that enables you to manage SQL Server workloads and resources by specifying limits on resource consumption by incoming requests. Resource Governor enables you to differentiate workloads and allocate CPU and memory as they are requested, based on the limits that you specify. It is available only in SQL Server 2008 or SQL Server 2008 R2 Enterprise edition. For more information about using Resource Governor, see Managing SQL Server Workloads with Resource Governor (http://go.microsoft.com/fwlink/p/?linkid=129385).
We recommend that you use Resource Governor with SharePoint Server 2010 to do the following:
Limit the amount of SQL Server resources that the Web servers targeted by the search crawl component consume. As a best practice, we recommend limiting the crawl component to 10 percent CPU when the system is under load. For more information, see the procedure To configure Resource Governor for limiting CPU usage (Transact-SQL) in How to: Use Resource Governor to Limit CPU Usage by Backup Compression (Transact-SQL).
Monitor how many resources are consumed by each database in the system — for example, you can use Resource Governor to help you determine the best placement of databases among computers that are running SQL Server. For more information, see Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).