SharePoint Diagnostic Studio 2010 (SPDiag 3.0) (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

Microsoft SharePoint Diagnostic Studio 2010 (SPDiag version 3.0) was created to simplify and standardize troubleshooting of Microsoft SharePoint 2010 Products, and to provide a unified view of collected data. Administrators of SharePoint 2010 Products can use SPDiag 3.0 to gather relevant information from a farm, display the results in a meaningful way, identify performance issues, and share or export the collected data and reports for analysis by Microsoft support personnel.

The SharePoint 2010 Products platform is highly complex and can be used for a wide variety of uses. Deploying, managing, and troubleshooting SharePoint 2010 Products requires extensive knowledge spanning multiple technology areas, including security, networking, Web technologies such as ASPX, and Microsoft SQL Server.

Traditionally, troubleshooting SharePoint 2010 Products involves manually collecting a wide array of data from servers in the affected farm, and then manually analyzing the data to determine the source of the problem. This process can be complex and time-consuming, and data collection itself can place a significant load on the servers.

SPDiag greatly simplifies the troubleshooting process by providing a single interface for data collection and presentation in a series of preconfigured reports that cover a wide range of data points commonly used to diagnose SharePoint performance and capacity-related issues. Although most common troubleshooting scenarios are addressed by SPDiag, some SharePoint issues might require analysis of additional data not collected by SPDiag.

In this article:

  • What's new in SPDiag 3.0

  • Installing and configuring SPDiag 3.0

  • Using SPDiag 3.0

  • Known issues

What's new in SPDiag 3.0

The SharePoint Diagnostic Studio 2010 (SPDiag version 3.0) contains several important updates and new features that increase its effectiveness as a troubleshooting tool. SPDiag 3.0 is a new version and might not include some functionality present in previous versions of SPDiag.

The following is a list of new features and changes in SPDiag 3.0:

  • Preconfigured reports   SPDiag provides a selection of preconfigured reports that aggregates data from the SharePoint farm present useful views into common SharePoint troubleshooting scenarios. For more information, see Using preconfigured reports later in this article.

  • Snapshots   You can take snapshots of your farm that aggregate report images, farm topology information, Unified Logging Service (ULS) logs, and usage database data. This makes it easy to consolidate key troubleshooting information about a SharePoint farm and share it with other users or preserve it for comparison and trend analysis.

  • Improved integration with SharePoint Server   Enhanced data collection from more sources.

Installing and configuring SPDiag 3.0

SPDiag is included in the Microsoft SharePoint 2010 Administration Toolkit v2. To download the toolkit, see SharePoint 2010 Administration Toolkit (SharePoint Server 2010).

You can install SPDiag on a farm server, or on a remote computer that is not part of the farm. You must be logged in with a user account that has farm administration privileges to create a new project or access an existing project.

Some SPDiag 3.0 diagnostics jobs require that the farm account has the sysadmin or sqladmin role assigned on the SQL Server instance where the SharePoint 2010 Products databases are located.

To install SPDiag 3.0, select SharePoint Diagnostic Studio from the SharePoint 2010 Administration Toolkit v2 component installation menu. Then, use the following procedure to configure the client computer and the SharePoint Server farm for use with SPDiag.

To configure the client computer and the SharePoint Server farm for use with SPDiag

  1. On the computer on which you are installing SPDiag, install the .NET Framework 3.5.

  2. Optionally, install the latest SharePoint Server 2010 service pack or cumulative update (CU) on all farm servers to ensure that the latest performance upgrades have been installed. The August 2010 SharePoint Server 2010 CU, in particular, includes usage database updates that might considerably improve the performance of certain SPDiag reports.

  3. On the computer on which you are installing SPDiag, install the Microsoft Chart Controls for the Microsoft .NET Framework 3.5.

  4. If you are installing SPDiag on a remote client computer, you must enable Windows PowerShell remoting and the remotesigned execution policy on the farm server to which you will connect SPDiag.

    Important

    The remotesigned execution policy must be enabled on the farm server to which you will connect SPDiag even when you are installing SPDiag on the farm server.

    In Windows PowerShell on the target server, run the following cmdlets, and enter Yes when prompted:

    1. Enable-PSRemoting -force

    2. Enable-WSManCredSSP -role Server -force

    3. Set-Item WSMan:\localhost\Shell\MaxMemoryPerShellMB 1000

    4. Set-ExecutionPolicy RemoteSigned

  5. If you are installing SPDiag on a remote client computer, enable Windows PowerShell remoting on the client computer. In Windows PowerShell on the client computer, run the following cmdlets, and enter Yes when prompted:

    1. Enable-PSRemoting -force

    2. Enable-WSManCredSSP -role Client -DelegateComputer “<target_computer>” -force

      Note

      The value for <target_computer> should be the host name of the SharePoint Server Web server to which you want to connect.

  6. Make sure that Usage and Health Data Collection has been configured on the target farm. The SPDiag diagnostic provider collects data from the usage database. If the usage database has not been provisioned prior to using SPDiag, you will see the error “Usage Database is not provisioned. Please provision it first.”

For information about how to configure usage and health data collection, see Configure usage and health data collection (SharePoint Server 2010).

Using SPDiag 3.0

SPDiag 3.0 is a diagnostic tool that is used to collect, filter, and display data from a SharePoint farm for troubleshooting. SPDiag is a read-only tool, and cannot make any changes to a farm. You can use SPDiag to help you discover problems yourself, or as a way to collect the data needed by support personnel to help troubleshoot a farm.

The information in this section will help you understand how to create and import projects, filter and collect data, generate graphs and reports, and export data to a file.

SPDiag collects and aggregates data from ULS logs, Windows Event logs, performance counters, SharePoint logs, and SQL databases and then displays that data in various preconfigured reports that were designed to expose specific capacity and performance characteristics and trends.

In this section:

  • Working with projects

  • The SPDiag user interface

  • Using preconfigured reports

Working with projects

An SPDiag project consists of a collection of data from SharePoint Server, Internet Information Services (IIS), ULS and event logs, and performance counter log data from farm servers. Project metadata is stored in a .ttfarm file on the local computer. A project can be saved indefinitely, and data in the project can be exported in several ways for archival or to share with others.

Creating a new project

Before you can begin using SPDiag to troubleshoot a farm, you must create a new project. When you create a project, a .ttfarm file is created on the computer where SPDiag is installed, and several tables are created in the farm’s usage database.

Create a new project

  1. In the SPDiag application window, click New Project.

  2. In the Create Project dialog box, type the host name of the server to which you want to connect, and then click Create Project.

    Tip

    In certain environments, the connection might fail if you do not use the FQDN of the target server.

  3. In the Windows PowerShell Credential Request window, type a user account and password with farm administrator privileges on the target SharePoint Server farm, and then click OK.

  4. The new project is created, and the overview window appears in the main SPDiag window.

Opening a project

To open a project, you must have access to the .ttfarm file for the project. If the .ttfarm file was created on another computer running SPDiag, you should ensure that the computer you are using to open the project has been properly configured by using the guidance in the Installing and configuring SPDiag 3.0 earlier in this article.

Also, you must be logged in to an account with farm administrator credentials, or enter an account when prompted.

Open a project

  1. In the SPDiag application window, click Open Project.

  2. In the Open dialog box, browse for the .ttfarm file that you want, select it, and then click Open.

The SPDiag user interface

The SPDiag application is divided into four main sections: menu bar, Guide pane, Reports pane, and Report Display pane, each of which is described in this section.

The menu bar appears at the top of the application window.

Menu bar

  • New Project   Creates a new SPDiag project.

  • Open Project   Opens an existing SPDiag project from a .ttfarm file.

  • Take Snapshot   Creates a snapshot of the farm that consists of PNG images of all open reports, a text document that contains information about the farm topology, and log files from the snapshot process. There are two kinds of snapshot available: Light and Full.

    • Light Snapshot   Exports the currently open reports and farm topology information

    • Full Snapshot   All Light Snapshot data, and data from the ULS logs and SharePoint usage database for a specified time range. When Full Snapshot is selected, you can use the Start Time and End Time fields to specify the time range for ULS log and usage database data collection.

  • Search   If you are looking for a specific request and you know the correlation ID or user account from which the request originated, click this button to open the Search dialog box. In the Search dialog box, you can enter the correlation ID, user account, and estimated date and time of the request, which begins the search from that point.

  • Assign Permission   You can provision SharePoint farm permissions to a specific user account or group to enable access to SPDiag.

Guide pane

The Guide pane appears at the middle left of the application window. It displays information about each report, including a description of the displayed data, instructions about how to manipulate and filter the data, and troubleshooting guidance specific to the report when it is available. Some reports include guidance on how to identify problems and offer suggestions for resolving them.

Reports pane

The Reports pane appears at the lower-left corner of the application window.

Reports pane

The Reports pane is an expandable menu that contains all of the reports that you can view. Click each node to expand the section and show the contained reports, and double-click on a report to open it in the Report Display pane.

When you use the Save button on the report toolbar to save a report, it will appear in the Customs node at the bottom of the Reports pane.

For more information and a full list of available reports, see Using preconfigured reports later in this article.

Report Display pane

The Report Display pane consists of the main part of the application window. When a project is created or opened, the overview report is displayed.

Report Display pane

Two primary reports are displayed as graphs in the overview report: Availability and Latency Percentiles.

  • Availability report   Charts the availability of the HTTP Web Service.

  • Latency Percentiles report   Indicates how much time it takes to render the fastest and most common requests.

As with all graphs in SPDiag, you can zoom into a specific time range by selecting the area on the graph by using the mouse. Doing this in the overview report opens the selected report in a new tab in the main SPDiag window that contains results from the selected area.

When you open a report by double-clicking it in the Reports pane, SPDiag collects the required data from the farm servers and displays it in a new tab in the main window. This display consists of three components: report toolbar, Filter pane, Data Display pane.

Report toolbar

The report toolbar appears at the top of each open report tab.

Report toolbar

The toolbar provides tools for manipulating report data, including tools to refresh, save, export, and change the displayed time scale of reports.

  • Refresh   Requests fresh data from the farm servers.

  • Save   Saves the current report to an XML file that has an SPR extension. Files are saved to the C:\Users\Administrator\Documents\SharePoint Diagnostic Studio\Custom Reports folder on the SPDiag client computer. These files can be opened in any XML or text editor.

  • Hour   Calibrates the data display to span the last hour. This button and the other three time calibration buttons automatically set the end time for the range to the current time.

  • 6 Hours   Calibrates the data display to span the last six hours.

  • 12 hours   Calibrates the data display to span the last 12 hours.

  • Day   Calibrates the data display to span the last 24-hour period.

  • Open Log   If data from log files is included in a report, you can select the log from the Filter pane, and then click the Open Log button to display the contents of the raw log file.

  • Export   Exports the current report to a PNG image file in the C:\Users\Administrator\Documents\SharePoint Diagnostic Studio\Exported Reports\<date and time> folder on the SPDiag client computer. The name of the final folder is dynamically generated by the date and 24-hour time that the report was exported in the format year.month.day-hour.minute.second. For example, a report exported on March 31, 2011, at 6:11 PM and 22 seconds would be saved to a folder named 2011.3.31-18.11.22.

Filter pane

The Filter pane provides report-specific fields that enable you to filter report data and date and time ranges that you want to display. Click a field to change its value and update the report data.

Filter pane

Data Display pane

The Data Display pane shows graphs, charts, tables, and log file data for the report that is being displayed.

Data Display pane

Some reports contain a table at the top half of the pane that contains a list of objects, and a separate pane at the bottom that shows details about objects that you select from the table. For example, in the Timer Jobs report, a list of timer jobs is displayed in the top half of the Data Display pane. Selecting a timer job from the list displays detailed ULS trace log information about that job in the lower half of the pane.

Dual Display pane

You can also right-click a field to filter the report on that value by using the QuickFilter feature. To update the report, right-click a field, and then select a function from the QuickFilter menu.

QuickFilter menu

  • =   Filter by the exact value.

  • <>   Filter by all values within the selected range.

  • >   Filter by all values greater than the selected value.

  • <   Filter by all values smaller than the selected value.

When the data for a specific record in a report indicates that a problem exists, a red circle with an exclamation point will appear at the beginning of the record. Hovering the mouse pointer over the red circle will cause a tooltip to display that contains information about the problem.

Error tooltip

When a report contains a graphical display element, such as a chart or graph, you can select areas of the graph by using the mouse to zoom in on a specific time range. To zoom out to the original time range, right-click anywhere on the graph.

Note

Data collection can take a long time, especially when there is significant network latency between the SPDiag client computer and the farm servers, or during times when the farm is under heavy load. There is no way to cancel the process of collecting data and rendering the report after it has begun, and SPDiag will be unresponsive until this process is completed.

Using preconfigured reports

SPDiag provides various reports that present data from logs, SharePoint databases, and performance counters. Reports collect data from the SharePoint farm and display aggregated information that is focused on specific aspects of farm performance.

You can also begin your investigation by directly opening a report from the Reports pane displayed at the bottom of the guide window. These reports are described in the following paragraphs.

Base report group

The Base report group contains several reports that display information about key general performance indicators.

HTTP Requests

This report displays all HTTP requests across the farm. When you select a row from the top report, the full trace from the request is fetched and displayed in the bottom pane.

Right-click any cell to add a filter with the column name and value of your selection. Alternatively, you can filter these results by using the filter list at the top of the report. When using the LIKE operator, '%' is treated as a wildcard character.

Click any column header to sort by that column. For example, to find the slowest requests, click the Duration column. Click the header again to reverse the results order.

Once you have customized the filter list, you can save the report so you do not have to generate it again. Saved reports can be found under the Customs node in the Reports pane at the lower-left side of the screen. The next time that you load the report, it will restore the saved sorts and filters, and apply them to the new data.

To save the current set of results to share or to view in a spreadsheet, click the Export button.

Windows Events

This report displays critical and SharePoint-related events from the Windows Event logs on all computers in the farm. Use this report to look for critical issues that occurred during the specified time frame.

Right-click any cell to add a filter with the column name and value of your selection. Alternatively, you can filter these results by using the filter list at the top of the report. When using the LIKE operator, '%' is treated as a wildcard character.

Click any column header to sort by that column.

Once you have customized the filter list, you can save the report so you do not have to generate it again. Saved reports can be found under the Customs node in the Reports pane at the lower-left side of the screen. The next time that you load the report, it will restore the saved sorts and filters, and apply them to the new data.

To save the current set of results to share or to view in a spreadsheet, click the Export button.

ULS Trace Issues

This report displays problems detected in the Unified Logging Service (ULS) trace logs. High-level traces that occur at the time of an issue might provide clues as to the cause. When you select a row from the top report, the full trace from the request or timer job is fetched and displayed in the bottom pane.

Right-click any cell to add a filter with the column name and value of your selection. Alternatively, you can filter these results by using the filter list at the top of the report. When using the LIKE clause, '%' is treated as a wildcard character.

Click any column header to sort by that column. For example, to find the slowest requests, click the Duration column. Click the header again to sort in the other direction.

Once you have customized the filter list, you can save the report so you do not have to generate it again. Saved reports can be found under the Customs node in the Reports pane at the lower-left side of the screen. The next time that you load the report, it will restore the saved sorts and filters, and apply them to the new data.

To save the current set of results to share or to view in a spreadsheet, click the Export button.

Timer Jobs

This report displays all Timer Job executions. When you select a row from the top report, the full trace from the timer job is fetched and displayed in the bottom pane.

Right-click any cell to add a filter with the column name and value of your selection. Alternatively, you can filter these results by using the filter list at the top of the report. When using the LIKE clause, '%' is treated as a wildcard character.

Click any column header to sort by that column. For example, to find the slowest jobs, click the Duration column. Click the header again to sort in the other direction.

Once you have customized the filter list, you can save the report so you do not have to generate it again. Saved reports can be found under the Customs node in the Reports pane at the lower-left side of the screen. The next time that you load the report, it will restore the saved sorts and filters, and apply them to the new data.

To save the current set of results to share or to view in a spreadsheet, click the Export button.

Performance Counters

This report shows key performance counter data over time for counters that are collected in the Usage database.

Filter by a specific Category, Counter, Instance, or Machine by using the custom filter controls. Select a category of interest, and the additional three filter controls (Counter, Instance, Machine) will be dynamically populated with relevant results. Once filtering selections have been made, click Refresh to regenerate the report. Place the pointer over a series in the chart to determine which Category, Counter, Instance, or Machine is related to the value.

You can use the Add-SPDiagnosticsPerformanceCounter Windows PowerShell cmdlet to add other performance counters to servers in the SharePoint farm. Any new counters that you add will automatically be included in the SharePoint Diagnostic Studio dataset.

Capacity report group

The Capacity report group contains several reports that display information about farm capacity indicators.

SQL Server Query IO Over Time

This report shows expensive stored procedures input/output (I/O) over time.

The top graph shows the five most expensive queries or stored procedures over time based on the SQL Dynamic Management views.

The bottom table contains detailed information about the expensive queries, including total I/O for the time period selected, average I/O per call, execution count, and CPU cost.

Take notice of any spikes or grouping of spikes and the time when they occurred. Such spikes can indicate an expensive stored procedure call or a bad execution plan.

Look for queries that have high values in the Execution Count and Total IO columns paired with low values in the Average IO column. Such queries might be being called to many times.

Click the Export button to save the execution plan to a file from which it can be more easily read.

CPU

This report shows the processor usage, expressed in percentage of total processor capacity, consumed by each process on each farm server over time. The data that is used to populate this graph is from the | Processor | % Processor Time | _Total performance counter.

Note

Data from performance counters is available only from the date and time that the SPDiag project was opened for the target farm.

Process Memory (MB)

This report shows the available physical memory, expressed in megabytes (MB) available, on each farm server over time. The data that is used to populate this graph is from the | Process | Private Bytes | <process name> performance counter.

Performance report group

The Performance report group contains several reports that display information about specific farm performance indicators related to latency and SQL Server.

SQL Read Intensive Traces

This report shows SQL Server queries that read more than 50,000 pages (1 page = 8 kilobytes).

Queries that read a large volume of data can cause SQL Server to respond slowly by forcing useful data out of memory and causing other queries to perform expensive physical reads. This can affect end-user latency for user operations that query data residing on the same SQL Server.

If a correlation ID is present in the query text, you can use it to find the request or timer job that generated the query. Copy the correlation ID into the filter field of the HTTP Requests or Timer Jobs reports.

Latency Tier Breakdown

This report shows a moving average of server-side HTTP request page latency over time. The time that is spent rendering the request is broken down into the three tiers that a typical HTTP request passes through during processing.

  • SQL Server   If SQL Server queries take longer than 250ms, use the SQL Overview report to identify SQL Server bottlenecks.

  • Application Servers   If service calls take longer than 250ms, you can use the Service Call Duration column in the HTTP Requests report to find the requests most affected by service calls.

  • Web Server   If there does not appear to be a bottleneck in either the SQL Server or application server tier, and requests take longer than 250ms on the Web server, use the Duration column in the HTTP Requests report to see the slowest requests overall. You can examine the Latency All Requests report to see whether the issue is isolated to a single computer. Finally, you can examine the CPU report to determine whether one or more Web servers or application servers are exhibiting excessive processor usage.

Changed Objects

This report displays all object types that have changed over a specific period of time based on information in the change log. The change log is a history of changes that have occurred in a content database. It provides search crawlers and other features a means of querying for only those changes that have occurred since a previous crawl.

Data points are collected once every k minutes (where, by default, k is 5). This report displays the data aggregated across all content databases as a stacked bar graph. Each stack represents a different object type (see the corresponding legend).

You can filter these results by the database or the object type. For example, you can customize the report to show only changed objects made on a database named contentdb1, assuming that it is in the filter dropdown. Similarly, you can customize the report to show only data for the changes that have the object type List to see all list-level changes.

This data can contribute to an overall understanding of what kinds of changes are occurring around a specific timeframe. From this data, you can further examine the HTTP Requests report to determine what requests are causing these changes, or you can look at the Changed Objects Per Database report to view the same data by using a different pivot. Additionally, you can look at the Change Types report and the Change Types Per Database report to look further into the kinds of changes being made to the objects.

If you want to save the current set of results to share, click the Export button.

Changed Objects Per Database

This report displays all object types that have changed in specific content databases over a specific period of time based on information in the change log. The change log is a history of changes that have occurred in a content database. It provides search crawlers and other features a means of querying for only those changes that have occurred since a previous crawl.

Data points are collected once every k minutes (where, by default, k is 5). This report displays the data aggregated across all content databases as a stacked bar graph. Each stack represents a different object type (see the corresponding legend).

You can filter these results by the database or the object type. For instance, you can customize the report to show only changed objects made on a database named contentdb1, assuming that it is in the filter dropdown. Similarly, you can customize the report to show only data for the changes with object type 'List' to see all list-level changes.

This data can contribute to an overall understanding of what kinds of changes are occurring around a specific timeframe. From this data, you can further look at the HTTP Requests report to determine what requests are causing these changes, or you can look at the Changed Objects report to view the same data by using a different pivot. Additionally, you can look at the Change Types report and the Change Types Per Database report to look further into the kinds of changes being made to the objects.

If you want to save the current set of results to share, click the Export button.

Change Types

This report displays all object types that have been changed over a specific period of time based on information in the change log. The change log is a history of changes that have occurred in a content database. It provides search crawlers and other features a means of querying for only those changes that have occurred since a previous crawl.

Data points are collected once every k minutes (where, by default, k is 5). This report displays the data aggregated across all content databases as a stacked bar graph. Each stack represents a different object type (see the corresponding legend).

You can filter these results by the database or the object type. For example, you can customize the report to show only changed objects made on a database named contentdb1, assuming that it is in the filter dropdown. Similarly, you can customize the report to show only data for the changes that have change type Rename to see all rename-related changes.

This data can contribute to an overall understanding of what kinds of changes are occurring in a specific timeframe. From this data, you can further examine the HTTP Requests report to determine what requests are causing these changes, or you can look at the Change Types Per Database report to view the same data by using a different pivot. Additionally, you can look at the Changed Objects report and the Changed Objects Per Database report to look further into the kinds of objects that are being changed.

If you want to save the current set of results to share, click the Export button.

Change Types Per Database

This report displays all object types that have been changed over a specific period of time based on information in the ChangeLog. Data points are collected once every k minutes (where, by default, k is 5). This report displays the data aggregated across all databases as a stacked bar graph. Each stack represents a different object type (see the corresponding legend).

You can filter these results by the database or the change type. For example, you can customize the report to show only change types made on database contentdb1, assuming that it is in the filter drop-down list box. Similarly, you can customize the report to show only data for the changes that have the change type Rename to see all rename-related changes.

This data is valuable to obtain an overall understanding of what kinds of changes are occurring around a specific timeframe, and the quantity of changes across each database. From this data, you can further look at the HTTP Requests report to determine what requests are causing these changes, or you can look at the Change Types report to view the same data by using a different pivot. Additionally, you can look at the Changed Objects report and the Changed Objects Per Database report to look further into the kinds of objects that are being changed.

If you want to save the current set of results to share, click the Export button.

Latency All Requests

This report plots the duration of all requests (up to a limit of 50,000).

Use this report to spot abnormal patterns in usage. For example, a poorly performing site might consistently take 5 seconds to load, which would present as a horizontal band at the 5-second mark. For a more detailed view, you can zoom in to a smaller area, go to the HTTP Requests report and look for requests that take around 5 seconds.

Latency spikes will appear as columns. If the spikes have a regular period, you can look at the Timer Jobs report to see whether a particular job runs during the same time.

Latency Percentiles

This report shows several key percentile thresholds over time to give you an idea of how many requests are affected by a particular latency spike.

For example, if the fastest 25 percent of all requests take 1 second or more, it is likely that an outage in some shared resource (such as the network or SQL Server computer) is affecting all requests. Use the Latency Tier Breakdown report to look for issues in shared resources.

On the other hand, if 75 percent of all requests complete quickly, but the 95th percentile is very high, you might have to look for a root cause that affects a smaller number of requests, such as blocking in a single database, or custom code that is only used by a subset of sites.

To see logs for the slowest requests, you can view the HTTP Requests report and sort the list by clicking the header of the Duration column.

You can also use the usage reports, such as the Requests Per User report and the Application Workload report, to look for users or applications that place an unexpected load on the network.

SQL Deadlocks

This report lists SQL Server deadlocks over time. SQL Server uses deadlock detection to prevent the server from hanging when two incompatible queries are executed. To resolve a deadlock, one or more of the queries are canceled. SharePoint Server can recover from some deadlocks and retry the affected queries. However, deadlocks can sometimes cause certain requests to fail.

SQL Blocking

This report lists SQL queries that have blocked other SQL queries.

Blocking can stop all activity on the farm. When blocked requests cannot be processed by the affected database, all available Web server memory will eventually be consumed, which will cause the affected servers to stop responding or crash.

This report displays the request or timer job responsible for generating the blocking query, if it is possible, and any associated logs. These can be useful if the block is caused by a specific end-user transaction. In such cases, restructuring a list or redesigning an application that uses custom queries might be indicated.

Some blocking cannot be avoided. For example, nightly database maintenance tasks necessarily lock large parts of a database.

Availability report group

The Availability report group contains several reports that display information about farm availability trends and issues.

Availability report

This report charts the availability of the HTTP Web Service. Drops in availability indicate periods when users might have been unable to access their SharePoint sites.

This report calculates availability dividing the number of successful Web requests by the total number of requests sent to the server. An attempt is made to remove requests coming from automated agents, such as a search crawler, from this calculation. However, some unknown automated agents might not be excluded.

Zoom in to a period of low availability by selecting it with your mouse. Subsequent reports used in this investigation will load faster if you select a smaller time range.

Once you have narrowed the time range, you can use the Failed User Requests report to examine details about requests that failed during the selected time period.

Crashes reduce availability by exiting the process without allowing requests to be completed. Because the process does not have an opportunity to write logs during a crash, requests that were running at the time of the crash will not appear in these reports, and their effect on availability will not be displayed in the graph. Regardless, crashes should always be investigated.

Scheduled Worker Process Recycles rarely reduce availability. The server will attempt to gracefully allow requests from one process to complete while at the same time starting another process to handle new requests. Frequent, unscheduled recycles during periods of higher than average traffic can cause some requests to fail if the server cannot respond to the increased demands of running multiple processes in parallel.

SQL Overview report

This report displays information that can help you understand the overall health of the SQL Server computers in your farm. The report focuses on the following three areas:

SQL Server Locking/Blocking

SQL Server query blocking can increase some SQL Server query duration values, and might contribute to availability issues and increased latency.

  • Average Lock Wait Time   Locks are held on SQL Server resources, such as rows read or modified during a transaction, to prevent concurrent use of resources by different transactions. For example, an update will hold an XLOCK and it will block a shared read lock. A high lock wait time means there is a blocking issue in the SQL Server tier, and you should pay attention to slow updating threads, as they will block reads.

  • Average Latch Wait Time   A latch is primarily used to synchronize database pages. Each latch is associated with a single allocation unit. A latch wait occurs when a latch request cannot be granted immediately because the latch is held by another thread in a conflicting mode. Unlike locks, a latch is released immediately after the operation, even in write operations. High latch wait time might mean that it is taking too long to load a specific page into memory.

When Lock Wait Time is high, examine the SQL Blocking report to identify the queries holding onto the locks.

You can examine the SQL Deadlocks report to identify queries that might have generated failed requests.

SQL Server Disk IO

A common SQL Server performance issue is an I/O bottleneck. When SQL Server does not have sufficient I/O bandwidth to process incoming queries, performance across all requests will decrease, and performance across all farm Web servers will be decreased.

  • Average Disk Queue Length   This metric is for overall Disk I/O. Higher values translate to increased overall I/O pressure, and if you have more than 10, it is possible there is an I/O bottleneck.

  • Average Logical Reads / s   This metric is for the Read Disk I/O. Higher values translate to increased Read I/O pressure.

  • Average Logical Writes / s   This metric is for the Write Disk I/O. Higher values translate to increased Write I/O pressure.

When there is an I/O bottleneck, examine the SQL Read-Intensive Traces report to see what specific queries are consuming the most resources.

SQL Server CPU

When SQL Server computer processor usage is excessively high, SQL queries are queued, and Web server performance is decreased. Processor and I/O performance is related. Therefore, when SQL Server processor usage is high, I/O is usually high also. An average processor usage of 80 percent is considered a bottleneck.

When there is a CPU bottleneck, examine the SQL Read-Intensive Traces report, and then click the CPU column to sort by the most expensive queries.

Worker Process Recycles

Recycles do not usually affect availability. Internet Information Services (IIS) 7.0 will create a new process, which allows existing requests to complete, and then shuts down the recycled process cleanly. However, the first browse to a new process can be delayed while the process is initialized.

By default, SharePoint Server schedules worker process recycle jobs to take place overnight. Frequent recycles during working hours can increase the latency of end-user requests. Check to see whether Web.config settings might have been changed, or if the recycle settings have been modified in IIS.

Failed User Requests

These user requests failed, or they were so slow that users might have assumed they failed.

Select a failed request to fetch its trace logs. Look for traces that mention a failure in some component of the system. If the cause is not apparent, look at the Windows Events report for signs of a system failure on the server or in IIS.

If a request failed because it was too slow, look for a gap in the log, which might be highlighted. If the lines before the gap indicate that the delay occurred in SQL Server, this request was most likely a lock victim. Look at the SQL Blocking report to find the blocking query that is the root cause of the issue.

Some requests, such as downloads of large files, can be expected to be slow.

Crashes

This report displays all of the IIS worker process crashes that occurred in the specified time range. After a row is selected in the top report, the last few seconds of traces from the crashing process are displayed in the bottom panel. These traces might indicate why the crash occurred.

Crashes can significantly affect availability. The availability report might underestimate the effect of crashes because requests that are being executed at the time of a crash are not recorded. Even when a crash does not noticeably affect availability, it could lead to data loss or other problems and should be investigated.

Usage report group

The Usage report group contains several reports that display information about farm usage trends and issues.

Requests Per URL

This report displays the most frequently requested URLs. You can use this report to identify pages that are often accessed and therefore might be high-priority candidates for optimization.

Requests Per User

This report displays the percentage of requests made by the most common user accounts. Some system accounts, such as the search crawler service account, might be expected to generate many requests. At certain times, individual users might also perform operations that create an unexpected peak in resource usage.

Application Workload

This report displays the time that is spent serving requests from various client applications in a given time range. The report provides an estimate of which resources are being consumed by client requests. The report might indicate the following considerations:

  • High total durations indicate a need for additional memory on the Web servers.

  • High SQL Server process durations imply high SQL I/O or processor usage, or that requests from client applications might be blocked by other queries.

  • High Web server durations might indicate high processor usage on the farm Web servers.

Requests Per Site

This report displays the percentage of requests made to each site in the farm.

Known issues

This section lists known issues in SPDiag 3.0, and their workarounds when available.

SPDiag requires the remotesigned execution policy to be enabled in PowerShell

If the remotesigned execution policy is not enabled in PowerShell on the farm server to which SPDiag is configured to connect, SPDiag will generate an error message when you try to enter the server name in the New Project window in SPDiag. You might see an error message that resembles the following:

Problem Event Name: CLR20r3

Problem Signature 01: spdiag.exe

Problem Signature 09: System.ArgumentOutOfRange

To correct this issue, run the following command from a Windows PowerShell command prompt on the farm server to which you want to connect:

Set-ExecutionPolicy RemoteSigned

Server connection fails when a SQL Server alias is used

When you attempt to connect SPDiag to a farm from a remote client computer and the SharePoint 2010 Products farm is configured to use a SQL Server alias to connect to the database server, SPDiag will generate the following error:

The usage database was not found or was inaccessible.  Please make sure the TTFARM file has the right information and you have access to the server.

To correct this issue, install the SQL Server client tools on the computer where you installed SPDiag, and then configure a SQL Server alias that matches the alias used by the SharePoint 2010 Products farm.

Some SPDiag diagnostic timer jobs require sysadmin or sqladmin privileges

Some SPDiag 3.0 diagnostics jobs require that the farm account has the sysadmin or sqladmin role assigned on the SQL Server instance where the SharePoint 2010 Products databases are located. If the farm account does not have these roles assigned, it will have insufficient privileges to run diagnostic jobs that are required for certain reports to gather data.

SPDiag reports do not work when the OS locale is not EN-US (1033)

When the operating system locale of the computer where you have installed SPDiag is anything other than EN-US (1033), SPDiag reports do not work because you cannot set the date range for the report. Currently, the only workaround is to change the locale of the client computer to EN-US.

If your SharePoint 2010 Products farm servers use a locale other than EN-US, we recommend that you install SPDiag on a client computer.

See Also

Concepts

SharePoint 2010 Administration Toolkit (SharePoint Server 2010)