SharePoint 2010: Monitoring SharePoint

Don’t neglect your efforts to monitor SharePoint event and data logging. It can have a significant impact on performance.

Steve Wright and Corey Erkes

Adapted from “Pro SharePoint 2010 Governance” (Apress)

Monitoring is one of the most frequently overlooked aspects of running a SharePoint farm. Monitoring lets you answer questions like, “How well is it running?” and, “Will it still be running tomorrow?”

An important part of managing any IT system is collecting diagnostic information you can use to solve problems or understand how the system is behaving. Like any Windows application, SharePoint writes important events to the Windows event logs. These messages include information on processes that have started or stopped, errors that have occurred, and any other events that might correlate with non-SharePoint events.

SharePoint trace files

SharePoint records trace information with a system called the Unified Logging System (ULS). The ULS is a collection of files with data recorded by SharePoint and its service applications. You can also use these logs for custom-built components to record operational and error information in a way that automatically correlates with other events occurring within the farm.

SharePoint automatically creates a new ULS log file every 30 minutes to limit the size of each file. These files can still become quite large, however. They’re stored in the LOGS directory under the 14-hive by default. Using a default installation path, the folder is at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS.

One of the first configurations made in a new production server farm is to move these files to another hard drive on each server within the farm. The C-drive is critical to running the OS. Despite being compressed, the ULS files can quickly fill up the drive and crash the system.

The ULS can and should be configured to prevent unnecessary data from filling up disk space. You can access these settings using Central Administration (CA) under Monitoring | Configure Diagnostic Logging.

You can set how many days of log files should be kept on each server. The default is 14 days. ULS files older than that number of days will be automatically removed from the system. If you need to log large amounts of data or maintain an indefinite history of log files, it may be preferable to back up and remove these files every 30 minutes when each file is closed and the next file is created.

You can also configure a maximum amount of disk space. When this limit is reached, the oldest log files are automatically removed to free up space. ULS logs are written as ordinary text files, so you can read them using a text editor such as Notepad. However, they can be difficult to read directly because they aren’t formatted conveniently and can be very large.

To simplify working with these files, Microsoft provides a ULSViewer application you can download. Microsoft doesn’t support the ULSViewer, but it should serve the needs of small to midsize SharePoint farms. Organizations with very large SharePoint installations may wish to invest in Microsoft System Center or third-party system-management tools.

Event throttling

SharePoint is a large and complex software platform. Therefore, it can produce a vast quantity of trace data. To limit the impact of logging this information, you can configure SharePoint to restrict event logging based on the event category, event severity, and whether the event will be logged in the Window event logs or a ULS trace file.

Event category describes where the event came from and to what it pertains. For example, an event might be logged by the Excel Services Application and pertain to accessing external data. You can configure each category separately or together with other types of events.

Event severity refers to its likely impact on the rest of the system. Events destined for the Windows event logs are assigned increasing levels of severity, including Verbose, Information, Warning, Error or Critical. The ULS logs use Verbose, Medium, High, Monitorable and Unexpected as severity levels.

When configuring event logging, designate one of these levels as the minimum level to be recorded. For example, if you log an event at the Information level, all events will be logged except for those at the Verbose level. A separate severity level is configured for each event category and event record destination. This lets you limit the amount of trace information generated while capturing the most important information.

By default, all events with a severity level of Information or higher are logged to the Windows event logs. Events at or above the Medium level are recorded in the ULS trace files. These settings produce significant trace logging, but minimal event log traffic. This is appropriate for most farms.

Event log flood protection

SharePoint 2010 can prevent event floods from overwhelming your log files. An event flood occurs when a component detects a problem, reports it and continues to experience the same problem. This can quickly fill up the server event logs. It can be almost comical when you lose the original cause of an error because the event log was overwritten by errors resulting from a side effect of the actual problem.

To prevent this situation, SharePoint 2010 monitors the frequency with which each event is being recorded. If it sees the same message recorded more than five times in two minutes, it will record the fact in the log and cease recording each occurrence of that event. It will then write a summary event every two minutes with suppressed event counts until the flood subsides. Then it will return to logging each event.

Event log flooding applies only to the Windows event logs, and not the ULS trace log files. This feature is turned on by default. You can turn it off on the same page where you configure event throttling. You can also set the threshold count and quiet period for event flood detection using Windows PowerShell, but not CA.

Correlation IDs

Because the various SharePoint components can generate such a vast amount of event and trace data, it can be difficult to tell which events are related to one another. Logs are stored sequentially as items are written to them. Requests being processed simultaneously may generate events that are mixed in the log sequence. SharePoint addresses this problem using correlation IDs.

A correlation ID is a GUID assigned to each requested SharePoint process. An event recorded by SharePoint as a result of a request will be associated with that request correlation ID. Correlation IDs are also included in some error messages, event log entries, and other interfaces such as the Developer Dashboard. The Developer Dashboard is a diagnostic panel you can turn on to debug problems on a SharePoint page.

SharePoint logging database

SharePoint 2010 introduced a new form of proactive logging called the SharePoint logging database. This database collects a variety of data from all of the servers in the farm. This gives you a single source for this information without needing to explicitly enable logging or combine log files.

The logging database is stored on the back-end SQL Server in a database called WSS_Logging. There are numerous tables in this database, and they’re difficult to query directly. Fortunately, Microsoft has provided a series of views to simplify retrieving information from these tables.

Much of the data that goes into this database is collected by a set of timer jobs. To prevent runaway data collection in a new farm, these jobs are disabled by default. To collect the information provided by these Diagnostic Data Providers, simply enable the timer jobs within CA:

  • Diagnostic Data Provider: Event Log
  • Diagnostic Data Provider: Performance Counters - Database Servers
  • Diagnostic Data Provider: Performance Counters - Web Front Ends
  • Diagnostic Data Provider: SQL Blocking Queries
  • Diagnostic Data Provider: SQL DMV
  • Diagnostic Data Provider: SQL Memory DMV
  • Diagnostic Data Provider: Trace Log

There are several categories of information you can report from the logging database. Unlike ULS or Windows event logs, these views contain information from all servers in the farm. This data covers the entire contents of the farm including diagnostic, health and feature usage information:

  • ULS logs
  • Windows event logs
  • Performance counters for memory, I/O and CPU utilization
  • SQL Server Dynamic Management Views (DMVs)
  • Usage information for various features
  • Search service crawling and querying
  • Timer jobs

Don’t assume the only data available in this database is reflected in the currently present views. When you configure a new type of information for collection, new tables and views will appear in WSS_Logging to hold that information. These database objects are created on demand as needed.

It’s important to remember the logging database is populated in addition to the ULS and Windows event logs, not in place of them. Turning on large amounts of logging in either mechanism can generate unmanageable amounts of log data. Consider the tools you’ll use for certain purposes and configure them accordingly. Be sure to plan for the storage space required for log and database files when you’re fully using them. Running out of space for these logs can result in the loss of critical information at the worst-possible time.

The information in these tables is useful for both diagnosing problems and planning future upgrades and features. This database collects data over time that you can use for trending performance, usage and search performance.

Steve Wright

Steve Wright is a senior manager in Business Intelligence Management (BIM) for Sogeti USA LLC in Omaha, Neb. Over the last 20-plus years, Wright has worked on air traffic control, financial, insurance and a multitude of other types of systems. He has authored and performed technical reviews for many previous titles covering Microsoft products including Windows, SharePoint, SQL Server and BizTalk.

Corey Erkes

Corey Erkes is a manager consultant for Sogeti USA LLC in Omaha, Neb. Erkes has worked with a wide range of companies at different points in the lifecycles oftheir SharePoint implementations. He’s also one of the founding members of the Omaha SharePoint Users Group.**

©2012 Apress Inc. All rights reserved. Printed with permission from Apress. Copyright 2012.Pro SharePoint 2012 Governanceby Steve Wright and Corey Erkes. For more information on this title and other similar books, please visit apress.com.