Microsoft Windows SharePoint Services Monitoring Design and Implementation

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Published: December 1, 2003

By Microsoft Office Internet Platform and Operations Windows SharePoint Services Team

This case study describes how the Microsoft Office Internet Platform and Operations group configured the Microsoft Operations Manager 2000 SP1 to monitor Windows SharePoint Services (Beta) farm that hosts 15,000 external customer sites, and presents their experiences to help enterprise companies design their monitoring and instrumentation mechanisms. It is the fourth of four technical white papers describing this deployment.

On This Page

Introduction
Deployment Goals
Server Farm Configuration
Monitoring Categories
Fundamental System, Application, and Server Monitoring
Windows SharePoint Services-Specific Service Monitoring
HTML Viewing and HTML Transformation Server
URL and Administration Port Monitoring
Additional System Monitor Counter Monitoring
Summary
Related Links

Introduction

This white paper describes the way the Internet Platform and Operations group on the Microsoft® Office team designed and implemented the monitoring mechanism for their deployment of Microsoft Windows® SharePoint™ Services (Beta). The team used Microsoft Operations Manager (MOM) 2000 SP1 with various monitoring rules and System Monitor counters to monitor the status of servers and services in the Windows SharePoint Services server farm. This is the fourth of four papers that describe the Windows SharePoint Services hosting experience.

Deployment Goals

The objectives of the monitoring implementation described in this paper were the following:

  • To test and provide enhanced and integrated monitoring features for Windows SharePoint Services server farms

  • To provide extremely high availability of the Windows SharePoint Services server farm to 15,000 external customers and allow administrators and operations engineers to take immediate and proactive actions when a service issue or system fault occurred.

Both objectives were accomplished. The availability of the Windows SharePoint Services server farm in the last year has been more than 99 percent–an excellent record for Beta code. Application, server, and drive issues occurred, but the appropriate groups received MOM notification e-mail messages in time to respond before service was interrupted.

Microsoft Operations Manager (MOM) and Hewlett-Packard Compaq Insight Manager (CIM) were chosen as the monitoring tools for the following reasons:

  • Hewlett-Packard hardware makes up the server farm, so CIM works well to monitor server status. MOM is fully integrated with the CIM monitoring tool.

  • Many required monitoring features identified in this paper are supported as provided features by MOM. Minimal development and customization efforts were needed for the Window SharePoint Services monitoring and instrumentation.

  • MOM can automatically notify corresponding groups when service issues or system faults occur on the Windows SharePoint Services server farm.

  • MOM provides default performance monitoring reports that help identify the traffic patterns and system status.

  • Windows SharePoint Services plans to ship a MOM management pack in the near future. Visit the MOM Management Pack site at https://go.microsoft.com/fwlink/?LinkId=20493&clcid=0x409 for updates.

  • Microsoft is committed to MOM as the long-term monitoring solution for enterprise companies.

The configuration and best practices outlined in this paper may be of use to anyone deploying Windows SharePoint Services. For more detailed descriptions and configuration steps for Microsoft Operations Manager, see the documentation for Microsoft Operations Manager 2000, available in several formats from the MOM Web site at https://go.microsoft.com/fwlink/?LinkId=20494&clcid=0x409.

Server Farm Configuration

Figure 1: Server Farm Configuration

Figure 1: Server Farm Configuration

  1. Public DNS servers

  2. Internet

  3. Router (Cisco Systems)

  4. Load balancer (F5 Networks BIG-IP)

  5. Load balancer (F5 Networks BIG-IP)

  6. Front-end Web server farm (six servers)

  7. SMTP and DNS server

  8. Terminal services, debugging, and administration server

  9. SQL Server server 1

  10. SQL Server server 2

  11. SQL Server server 3

  12. SQL Server server 4

  13. SAN unit (Hewlett Packard)

  14. Active Directory domain controller 1

  15. Active Directory domain controller 2

  16. MOM server

  17. Backup server (Veritas software)

  18. Backup tape device

  19. HTML transformation server

  20. Imaging and installation server (Altiris deployment server)

  21. Router (Cisco Systems)

  22. Edge network

Figure 1 shows the diagram of the server farm and network set up by the Internet Platform and Operations group. The following sections will discuss the implementation of monitoring this Windows SharePoint Services server farm in detail.

Monitoring Categories

All implemented monitoring rules and counters were prioritized and classified into the following five categories:

  • Fundamental System, Application, and Server Monitoring Monitoring rules that are critical and must be present on the system or server level.

  • Windows SharePoint Services-Specific Service Monitoring Events that track features that are specific to Windows SharePoint Services and its components, including Web Parts.

  • Windows SharePoint Services HTML Transformation and HTML Transformation Server Services Events that monitor the HTML transformation server, an optional component for a Windows SharePoint Services server farm.

  • Windows SharePoint Services URL Monitoring These URL tests help identify whether Windows SharePoint Services sites and administrative features work properly on each front-end Web server.

  • Additional System Monitor Counter Monitoring These System Monitor counters can help system administrators understand the system load and service usage information. The collected data can be further referred to pursue continuous capacity planning.

Fundamental System, Application, and Server Monitoring

The rules and features in this category are critical and must be tracked to ensure that hardware and application errors are caught proactively and handled in a timely matter to avoid service downtime.

Hardware Monitoring

The Internet Platform and Operations group's deployment of Windows SharePoint Services used Hewlett-Packard CIM software to monitor server status. The CIM management pack was installed to the MOM consolidator, and MOM was used as the central monitoring and notification mechanism. When critical alerts occurred on any of the servers or the storage area network (SAN), the CIM software sent e-mail messages through the MOM server to the team managing Windows SharePoint Services and the team managing the physical servers in the lab.

Fundamental Application Monitoring

The MOM rules in this category monitored fundamental Microsoft® Windows Server™ 2003 events and System Monitor counters. Notifications were sent through e-mail to one or more of the following teams:

  • Windows SharePoint Services Team members in the Internet Platform and Operations group in charge of managing Windows SharePoint Services setting.

  • Lab Team members in charge of the physical servers in the lab.

  • Active Directory Team members managing Microsoft Active Directory® directory services for the deployment.

  • SQL Server Team members managing the settings for Microsoft SQL Server™ and the servers running SQL Server.

Application monitoring can be divided into two pieces, one that uses the MOM rules and one that uses System Monitor counters. Table 1 lists the MOM rules, including which servers the rule tracked, which events it tracked, and which group was notified of alerts.

Table 1 MOM rules

Server Type

Events

Notification Group

Front-end Internet Information Services (IIS) servers

IIS stops and starts (times)

Windows SharePoint Services

Front-end IIS servers

NetLogon stops and starts (times)

Windows SharePoint Services

Front-end IIS servers

Windows SharePoint Services stops and starts (times)

Windows SharePoint Services

Active Directory servers

Standard rules from the Active Directory directory service management pack module

Active Directory

Servers running SQL Server

Standard rules from the SQL Server 2000 management pack module

SQL Server

All servers

Server login successes and failures

Windows SharePoint Services and Lab

All servers

Hewlett-Packard CIM monitoring

Windows SharePoint Services and Lab

Hewlett-Packard SAN HSG80 Data Repository

SAN error notifications

Windows SharePoint Services and Lab

Table 2 lists the System Monitor counters, the values for which notifications were sent, and which group received the notification.

Table 2 System Monitor counters

System Monitor Counter

Threshold

Notification Group

Memory: % Committed Bytes in Use

Greater than 80 percent

Windows SharePoint Services

Memory: Available Mbytes

Less than 50 MB

Windows SharePoint Services

Web Service: Connection Attempts/sec

Greater than 500 attempts per second

Windows SharePoint Services

Processor: % Processor Time: _Total (CPU Utilization)

Greater than 80 percent

Windows SharePoint Services

Current Connections–Warning

1000 connections

Windows SharePoint Services

Current Connections–Error

2000 connections

Windows SharePoint Services

Disk Usage

Less than 10 percent

Windows SharePoint Services

System: Processor Queue Length

Greater than 10 threads

Windows SharePoint Services

Memory Pages/sec

Greater than 220 pages per second

Windows SharePoint Services

Windows SharePoint Services-Specific Service Monitoring

The monitoring rules or features in this category helped the operations team understand the status of services related to Windows SharePoint Services and helped them troubleshoot issues. Notification e-mail messages for these alerts helped early escalation of potential issues. This section discusses four groups of events and System Monitor counters for front-end Web servers.

Windows SharePoint Services and SQL Server

The following alerts have to do with issues with Windows SharePoint Services and SQL Server. Notification was sent to the Windows SharePoint Services and SQL Server notification groups.

  • Cannot connect to database:

    Event Type: Error

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Description contains substring '#50070'

    Example: #50070: Unable to connect to the database STS_Config on Server_Name. Check the database connection information and make sure that the database server is running.

    This event requires immediate action. When the SQL Server databases for Windows SharePoint Services cannot be reached, Windows SharePoint Services on the front-end Web servers is interrupted.

  • Database Capacity Reached:

    Event Type: Error

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Description contains substring '#50068'

    Example: #50068: The content databases in this cluster are full. You cannot add more Web sites until you change the content database Web site capacity settings or add more content databases.

    When this alert is received, system administrators should increase database capacity or add more content databases.

  • Database capacity warning reached

    Event Type: Warning

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Description contains substring '#50069'

    Example: #50069: The content databases in this cluster have exceeded the warning Web site count. Either change the content database Web site capacity settings or add more content databases.

    When this alert is received, system administrators should increase database capacity or add more content databases.

Windows SharePoint Services Components

The following events were sent to the Windows SharePoint Services notification group when Web Parts, the SharePoint Timer Service, or other Windows SharePoint Services components produced errors. When these events occurred, Windows SharePoint Services was still available but certain components on one server might not function normally.

  • DDS Web Part Rendering Failure

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Type: Error

    Source Name: Windows SharePoint Services 2.0

    Description contains substring 'VerifySafeControls failed for guid'

    Generate Alert: Critical Error

    Notification Group: Windows SharePoint Services

  • DDS Web Part Unsafe control detected rule 2

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Number: 1000

    Event Type: Error

    Source Name: Windows SharePoint Services 2.0

    Description contains substring 'Unsafecontrol exception (GetTypeFromGuid)'

    Generate Alert: Critical Error

    Notification Group: Windows SharePoint Services

  • OWSTimer and STSWel error

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Number: 1000

    Event Type: Error

    Source Name: Windows SharePoint Services 2.0

    Description contains substring 'eowstimer.exe'

    Generate Alert: Warning

    Notification Group: Windows SharePoint Services

  • W3WP WSS error

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Number: 1000

    Event Type: Error

    Source Name: Windows SharePoint Services 2.0

    Description contains substring 'ew3wp.exe'

    Generate Alert: Warning

    Notification Group: Windows SharePoint Services

Windows SharePoint Services Virus Scanner

If McAfee PortalShield or other virus scanner was installed to the front-end Web servers, the following events might be logged.

  • Virus checking, loading virus scanner:

    Event Type: Information

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Example: #96000f: Loading antivirus scanner...

  • Virus checking, cannot load virus scanner:

    Event Type: Information

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Example: #960010: Finished loading antivirus scanner. No scanner installed.

Windows SharePoint Services Active Directory

The following three events are related to Active Directory directory services account creation, deletion, and updates. Immediate action should be taken when an error is received.

  • Cannot add user to Active Directory

    Event Type: Information

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Example: #1966150: Adding user <username> to OU <active directory OU> in domain <domain name> FAILED with HRESULT <error code from AD handler>

  • Cannot delete user from Active Directory

    Event Type: Information

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Example: #1966151: Deleting user %user% from OU %OU% in domain %DOMAIN% FAILED with HRESULT %HR%

  • Cannot update user in Active Directory

    Event Type: Information

    Event Source: Windows SharePoint Services 2.0

    Event Category: None

    Event ID: 1000

    Example: #1966152: Updating user %user% from OU %OU% in domain %DOMAIN% FAILED with HRESULT %HR%

HTML Viewing and HTML Transformation Server

An HTML transformation server is an optional component for a Windows SharePoint Services server farm. An HTML transformation server runs an HTML viewer service, which allows a user to see documents in HTML format, even if the program that created the documents is not installed on the user's computer. If an HTML transformation server is configured, the events in this category should be configured and monitored. The HTML viewer service for Microsoft Office documents is the Microsoft® Office 2003 HTML Viewer service.

When HTML viewer services are started or stopped or when the Office HTML Viewer service uses more than 90 percent of system resources, an event will be sent to an HTML Transformation Service Operators notification group.

  • HTML Launcher Started

    Provider Name: Application

    Provider Type: Windows NT Event Log

    From Source: Microsoft.Office.HtmlTrans.Launcher

    Description contains substring 'start'

    Generate Alert: Information

    Notification Group: HTML Transformation Service Operators

  • HTML Load Balancer Stopped

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Number: 0

    Source Name: Microsoft.Office.HtmlTrans.LoadBalancer

    Description contains substring 'stop'

    Generate Alert: Critical Error

    Notification Group: HTML Transformation Service Operators

  • HTML Launcher1 Stopped

    Provider Name: Application

    Provider Type: Windows NT Event Log

    From Source: Microsoft.Office.HtmlTrans.Launcher

    Event Number: 0

    Description contains substring 'stop'

    Generate Alert: Critical Error

    Notification Group: HTML Transformation Service Operators

  • HTML Load Balancer 1 Started

    Provider Name: Application

    Provider Type: Windows NT Event Log

    Event Number: 0

    Source Name: Microsoft.Office.HtmlTrans.LoadBalancer

    Description contains substring 'start'

    Generate Alert: Information

    Notification Group: HTML Transformation Service Operators

  • HTML Transformation Server CPU Usage >90%

    Provider Name: Processor**–% Processor Time–**_Total-3.0-minutes

    Provider Type: Windows NT Performance Counter

    Threshold is greater than 90

    Generate Alert: Critical Error

    Notification Group: HTML Transformation Service Operators

URL and Administration Port Monitoring

MOM scripts provide the capabilities of monitoring URLs and administration ports for Windows SharePoint Services sites. A generic URL monitoring notification rule was implemented to report errors on various URL requests on each front-end Web server.

  • Notification Rules–Script-generated data

    Criteria**–**With Event ID: 2002

    Generate Alert: Critical Error

    Notification Group: Windows SharePoint Services Service Operators

As the Internet Platform and Operations group implemented this rule, a ping request was sent to each front-end Web server every two minutes. The script issued a ping request to each server up to three times during each test. If all three requests failed, a critical error was generated and reported.

Note: The frequency and number of trials should be adjusted based on your traffic analysis and expectation of server availability. The MOM account on the consolidator server should have access to the URL being tested or anonymous users should have access to the site.

  • Ping FE1 https:// site_URL (Change the site URL as appropriate)

    Data Provider: Scheduled every 2 min.

    Provider Type: Time Event

    Responses: Script Name: HTTP Ping–Centrally on the Consolidator computer.

    AttemptedInterval: 1

    Attempts:3

    LogSuccessEvent: False

    URL: https://site_URL (Change the site URL as appropriate)

  • Ping FE1 WindowsSharePointServicesAdminPort (Assume 8080 is the admin port.)

    Data Provider: Scheduled every 2 min.

    Provider Type: Time Event

    Responses: Script Name: HTTP Ping–Centrally on the Consolidator computer.

    AttemptedInterval: 1

    Attempts:3

    LogSuccessEvent: False

    URL: https://Server:8080/

Repeat the last two rules for each front-end Web server to make sure that all front-end Web servers respond properly for these fundamental requests.

Additional System Monitor Counter Monitoring

Monitoring these System Monitor counters can help system administrators understand the system load and service usage information. The collected data can be further referred to pursue continuous capacity planning. Because Windows SharePoint Services has its own ISAPI filter and uses the Microsoft .NET Framework, it is also worthwhile to monitor the following front-end and back-end System Monitor counters and events.

Additional Monitoring Rules on Front-End Servers

  • Process(w3wp)\% Processor Time

    Process(w3wp)\Private Bytes

  • Process(w3wp)\Working Set

    Process(w3wp)\Handle Count

  • .NET CLR Memory\# Bytes in All Heaps

  • .NET CLR Memory\Large Object Heap Size

  • .NET CLR Memory\% Time in GC

  • ASP.NET\Worker Process Restarts

Additional Monitoring Rules on Back-End Servers

  • Process(sqlservr)\% Processor Time

  • Process(sqlservr)\Working Set

  • SQLServer:General Statistics\User Connections

  • SQLServer:Locks\Number of Deadlocks/sec

  • SQLServer:Locks\Lock Waits/sec

  • SQLServer:Locks\Lock Wait Time (ms)

  • SQLServer:SQL Statistics\Batch Requests/sec

Summary

Administrators deploying Windows SharePoint Services to host customer sites can build on the experience that the Microsoft Internet Platform and Operations group had when they deployed Windows SharePoint Services (Beta) and configured Microsoft Operations Manager for a similar use. From choosing servers, to monitoring, to setting up customer sites, administrators can be confident that someone has been through this before. For more information about the entire environment of the Windows SharePoint Services (Beta) hosting deployment, see the other white papers in this series.

See the following resources for further information:

For the latest information about Windows Server 2003, see the Windows Server 2003 Web site at https://www.microsoft.com/windowsserver2003/default.mspx.

This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.

Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred.

© 2003 Microsoft Corporation. All rights reserved.

Microsoft, Windows, Windows Server, Active Directory, and SharePoint are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.