Microsoft Windows SharePoint Services Monitoring Design and Implementation
Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
Published: December 1, 2003
By Microsoft Office Internet Platform and Operations Windows SharePoint Services Team
This case study describes how the Microsoft Office Internet Platform and Operations group configured the Microsoft Operations Manager 2000 SP1 to monitor Windows SharePoint Services (Beta) farm that hosts 15,000 external customer sites, and presents their experiences to help enterprise companies design their monitoring and instrumentation mechanisms. It is the fourth of four technical white papers describing this deployment.
On This Page
Introduction
Deployment Goals
Server Farm Configuration
Monitoring Categories
Fundamental System, Application, and Server Monitoring
Windows SharePoint Services-Specific Service Monitoring
HTML Viewing and HTML Transformation Server
URL and Administration Port Monitoring
Additional System Monitor Counter Monitoring
Summary
Related Links
Introduction
This white paper describes the way the Internet Platform and Operations group on the Microsoft® Office team designed and implemented the monitoring mechanism for their deployment of Microsoft Windows® SharePoint™ Services (Beta). The team used Microsoft Operations Manager (MOM) 2000 SP1 with various monitoring rules and System Monitor counters to monitor the status of servers and services in the Windows SharePoint Services server farm. This is the fourth of four papers that describe the Windows SharePoint Services hosting experience.
Deployment Goals
The objectives of the monitoring implementation described in this paper were the following:
To test and provide enhanced and integrated monitoring features for Windows SharePoint Services server farms
To provide extremely high availability of the Windows SharePoint Services server farm to 15,000 external customers and allow administrators and operations engineers to take immediate and proactive actions when a service issue or system fault occurred.
Both objectives were accomplished. The availability of the Windows SharePoint Services server farm in the last year has been more than 99 percent–an excellent record for Beta code. Application, server, and drive issues occurred, but the appropriate groups received MOM notification e-mail messages in time to respond before service was interrupted.
Microsoft Operations Manager (MOM) and Hewlett-Packard Compaq Insight Manager (CIM) were chosen as the monitoring tools for the following reasons:
Hewlett-Packard hardware makes up the server farm, so CIM works well to monitor server status. MOM is fully integrated with the CIM monitoring tool.
Many required monitoring features identified in this paper are supported as provided features by MOM. Minimal development and customization efforts were needed for the Window SharePoint Services monitoring and instrumentation.
MOM can automatically notify corresponding groups when service issues or system faults occur on the Windows SharePoint Services server farm.
MOM provides default performance monitoring reports that help identify the traffic patterns and system status.
Windows SharePoint Services plans to ship a MOM management pack in the near future. Visit the MOM Management Pack site at https://go.microsoft.com/fwlink/?LinkId=20493&clcid=0x409 for updates.
Microsoft is committed to MOM as the long-term monitoring solution for enterprise companies.
The configuration and best practices outlined in this paper may be of use to anyone deploying Windows SharePoint Services. For more detailed descriptions and configuration steps for Microsoft Operations Manager, see the documentation for Microsoft Operations Manager 2000, available in several formats from the MOM Web site at https://go.microsoft.com/fwlink/?LinkId=20494&clcid=0x409.
Server Farm Configuration
Figure 1: Server Farm Configuration
Public DNS servers
Internet
Router (Cisco Systems)
Load balancer (F5 Networks BIG-IP)
Load balancer (F5 Networks BIG-IP)
Front-end Web server farm (six servers)
SMTP and DNS server
Terminal services, debugging, and administration server
SQL Server server 1
SQL Server server 2
SQL Server server 3
SQL Server server 4
SAN unit (Hewlett Packard)
Active Directory domain controller 1
Active Directory domain controller 2
MOM server
Backup server (Veritas software)
Backup tape device
HTML transformation server
Imaging and installation server (Altiris deployment server)
Router (Cisco Systems)
Edge network
Figure 1 shows the diagram of the server farm and network set up by the Internet Platform and Operations group. The following sections will discuss the implementation of monitoring this Windows SharePoint Services server farm in detail.
Monitoring Categories
All implemented monitoring rules and counters were prioritized and classified into the following five categories:
Fundamental System, Application, and Server Monitoring Monitoring rules that are critical and must be present on the system or server level.
Windows SharePoint Services-Specific Service Monitoring Events that track features that are specific to Windows SharePoint Services and its components, including Web Parts.
Windows SharePoint Services HTML Transformation and HTML Transformation Server Services Events that monitor the HTML transformation server, an optional component for a Windows SharePoint Services server farm.
Windows SharePoint Services URL Monitoring These URL tests help identify whether Windows SharePoint Services sites and administrative features work properly on each front-end Web server.
Additional System Monitor Counter Monitoring These System Monitor counters can help system administrators understand the system load and service usage information. The collected data can be further referred to pursue continuous capacity planning.
Fundamental System, Application, and Server Monitoring
The rules and features in this category are critical and must be tracked to ensure that hardware and application errors are caught proactively and handled in a timely matter to avoid service downtime.
Hardware Monitoring
The Internet Platform and Operations group's deployment of Windows SharePoint Services used Hewlett-Packard CIM software to monitor server status. The CIM management pack was installed to the MOM consolidator, and MOM was used as the central monitoring and notification mechanism. When critical alerts occurred on any of the servers or the storage area network (SAN), the CIM software sent e-mail messages through the MOM server to the team managing Windows SharePoint Services and the team managing the physical servers in the lab.
Fundamental Application Monitoring
The MOM rules in this category monitored fundamental Microsoft® Windows Server™ 2003 events and System Monitor counters. Notifications were sent through e-mail to one or more of the following teams:
Windows SharePoint Services Team members in the Internet Platform and Operations group in charge of managing Windows SharePoint Services setting.
Lab Team members in charge of the physical servers in the lab.
Active Directory Team members managing Microsoft Active Directory® directory services for the deployment.
SQL Server Team members managing the settings for Microsoft SQL Server™ and the servers running SQL Server.
Application monitoring can be divided into two pieces, one that uses the MOM rules and one that uses System Monitor counters. Table 1 lists the MOM rules, including which servers the rule tracked, which events it tracked, and which group was notified of alerts.
Table 1 MOM rules
Server Type |
Events |
Notification Group |
---|---|---|
Front-end Internet Information Services (IIS) servers |
IIS stops and starts (times) |
Windows SharePoint Services |
Front-end IIS servers |
NetLogon stops and starts (times) |
Windows SharePoint Services |
Front-end IIS servers |
Windows SharePoint Services stops and starts (times) |
Windows SharePoint Services |
Active Directory servers |
Standard rules from the Active Directory directory service management pack module |
Active Directory |
Servers running SQL Server |
Standard rules from the SQL Server 2000 management pack module |
SQL Server |
All servers |
Server login successes and failures |
Windows SharePoint Services and Lab |
All servers |
Hewlett-Packard CIM monitoring |
Windows SharePoint Services and Lab |
Hewlett-Packard SAN HSG80 Data Repository |
SAN error notifications |
Windows SharePoint Services and Lab |
Table 2 lists the System Monitor counters, the values for which notifications were sent, and which group received the notification.
Table 2 System Monitor counters
System Monitor Counter |
Threshold |
Notification Group |
---|---|---|
Memory: % Committed Bytes in Use |
Greater than 80 percent |
Windows SharePoint Services |
Memory: Available Mbytes |
Less than 50 MB |
Windows SharePoint Services |
Web Service: Connection Attempts/sec |
Greater than 500 attempts per second |
Windows SharePoint Services |
Processor: % Processor Time: _Total (CPU Utilization) |
Greater than 80 percent |
Windows SharePoint Services |
Current Connections–Warning |
1000 connections |
Windows SharePoint Services |
Current Connections–Error |
2000 connections |
Windows SharePoint Services |
Disk Usage |
Less than 10 percent |
Windows SharePoint Services |
System: Processor Queue Length |
Greater than 10 threads |
Windows SharePoint Services |
Memory Pages/sec |
Greater than 220 pages per second |
Windows SharePoint Services |
Windows SharePoint Services-Specific Service Monitoring
The monitoring rules or features in this category helped the operations team understand the status of services related to Windows SharePoint Services and helped them troubleshoot issues. Notification e-mail messages for these alerts helped early escalation of potential issues. This section discusses four groups of events and System Monitor counters for front-end Web servers.
Windows SharePoint Services and SQL Server
The following alerts have to do with issues with Windows SharePoint Services and SQL Server. Notification was sent to the Windows SharePoint Services and SQL Server notification groups.
Cannot connect to database:
Event Type: Error
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Description contains substring '#50070'
Example: #50070: Unable to connect to the database STS_Config on Server_Name. Check the database connection information and make sure that the database server is running.
This event requires immediate action. When the SQL Server databases for Windows SharePoint Services cannot be reached, Windows SharePoint Services on the front-end Web servers is interrupted.
Database Capacity Reached:
Event Type: Error
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Description contains substring '#50068'
Example: #50068: The content databases in this cluster are full. You cannot add more Web sites until you change the content database Web site capacity settings or add more content databases.
When this alert is received, system administrators should increase database capacity or add more content databases.
Database capacity warning reached
Event Type: Warning
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Description contains substring '#50069'
Example: #50069: The content databases in this cluster have exceeded the warning Web site count. Either change the content database Web site capacity settings or add more content databases.
When this alert is received, system administrators should increase database capacity or add more content databases.
Windows SharePoint Services Components
The following events were sent to the Windows SharePoint Services notification group when Web Parts, the SharePoint Timer Service, or other Windows SharePoint Services components produced errors. When these events occurred, Windows SharePoint Services was still available but certain components on one server might not function normally.
DDS Web Part Rendering Failure
Provider Name: Application
Provider Type: Windows NT Event Log
Event Type: Error
Source Name: Windows SharePoint Services 2.0
Description contains substring 'VerifySafeControls failed for guid'
Generate Alert: Critical Error
Notification Group: Windows SharePoint Services
DDS Web Part Unsafe control detected rule 2
Provider Name: Application
Provider Type: Windows NT Event Log
Event Number: 1000
Event Type: Error
Source Name: Windows SharePoint Services 2.0
Description contains substring 'Unsafecontrol exception (GetTypeFromGuid)'
Generate Alert: Critical Error
Notification Group: Windows SharePoint Services
OWSTimer and STSWel error
Provider Name: Application
Provider Type: Windows NT Event Log
Event Number: 1000
Event Type: Error
Source Name: Windows SharePoint Services 2.0
Description contains substring 'eowstimer.exe'
Generate Alert: Warning
Notification Group: Windows SharePoint Services
W3WP WSS error
Provider Name: Application
Provider Type: Windows NT Event Log
Event Number: 1000
Event Type: Error
Source Name: Windows SharePoint Services 2.0
Description contains substring 'ew3wp.exe'
Generate Alert: Warning
Notification Group: Windows SharePoint Services
Windows SharePoint Services Virus Scanner
If McAfee PortalShield or other virus scanner was installed to the front-end Web servers, the following events might be logged.
Virus checking, loading virus scanner:
Event Type: Information
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Example: #96000f: Loading antivirus scanner...
Virus checking, cannot load virus scanner:
Event Type: Information
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Example: #960010: Finished loading antivirus scanner. No scanner installed.
Windows SharePoint Services Active Directory
The following three events are related to Active Directory directory services account creation, deletion, and updates. Immediate action should be taken when an error is received.
Cannot add user to Active Directory
Event Type: Information
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Example: #1966150: Adding user <username> to OU <active directory OU> in domain <domain name> FAILED with HRESULT <error code from AD handler>
Cannot delete user from Active Directory
Event Type: Information
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Example: #1966151: Deleting user %user% from OU %OU% in domain %DOMAIN% FAILED with HRESULT %HR%
Cannot update user in Active Directory
Event Type: Information
Event Source: Windows SharePoint Services 2.0
Event Category: None
Event ID: 1000
Example: #1966152: Updating user %user% from OU %OU% in domain %DOMAIN% FAILED with HRESULT %HR%
HTML Viewing and HTML Transformation Server
An HTML transformation server is an optional component for a Windows SharePoint Services server farm. An HTML transformation server runs an HTML viewer service, which allows a user to see documents in HTML format, even if the program that created the documents is not installed on the user's computer. If an HTML transformation server is configured, the events in this category should be configured and monitored. The HTML viewer service for Microsoft Office documents is the Microsoft® Office 2003 HTML Viewer service.
When HTML viewer services are started or stopped or when the Office HTML Viewer service uses more than 90 percent of system resources, an event will be sent to an HTML Transformation Service Operators notification group.
HTML Launcher Started
Provider Name: Application
Provider Type: Windows NT Event Log
From Source: Microsoft.Office.HtmlTrans.Launcher
Description contains substring 'start'
Generate Alert: Information
Notification Group: HTML Transformation Service Operators
HTML Load Balancer Stopped
Provider Name: Application
Provider Type: Windows NT Event Log
Event Number: 0
Source Name: Microsoft.Office.HtmlTrans.LoadBalancer
Description contains substring 'stop'
Generate Alert: Critical Error
Notification Group: HTML Transformation Service Operators
HTML Launcher1 Stopped
Provider Name: Application
Provider Type: Windows NT Event Log
From Source: Microsoft.Office.HtmlTrans.Launcher
Event Number: 0
Description contains substring 'stop'
Generate Alert: Critical Error
Notification Group: HTML Transformation Service Operators
HTML Load Balancer 1 Started
Provider Name: Application
Provider Type: Windows NT Event Log
Event Number: 0
Source Name: Microsoft.Office.HtmlTrans.LoadBalancer
Description contains substring 'start'
Generate Alert: Information
Notification Group: HTML Transformation Service Operators
HTML Transformation Server CPU Usage >90%
Provider Name: Processor**–% Processor Time–**_Total-3.0-minutes
Provider Type: Windows NT Performance Counter
Threshold is greater than 90
Generate Alert: Critical Error
Notification Group: HTML Transformation Service Operators
URL and Administration Port Monitoring
MOM scripts provide the capabilities of monitoring URLs and administration ports for Windows SharePoint Services sites. A generic URL monitoring notification rule was implemented to report errors on various URL requests on each front-end Web server.
Notification Rules–Script-generated data
Criteria**–**With Event ID: 2002
Generate Alert: Critical Error
Notification Group: Windows SharePoint Services Service Operators
As the Internet Platform and Operations group implemented this rule, a ping request was sent to each front-end Web server every two minutes. The script issued a ping request to each server up to three times during each test. If all three requests failed, a critical error was generated and reported.
Note: The frequency and number of trials should be adjusted based on your traffic analysis and expectation of server availability. The MOM account on the consolidator server should have access to the URL being tested or anonymous users should have access to the site.
Ping FE1 https:// site_URL (Change the site URL as appropriate)
Data Provider: Scheduled every 2 min.
Provider Type: Time Event
Responses: Script Name: HTTP Ping–Centrally on the Consolidator computer.
AttemptedInterval: 1
Attempts:3
LogSuccessEvent: False
URL: https://site_URL (Change the site URL as appropriate)
Ping FE1 WindowsSharePointServicesAdminPort (Assume 8080 is the admin port.)
Data Provider: Scheduled every 2 min.
Provider Type: Time Event
Responses: Script Name: HTTP Ping–Centrally on the Consolidator computer.
AttemptedInterval: 1
Attempts:3
LogSuccessEvent: False
URL: https://Server:8080/
Repeat the last two rules for each front-end Web server to make sure that all front-end Web servers respond properly for these fundamental requests.
Additional System Monitor Counter Monitoring
Monitoring these System Monitor counters can help system administrators understand the system load and service usage information. The collected data can be further referred to pursue continuous capacity planning. Because Windows SharePoint Services has its own ISAPI filter and uses the Microsoft .NET Framework, it is also worthwhile to monitor the following front-end and back-end System Monitor counters and events.
Additional Monitoring Rules on Front-End Servers
Process(w3wp)\% Processor Time
Process(w3wp)\Private Bytes
Process(w3wp)\Working Set
Process(w3wp)\Handle Count
.NET CLR Memory\# Bytes in All Heaps
.NET CLR Memory\Large Object Heap Size
.NET CLR Memory\% Time in GC
ASP.NET\Worker Process Restarts
Additional Monitoring Rules on Back-End Servers
Process(sqlservr)\% Processor Time
Process(sqlservr)\Working Set
SQLServer:General Statistics\User Connections
SQLServer:Locks\Number of Deadlocks/sec
SQLServer:Locks\Lock Waits/sec
SQLServer:Locks\Lock Wait Time (ms)
SQLServer:SQL Statistics\Batch Requests/sec
Summary
Administrators deploying Windows SharePoint Services to host customer sites can build on the experience that the Microsoft Internet Platform and Operations group had when they deployed Windows SharePoint Services (Beta) and configured Microsoft Operations Manager for a similar use. From choosing servers, to monitoring, to setting up customer sites, administrators can be confident that someone has been through this before. For more information about the entire environment of the Windows SharePoint Services (Beta) hosting deployment, see the other white papers in this series.
Related Links
See the following resources for further information:
Windows SharePoint Services Hosting Configuration and Experience at https://go.microsoft.com/fwlink/?linkid=18323\&clcid=0x409
Data Storage Design, Backup, and Restore for Windows SharePoint Services at https://go.microsoft.com/fwlink/?linkid=18324\&clcid=0x409
Microsoft Network and Load Balancing Design of Windows SharePoint Services at https://go.microsoft.com/fwlink/?linkid=18325\&clcid=0x409
Microsoft Windows SharePoint Services Administrator's Guide at https://go.microsoft.com/fwlink/?linkid=18327\&clcid=0x409
Microsoft Operations Manager 2000 Documentation at https://go.microsoft.com/fwlink/?linkid=20493\&clcid=0x409
For the latest information about Windows Server 2003, see the Windows Server 2003 Web site at https://www.microsoft.com/windowsserver2003/default.mspx.
This is a preliminary document and may be changed substantially prior to final commercial release of the software described herein.
The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
The example companies, organizations, products, people and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred.
© 2003 Microsoft Corporation. All rights reserved.
Microsoft, Windows, Windows Server, Active Directory, and SharePoint are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.