MOM 2000 SP1 - Performance and Sizing
Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. |
Event and Performance Management for Windows®-based Systems
Microsoft Corporation
September 2003
Click here to download a copy of this paper and the Management Server Sizer
Abstract
This technical paper describes a process for testing Microsoft® Operations Manager 2000 (MOM) Service Pack 1 (SP1) and recommends a suitable computer system size with enough reserve capacity to monitor a specific number of managed computers. It also provides information about the expected performance of that computer system while managing these computers.
On This Page
Prefatory Note
Introduction
MOM SP1 Test Parameters
MOM SP1 Test Results
MOM SP1 Test Results I: Small Configuration - Single DDCAM (MOM Database and DCAM)
MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Test Results II: Large Configuration - Separate Database Server and Single DCAM
The MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs
The MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Management Packs
Appendix A: Test Results For Microsoft Operations Manager 2000 RTM
Appendix B: MOM SP1 Management Sizer
Appendix C: SQL Server Installation for Microsoft Operations Manager Usage
Appendix D: Counter Definitions
Prefatory Note
All tests referred to in this report were designed to determine the minimum computer hardware required for a management server to perform various Microsoft Operations Manager 2000 (MOM) tasks. The MOM test team conducted these tests in June 2003 using MOM Service Pack 1 (MOM SP1).
Note: The MOM test team originally conducted tests in June 2001 using Microsoft Operations Manager 2000 RTM version (MOM RTM). With MOM SP1, data is processed to the MOM database differently; therefore, it is not possible to make direct comparisons of the test results. For the results of the original MOM RTM tests, see Appendix A: Test Results For Microsoft Operations Manager 2000 RTM later in this paper.
The computer systems described herein might not necessarily represent the ideal configuration. The intent is to provide a starting point from which to specify the management server, with the knowledge that the base system you are specifying has been tested and found to be able to perform a given level of tasks.
This report in no way represents or is meant to define an absolute system configuration for any number of managed computers. Instead, this report is meant to show findings and a possible starting point for you to specify the management server. Calculators are provided in the appendices to help you to calculate the database size, and to show the expected input/output activity that might be found on a management server. For more information, refer to the appendices later in this paper.
Testing took into account all events, alerts, and performance counters that occurred during the peak operation of managed computers. This testing did not take into account any Application Management Packs that you might place into service, or the use of any services or scripts to correct certain situations. Although Management Packs were not used in the testing, processing rules were used to generate events, alerts, and performance counters per day at level that is higher than any reported by MOM enterprise customers.
For the MOM SP1 testing, the test workload was determined by collecting data from approximately 20 enterprise customers and from the Microsoft Operations and Technologies Group (OTG). The data collected showed a significant drop in the number and rates of events, alerts, and performance counters for MOM SP1. This is due to the tuning of Management Packs and increased database efficiencies. The rate of simulated network line and database usage the MOM test team used to test MOM SP1 far exceeded the actual rates collected from any of the external or internal users. For MOM SP1, actual managed computers were used to generate the network line and database usage workloads, rather than being simulated as was the case with the original MOM RTM testing.
Introduction
MOM is a management system for monitoring managed computers in an organization. MOM SP1 has been tested and it has been proven, based on many customers, that MOM SP1 scales to the published supported numbers. However, the limits for MOM can vary depending on many variables that are discussed in this paper. This paper describes the testing process used to recommend a suitable management server size with enough reserve capacity to smoothly manage a specific number of computers without putting the MOM SP1 computer systems at risk. It also provides information about the expected performance of the MOM SP1 computer system while managing these computers. Specifically, this paper answers questions such as:
How large must the management server be in terms of hardware resources?
How large is the overall footprint of MOM SP1?
How large should the MOM SP1 database be?
What are the system requirements needed to run MOM SP1 effectively?
What is the expected disk activity on the MOM SP1 database and database server?
What is the expected CPU usage of the MOM SP1 agent on a managed computer?
How might the recommendations contained in this paper be useful? Consider the performance and sizing considerations presented in the following scenarios.
Scenario 1
The systems managers of an online-order-entry environment decide to license MOM SP1 to manage 150 servers worldwide. They determine how large the management server computer system should be and decide to use a single computer for the task. With no experience in MOM capacity planning, it is difficult for them to determine the correct size for the management server. They order a computer system that is much too small for the job. They also learn that they need a much larger-capacity network to accommodate the MOM workload traffic. They will now lose time ordering additional system hardware to rectify this situation.
Scenario 2
The systems managers of an online-order-entry environment decide to license MOM SP1 to manage 1,000 servers worldwide. They determine how large the management server computer system should be and decide to use a series of tiered management systems (alert forwarding) for the task. Unlike in Scenario 1, this company will lose large amounts of money if the servers are not managed correctly, or if they go offline for any reason.
In such a large environment, deciding how large the first tier configuration group management servers should be, and how large the management server in the master configuration group should be compounds the complexity of the sizing considerations. Again, with limited or no experience in MOM capacity planning, it is very difficult for the system managers to determine the correct size for the tier one and tier two management servers. As a result, they order computer systems that are too small for the job. In the process, they also find that they need a separate management network to accommodate the MOM workload traffic. They will now lose time ordering the additional system hardware to rectify the problems.
Conclusion
Careful consideration of the performance and sizing of the hardware systems that support MOM SP1 is critical to the successful implementation of MOM to manage computers in your organization. Although there is no absolute system configuration for any number of managed computers, this technical paper presents the results of performance and sizing testing for MOM SP1 in environments of various sizes. You can use the findings in this paper and the MOM SP1 Management Server Sizer as a starting point to help you to determine the appropriate performance and sizing considerations for MOM in your organization. For more information about the MOM SP1 Management Server Sizer, see “Appendix B: MOM SP1 Management Sizer” later in this paper.
MOM SP1 Test Parameters
This section presents the key factors for the MOM SP1 performance and sizing testing — the hardware used, the scope and goals for the testing, the tools used, and how the test workload was calculated. Later sections present the results of testing.
Hardware Test Environment
This series of tests included three different MOM configuration scenarios, each with an appropriate range of managed computers.
Small configuration - Single DDCAM (MOM database and DCAM), with 20, 50, 85, 140, and 200 managed computers.
Large configuration - Separate database server, single DCAM, with 250, 500, 700, and 1000 managed computers.
Enterprise configuration - Separate database server; two DCAMs, with 700 and 1000 managed computers.
The detailed systems information for each of these scenarios, along with the test results, is described in later sections.
The network used in all tests had a line capacity of 100 Mbps, which represents the highest available bandwidth for most organization’s production environments. Lines with greater capacity, such as T1, are not widely used by a large part of the user community. The hardware was set up in a single-tier configuration. Multitiered configurations were not tested.
For all test scenarios, the configuration for the managed computers was the same, as described in Table 1.
Table 1 Managed Computer Configuration
System component |
Description |
---|---|
Processor count |
1 |
Processor type |
1000 MHz Pentium 4 |
Memory |
512 MB |
Disk count |
1 |
Disk designation OS |
Drive C |
Disk size |
7.85 GB (6 GB free space) |
Network capacity |
100 Mbps (12.5 MB) |
Workload Environment
The data workloads used in testing each of the configurations for MOM SP1 was consistently higher than workloads used in MOM RTM testing and higher than the actual workloads reported by the largest enterprise customers. For example, alerts delivered to the database for MOM SP1 were 0.00833 alerts per minute per computer at the 1000-managed computer level for the enterprise configuration. This compares to MOM RTM testing at the rate of 0.00445 per minute per computer, which means that the rate was twice as high for MOM SP1. Table 2 and Table 3 show the workload levels used in testing MOM SP1.
Table 2 Data Workload Levels per Day Used for Testing MOM SP1
Managed computer count |
Alerts per day |
Events per day |
Performance counters per day |
---|---|---|---|
20 |
2,250 |
100,000 |
250,000 |
50 |
2,250 |
100,000 |
250,000 |
85 |
2,250 |
100,000 |
250,000 |
140 |
2,250 |
100,000 |
250,000 |
200 |
2,250 |
100,000 |
250,000 |
250 |
9,000 |
400,000 |
400,000 |
500 |
9,000 |
400,000 |
400,000 |
700 |
9,000 |
400,000 |
400,000 |
1000 |
12,000 |
600,000 |
600,000 |
Note: The values in Table 2 far exceed any numbers reported by the largest enterprise customers for MOM SP1.
Table 3 Data Workload Rates per Minute per Managed Computer Used for Testing MOM SP1
Managed computer count |
Alerts per minute per managed computer |
Events per minute per managed computer |
Performance counters per minute per managed computer |
---|---|---|---|
20 |
.078125 |
3.470 |
8.680 |
50 |
.031200 |
1.380 |
3.472 |
85 |
.018350 |
0.816 |
2.042 |
140 |
.011100 |
0.049 |
1.240 |
200 |
.007810 |
0.347 |
0.868 |
250 |
.025000 |
1.111 |
1.111 |
500 |
.012500 |
0.555 |
0.555 |
700 |
.008930 |
0.397 |
0.397 |
1000 |
.008330 |
0.417 |
0.417 |
Note: The values in Table 3 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.
Integrated Grooming
MOM SP1 uses an integrated grooming feature, which means that each time MOM SP1 performs a database insert for an event, alert, or a performance counter, it also deletes up to 4,000 records by default according to the grooming parameters that you have established. As a result, the need to periodically groom the MOM database is substantially reduced. Another result of integrated grooming is that ongoing CPU utilization and total I/Os are higher with MOM SP1 than with MOM RTM. However, when you groom the MOM database, you do not experience high CPU utilization, which often reached 100 percent for an extended period of time with MOM RTM. Table 4 shows the alert latency and grooming data for each level of managed computers tested.
Table 4 Alert Latency and Grooming Data
Managed computer count |
Alert latency (seconds) |
Events groomed per day |
Performance counters groomed per day |
---|---|---|---|
20 |
42.32 |
3,960,000 |
21,144,000 |
50 |
47.37 |
3,936,000 |
20,760,108 |
85 |
44.66 |
4,656,000 |
19,632,108 |
140 |
49.98 |
3,816,000 |
19,632,108 |
200 |
51.27 |
4,752,000 |
21,456,108 |
250 |
83.35 |
2,152,110 |
13,174,932 |
500 |
121.18 |
2,280,000 |
13,294,932 |
700 |
119.19 |
2,160,000 |
15,190,932 |
1000 |
121.23 |
2,312,400 |
19,992,264 |
Note: For all tests results shown in Table 4, the duration of testing was four hours. Alerts were not groomed during this testing because they did not accumulate fast enough to require grooming.
When the workload was increased for the 250 to 1000 managed-computers level, grooming rates dropped off. This is because the DCAM is performing more database inserts, and therefore it performs integrated grooming at a slightly lower percentage to prevent insert latency. For each insert, the DCAM uses an algorithm to calculate the level of integrated grooming, depending on a number of factors, such as how many inserts are in the queue.
Scope of Testing
The scope of testing determines how well MOM SP1 scales, what management server configuration best manages the computers, and what the maximum number of computers is that a single management server can manage.
For each test configuration, the test procedure was as follows:
Set up the MOM database server by using the required database backup and set up the DCAM(s).
Set up the required number of managed computers for the DCAM(s). Flushed the queues if MOM agents had already been installed on the managed computers.
Stopped and restarted the OnePoint service and the MOM database services, in the following order:
Stopped OnePoint service on the DCAM(s)
Stopped MSSQLSERVER and SQLSERVERAGENT services on the MOM database
Started MSSQLSERVER and SQLSERVERAGENT service on the MOM database
Flushed the queues on the DCAM(s)
Started the OnePoint service on the DCAM(s)
Started collecting the performance counters on the MOM database server and DCAM(s).
Started the specified alert, event, and performance counter workload on each of the managed computers, which began the test.
Set up the grooming jobs to run once each hour for the last 2 hours of the test.
Ran the test for 4 hours.
Note: MOM SP1 Build 1300 (RTM) was used for all test scenarios.
Performance Monitor Counter Metrics
Table 5 lists the primary performance counters that were collected and used for this analysis. For a complete list and description of the counter functions, see “Appendix D: Counter Definitions” later in this paper.
Table 5 Primary Counters Used in Testing
Counter object |
Counter property |
Instances |
---|---|---|
Processor |
% Processor Time average |
Total |
|
% Processor Time peak |
|
|
Interrupts/sec |
|
Process |
% Processor Time |
OnePoint process |
|
Working Set |
SQL Server processes |
|
Thread Count |
|
|
IO Read Operations/sec |
|
|
IO Write Operations/sec |
|
Memory |
Available Bytes |
Total |
|
Page Faults/sec |
|
|
% Committed Bytes In Use |
|
Network Interface |
Bytes Total/sec |
100 Mbps network adapter card |
|
Current Bandwidth |
|
Physical Disk |
Disk Reads/sec |
Drive C |
|
Disk Writes/sec |
Database disk drives |
|
Avg. Disk Queue Length |
|
System |
Processor Queue Length |
Total |
SQL Server:Databases |
Transactions/sec |
OnePoint database |
SQL Server:Buffer Manager |
Buffer Cache Hit Ratio |
OnePoint database |
Calculated counters |
Counter property |
Calculations |
% Network Busy |
Bytes Total/sec/Current Bandwidth Bytes |
Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec Current Bandwidth Bytes = Current Bandwidth/8 |
Memory Free Space |
Available KBytes/Total Physical Memory |
|
Note: Table 5 establishes the core performance-counter collection metrics. Other counters might be used for further analysis. The Physical Disk, % Disk Time counter was not used because it gives false readings on Redundant Array of Independent Disks (RAID) arrays. All the database disk arrays used for these tests were RAID 10.
MOM SP1 Test Results
The following sections show test results for a range of managed computers in different-sized MOM configurations.
MOM SP1 Test Results I: Small Configuration - Single DDCAM (MOM Database and DCAM)
The first series of tests was performed on a small management server. This system includes the basic management server that manages a few computers. This section shows the capacity of the system. The test results also show maximum capacity, in terms of the upper bounds of managed computers that can be adequately controlled by this server configuration.
Hardware Test Environment - Small Configuration, Single DDCAM
Table 6 DDCAM System Configuration
System Component |
Description |
---|---|
Processor count |
4 |
Processor type |
550 MHz Pentium 3 |
Memory |
768 MB |
Disk count OS |
1 |
Disk count DB |
6 (RAID 10) |
Disk count log file |
1 |
Disk designation OS |
C drive (8.46 GB) |
Disk designation DB |
D drive (101.6 GB, 37.1 GB free space) |
Disk designation log file |
E drive (26 GB) |
Disk I/O capacity - Reads |
750 read operations per second |
Disk I/O capacity - Writes |
375 write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
MOM Build |
MOM SP1 Build 1300 (RTM) |
MOM/SQL Server Disk Requirements
Table 7 shows the resources needed to install the management server, along with the SQL Server database. Microsoft® SQL Server™ 2000 Standard was used for these tests.
Table 7 Disk Requirements for Small Configuration - Single DDCAM
MOM disk space requirement total |
230 MB |
MOM OnePoint working set average memory |
53.81 MB-79.87 MB |
OnePoint threads (avg.) |
71 |
SQL Server working set memory |
604.57 MB-645.37 MB |
SQL Server database size (disk space) |
6.63 GB |
Database log disk space |
1 GB |
MS DTC log size disk space |
512 MB |
Database and Data Workload Sizing
Table 8 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 20 to 200). Tables 9 and 10 show the data workloads used in the tests for this configuration.
Table 8 Pre-test Database Table Sizes for the Small Configuration
Rows in Alert table |
88,574 |
Rows in Event table |
2,252,322 |
Rows in SampledNumericData table |
3,720,018 |
Table 9 Data Workload Levels per Day - Small Configuration (for 20 to 200 managed computers)
Alerts per day |
2,250 |
Events per day |
100,000 |
Performance counters per day |
250,000 |
Note: The data workload shown in Table 9 was held constant for each level of managed computers (from 20 to 200). This workload represents higher workloads than the levels reported by any of the enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken by the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.
Table 10 Data Workload Rates per Minute per Managed Computer Used for the Small Configuration
Managed computer count |
Alerts per minute per managed computer |
Events per minute per managed computer |
Performance counters per minute per managed computer |
---|---|---|---|
20 |
.078125 |
3.470 |
8.680 |
50 |
.031200 |
1.380 |
3.472 |
85 |
.018350 |
0.816 |
2.042 |
140 |
.011100 |
0.049 |
1.240 |
200 |
.007810 |
0.347 |
0.868 |
Note: The values in Table 10 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.
In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00781 per minute per computer at the 200-managed computer level. This is approximately twice as high as the MOM RTM test workload levels. At the 20-managed computer level for MOM SP1, the average alerts delivered to the database were 0.0781 alerts per minute per computer. This represents a workload over17 times higher than the MOM RTM testing workload. This means that the alert workloads used for the MOM SP1 performance testing of this configuration, range from 2 times to 17 times as high as the MOM RTM testing levels.
Microsoft Operations Manager/SQL Server Test Results
These tests were performed to find what size the management server should be, in terms of hardware, to perform a set level of work. Testing was started from 20 managed computers to find the upper limit. In this test series, the MOM DDCAM was monitored while managing 20 computers at the low end. These findings were used to establish a baseline for the DDCAM operation. Table 11 depicts the growth rate as more managed computers are added.
Table 11 Effects on DDCAM of Additional Managed Computers - Small Configuration
Managed computer count |
% CPU utilization |
OnePoint service utilization |
OnePoint working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
20 |
22.56% |
9.00% |
53,809,957 |
288.92 |
268.84 |
57.02% |
0.38% |
50 |
27.64% |
20.22% |
54,349,396 |
301.69 |
273.15 |
55.48% |
0.38% |
85 |
30.91% |
25.91% |
56,505,485 |
306.24 |
258.50 |
55.19% |
0.38% |
140 |
38.19% |
36.76% |
70,732,436 |
276.75 |
259.49 |
55.18% |
0.39% |
200 |
42.31% |
44.01% |
79,874,416 |
253.74 |
242.19 |
54.64% |
0.40% |
Figures 1 through 5 graphically present information from Table 11.
Figure 1: Adding managed computers increases CPU utilization
New for MOM SP1 – Increased Managed Computer Capacity for DDCAMs
Notice in Figure 1 that the CPU utilization on this DDCAM, which is a 4-processor 550 MHz system, varies between 22 percent utilization for 20 managed computers to 42 percent for 200 managed computers. With the new multi-gigahertz processors, you can easily manage 200 computers with a 2-processor system.
Figure 2: Increasing I/O has a negative affect on disk performance. In this case, increasing disk queues cause increased latency (see Figure 3)
For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total peak I/O rate total for 200 managed computers was 116.60 per sec per computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.
Figure 3: Disk queues remain approximately ten for MOM SP1
Figure 3 displays queue lengths of approximately ten. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.
Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
Figure 4: Free memory space is adequate at all managed computer levels
Memory usage for the MOM DDCAM, which includes DCAM and database activity, was as high 56 percent free memory space with 768 MB of memory. In all of these tests, MOM SP1 uses approximately the same amount of memory consistently.
Figure 5: Network utilization remains very low at all managed computer levels
Network utilization has risen predictably from the 20 managed computer count to the 200 managed computers count, with a high point of 0.40 percent utilization. This is consistent with what has been seen throughout the series of testing and consistent with customer reports about network usage. This utilization factor reflects only steady-state usage and does not reflect Management Pack or MOM agent pushdowns.
Best Practice: Capacity/Performance Recommendation
Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
MOM SP1 Test Results II: Large Configuration - Separate Database Server and Single DCAM
The second series of tests was performed on a larger management server (DCAM), with the MOM database installed on a separate computer. The test results show the maximum number of managed computers that can be adequately controlled by this server configuration.
Hardware Test Environment - Large Configuration, Separate Database, Single DCAM
Table 12 Database System Configuration for the Large Configuration
System component |
Description |
---|---|
Processor count |
4 |
Processor type |
550 MHz Pentium 3 |
Memory |
768 MB |
Disk count OS |
1 |
Disk count DB |
6 (RAID 10) |
Disk count log file |
1 |
Disk designation OS |
C drive (8.46 GB) |
Disk designation DB |
D drive (101.6 GB, 37.1 GB free space) |
Disk designation log file |
E drive (26 GB) |
Disk I/O capacity - Reads |
750 read operations per second |
Disk I/O capacity - Writes |
375 write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
MOM Build |
MOM SP1 Build 1300 (RTM) |
Table 13 DCAM System Configuration for the Large Configuration
System component |
Description |
---|---|
Processor count |
2 |
Processor type |
800 MHz Pentium 3 |
Memory |
512 MB |
Disk count |
1 |
Disk designation |
C drive (14.6 GB, 12.5 GB free space) |
Network capacity |
100 Mbps (12.5 MB) |
MOM Build |
MOM SP1 Build 1300 (RTM) |
The MOM/SQL Server Disk Requirements
Table 14 and Table 15 show the resources needed to install the DCAM and the SQL Server database. SQL Server 2000 Standard was used for these tests.
Table 14 Database Disk Requirements for Large Configuration
Database server: |
---|
MOM disk space requirement total |
SQL Server working set memory (average) |
SQL Server database size (disk space) |
Database log disk space |
MS DTC log size disk space |
DCAM: |
MOM disk space requirement total |
MOM OnePoint working set memory (average) |
OnePoint threads (average) |
Database and Data Workload Sizing
Table 15 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 250 to 1,000). Tables 16, 17 and 18 show the data workloads used in the tests for this configuration.
Table 15 Pre-test Database Table Sizes for the Large Configuration
Rows in Alert table |
97,833 |
Rows in Event table |
3,303,168 |
Rows in SampledNumericData table |
2,947,496 |
Table 16 Data Workload Levels per Day - Large Configuration (for 250, 500, and 700 managed computers)
Alerts per day |
9,000 |
Events per day |
400,000 |
Performance counters per day |
400,000 |
Note: The data workload shown in Table 16 was held constant for the 200, 500, and 700 levels of managed computers. This workload represents higher workloads than the levels reported by any of the enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken by the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.
Table 17 Data Workload Levels per Day - Large Configuration (for 1,000 managed computers)
Alerts per day |
12,000 |
Events per day |
600,000 |
Performance counters per day |
600,000 |
Table 18 Data Workload Rates per Minute per Managed Computer for the Large Configuration
Managed computer count |
Alerts per minute per managed computer |
Events per minute per managed computer |
Performance counters per minute per managed computer |
---|---|---|---|
250 |
.02500 |
1.111 |
1.111 |
500 |
.01250 |
0.555 |
0.555 |
700 |
.00893 |
0.397 |
0.397 |
1000 |
.00833 |
0.417 |
0.417 |
Note: The values in Table 18 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.
In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00833 per minute per computer at the 1,000-managed computer level. This is approximately twice as high as the MOM RTM test workload levels. At the 250-managed computer level for MOM SP1, the average alerts delivered to the database were 0.025 alerts per minute per computer. This represents a workload nearly 6 times higher than the MOM RTM testing workload. This means that the alert workloads used for the MOM SP1 performance testing of this configuration, range from 2 times to 6 times as high as the MOM RTM testing levels.
Microsoft Operations Manager/SQL Server Test Results
These tests were performed to find what size the DCAM and the database server should be, in terms of hardware, to perform a set level of work. Testing was started from 250 managed computers to find the upper limit.
In this test series, the DCAM and the database server were monitored while managing 250 computers at the low end. Tables 19 and 20 depict the effect on the DCAM and the database server, respectively, as more managed computers are added.
Table 19 Effect on DCAM of Additional Managed Computers - Large Configuration
Managed computer count |
% CPU utilization |
OnePoint service utilization |
OnePoint working set average |
Memory free space |
Network busy |
---|---|---|---|---|---|
250 |
36.99% |
38.91% |
164,009,537 |
81.12% |
0.07% |
500 |
49.30% |
48.51% |
192,642,270 |
77.52% |
0.16% |
700 |
50.70% |
48.57% |
192,856,224 |
76.18% |
0.19% |
1,000 |
62.23% |
54.65% |
194,204,590 |
70.43% |
0.23% |
Table 20 Effect on Database Server of Additional Managed Computers - Large Configuration
Managed computer count |
% CPU utilization |
SQL Server service utilization |
SQL Server working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
250 |
17.96% |
65.38% |
703,313,169 |
56.42 |
166.86 |
56.59% |
0.47% |
500 |
22.88% |
76.22% |
709,683,340 |
154.39 |
264.32 |
55.02% |
0.43% |
700 |
22.28% |
72.80% |
714,549,239 |
133.46 |
265.90 |
56.03% |
0.45% |
1,000 |
24.37% |
80.05% |
720,452,301 |
187.55 |
292.35 |
54.38% |
0.46% |
Figures 6 through 13 graphically present information about the MOM database server and DCAM from Table 19 and Table 20.
Figure 6: Adding managed computers increases CPU utilization on the database server
Even with the 550 MHz, 4-processor system that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 25 percent. With the new, more powerful multi-gigahertz processors, it is expected that this utilization would be drastically reduced.
Figure 7: Adding managed computers increases CPU utilization on the DCAM
Even with the 800 MHz, 2-processor system that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 61 percent. With the new, more powerful multi-gigahertz processors, it is expected that that this utilization would be drastically reduced.
Figure 8: Increasing I/O has a negative affect on disk performance. In this case, increasing disk queues cause increased latency (see Figure 9)
Note: Information about I/O activity on the DCAM was not included because it was inconsequential and only reflects the operating system and MOM activity.
For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total I/O peak rate for 1,000 managed computers was 59.57/sec/computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.
Figure 9: Disk queues increase as managed computers are added
Figure 9 displays queue lengths of up to 16. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.
Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
Figure 10: Free memory space is adequate at all managed computer levels
Memory usage for the MOM database server was 55 percent free memory space with 768 MB of memory, which is consistent with the enterprise configuration test results (see the MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs section later in this paper). Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.
Figure 11: Free memory space is adequate at all managed computer levels
Memory usage for the DCAM at the 250-managed computer count was 80 percent free memory space with 512 MB of memory. As expected, and consistent with the findings overall, at the 1000-managed computers level, free space was 10 percent less at approximately 70 percent. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.
Figure 12: Network utilization remains very low at all managed computer levels
As expected, and consistent with MOM RTM testing, network utilization is at a minimum for all levels. As Figure 12 demonstrates, MOM SP1 does not overburden the network. In further tests, the highest utilization seen was 9 percent utilization during an agent pushdown, demonstrating that agent pushdowns result in much higher network utilizations.
Figure 13: Network utilization remains very low at all managed computer levels
As in the case of the database server, the network utilization from the managed computers to the DCAM consistently rose from 0.10 percent, at 250-managed computer level, to 0.25 percent, at the 1000-managed computer level, as Figure 13 demonstrates. As in comments throughout this paper, the workloads were consistently higher than any reported by customer surveys.
Best Practice: Capacity/Performance Recommendation
Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs
The third series of tests was performed using two large management servers (DCAMs), with the MOM database installed on a separate computer. The test results show the maximum number of managed computers that can be adequately controlled by this server configuration.
Hardware Test Environment - Enterprise Configuration, Separate Database, Two DCAMs
Table 21 Database System Configuration for the Enterprise Configuration
System component |
Description |
---|---|
Processor count |
4 |
Processor type |
550 MHz Pentium 3 |
Memory |
768 MB |
Disk count OS |
1 |
Disk count DB |
6 (RAID 10) |
Disk count log file |
1 |
Disk designation OS |
C drive (8.46 GB) |
Disk designation DB |
D drive (101.6 GB, 37.1 GB free space) |
Disk designation log file |
E drive (26 GB) |
Disk I/O capacity - Reads |
750 read operations per second |
Disk I/O capacity - Writes |
375 write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
MOM Build |
MOM SP1 Build 1300 (RTM) |
Table 22 DCAM System Configuration for the Enterprise Configuration
System component |
Description |
---|---|
Processor count |
2 |
Processor type |
800 MHz Pentium 3 |
Memory |
512 MB |
Disk count |
1 |
Disk designation |
C drive (14.6 GB, 12.5 GB free space) |
Network capacity |
100 Mbps (12.5 MB) |
MOM Build |
MOM SP1 Build 1300 (RTM) |
The MOM/SQL Server Disk Requirements
Table 23 shows the resources needed to install the SQL Server database. SQL Server 2000 Standard was used for these tests.
Table 23 Disk Requirements for Enterprise Configuration - Separate Database, Two DCAMs
Database server: |
---|
MOM disk space requirement total |
SQL Server working set memory (average) |
SQL Server database size (disk space) |
Database log disk space |
MS DTC log size disk space |
Each DCAM: |
MOM disk space requirement total |
MOM OnePoint working set memory (average) |
OnePoint threads (average) |
Database and Data Workload Sizing
Table 24 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 700 to 1,000). Table 25 and Table 26 show the data workloads used in the tests for this configuration.
Table 24 Pre-test Database Table Sizes for the Enterprise Configuration
Rows in Alert table |
97,833 |
Rows in Event table |
3,303,168 |
Rows in SampledNumericData table |
2,947,496 |
Table 25 Data Workload Levels per Day for the Enterprise Configuration
Workload item |
For 700 managed computers |
For 1,0000 managed computers |
---|---|---|
Alerts per day |
9,000 |
12,000 |
Events per day |
400,000 |
600,000 |
Performance counters per day |
400,000 |
600,000 |
Note: The data workload shown in Table 25 represents higher workloads than the levels reported by any enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken from the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.
Table 26 Data Workload Rates per Minute per Managed Computer for the Enterprise Configuration
Managed computer count |
Alerts per minute per managed computer |
Events per minute per managed computer |
Performance counters per minute per managed computer |
---|---|---|---|
700 |
.00893 |
0.397 |
0.397 |
1000 |
.00833 |
0.417 |
0.417 |
Note: The values in Table 26 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.
In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00833 per minute per computer, at the 1,000-managed computer level, and 0.00893 per minute per computer, at the 700-managed computer level. Both are approximately twice as high as the MOM RTM test workload levels. This means that the alerts workloads used for the MOM SP1 performance testing of this configuration, are twice as high as the MOM RTM testing levels.
Microsoft Operations Manager/SQL Server Test Results
These tests were performed to find what size that the DCAM and the database server should be, in terms of hardware, to perform a set level of work. Testing was started from 700 managed computers to find the upper limit.
In this series of tests, two DCAMs and the database server were monitored while managing 700 computers at the low end and 1,000 computers at the high end. The first test was conducted with 200 managed computers on one DCAM and 500 managed computers on the other, for a total of 700. The second test was conducted with 500 managed computers on each DCAM, for a total of 1,000. Table 27 depicts the effects on the two DCAMs for these test scenarios. Table 28 depicts the effect on the database server for the two test scenarios.
Table 27 Effect on the DCAMs of Additional Managed Computers - Enterprise Configuration
DCAM |
Managed computer count |
% CPU utilization |
OnePoint service utilization |
OnePoint working set average |
Memory free space |
Network busy |
---|---|---|---|---|---|---|
A |
200/700 |
34.76% |
38.22% |
141,628,304 |
80.70% |
0.26% |
B |
500/700 |
49.32% |
53.20% |
189,064,637 |
77.22% |
0.26% |
C |
500/1,000 |
49.45% |
49.98% |
193,576,489 |
76.61% |
0.25% |
D |
500/1,000 |
50.57% |
51.86% |
194,378,502 |
76.57% |
0.25% |
Note: Table 27 reflects the usage of four different DCAMs. In one test case, DCAM A managed 200 out of the 700 computers and DCAM B managed 500 of the 700 computers. In the second test case, DCAM C managed 500 of the 1,000 computers, and DCAM D managed 500 of 1,000 computers.
Table 28 Effect on Database Server of Additional Managed Computers - Enterprise Configuration
Managed computer count |
% CPU utilization |
SQL Server service utilization |
SQL Server working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
700 |
19.76% |
69.67% |
711,511,735 |
94.91 |
204.91 |
56.19% |
0.46% |
1,000 |
24.50% |
76.10% |
743,612,153 |
190.44 |
218.82 |
54.35% |
0.45% |
Figures 14 through 21 graphically present information about the DCAMs and the MOM database server from Table 27 and Table 28.
Figure 14: expected, adding managed computers increases CPU utilization on the database server
Even with the 550 MHz processor that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 25 percent. It is projected that the utilization for 2,000 managed computers would be less the 50 percent on a 768 MHz processor. With the new more powerful multi-gigahertz processors, we expect that that this utilization would be drastically reduced.
Figure 15: Adding managed computers increases CPU utilization on the DCAMs
Figure 15 reflects the usage of four different DCAMs. For more information, see Table 27. Notice that DCAM B, DCAM C, and DCAM D, which were all managing 500 computers, had almost identical CPU utilization factors. These tests reflect the consistency of MOM DCAMs. Also, note that the utilization for DCAM B, DCAM C, and DCAM D was at 50 percent on an 800 MHz computer, which is well below the 75 percent level, leaving 25 percent reserve capacity. It is expected that on the new more powerful multi-gigahertz processors, that the CPU utilization would be drastically reduced.
Figure 16: Increasing I/O affects disk performance
Note: Information about I/O activity on the DCAM was not included because it was inconsequential and only reflects the operating system and MOM activity.
For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total peak I/O rate for 1,000 managed computers was 59.57/sec/computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.
Figure 17: Disk queues increase as managed computers are added
Figure 17 displays queue lengths of up to 15 on the database server. The queue length on the DCAMs was zero for all test cases, so no figure is shown. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.
Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
Figure 18: Free memory space is adequate at all managed computer levels
Memory usage for the MOM database server was 60 percent free memory space with 768 MB of memory. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.
Figure 19: Free memory space is adequate at all managed computer levels
Memory usage for the MOM DCAM was 80 percent free memory space with 512 MB of memory for all tests. This demonstrates efficient use of memory by MOM SP1. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 1 GB. It is projected that at the recommended 1 GB memory size, the DCAM would run more efficiently.
Figure 20: Network utilization remains very low at all managed computer levels
As expected, and consistent with MOM RTM testing, network utilization is at a minimum for all managed computers levels. As Figure 20 demonstrates, MOM SP1 does not overburden the network. In further tests, the highest utilization seen was 9 percent utilization during an agent push down, demonstrating that agent pushdowns will result in much higher network utilizations.
Figure 21: Network utilization remains very low at all managed computer levels
As in the case of the database server, the network utilization from the managed computers to the DCAM was consistently around 25 percent utilization. As Figure 21 demonstrates, the usage for 200 managed computers up to 500 managed computers was about the same. This is because the workload for all managed computers levels were the same. As in comments throughout this paper, the workloads were consistently higher than any reported by customer surveys.
Best Practice: Capacity/Performance Recommendation
Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.
MOM SP1 Management Packs
These tests are designed to measure the memory usage (footprint) of the MOM SP1 Management Packs both individually and cumulatively (build-up) as they are added to a managed computer.
Test Parameters
Hardware Test Environment
Table 29 describes the system configuration for the computers used in this test. The same configuration was used for the MOM SP1 server and the three managed computers.
Table 29 Computer Configuration (MOM SP1 Server and Managed Computers)
System component |
Description |
---|---|
Processor count |
1 |
Processor type |
600 MHz Pentium 3 |
Memory |
256 MB |
Disk count |
1 |
Disk designation OS |
Drive C |
Disk size |
12.76 GB |
Operating system |
Windows 2000 Server SP3 |
Software Test Environment
The software and the versions used for these tests are listed in the Table 30:
Table 30 Product s and Versions Used for Tests
Product name |
Version or build tested |
---|---|
Windows 2000 Server |
Service Pack 3 |
SQL Server 2000 |
RTM + Service Pack 3 |
MOM 2000 |
Service Pack 1 |
MOM 2000 Application Management Pack |
Service Pack 1 |
MOM SP1 Configuration
MOM SP1 was configured as a single configuration group, with three managed computers and with all MOM components installed on a single server.
The software that the test team installed on the MOM SP1 server is as follows:
MOM SP1
MOM SP1 Application Management Pack
SQL Server 2000 SP3
Internet Information Server
Terminal Services
Anti-virus software (eTrust)
After installing MOM SP1, the test team created three performance processing rules to capture performance data from the managed computers. Details of these custom performance processing rules are listed in the Table 31. After creating the performance processing rules, the test team created three Public views to chart this information.
Table 31 Custom Performance Processing Rules
Rule name |
Provider |
---|---|
Performance-Private Bytes-OnePointService Agent |
Process-Private Bytes-OnePointService-10-minutes |
Performance-% Processor Time-OnePointService Agent |
Process-% Processor Time-OnePointService-10-minutes |
Process-% Processor Time-OnePointService-10-minutes |
Process-Working Set-OnePointService-10-minutes |
Managed Computers Configuration
The managed computers were a basic Windows 2000 Server configuration. The only additional service or product installed on the managed computers was the eTrust anti-virus software.
Agent Installation and Configuration Process
The test team installed each agent by adding the computer name to the Agent Manager, and then approving the installation of the agent to the managed computers. The installation of the agent was verified by viewing the All Agents view on the MOM SP1 server and by checking for the OnePointService process on each managed computer.
After installing each agent, the custom performance processing rules were enabled for graphing by selecting each computer in the Recent Performance view and enabling the counters for graphing.
For each test case, the managed computers were placed in several default computer groups. The computer groups that were common to each test variation are listed in Table 32.
Table 32 Common Computer Groups
Hardware Attributes – Number of Processors |
Hardware Attributes – CPU Vendor |
Hardware Attributes – CPU speed |
Hardware Attributes – CPU Identifier |
Hardware Attributes – BIOS Version |
Hardware Attributes – BIOS Date |
Microsoft Operations Manager Agents |
When adding Management Packs to the agents, the managed computers were explicitly added to the computer groups for each Management Pack. This ensured that the Management Pack was deployed to the managed computer.
Test Cases
Management Pack Memory Build-up Tests
This series of tests is designed to measure the cumulative agent memory footprint as Management Packs are added to a managed computer. For each test case, the Management Packs were added to the agent computers in the order listed. Performance metrics were collected on the OnePointService process by using the custom performance processing rules listed in Table 31 earlier in this paper.
Test Case 1: Windows Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Windows NT & 2000 RRAS & RAS Non-Authorized Computers
Windows 2000 Servers
Windows 2000 License Logging Service
Windows 2000 Dr. Watson
Windows 2000 Any Computer
Service Pack Version
Test Case 2: Add MOM SP1 Management Pack
The managed computer was placed in the following groups:
All groups in Test Case 1
Microsoft Operations Manager Database
Microsoft Operations Manager Data Access Server
Microsoft Operations Manager Consolidator
Test Case 3: Add Active Directory Management Pack
The managed computer was placed in the following groups:
All groups in Test Cases 1 and 2
Windows 2000 Domain Controllers
Active Directory Trust Monitoring
Active Directory Replication Latency Data Collection
Active Directory Client Side Monitoring
Test Case 4: Add Exchange Server Management Pack
The managed computer was placed in the following groups:
All groups in Test Cases 1, 2, and 3
Microsoft Active Directory Connector
Microsoft Exchange Server 2000
Microsoft Exchange Instant Messaging Server
Test Case 5: Add SQL Server Management Pack
The managed computer was placed in the following groups:
All groups in Test Cases 1, 2, 3, and 4
Microsoft SQL Server 2000
Management Pack Memory Footprint Tests
This series of tests is designed to measure the agent memory footprint of individual Management Packs. Each Management Pack listed was added individually to a managed computer. Performance metrics were collected on the OnePointService service by using the custom performance processing rules listed in Table 31 earlier in this paper.
To keep the Windows Management Pack from automatically being installed on each managed computer and because they were all running Windows 2000 Server, the processing rule was modified.
Test Case 6: Base MOM Agent Only
The managed computer was placed in the following groups:
- Common groups listed in Table 32, earlier in this paper
Test Case 7: Windows Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Windows NT & 2000 RRAS & RAS Non-Authorized Computers
Windows 2000 Servers
Windows 2000 License Logging Service
Windows 2000 Dr. Watson
Windows 2000 Any Computer
Service Pack Version
Test Case 8: MOM SP1 Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Microsoft Operations Manager Database
Microsoft Operations Manager Data Access Server
Microsoft Operations Manager Consolidator
Test Case 9: Active Directory Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Windows 2000 Domain Controllers
Active Directory Trust Monitoring
Active Directory Replication Latency Data Collection
Active Directory Client Side Monitoring
Test Case 10: Exchange Server Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Microsoft Active Directory Connector
Microsoft Exchange Server 2000
Microsoft Exchange Instant Messaging Server
Test Case 11: SQL Server Management Pack
The managed computer was placed in the following groups:
Common groups listed in Table 32, earlier in this paper
Microsoft SQL Server 2000
Test Results
To achieve consistent results for the Management Pack memory footprint test cases, data was sampled for a 2-hour period, starting 15 minutes after the agent was installed on each managed computer. Waiting 15 minutes after the agent installation provides sufficient time for the agent to initiate communication with the MOM SP1 server, to properly identify the computer groups that the managed computer is included in, and for the Management Pack to be installed on the managed computer.
In addition, for the Management Pack memory build-up test cases, a 15-minute wait was provided after the addition of a Management Pack to the managed computer. An average was taken over the 2-hour period after the waiting period, and the results are reported in the Tables 33 and 34.
Table 33 Management Pack Memory Build-Up Test Results
Test case number |
Test case title |
CPU utilization |
Cumulative Working set (bytes) |
---|---|---|---|
1 |
Windows Management Pack |
0.040% |
18,497,536 |
2 |
Add MOM SP1 Management Pack |
0.055% |
19,935,232 |
3 |
Add Active Directory Management Pack |
0.243% |
28,618,752 |
4 |
Add Exchange Management Pack |
0.427% |
41,435,136 |
5 |
Add SQL Server Management Pack |
0.483% |
43,552,768 |
Note: Table 33 displays possible memory cumulative usage when the Management Packs are added to the managed computer.
Table 34 Management Pack Individual Memory Footprint Test Results
Test case number |
Test case title |
CPU utilization |
Individual working set (bytes) |
---|---|---|---|
6 |
Base MOM Agent Only |
0.004% |
12,193,792 |
7 |
Windows Management Pack |
0.061% |
6,303,744 |
8 |
MOM SP1 Management Pack |
0.020% |
1,761,280 |
9 |
Active Directory Management Pack |
0.121% |
7,553,024 |
10 |
Exchange Management Pack |
0.111% |
13,300,000 |
11 |
SQL Server Management Pack |
0.030% |
10,366,976 |
Note: Table 34 displays possible net memory usage for each Management Pack added to a managed computer.
Appendix A: Test Results For Microsoft Operations Manager 2000 RTM
The MOM test team conducted tests in June 2001 using Microsoft Operations Manager 2000 (MOM 2000). The processing of MOM data is handled completely differently with MOM SP1; therefore, it is not possible to make direct comparisons of the test results between MOM 2000 and MOM SP1.
Note: For the tests of MOM 2000, all MOM components were installed on a single computer, which is referred to throughout this appendix as the management server.
For MOM 2000, these tests were conducted using simulators that imitate heartbeat activity. These tests indicate that a management server supporting 700 managed computers met the criteria for managing that many computers and delivering alerts to the database within the two-minute Service Level Agreement. Although the computer systems sized in this report can process MOM events, alerts, and performance counters within the two-minute Service Level Agreement, this should not be construed as a best practice of MOM usage. There is a minimum-size configuration that has been tested and is known to be able to handle the managed computer count. A best-practice recommendation is offered for each of the management servers depicted in this report.
Test Parameters
This section presents the key factors for the MOM 2000 performance and sizing testing — the hardware used, the scope and goals for the testing, the tools used, and how the test workload was calculated. Later sections present the results of testing.
Hardware Test Environment
For MOM 2000, the test hardware used to conduct this performance study consisted of management servers and client computers. The client computers simulated the activity level of managed computers. Six client computers could simulate up to 2,000 managed computers. Each of the management servers had multiple processor support, although multiple processors were not used in all the testing. The exact configuration of each management server is disclosed in the description section of each test series.
The network had a line capacity of 100 Mbps, which represents the highest available bandwidth for most enterprise environments. Using lines with greater capacity for testing, such as T1, would rule out a large part of the user community. The hardware was set up in a single-tier configuration. Multitiered configurations were not tested during this phase. The configuration of the client server used to simulate the managed computers is as follows:
Table A-1 Managed Computer Configuration
Server name |
MOMTEST3 – MOMTEST8 |
---|---|
Processor count |
1 |
Processor type |
550 MHz - 733 MHz Pentium 3 |
Memory |
256 MB |
Disk count |
One |
Disk designation OS |
Drive C |
Disk size |
9.1 GB |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Manufacturer |
Compaq |
Model |
Deskpro |
Scope of Testing
The scope of testing determines how well MOM scales, what management server configuration best manages the computers, and the maximum number of computers a single management server can manage. The findings include any bottlenecks that were discovered and recommendations to remove the bottlenecks. The builds of MOM 2000 used include: MOM 0003, 0005, 00012.1, 00012.2, and Beta version 00012.5.
Data for all tests performed for MOM 2000 were collected under the following conditions:
The Event Simulator program generated the workload activity for all tests.
The Managed Computer Heartbeat Activity parameter was not tested for scalability because, for many systems, a maximum of 10 clients were used to simulate activity. Heartbeat activity can cause additional network, CPU, and disk usage.
Limited testing was performed for reporting. Database grooming studies also are included in this report.
Goals of Testing
The goal of testing was to determine if MOM meets the high standards of performance that our customers have come to expect from Microsoft. We tested each MOM component for how well it uses resources on the management server, and if the component runs within an acceptable range for defined parameters. These parameters are:
System resource usage
MOM agent usage on managed computers
One of the most important goals of this testing was to determine the correct size of the management server, in terms of hardware resources, needed to manage a set number of managed computers. Scalability was tested to establish the maximum number of managed computers per management server.
Tools for Testing
The software tools used for the testing include the following:
Performance Monitor. Performance Monitor was used to record performance statistics for MOM. This is essential for testing functionality and efficiency. Performance Monitor can detect bottlenecks in any of the core systems such as CPU, disk, memory, and network. Using the performance counters, we can analyze exactly what the computer system is doing, and what level of resources the computer system is using.
Event Simulator. To test the environment accurately from a performance perspective, we needed to create multiple events that strained the Consolidator and the MOM database components. We used Event Simulator, a tool developed by NetIQ, to perform this function. Event Simulator can simulate an event and alert level that is many times the actual quantity of managed computers in our test environment.
Note: Event Simulator is a development test tool, which is not intended for general use.
Test Workload and Calculations
For the MOM 2000 testing, the test workload was determined through a study of the Microsoft Information Technologies Group (ITG) MOM database. A copy of the ITG MOM database that had been in operation for several months was examined to determine the exact number of events and alerts that were generated during peak periods of operation. In addition, the number of performance counters that were collected for each computer also was recorded. These values were then used to simulate actual network line and database usage during the testing.
Test workload information for MOM 2000:
Number of computers that were managed = 350
Time range of test = 67 hours
Suppressed alerts = 1,513
Unsuppressed alerts = 6,273
Events = 9,253,409
Performance counters = approximately 100-130 per computer, per 900 seconds
This results in the following peak numbers:
Suppressed alerts = 0.00108 per computer, per minute
Unsuppressed alerts = 0.00445 per computer, per minute
Events = 6.57 per computer, per minute
Performance counters = 0.144 per computer, per second
Examples of Calculations
Based on the test workload calculations from the previous section, to simulate 100 managed computers required the following calculations:
Suppressed alerts = 0.00108 × 100 computers = 0.108 × 10 minutes = 1.08 alerts
This can be rounded off and expressed as one suppressed alert every 10 minutes.
Unsuppressed alerts = 0.00445 × 100 computers = 0.445 × 2 minutes = 0.89 alerts
This can be rounded off and expressed as one unsuppressed alert every two minutes.
Events = 6.57 × 100 computers = 657 events per minuteCounters = 0.144 × 100 computers = 14.4 counters per second
For MOM 2000, the Event Simulator was set up to generate alerts and events information according to the previous calculations. Setting performance rules within MOM to collect and write these values into the MOM database simulated the network line and database activity of 100 managed computers collecting counters in the previous example.
Note: The collection of performance counters can cause high CPU utilization if collected at too high a rate. In these tests large amounts of counters were collected at very low collection intervals to simulate high numbers of managed computers, and to measure the activity they cause. If used, these low collection intervals can cause abstract effects such as very high CPU utilization for long periods of time, and database latency in excess of 20 minutes.
Performance Monitor Counter Metrics
Table A-2 lists the primary performance counters that were collected and used for this analysis. For a complete list and description of the counter functions, see “Appendix D: Counter Definitions” later in this paper.
Table A-2 Primary Counters Used in Testing
Counter object |
Counter property |
Instances |
---|---|---|
Processor |
% Processor Time average |
Total |
|
% Processor Time peak |
|
|
Interrupts/sec |
|
Process |
% Processor Time |
OnePoint process |
|
Working Set |
SQL Server processes |
|
Thread Count |
|
|
IO Read Operations/sec |
|
|
IO Write Operations/sec |
|
Memory |
Available Bytes |
Total |
|
Page Faults/sec |
|
|
% Committed Bytes In Use |
|
Network Interface |
Bytes Total/sec |
100 Mbps network adapter card |
|
Current Bandwidth |
|
Physical Disk |
Disk Reads/sec |
Drive C |
|
Disk Writes/sec |
Database disk drives |
|
Avg. Disk Queue Length |
|
System |
Processor Queue Length |
Total |
SQL Server:Databases |
Transactions/sec |
OnePoint database |
SQL Server:Buffer Manager |
Buffer Cache Hit Ratio |
OnePoint database |
Calculated counters |
Counter property |
Calculations |
% Network Busy |
Bytes Total/sec/Current Bandwidth Bytes |
Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec Current Bandwidth Bytes = Current Bandwidth/8 |
Memory Free Space |
Available KBytes/Total Physical Memory |
|
Note: Table A-2 establishes the core performance counter collection metrics. Other counters might be used for further analysis. The Physical Disk, % Disk Time counter was not used because it gives false readings on Redundant Array of Independent Disks (RAID) arrays. All database disk arrays used for these tests were RAID 5.
Test Results for MOM 2000
The following sections show test results for a range of managed computers with different sized server configurations.
Test Results I: MOM 2000/SQL Server Typical Install, Small Management Server, 20 to 85 Managed Computers
This series of tests was performed on a small management server using SQL Server 2000 Standard database, with a 5 GB database and a 1 GB log file size. The database was loaded on three-disk drive arrays set to RAID 5. RAID 5 was selected because it offers the most inexpensive form of disk fault tolerance, and it generates the most I/Os per second to accommodate this fault tolerance. This system is meant to monitor up to 85 managed computers. This section shows the capacity of the system, and makes recommendations for improving the system as tested where applicable. The test results show the asymptotic capacity bounds of managed computers that can be adequately controlled by this server.
Management System Configuration
Table A-3 lists the system components used for this series of tests.
Table A-3 System Configuration for Small Server Test
Processor count |
1 |
Processor type |
733 MHz Pentium 3 |
Memory |
512 MB |
Disk count OS |
1×9.1 GB |
Disk count DB |
3×9.1 GB |
Disk designation OS |
C drive (9.1 GB) |
Disk designation DB |
D drive (21.3 GB) |
Database size (+ Log) |
6 GB |
DB I/O capacity |
210 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Manufacturer |
Compaq |
Model |
350 ML |
MOM build |
0003, 0005, 00012.1, 00012.2 |
The MOM 2000/SQL Server Footprint
The following statistics show the actual resources needed to install the management server along with the SQL Server database. SQL Server 2000 Standard was used for these tests. Table A-4 lists the disk sizes that are required to install these components.
Table A-4 System Resources for Small Management Server Test
MOM disk space requirement total |
230 MB |
MOM OnePoint working set average memory |
76.85 MB-97.98 MB |
OnePoint threads |
71 |
SQL Server working set memory |
64.3 MB-123.52 MB |
SQL Server database size disk space |
5 GB |
Database log disk space |
1 GB |
MS DTC log size disk space |
512 MB |
MOM 2000/SQL Server Test Results
These tests were performed to determine the optimum size of the management server needed to perform this task. We started the testing from 20 managed computers (as covered in test series 1) to find the upper limit. For this test, the database is placed on a three-disk volume designated as the D drive.
Establishing the Baseline of the Small Management Server
In this test series, the management server was monitored while managing 20 computers at the low end. These findings are used to establish a baseline for its operation. Table A-5 depicts the growth rate as more managed computers were added.
Table A-5 Effect on Server of Additional Managed Computers
Managed computer count |
% CPU utilization |
OnePoint service |
OnePoint working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
20 |
21.96% |
19.98% |
76,852,765 |
8.98 |
11.45 |
9.57% |
0.635% |
30 |
31.78% |
22.83% |
97,987,467 |
11.31 |
14.89 |
8.342% |
0.756% |
50 |
42.76% |
29.45% |
97,987,467 |
14.58 |
21.56 |
7.99% |
0.963% |
75 |
46.97% |
36.52% |
97,987,467 |
17.94 |
26.45 |
7.560% |
1.041% |
85 |
53.74% |
43.81% |
97,987,467 |
20.71 |
29.48 |
7.89% |
1.39% |
Table A-5 shows a linear trend. Figures A-1, 3, 5, and 7 depict this linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.
Figure A: -1 Adding managed computers affects CPU utilization
Figure A-1 depicts the CPU utilization trend between 20 and 85 managed computers for MOM 2000. As the graph depicts, the growth trend is linear. The utilization rate for 85 managed computers is almost 54 percent. This gives you a good reserve capacity of 46 percent. Because it is not known exactly how much effect the Application Management Packs or scripts will have on CPU utilization, it is recommended that you have as much reserve capacity as possible.
Figure A-2 shows the I/O usage by the MOM 2000 management server monitoring up to 85 managed computers. The maximum I/O count per second is almost 50, which, for this configuration, is very close to the maximum of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if you intend to increase the amount of managed computers.
Figure A: -2 Increasing I/O affects disk performance
Figure A-3 shows the disk queue length for MOM 2000. Notice that at 85 managed computers, the queue is less than 2.00. This does not indicate that there is an abnormal amount of contention. Although the queue should be as close to zero as possible, this should not be a problem, and can be corrected easily by the addition of another disk drive to the database volume. Combined with the I/O rate that the managed computers create, this indicates that the configuration can handle the 85-managed computer workload.
Figure A: -3 Disk queues remain short at 85-managed computer level for MOM 2000
Figure A-4 indicates that the amount of free memory is just under the 10 percent free space limit. Considering all the other counters for the configuration and the volume, this is acceptable. The memory usage has leveled off, which indicates that no more memory usage for this managed computer group will take place.
Figure A: -4 Memory is adequate at the 85-managed computer level
Figure A-5 depicts the network usage of the management server for MOM 2000. Notice that the 100 Mbps network is not affected very much by the activity of 85 managed computers. The total usage is about 1.4 percent in steady state. This indicates that a 10 Mbps network can handle this managed computer workload at about 14 percent usage. This further indicates that the managed computer scan, which can add up to three times the average volume, would still not affect the network enough to cause a bottleneck.
Figure A: -5 Network usage is adequate at the 85-managed computer level for MOM 2000
Sizing Recommendations for Small Configurations
The management server used in these tests has enough capacity to manage 85 computers. The indication of these tests is that the I/Os per second will exceed the recommended I/O capacity if more than 85 computers are managed. In subsequent testing of 90 managed computers, the disk I/O level and the acceptable reserve memory exceeded the maximum values.
As a result of these tests, it can be concluded that the hardware required for an 85-computer system should have the following components, at a minimum:
One CPU running at 733 MHz or higher
512 MB memory
One disk drive designated as C drive for the operating system and Consolidator
3×9.1 GB disk volume designated as D drive for the database and the database log file
100 Mbps network
This configuration is a recommended starting point. Results may vary depending on the types of managed computers, and the events/alerts they are generating.
It is expected that 85 managed computers will generate:
558.45 events per minute
One unsuppressed alert per 11 minutes
One suppressed alert per 3 minutes
12.24 performance counters per second
Best Practice: Capacity/Performance Recommendation
The usage of this system should not exceed the 85-managed computer limit. The CPU utilization is above 50 percent maximum steady state usage. Keep in mind that the testing did not include the use of Application Management Packs, which could easily add 25 percent additional utilization. This could very well exceed the maximum usage in steady state. Although memory shows ample reserve capacity, users should monitor the memory closely and add additional memory if the reserve drops below 5 percent. The disk I/Os also should be monitored, and additional disk capacity should be added if the I/O per second rate increases above 60. The management server should be a DDCAM (single database and DCAM unit), and should contain the system components listed in Table A-6.
Table A-6 Recommended Minimum System Capacity for up to 85 Managed Computers
Processor count |
1 |
Processor type |
550 MHz - 733 MHz Pentium III |
Memory |
512 MB |
Disk count |
5 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk designation MOM DB, and DB Log |
D drive, four disks or more depending on RAID factor |
MOM DB + DB log size |
Refer to the database calculator |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
280 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Test Results II: MOM 2000/SQL Server Typical Install, Medium Management Server, 85 to 200 Managed Computers
The next series of tests was performed on a medium management server. The database for these tests was 10 GB, with a 2 GB log file size. The database was installed on a six-disk array set to RAID 5. This system has twice the disk capacity of the previous management server, and is designed to monitor up to 250 managed computers. This section shows the capacity of the system, and makes recommendations for improving the system as tested where applicable. The test results show asymptotic capacity bounds of managed computers that can be adequately controlled by this server.
Management System Configuration
Table A-7 lists the system components used for this series of tests.
Table A-7 System Configuration for Medium Management Server Test
Server Name |
MOMTEST3 |
Processor count |
2 |
Processor type |
550 MHz Pentium 3 |
Memory |
1 GB |
Disk count OS |
1×9.1 |
Disk count DB |
6×9.1 GB |
Disk designation OS |
C drive (9.1 GB) |
Disk designation DB |
D drive (54.6 GB) |
Database size (+ Log) |
12 GB |
DB I/O capacity |
420 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Manufacturer |
Compaq |
Model |
350 ML |
MOM build |
0003, 0005, 00012.1, 00012.2 |
The MOM 2000/SQL Server Footprint
The data in Table A-8 shows the resources that were used to install the management server and the SQL Server database. SQL Server 2000 Enterprise was used for these tests.
Table A-8 System Resources for the Medium Management Server Test
MOM disk space requirement total |
230 MB |
MOM OnePoint working set average memory |
97.98 MB |
OnePoint threads |
68 |
SQL Server working set memory |
64.3 MB |
SQL Server database size disk space |
10 GB |
Database log disk space |
2 GB |
MS DTC log size disk space |
512 MB |
MOM 2000/SQL Server Test Results
These tests were performed to determine the optimum size of the management server needed to perform this task. Based on our last set of tests, we started the testing from 86 managed computers to find the upper limit.
Establishing the Baseline of the Medium Management Server
In the third test series, the management server was monitored while managing 85 computers at the low end. These findings are used to establish a baseline for its operation and the upper bounds of the management server limit. Table A-9 depicts the growth rate as more managed computers are added.
Table A-9 Effect on Server of Additional Managed Computers
Computer count |
% CPU utilization |
OnePoint service |
OnePoint working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
86 |
31.67% |
27.89% |
97,987,467 |
16.35 |
16.67 |
51.66% |
01.438% |
100 |
38.69% |
26.35% |
97,987,467 |
21.02 |
20.71 |
48.00% |
01.441% |
250 |
59.65% |
49.73% |
97,987,467 |
34.69 |
57.81 |
46.00% |
01.422% |
Table A-9 shows, like those in previous tests, a linear growth trend. The following figures show linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.
Figure A: -6 Utilization trend for a medium configuration
Figure A-6 shows a utilization trend for up to 250 managed computers. Notice the growth trend is again linear. The utilization for 250 managed computers is almost 60 percent on a two-processor system. This gives the user a reserve capacity of 40 percent, which is acceptable. We can also estimate how much effect the Application Management Packs or scripts might have on CPU utilization: they can add up to 25 percent overhead.
Figure A: -7 Approaching the limits of I/O count with medium configuration
Figure A-7 shows the I/O usage by the management server managing up to 250 managed computers. The maximum I/O count per second is almost 60, which, for this configuration, is very close to the maximum allowed count of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if you intend to increase the amount of managed computers. The disk queue in these tests was less than one, which indicates that there were no disk bottlenecks.
Figure A-8 shows the memory free space. In this set of tests, memory has been increased to 1 GB, and there is a little more than 40 percent available. This indicates that memory is not a problem or potential bottleneck.
Figure A: -8 Increasing memory helps performance with medium configuration
Figure A-9 shows the network usage of the management server. Notice that the 100 Mbps network is not affected very much by the activity of 250 managed computers. The total usage is about 1.6 percent in steady state. This indicates that a 10 Mbps network can handle this managed computer workload at about 16 percent usage. This further indicates that the managed computer scan, which can add up to three times the average volume, will not affect the network enough to cause a bottleneck.
Figure A: -9 Adding up to 250 managed computers does not affect network usage adversely
Sizing Recommendations for Medium Configurations
The management server used has enough capacity to manage 250 computers, but not a lot of reserve capacity on the disk end. This indicates that the I/Os per second will exceed the recommended I/O capacity if more than 250 computers are managed. In subsequent testing, the disk I/O level exceeded the maximum values that are considered acceptable at 260 managed computers. The CPU utilization was at almost 60 percent, which indicates an adequate reserve capacity. The memory had more than enough reserve capacity, as did the network.
As a result of this test, it can be concluded that the hardware required for a 250 managed computer system should be, at a minimum:
Two CPUs running at 550 MHz or higher
1 GB memory
One disk drive designated as the C drive for the operating system and Consolidator
6×9.1 GB disk volume designated as D drive for the database and database log file
100 Mbps network
It is expected that 250 managed computers will generate:
1642.5 events per minute
One unsuppressed alert per 4 minutes
One suppressed alert per minute
36 performance counters per second
Best Practice: Capacity/Performance Recommendation
The usage of this system should not exceed the 250 managed computer limit. The disk I/Os should also be monitored, and additional disk capacity should be added if the I/O per second rate increases above 60 per disk. As a best practice, the management server should be a DDCAM (one database server and DCAM server), and contain the following system components:
Table A-10 Recommended Minimum System Capacity for up to 250 Managed Computers
Database server |
Database unit |
Processor count |
2 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
1 GB |
Disk count |
7 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk designation MOM DB, and DB log |
D drive, six disks or more, depending on RAID factor |
MOM DB + DB log size |
Refer to the database calculator |
Disk size |
9.1 GB or higher |
Disk IO capacity |
420 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
DCAM server |
DAS/CAM unit |
Processor count |
2 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
1 GB |
Disk count |
1 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Test Results III: MOM 2000/SQL Server Typical Install, Large Management Server, 250 to 1,200 Managed Computers
The last series of tests was performed on a large management server with four processors and 2 GB of memory. The database size for these tests was 20 GB with a 5 GB log file size. The database was loaded on an eight-disk array set to RAID 5. The database was SQL Server 2000 Enterprise. This section will show the capacity of the system and make recommendations on how improvements of the system as tested were applicable. The test results will show asymptotic capacity bounds of managed computers that can be adequately controlled by this server.
Management System Configuration
The system tested for this evaluative series was configured as shown in Table A-11.
Table A-11 System Configuration for Large Management Server Test
Server name |
MOMTEST4 |
Processor count |
4 |
Processor type |
733 MHz Pentium III |
Memory |
2 GB |
Disk count OS |
1×9.1 |
Disk count DB |
8×9.1 GB |
Disk designation OS |
C drive (9.1 GB) |
Disk designation DB |
D drive (72.8 GB) |
Database size (with log) |
25 GB |
DB I/O capacity |
560 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Manufacturer |
Dell |
Model |
Power Edge 6300 |
MOM build |
0003, 0005, 00012.1, 00012.2, 66.7 |
Note The MOM Build 66.7 was used for verification testing.
The MOM 2000/SQL Server Footprint
The following data in Table A-12 show the resources used to install the management server and the SQL Server database. SQL Server 2000 Enterprise was used for these tests.
Table A-12 System Resources for Large Management Server Test
MOM disk space requirement total |
230 MB |
MOM OnePoint working set average memory |
98.81 MB-108.134 MB |
OnePoint threads |
77 |
SQL Server working set memory |
64.3 MB-212.23 MB |
SQL Server database size disk space |
20 GB |
MSDE database log disk space |
5 GB |
MS DTC log size disk space |
512 MB |
MOM 2000/SQL Server Test Results
These tests were performed to determine the optimum size of the management server for this task. Based on the last set of tests, the testing was started from 251 computers to find the upper limit. This system has much greater capacity than the previously tested computers. There are four CPUs, rated at 733 MHz. There are eight disks set to RAID 5.
Establishing the Baseline of the Large Management Server
In the fourth test series, the management server was monitored while managing 251 computers at the low end. These findings are used to establish a baseline for its operation. Table A-13 shows the growth rate as more managed computers are added:
Table A-13 Effect on Server of Additional Managed Computers
Computer count |
% CPU utilization |
OnePoint service |
One Point working set peak |
Disk reads/sec |
Disk writes/sec |
Memory free space |
Network busy |
---|---|---|---|---|---|---|---|
251 |
18.45% |
8.78% |
98,816,000 |
15.67 |
18.56 |
51.66% |
01.433% |
450 |
36.45% |
13.67% |
98,816,000 |
18.35 |
17.90 |
50.03% |
01.649% |
550 |
42.98% |
18.69% |
98,816,000 |
23.45 |
22.75 |
49.99% |
01.894% |
750 |
43.79% |
22.31% |
98,816,000 |
25.73 |
24.54 |
46.98% |
02.316% |
1000 |
46.45% |
35.95% |
106,046,916 |
29.90 |
29.67 |
42.67% |
02.678% |
1200 |
53.67% |
47.73% |
108,134,400 |
31.21 |
31.56 |
40.09% |
02.853% |
The growth trend shown in Table A-13 is again linear. The following figures demonstrate linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.
Figure A: -10 Adding CPUs decreases over-utilization in large configurations.
Note 700 Managed computer limit.
Figure A-10 shows the usage trend for up to 1200 managed computers. The growth trend shows that usage starts low, because the testing has gone from two CPUs to four CPUs, but then levels off at about 550 managed computers. As more managed computers are added, the utilization climbs to almost 54 percent. This gives the user a reserve capacity of more than 40 percent, which is acceptable. How much effect the Application Management Packs or scripts will have on CPU utilization can also be estimated: they can add up to 25 percent. In this case, the CPU power is present, but at 1200 managed computers it is close to 60 percent. Whether this is acceptable or not is determined by the frequency of use of these Management Packs and the number of scripts that are processed.
Figure A: -11 Increased I/O in large installation means more disk space may be needed.
Note 700 managed computer limit.
Figure A-11 shows the I/O usage by the management server managing up to 1200 managed computers. The maximum I/O count per second is at almost 63 I/Os per second, which, for this configuration, is above the maximum allowed count of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if the amount of managed computers grows to 1200.
Figure A-12 shows the disk queue in these tests. For 1200 managed computers it is very close to the maximum, but still tolerable. The maximum queue length for this series of tests is almost 2.5, which will not affect system integrity.
Figure A: -12 Disk queues get somewhat longer in 1200-computer configuration.
Note 700 managed computer limit.
Figure A-13 depicts the memory free space. In this set of tests, memory has been increased to 1 GB, and there is a little less than 40 percent available. This indicates that memory is not a problem or potential bottleneck in these tests.
Figure A: -13 Increase in memory assures adequate free space.
Note 700 managed computer limit.
Figure A-14 shows the network usage of the management server. Notice that the 100 Mbps network is not affected very much by 1200 managed computers. The total usage is about 2.85 percent in steady state. This indicates that a 10 Mbps network can handle the managed computer workload at about 28 percent usage. As in the previous tests, this further indicates that the managed computer scan, which can add up to three times the average volume, would still not affect the network enough to cause a bottleneck.
Figure A: -14 Network usage is adequate in large configurations.
Note 700 managed computer limit.
Sizing Recommendations for Large Configurations
These tests indicate that the management server used has enough capacity to manage 1200 computers, but it is very close to the limit on the reserve capacity. The CPU has enough reserve capacity, even with the additional burden of the Application Management Packs. As in the other tests, memory and network are not issues for concern.
With these test results as a basis, it can be concluded that the minimum hardware for a 1200-computer server should be:
4×733 MHz
2 GB memory
One disk drive designated as the C drive for the operating system and Consolidator
8×9.1 GB disk volume designated as the D drive for the database and database log file (or more)
100 Mbps network
This configuration is recommended as a starting point. Results may vary depending on the types of managed computers and the events or alerts they generate. 1200 managed computers should generate:
7884 events per minute
1.29 unsuppressed alerts per minute
5.34 suppressed alert/ per minute
173 performance counters per second
Best Practice: Capacity/Performance Recommendation
To prevent overloading the management server, the total number of managed computers for a large management server computer system should not exceed the 700-managed computer limit. Depending on additional Application Management Packs, the managed computers should be decreased to no more than 600 managed computers for a large number of Management Packs. The disk I/Os should also be monitored, and additional disk capacity should be added, if the I/O per second rate increases above 60 per disk. Furthermore, the total managed computer count for any configuration group should not exceed 1,000. An adequate management server to monitor 700 managed computers should be a three-server configuration consisting of the components listed in Table A-14.
Table A-14 Recommended Minimum System Capacity for Large Configurations
Database server |
1 |
Processor count |
4 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
2 GB |
Disk count |
9 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk designation MOM DB, and DB Log |
D drive, eight disks or more, depending on RAID factor |
MOM DB + DB log size |
Refer to the database calculator |
Disk size |
9.1 GB or higher |
Disk I/O capacity |
560read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
DCAM server |
DAS/CAM unit |
Processor count |
2 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
1 GB |
Disk count |
1 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
DCAM server |
DAS/CAM unit |
Processor count |
2 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
1 GB |
Disk count |
1 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Two DCAM systems are recommended for redundancy and fault-tolerance. It is recommended that the DCAM systems have no more than 350 managed computers assigned to each. Table A-15 lists the recommendations for a two-server configuration.
Table A-15 Recommendations for Two-Server Configuration
Database server |
1 |
Processor count |
4 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
2 GB |
Disk count |
9 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk designation MOM DB, and DB log |
D drive, eight disks or more depending on RAID factor |
MOM DB + DB log size |
Refer to the database calculator |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps, 12.5 MB |
DCAM server |
DAS/CAM unit |
Processor count |
4 |
Processor type |
550 MHz-733 MHz Pentium 3 |
Memory |
1 GB |
Disk count |
1 |
Disk designation OS, MOM |
C drive, one disk drive |
Disk space size |
9.1 GB or higher |
Disk I/O capacity |
70 read/write operations per second |
Network capacity |
100 Mbps (12.5 MB) |
Report Generation for MOM 2000
This section presents the findings for the reporting function of MOM 2000 in terms of resource consumption. The reports that were tested are shown in Figure A-15. They were selected out of the Windows NT/2000 reporting tree and include reports from Windows NT/2000 Capacity Planning, Windows NT/2000 Operations, and Windows NT/2000 Performance Analysis. These were selected because of the reputed high-resource consumption associated with generating these reports.
Figure A: -15 Reporting resource consumption with MOM 2000
Reports and CPU Utilization
The first series of test results concern CPU and disk usage, and the report generation process. Figure A-16 depicts CPU utilization as the result of managing 150 computers. The average utilization is 53.63 percent. The time interval for the measurement was 45 seconds. The figure shows no 100 percent utilization spikes for prolonged periods, indicating that there is even response time and processing distribution.
Figure A: -16 Reporting CPU utilization with MOM
Figure A-17 shows report generation managing the same 150 computers. The two CPU spikes of 100 percent each (on the right side of the graph) were prolonged for at least 2.5 minutes each, during which time no MOM Administrator console function was available. The red (dark) line on the lower section of the graph shows the OnePoint service utilization, which is not affected during report generation.
Figure A: -17 Reporting CPU spikes with MOM
Figure A-18 shows a normal disk read pattern for the same 150 managed computers. Even reads occur every five minutes, which is how often MOM reads and writes to the disks.
Figure A: -18 Reporting disk reads with MOM
The write pattern, as shown in Figure A-19, is identical to the read pattern. Both figures show no reporting for this period.
Figure A: -19 Reporting disk writes with MOM
Figure A-20 shows read activity with report generation for the same 150 managed computers. Notice that the disk is constantly running at almost 100 percent I/O activity. This made disk writes and reads almost impossible, and caused very long disk queues. The write activity for this time was the same.
Figure A: -20 Reporting disk reads with report generation
Figure A-21 shows the average disk queue length to be above 12 for the same measurement and time as in Figure A-20.
Figure A: -21 Reporting disk queuing with MOM
Conclusions & Best Practices: Reporting
These results show that producing reports and managing computers—at the same time and on the same computer—consumes CPU and disk resources. There is an estimated 30 percent overhead for report generation in CPU and disk utilization of 100 percent, and average disk queue lengths of 12 or more during this function.
Therefore, you should not attempt to generate reports while managing computers from the management server. The best way is to use a separate reporting system and a duplicate database, if possible, from which to generate reports. Another way to generate reports if you do not have a separate reporting system is to schedule reports at an appropriate time. Do not generate too many reports at a time; generating more than two reports at a time will overtax most systems.
Grooming the Database
The final series of tests concerned grooming the MOM database. These tests were run while the management server computer system was managing 250 computers to isolate the CPU and disk overhead caused by this function.
Grooming and System Reactions
CPU utilization accounted for almost 30 percent additional utilization on the management system. The average pre-grooming rate was approximately 35 percent. When grooming was turned on, the utilization averaged about 65 percent. This can be observed in Figure A-22.
Figure A: -22 Reporting CPU utilization with database grooming
Utilization was constant for at least 18 minutes, and then dropped off at the end to about 35 percent, which is what it was before the grooming was turned on (far right).
Figure A: -23 Reporting massive CPU usage
Figure A-23 indicates that the CPU queue length was more than twice the accepted maximum of two. This indicates massive CPU usage. Figure A-24 shows massive disk usage during this same time period.
Figure A: -24 Reporting massive disk utilization
Conclusions and Best Practices: Database Grooming
Database grooming is a necessary function for MOM. Testing was performed to move grooming out to later times, but that only caused longer grooming periods later on.
Although grooming took place during all tests, when isolated, the grooming function appears to use excessive resources. However, this really is not the case when all the other functions that MOM performs simultaneously are taken into account. As a best practice, you should place the database on a separate server with multiple CPUs and multiple disks to overcome the additional reads and writes that are caused by this function. It is also best to keep the primary database as small as possible, but not greater than 12 GB, with the database log files.
Appendix B: MOM SP1 Management Sizer
This MOM SP1 Management Sizer recommends database sizes according to the data that you input; there are no preset limitations to what the MOM SP1 Management Sizer can recommend. Any recommendation that the MOM SP1 Management Sizer makes that is over 30 GB is an unsupported database configuration. You should adjust grooming parameters to yield a supported database size, which is 30 GB or less. Any recommendation that is over 30 GB could be considered as a database warehouse recommendation for long-term reporting and storage.
To use the MOM SP1 Management Sizer, type the targeted managed computer count in the highlighted yellow area to the right of Managed Computer Count. You can also adjust the grooming parameters to determine the appropriate database size. To change the grooming parameters, type different grooming parameter values in the highlighted yellow areas to the right of the parameters in the Enter Groom Factor section. The MOM SP1 Management Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes.
You can download the MOM SP1 Management Sizer as part of the MOM SP1 Performance and Sizing Kit, which is available from the MOM Web site at https://www.microsoft.com/mom/techinfo/deployment/default.asp.
Appendix C: SQL Server Installation for Microsoft Operations Manager Usage
The correct installation of the Microsoft SQL Server database is very important. If configured incorrectly, the consequences can be poor performance and violation of the two-minute alert insertion Service Level Agreement. The following can assist you in making the proper choices for installation.
The SQL Server database should never be installed on the same disk drive as the operating system, or the MOM subsystem. In addition, partitioning the C drive will not afford better performance, because the database is still on the same disk drive as the operating system and MOM, and the drive is still governed by the same disk controller. The MOM database should be placed on its own disk drives, and should be striped on more than one disk. The database log file should also be placed on a separate disk for further efficiency. Figure C-1 depicts how this is accomplished for a medium-sized installation of 250 managed computers:
Figure C: -1 Installing the SQL Server database
In addition, configuring SQL Server correctly for memory usage is crucial. In limited-memory situations, the SQL Server processes can overwhelm the memory usage and cause page faults to occur. To prevent this from happening, you can limit the SQL Server memory usage by configuring the memory to be limited, instead of dynamic. This can be accomplished as shown in Figure C-2.
Figure C: -2 Configuring SQL Server memory
To configure SQL Server for memory usage
Start SQL Server Enterprise Manager.
Navigate to the local computer item, right-click it, and then click Properties.
In the SQL Server Properties dialog box, click the Memory tab.
On the Memory tab, click Use a fixed memory size, and use the slider to set an appropriate memory size.
Appendix D: Counter Definitions
Object name |
Property name |
Definition |
---|---|---|
System |
Processor Queue Length |
The number of threads in the processor queue. There is a single queue for processor time, even on computers with multiple processors. Unlike the disk counters, this counter counts ready threads only, not threads that are running. A sustained processor queue of greater than two threads generally indicates processor congestion. This counter displays the last observed value only; it is not an average. This counter is necessary to see what kind of contention is occurring on the CPUs of the system. An increase in this value to greater than 2 indicates that an additional CPU is needed, or that some of the workload should be considered for relocation. This counter is a general purpose queue counter. |
|
Processes |
The number of processes in the computer at the time of data collection. Notice that this is an instantaneous count, not an average over the time interval. Each process represents the running of a program. This counter is useful for keeping track of the running threads in a system. This counter must be averaged over the delta for the measurement statistics to be useful. This counter is a general purpose incremental counter. |
|
Threads |
The number of threads in the computer at the time of data collection. Notice that this is an instantaneous count, not an average over the time interval. A thread is the basic executable entity that can execute instructions in a processor. This counter is useful for keeping track of the running threads in a system. This counter must be averaged over the delta for the measurement statistics to be useful. This counter is a general purpose incremental counter. |
Processor |
% Interrupt Time |
The percentage of time the processor spent receiving and servicing hardware interrupts during the sample interval. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network adapters and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended during interrupts. Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity. This counter displays the average busy time as a percentage of the sample time. This counter is useful for keeping track of the time spent the CPU spends processing Interrupts. Considering that this counter can reveal gradual or sudden rises in interrupt time, which indicates free CPU time for processing workloads, it can be used as a general purpose counter. |
|
% Processor Time |
The percentage of time the processor is executing a non-Idle thread. This counter was designed as a primary indicator of processor activity. It is calculated by measuring the time that the processor spends executing the thread of the Idle process in each sample interval, and subtracting that value from 100 percent (each processor has an Idle thread that consumes cycles when no other threads are ready to run). It can be viewed as the percentage of the sample interval spent doing useful work. This counter displays the average percentage of busy time observed during the sample interval, which is calculated by monitoring the time the service was inactive, and then subtracting that value from 100 percent. This counter is useful for keeping track of the time spent by the CPU processing anything. Considering that this counter displays the amount of time the CPU is busy, it is very useful in determining system usage in capacity studies, and performance studies. The calculation for this counter is: 1–Idle_Time |
Memory |
Available Bytes |
The amount of physical memory available to processes running on the computer, in bytes. It is calculated by adding the space on the Zeroed, Free, and Standby memory lists. Free memory is ready for use. Zeroed memory is pages of memory filled with zeros to prevent later processes from seeing data used by a previous process. Standby memory is memory removed from the working set (physical memory) of a process on its way to disk, but it is still available to be recalled. This counter displays the last observed value only; it is not an average. This is a general purpose counter used to track memory usage. If the user wants to produce a “% Memory Free Space” counter the user must know the amount of memory the computer has, and then perform the calculation: Available Bytes/Total Memory = % Memory Free Space Because this is a snapshot value, this counter must be averaged over the Available Bytes counter in the report database for many data points to provide an accurate value of Available Bytes or Memory Free Space Counters. Note:This is a composite counter. |
|
Page Faults/sec |
The overall rate at which faulted pages are handled by the processor. It is measured in numbers of pages faulted per second. A page fault occurs when a process requires code or data that is not in its working set (its space in physical memory). This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory). Most processors can handle large numbers of soft faults without consequence. However, hard faults can cause significant delays. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval. This is a general purpose counter used to track memory usage by the Page Faults that occur within the system. An indication of increased memory use would be a gradual or radical increase of Page Faults per second. The page fault interrupt is also indicated in the processor page faults and is part of that number. |
Network Interface |
Bytes Total/sec |
The rate at which bytes are sent and received on the interface, including framing characters. This is a general purpose counter. It shows network traffic, which can have a negative effect on transaction throughput, or message throughput. This counter can either be collected or calculated by using the calculation: Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec. |
|
Current Bandwidth |
An estimate of the current bandwidth in bits per second. For interfaces that do not vary in bandwidth, or for those where no accurate estimation can be made, this value is the nominal bandwidth. This is a general purpose counter It shows network size in bits per second and is used in calculations pertaining to the network usage. |
|
% Network Busy |
A calculated counter that shows network usage. This is a general purpose composite counter. This counter must be calculated by using the following calculation: % Network Busy = Bytes Total/sec/Current Bandwidth Bytes where Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec and Current Bandwidth Bytes = Current Bandwidth/8 |
|
Output Queue Length |
Output Queue Length is the length of the output packet queue (in packets). If this is longer than 2, delays are being experienced, and the bottleneck should be found and eliminated if possible. Because the requests are queued by NDIS in this implementation, this will always be 0. This is a general purpose counter. |
Physical Disk |
% Disk Time |
The percentage of elapsed time that the selected disk drive is busy servicing read or write requests. This is a general purpose counter. This counter can either be collected or calculated using the following equation: % Disk Time = % Disk Read Time+% Disk Write Time or 1 - % Idle Time |
|
Avg. Disk Queue Length |
The average number of both read and write requests that were queued for the selected disk during the sample interval. This is a general purpose counter. |
Server |
Bytes Total/sec |
The number of bytes the server has sent to and received from the network. This value provides an overall indication of how busy the server is. This is a general purpose counter. |
|
% Server Network Busy |
A calculated counter that shows network usage. This is a general purpose counter. This counter must be calculated by using the following equation: % Server Network Busy = Bytes Total/sec/Network Interface_Current Bandwidth Bytes where Network Interface_Current Bandwidth Bytes = CurrentBandwidth/8 |
Server Work Queues |
Bytes Received/sec |
The rate at which the server is receiving bytes from network clients on this CPU. This value is a measure of how busy the server is. This is a general purpose counter. |
|
Current Clients |
Current Clients is the instantaneous count of the clients being serviced by this CPU. The server actively balances the client load across all the CPUs in the system. This value will always be 0 in the Blocking Queue instance. This is a general purpose counter. |
|
Queue Length |
Queue Length is the current length of the server work queue for this CPU. A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time. This is a general purpose counter. |
SQL Server:Buffer Manager |
Buffer Cache Hit Ratio |
Percentage of pages that were found in the buffer pool cache without having to incur a read from disk. This is a general purpose counter. |
SQL Server:Databases |
Active Transactions |
Number of active transactions for the database. This is a general purpose counter. |
|
Transactions/sec |
Number of transactions started for the database. This is a general purpose counter. |
SQL Server:General Statistics |
User Connections |
Number of users connected to the system. |
Process |
% Processor Time |
The percentage of elapsed time that all of the threads of this process used the processor to execute instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count. On multiprocessor computers, the maximum value of the counter is 100 percent times the number of processors. This is a general purpose counter. |
|
Page Faults/sec |
The rate at which Page Faults occur in the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This will not cause the page to be retrieved from disk if it is on the standby list and already in main memory, or if it is in use by another process with which the page is shared. This is a general purpose counter. |
Thread |
% User Time |
The percentage of elapsed time that this thread has spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows NT Executive, Kernel, and device drivers. Unlike some early operating systems, Windows NT uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. These subsystem processes provide additional protection. Therefore, some work done by Windows NT on behalf of your application might appear in other subsystem processes in addition to the privileged time in your process. This is a general purpose counter. |