MOM 2000 SP1 - Performance and Sizing

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Event and Performance Management for Windows®-based Systems

Microsoft Corporation

September 2003

Click here to download a copy of this paper and the Management Server Sizer

Abstract

This technical paper describes a process for testing Microsoft® Operations Manager 2000 (MOM) Service Pack 1 (SP1) and recommends a suitable computer system size with enough reserve capacity to monitor a specific number of managed computers. It also provides information about the expected performance of that computer system while managing these computers.

On This Page

Prefatory Note
Introduction
MOM SP1 Test Parameters
MOM SP1 Test Results
MOM SP1 Test Results I: Small Configuration - Single DDCAM (MOM Database and DCAM)
MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Test Results II: Large Configuration - Separate Database Server and Single DCAM
The MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs
The MOM/SQL Server Disk Requirements
Database and Data Workload Sizing
Microsoft Operations Manager/SQL Server Test Results
Best Practice: Capacity/Performance Recommendation
MOM SP1 Management Packs
Appendix A: Test Results For Microsoft Operations Manager 2000 RTM
Appendix B: MOM SP1 Management Sizer
Appendix C: SQL Server Installation for Microsoft Operations Manager Usage
Appendix D: Counter Definitions

Prefatory Note

All tests referred to in this report were designed to determine the minimum computer hardware required for a management server to perform various Microsoft Operations Manager 2000 (MOM) tasks. The MOM test team conducted these tests in June 2003 using MOM Service Pack 1 (MOM SP1).

Note: The MOM test team originally conducted tests in June 2001 using Microsoft Operations Manager 2000 RTM version (MOM RTM). With MOM SP1, data is processed to the MOM database differently; therefore, it is not possible to make direct comparisons of the test results. For the results of the original MOM RTM tests, see Appendix A: Test Results For Microsoft Operations Manager 2000 RTM later in this paper.

The computer systems described herein might not necessarily represent the ideal configuration. The intent is to provide a starting point from which to specify the management server, with the knowledge that the base system you are specifying has been tested and found to be able to perform a given level of tasks.

This report in no way represents or is meant to define an absolute system configuration for any number of managed computers. Instead, this report is meant to show findings and a possible starting point for you to specify the management server. Calculators are provided in the appendices to help you to calculate the database size, and to show the expected input/output activity that might be found on a management server. For more information, refer to the appendices later in this paper.

Testing took into account all events, alerts, and performance counters that occurred during the peak operation of managed computers. This testing did not take into account any Application Management Packs that you might place into service, or the use of any services or scripts to correct certain situations. Although Management Packs were not used in the testing, processing rules were used to generate events, alerts, and performance counters per day at level that is higher than any reported by MOM enterprise customers.

For the MOM SP1 testing, the test workload was determined by collecting data from approximately 20 enterprise customers and from the Microsoft Operations and Technologies Group (OTG). The data collected showed a significant drop in the number and rates of events, alerts, and performance counters for MOM SP1. This is due to the tuning of Management Packs and increased database efficiencies. The rate of simulated network line and database usage the MOM test team used to test MOM SP1 far exceeded the actual rates collected from any of the external or internal users. For MOM SP1, actual managed computers were used to generate the network line and database usage workloads, rather than being simulated as was the case with the original MOM RTM testing.

Introduction

MOM is a management system for monitoring managed computers in an organization. MOM SP1 has been tested and it has been proven, based on many customers, that MOM SP1 scales to the published supported numbers. However, the limits for MOM can vary depending on many variables that are discussed in this paper. This paper describes the testing process used to recommend a suitable management server size with enough reserve capacity to smoothly manage a specific number of computers without putting the MOM SP1 computer systems at risk. It also provides information about the expected performance of the MOM SP1 computer system while managing these computers. Specifically, this paper answers questions such as:

  • How large must the management server be in terms of hardware resources?

  • How large is the overall footprint of MOM SP1?

  • How large should the MOM SP1 database be?

  • What are the system requirements needed to run MOM SP1 effectively?

  • What is the expected disk activity on the MOM SP1 database and database server?

  • What is the expected CPU usage of the MOM SP1 agent on a managed computer?

How might the recommendations contained in this paper be useful? Consider the performance and sizing considerations presented in the following scenarios.

Scenario 1

The systems managers of an online-order-entry environment decide to license MOM SP1 to manage 150 servers worldwide. They determine how large the management server computer system should be and decide to use a single computer for the task. With no experience in MOM capacity planning, it is difficult for them to determine the correct size for the management server. They order a computer system that is much too small for the job. They also learn that they need a much larger-capacity network to accommodate the MOM workload traffic. They will now lose time ordering additional system hardware to rectify this situation.

Scenario 2

The systems managers of an online-order-entry environment decide to license MOM SP1 to manage 1,000 servers worldwide. They determine how large the management server computer system should be and decide to use a series of tiered management systems (alert forwarding) for the task. Unlike in Scenario 1, this company will lose large amounts of money if the servers are not managed correctly, or if they go offline for any reason.

In such a large environment, deciding how large the first tier configuration group management servers should be, and how large the management server in the master configuration group should be compounds the complexity of the sizing considerations. Again, with limited or no experience in MOM capacity planning, it is very difficult for the system managers to determine the correct size for the tier one and tier two management servers. As a result, they order computer systems that are too small for the job. In the process, they also find that they need a separate management network to accommodate the MOM workload traffic. They will now lose time ordering the additional system hardware to rectify the problems.

Conclusion

Careful consideration of the performance and sizing of the hardware systems that support MOM SP1 is critical to the successful implementation of MOM to manage computers in your organization. Although there is no absolute system configuration for any number of managed computers, this technical paper presents the results of performance and sizing testing for MOM SP1 in environments of various sizes. You can use the findings in this paper and the MOM SP1 Management Server Sizer as a starting point to help you to determine the appropriate performance and sizing considerations for MOM in your organization. For more information about the MOM SP1 Management Server Sizer, see “Appendix B: MOM SP1 Management Sizer” later in this paper.

MOM SP1 Test Parameters

This section presents the key factors for the MOM SP1 performance and sizing testing — the hardware used, the scope and goals for the testing, the tools used, and how the test workload was calculated. Later sections present the results of testing.

Hardware Test Environment

This series of tests included three different MOM configuration scenarios, each with an appropriate range of managed computers.

  1. Small configuration - Single DDCAM (MOM database and DCAM), with 20, 50, 85, 140, and 200 managed computers.

  2. Large configuration - Separate database server, single DCAM, with 250, 500, 700, and 1000 managed computers.

  3. Enterprise configuration - Separate database server; two DCAMs, with 700 and 1000 managed computers.

The detailed systems information for each of these scenarios, along with the test results, is described in later sections.

The network used in all tests had a line capacity of 100 Mbps, which represents the highest available bandwidth for most organization’s production environments. Lines with greater capacity, such as T1, are not widely used by a large part of the user community. The hardware was set up in a single-tier configuration. Multitiered configurations were not tested.

For all test scenarios, the configuration for the managed computers was the same, as described in Table 1.

Table 1 Managed Computer Configuration

System component

Description

Processor count

1

Processor type

1000 MHz Pentium 4

Memory

512 MB

Disk count

1

Disk designation OS

Drive C

Disk size

7.85 GB (6 GB free space)

Network capacity

100 Mbps (12.5 MB)

Workload Environment

The data workloads used in testing each of the configurations for MOM SP1 was consistently higher than workloads used in MOM RTM testing and higher than the actual workloads reported by the largest enterprise customers. For example, alerts delivered to the database for MOM SP1 were 0.00833 alerts per minute per computer at the 1000-managed computer level for the enterprise configuration. This compares to MOM RTM testing at the rate of 0.00445 per minute per computer, which means that the rate was twice as high for MOM SP1. Table 2 and Table 3 show the workload levels used in testing MOM SP1.

Table 2 Data Workload Levels per Day Used for Testing MOM SP1

Managed computer count

Alerts per day

Events per day

Performance counters per day

20

2,250

100,000

250,000

50

2,250

100,000

250,000

85

2,250

100,000

250,000

140

2,250

100,000

250,000

200

2,250

100,000

250,000

250

9,000

400,000

400,000

500

9,000

400,000

400,000

700

9,000

400,000

400,000

1000

12,000

600,000

600,000

Note: The values in Table 2 far exceed any numbers reported by the largest enterprise customers for MOM SP1.

Table 3 Data Workload Rates per Minute per Managed Computer Used for Testing MOM SP1

Managed computer count

Alerts per minute per managed computer

Events per minute per managed computer

Performance counters per minute per managed computer

20

.078125

3.470

8.680

50

.031200

1.380

3.472

85

.018350

0.816

2.042

140

.011100

0.049

1.240

200

.007810

0.347

0.868

250

.025000

1.111

1.111

500

.012500

0.555

0.555

700

.008930

0.397

0.397

1000

.008330

0.417

0.417

Note: The values in Table 3 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.

Integrated Grooming

MOM SP1 uses an integrated grooming feature, which means that each time MOM SP1 performs a database insert for an event, alert, or a performance counter, it also deletes up to 4,000 records by default according to the grooming parameters that you have established. As a result, the need to periodically groom the MOM database is substantially reduced. Another result of integrated grooming is that ongoing CPU utilization and total I/Os are higher with MOM SP1 than with MOM RTM. However, when you groom the MOM database, you do not experience high CPU utilization, which often reached 100 percent for an extended period of time with MOM RTM. Table 4 shows the alert latency and grooming data for each level of managed computers tested.

Table 4 Alert Latency and Grooming Data

Managed computer count

Alert latency (seconds)

Events groomed per day

Performance counters groomed per day

20

42.32

3,960,000

21,144,000

50

47.37

3,936,000

20,760,108

85

44.66

4,656,000

19,632,108

140

49.98

3,816,000

19,632,108

200

51.27

4,752,000

21,456,108

250

83.35

2,152,110

13,174,932

500

121.18

2,280,000

13,294,932

700

119.19

2,160,000

15,190,932

1000

121.23

2,312,400

19,992,264

Note: For all tests results shown in Table 4, the duration of testing was four hours. Alerts were not groomed during this testing because they did not accumulate fast enough to require grooming.

When the workload was increased for the 250 to 1000 managed-computers level, grooming rates dropped off. This is because the DCAM is performing more database inserts, and therefore it performs integrated grooming at a slightly lower percentage to prevent insert latency. For each insert, the DCAM uses an algorithm to calculate the level of integrated grooming, depending on a number of factors, such as how many inserts are in the queue.

Scope of Testing

The scope of testing determines how well MOM SP1 scales, what management server configuration best manages the computers, and what the maximum number of computers is that a single management server can manage.

For each test configuration, the test procedure was as follows:

  1. Set up the MOM database server by using the required database backup and set up the DCAM(s).

  2. Set up the required number of managed computers for the DCAM(s). Flushed the queues if MOM agents had already been installed on the managed computers.

  3. Stopped and restarted the OnePoint service and the MOM database services, in the following order:

    • Stopped OnePoint service on the DCAM(s)

    • Stopped MSSQLSERVER and SQLSERVERAGENT services on the MOM database

    • Started MSSQLSERVER and SQLSERVERAGENT service on the MOM database

    • Flushed the queues on the DCAM(s)

    • Started the OnePoint service on the DCAM(s)

  4. Started collecting the performance counters on the MOM database server and DCAM(s).

  5. Started the specified alert, event, and performance counter workload on each of the managed computers, which began the test.

  6. Set up the grooming jobs to run once each hour for the last 2 hours of the test.

  7. Ran the test for 4 hours.

Note: MOM SP1 Build 1300 (RTM) was used for all test scenarios.

Performance Monitor Counter Metrics

Table 5 lists the primary performance counters that were collected and used for this analysis. For a complete list and description of the counter functions, see “Appendix D: Counter Definitions” later in this paper.

Table 5 Primary Counters Used in Testing

Counter object

Counter property

Instances

Processor

% Processor Time average

Total

 

% Processor Time peak

 

 

Interrupts/sec

 

Process

% Processor Time

OnePoint process

 

Working Set

SQL Server processes

 

Thread Count

 

 

IO Read Operations/sec

 

 

IO Write Operations/sec

 

Memory

Available Bytes

Total

 

Page Faults/sec

 

 

% Committed Bytes In Use

 

Network Interface

Bytes Total/sec

100 Mbps network adapter card

 

Current Bandwidth

 

Physical Disk

Disk Reads/sec

Drive C

 

Disk Writes/sec

Database disk drives

 

Avg. Disk Queue Length

 

System

Processor Queue Length

Total

SQL Server:Databases

Transactions/sec

OnePoint database

SQL Server:Buffer Manager

Buffer Cache Hit Ratio

OnePoint database

Calculated counters

Counter property

Calculations

% Network Busy

Bytes Total/sec/Current Bandwidth Bytes

Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec

Current Bandwidth Bytes = Current Bandwidth/8

Memory Free Space

Available KBytes/Total Physical Memory

 

Note: Table 5 establishes the core performance-counter collection metrics. Other counters might be used for further analysis. The Physical Disk, % Disk Time counter was not used because it gives false readings on Redundant Array of Independent Disks (RAID) arrays. All the database disk arrays used for these tests were RAID 10.

MOM SP1 Test Results

The following sections show test results for a range of managed computers in different-sized MOM configurations.

MOM SP1 Test Results I: Small Configuration - Single DDCAM (MOM Database and DCAM)

The first series of tests was performed on a small management server. This system includes the basic management server that manages a few computers. This section shows the capacity of the system. The test results also show maximum capacity, in terms of the upper bounds of managed computers that can be adequately controlled by this server configuration.

Hardware Test Environment - Small Configuration, Single DDCAM

Table 6 DDCAM System Configuration

System Component

Description

Processor count

4

Processor type

550 MHz Pentium 3

Memory

768 MB

Disk count OS

1

Disk count DB

6 (RAID 10)

Disk count log file

1

Disk designation OS

C drive (8.46 GB)

Disk designation DB

D drive (101.6 GB, 37.1 GB free space)

Disk designation log file

E drive (26 GB)

Disk I/O capacity - Reads

750 read operations per second

Disk I/O capacity - Writes

375 write operations per second

Network capacity

100 Mbps (12.5 MB)

MOM Build

MOM SP1 Build 1300 (RTM)

MOM/SQL Server Disk Requirements

Table 7 shows the resources needed to install the management server, along with the SQL Server database. Microsoft® SQL Server™ 2000 Standard was used for these tests.

Table 7 Disk Requirements for Small Configuration - Single DDCAM

MOM disk space requirement total

230 MB

MOM OnePoint working set average memory

53.81 MB-79.87 MB

OnePoint threads (avg.)

71

SQL Server working set memory

604.57 MB-645.37 MB

SQL Server database size (disk space)

6.63 GB

Database log disk space

1 GB

MS DTC log size disk space

512 MB

Database and Data Workload Sizing

Table 8 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 20 to 200). Tables 9 and 10 show the data workloads used in the tests for this configuration.

Table 8 Pre-test Database Table Sizes for the Small Configuration

Rows in Alert table

88,574

Rows in Event table

2,252,322

Rows in SampledNumericData table

3,720,018

Table 9 Data Workload Levels per Day - Small Configuration (for 20 to 200 managed computers)

Alerts per day

2,250

Events per day

100,000

Performance counters per day

250,000

Note: The data workload shown in Table 9 was held constant for each level of managed computers (from 20 to 200). This workload represents higher workloads than the levels reported by any of the enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken by the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.

Table 10 Data Workload Rates per Minute per Managed Computer Used for the Small Configuration

Managed computer count

Alerts per minute per managed computer

Events per minute per managed computer

Performance counters per minute per managed computer

20

.078125

3.470

8.680

50

.031200

1.380

3.472

85

.018350

0.816

2.042

140

.011100

0.049

1.240

200

.007810

0.347

0.868

Note: The values in Table 10 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.

In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00781 per minute per computer at the 200-managed computer level. This is approximately twice as high as the MOM RTM test workload levels. At the 20-managed computer level for MOM SP1, the average alerts delivered to the database were 0.0781 alerts per minute per computer. This represents a workload over17 times higher than the MOM RTM testing workload. This means that the alert workloads used for the MOM SP1 performance testing of this configuration, range from 2 times to 17 times as high as the MOM RTM testing levels.

Microsoft Operations Manager/SQL Server Test Results

These tests were performed to find what size the management server should be, in terms of hardware, to perform a set level of work. Testing was started from 20 managed computers to find the upper limit. In this test series, the MOM DDCAM was monitored while managing 20 computers at the low end. These findings were used to establish a baseline for the DDCAM operation. Table 11 depicts the growth rate as more managed computers are added.

Table 11 Effects on DDCAM of Additional Managed Computers - Small Configuration

Managed computer count

% CPU utilization

OnePoint service utilization

OnePoint working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

20

22.56%

9.00%

53,809,957

288.92

268.84

57.02%

0.38%

50

27.64%

20.22%

54,349,396

301.69

273.15

55.48%

0.38%

85

30.91%

25.91%

56,505,485

306.24

258.50

55.19%

0.38%

140

38.19%

36.76%

70,732,436

276.75

259.49

55.18%

0.39%

200

42.31%

44.01%

79,874,416

253.74

242.19

54.64%

0.40%

Figures 1 through 5 graphically present information from Table 11.

Cc750905.mmpfsz01(en-us,TechNet.10).gif

Figure 1: Adding managed computers increases CPU utilization

New for MOM SP1 – Increased Managed Computer Capacity for DDCAMs

Notice in Figure 1 that the CPU utilization on this DDCAM, which is a 4-processor 550 MHz system, varies between 22 percent utilization for 20 managed computers to 42 percent for 200 managed computers. With the new multi-gigahertz processors, you can easily manage 200 computers with a 2-processor system.

Cc750905.mmpfsz02(en-us,TechNet.10).gif

Figure 2: Increasing I/O has a negative affect on disk performance. In this case, increasing disk queues cause increased latency (see Figure 3)

For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total peak I/O rate total for 200 managed computers was 116.60 per sec per computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.

Cc750905.mmpfsz03(en-us,TechNet.10).gif

Figure 3: Disk queues remain approximately ten for MOM SP1

Figure 3 displays queue lengths of approximately ten. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.

Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

Cc750905.mmpfsz04(en-us,TechNet.10).gif

Figure 4: Free memory space is adequate at all managed computer levels

Memory usage for the MOM DDCAM, which includes DCAM and database activity, was as high 56 percent free memory space with 768 MB of memory. In all of these tests, MOM SP1 uses approximately the same amount of memory consistently.

Cc750905.mmpfsz05(en-us,TechNet.10).gif

Figure 5: Network utilization remains very low at all managed computer levels

Network utilization has risen predictably from the 20 managed computer count to the 200 managed computers count, with a high point of 0.40 percent utilization. This is consistent with what has been seen throughout the series of testing and consistent with customer reports about network usage. This utilization factor reflects only steady-state usage and does not reflect Management Pack or MOM agent pushdowns.

Best Practice: Capacity/Performance Recommendation

Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

MOM SP1 Test Results II: Large Configuration - Separate Database Server and Single DCAM

The second series of tests was performed on a larger management server (DCAM), with the MOM database installed on a separate computer. The test results show the maximum number of managed computers that can be adequately controlled by this server configuration.

Hardware Test Environment - Large Configuration, Separate Database, Single DCAM

Table 12 Database System Configuration for the Large Configuration

System component

Description

Processor count

4

Processor type

550 MHz Pentium 3

Memory

768 MB

Disk count OS

1

Disk count DB

6 (RAID 10)

Disk count log file

1

Disk designation OS

C drive (8.46 GB)

Disk designation DB

D drive (101.6 GB, 37.1 GB free space)

Disk designation log file

E drive (26 GB)

Disk I/O capacity - Reads

750 read operations per second

Disk I/O capacity - Writes

375 write operations per second

Network capacity

100 Mbps (12.5 MB)

MOM Build

MOM SP1 Build 1300 (RTM)

Table 13 DCAM System Configuration for the Large Configuration

System component

Description

Processor count

2

Processor type

800 MHz Pentium 3

Memory

512 MB

Disk count

1

Disk designation

C drive (14.6 GB, 12.5 GB free space)

Network capacity

100 Mbps (12.5 MB)

MOM Build

MOM SP1 Build 1300 (RTM)

The MOM/SQL Server Disk Requirements

Table 14 and Table 15 show the resources needed to install the DCAM and the SQL Server database. SQL Server 2000 Standard was used for these tests.

Table 14 Database Disk Requirements for Large Configuration

Database server:

MOM disk space requirement total

SQL Server working set memory (average)

SQL Server database size (disk space)

Database log disk space

MS DTC log size disk space

DCAM:

MOM disk space requirement total

MOM OnePoint working set memory (average)

OnePoint threads (average)

Database and Data Workload Sizing

Table 15 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 250 to 1,000). Tables 16, 17 and 18 show the data workloads used in the tests for this configuration.

Table 15 Pre-test Database Table Sizes for the Large Configuration

Rows in Alert table

97,833

Rows in Event table

3,303,168

Rows in SampledNumericData table

2,947,496

Table 16 Data Workload Levels per Day - Large Configuration (for 250, 500, and 700 managed computers)

Alerts per day

9,000

Events per day

400,000

Performance counters per day

400,000

Note: The data workload shown in Table 16 was held constant for the 200, 500, and 700 levels of managed computers. This workload represents higher workloads than the levels reported by any of the enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken by the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.

Table 17 Data Workload Levels per Day - Large Configuration (for 1,000 managed computers)

Alerts per day

12,000

Events per day

600,000

Performance counters per day

600,000

Table 18 Data Workload Rates per Minute per Managed Computer for the Large Configuration

Managed computer count

Alerts per minute per managed computer

Events per minute per managed computer

Performance counters per minute per managed computer

250

.02500

1.111

1.111

500

.01250

0.555

0.555

700

.00893

0.397

0.397

1000

.00833

0.417

0.417

Note: The values in Table 18 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.

In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00833 per minute per computer at the 1,000-managed computer level. This is approximately twice as high as the MOM RTM test workload levels. At the 250-managed computer level for MOM SP1, the average alerts delivered to the database were 0.025 alerts per minute per computer. This represents a workload nearly 6 times higher than the MOM RTM testing workload. This means that the alert workloads used for the MOM SP1 performance testing of this configuration, range from 2 times to 6 times as high as the MOM RTM testing levels.

Microsoft Operations Manager/SQL Server Test Results

These tests were performed to find what size the DCAM and the database server should be, in terms of hardware, to perform a set level of work. Testing was started from 250 managed computers to find the upper limit.

In this test series, the DCAM and the database server were monitored while managing 250 computers at the low end. Tables 19 and 20 depict the effect on the DCAM and the database server, respectively, as more managed computers are added.

Table 19 Effect on DCAM of Additional Managed Computers - Large Configuration

Managed computer count

% CPU utilization

OnePoint service utilization

OnePoint working set average

Memory free space

Network busy

250

36.99%

38.91%

164,009,537

81.12%

0.07%

500

49.30%

48.51%

192,642,270

77.52%

0.16%

700

50.70%

48.57%

192,856,224

76.18%

0.19%

1,000

62.23%

54.65%

194,204,590

70.43%

0.23%

Table 20 Effect on Database Server of Additional Managed Computers - Large Configuration

Managed computer count

% CPU utilization

SQL Server service utilization

SQL Server working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

250

17.96%

65.38%

703,313,169

56.42

166.86

56.59%

0.47%

500

22.88%

76.22%

709,683,340

154.39

264.32

55.02%

0.43%

700

22.28%

72.80%

714,549,239

133.46

265.90

56.03%

0.45%

1,000

24.37%

80.05%

720,452,301

187.55

292.35

54.38%

0.46%

Figures 6 through 13 graphically present information about the MOM database server and DCAM from Table 19 and Table 20.

Cc750905.mmpfsz06(en-us,TechNet.10).gif

Figure 6: Adding managed computers increases CPU utilization on the database server

Even with the 550 MHz, 4-processor system that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 25 percent. With the new, more powerful multi-gigahertz processors, it is expected that this utilization would be drastically reduced.

Cc750905.mmpfsz07(en-us,TechNet.10).gif

Figure 7: Adding managed computers increases CPU utilization on the DCAM

Even with the 800 MHz, 2-processor system that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 61 percent. With the new, more powerful multi-gigahertz processors, it is expected that that this utilization would be drastically reduced.

Cc750905.mmpfsz08(en-us,TechNet.10).gif

Figure 8: Increasing I/O has a negative affect on disk performance. In this case, increasing disk queues cause increased latency (see Figure 9)

Note: Information about I/O activity on the DCAM was not included because it was inconsequential and only reflects the operating system and MOM activity.

For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total I/O peak rate for 1,000 managed computers was 59.57/sec/computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.

Cc750905.mmpfsz09(en-us,TechNet.10).gif

Figure 9: Disk queues increase as managed computers are added

Figure 9 displays queue lengths of up to 16. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.

Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

Cc750905.mmpfsz10(en-us,TechNet.10).gif

Figure 10: Free memory space is adequate at all managed computer levels

Memory usage for the MOM database server was 55 percent free memory space with 768 MB of memory, which is consistent with the enterprise configuration test results (see the MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs section later in this paper). Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.

Cc750905.mmpfsz11(en-us,TechNet.10).gif

Figure 11: Free memory space is adequate at all managed computer levels

Memory usage for the DCAM at the 250-managed computer count was 80 percent free memory space with 512 MB of memory. As expected, and consistent with the findings overall, at the 1000-managed computers level, free space was 10 percent less at approximately 70 percent. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.

Cc750905.mmpfsz12(en-us,TechNet.10).gif

Figure 12: Network utilization remains very low at all managed computer levels

As expected, and consistent with MOM RTM testing, network utilization is at a minimum for all levels. As Figure 12 demonstrates, MOM SP1 does not overburden the network. In further tests, the highest utilization seen was 9 percent utilization during an agent pushdown, demonstrating that agent pushdowns result in much higher network utilizations.

Cc750905.mmpfsz13(en-us,TechNet.10).gif

Figure 13: Network utilization remains very low at all managed computer levels

As in the case of the database server, the network utilization from the managed computers to the DCAM consistently rose from 0.10 percent, at 250-managed computer level, to 0.25 percent, at the 1000-managed computer level, as Figure 13 demonstrates. As in comments throughout this paper, the workloads were consistently higher than any reported by customer surveys.

Best Practice: Capacity/Performance Recommendation

Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

MOM SP1 Test Results III: Enterprise Configuration - Separate Database Server and Two DCAMs

The third series of tests was performed using two large management servers (DCAMs), with the MOM database installed on a separate computer. The test results show the maximum number of managed computers that can be adequately controlled by this server configuration.

Hardware Test Environment - Enterprise Configuration, Separate Database, Two DCAMs

Table 21 Database System Configuration for the Enterprise Configuration

System component

Description

Processor count

4

Processor type

550 MHz Pentium 3

Memory

768 MB

Disk count OS

1

Disk count DB

6 (RAID 10)

Disk count log file

1

Disk designation OS

C drive (8.46 GB)

Disk designation DB

D drive (101.6 GB, 37.1 GB free space)

Disk designation log file

E drive (26 GB)

Disk I/O capacity - Reads

750 read operations per second

Disk I/O capacity - Writes

375 write operations per second

Network capacity

100 Mbps (12.5 MB)

MOM Build

MOM SP1 Build 1300 (RTM)

Table 22 DCAM System Configuration for the Enterprise Configuration

System component

Description

Processor count

2

Processor type

800 MHz Pentium 3

Memory

512 MB

Disk count

1

Disk designation

C drive (14.6 GB, 12.5 GB free space)

Network capacity

100 Mbps (12.5 MB)

MOM Build

MOM SP1 Build 1300 (RTM)

The MOM/SQL Server Disk Requirements

Table 23 shows the resources needed to install the SQL Server database. SQL Server 2000 Standard was used for these tests.

Table 23 Disk Requirements for Enterprise Configuration - Separate Database, Two DCAMs

Database server:

MOM disk space requirement total

SQL Server working set memory (average)

SQL Server database size (disk space)

Database log disk space

MS DTC log size disk space

Each DCAM:

MOM disk space requirement total

MOM OnePoint working set memory (average)

OnePoint threads (average)

Database and Data Workload Sizing

Table 24 shows the number of rows in the MOM database tables prior to running each test for a specific number of managed computers (from 700 to 1,000). Table 25 and Table 26 show the data workloads used in the tests for this configuration.

Table 24 Pre-test Database Table Sizes for the Enterprise Configuration

Rows in Alert table

97,833

Rows in Event table

3,303,168

Rows in SampledNumericData table

2,947,496

Table 25 Data Workload Levels per Day for the Enterprise Configuration

Workload item

For 700 managed computers

For 1,0000 managed computers

Alerts per day

9,000

12,000

Events per day

400,000

600,000

Performance counters per day

400,000

600,000

Note: The data workload shown in Table 25 represents higher workloads than the levels reported by any enterprise customers during their testing of MOM SP1 Build 1300 (RTM). The MOM SP1 testing workload values for alerts, events, and performance counters were based on the results of surveys taken from the largest enterprise customers for workload traffic, and then inflated to represent peak load situations.

Table 26 Data Workload Rates per Minute per Managed Computer for the Enterprise Configuration

Managed computer count

Alerts per minute per managed computer

Events per minute per managed computer

Performance counters per minute per managed computer

700

.00893

0.397

0.397

1000

.00833

0.417

0.417

Note: The values in Table 26 far exceed any numbers reported by the largest enterprise customers. The values for events and performance counters used for MOM RTM testing were higher than the MOM SP1 test scenarios. This is a result of fine-tuning the Management Packs to reduce the volume of events and performance counter traffic for MOM SP1.

In the original MOM RTM testing, the average alerts delivered to the database were equal to 0.00445 per minute per computer. For the MOM SP1 workload used in testing this configuration, the average alerts delivered to the database was 0.00833 per minute per computer, at the 1,000-managed computer level, and 0.00893 per minute per computer, at the 700-managed computer level. Both are approximately twice as high as the MOM RTM test workload levels. This means that the alerts workloads used for the MOM SP1 performance testing of this configuration, are twice as high as the MOM RTM testing levels.

Microsoft Operations Manager/SQL Server Test Results

These tests were performed to find what size that the DCAM and the database server should be, in terms of hardware, to perform a set level of work. Testing was started from 700 managed computers to find the upper limit.

In this series of tests, two DCAMs and the database server were monitored while managing 700 computers at the low end and 1,000 computers at the high end. The first test was conducted with 200 managed computers on one DCAM and 500 managed computers on the other, for a total of 700. The second test was conducted with 500 managed computers on each DCAM, for a total of 1,000. Table 27 depicts the effects on the two DCAMs for these test scenarios. Table 28 depicts the effect on the database server for the two test scenarios.

Table 27 Effect on the DCAMs of Additional Managed Computers - Enterprise Configuration

DCAM

Managed computer count

% CPU utilization

OnePoint service utilization

OnePoint working set average

Memory free space

Network busy

A

200/700

34.76%

38.22%

141,628,304

80.70%

0.26%

B

500/700

49.32%

53.20%

189,064,637

77.22%

0.26%

C

500/1,000

49.45%

49.98%

193,576,489

76.61%

0.25%

D

500/1,000

50.57%

51.86%

194,378,502

76.57%

0.25%

Note: Table 27 reflects the usage of four different DCAMs. In one test case, DCAM A managed 200 out of the 700 computers and DCAM B managed 500 of the 700 computers. In the second test case, DCAM C managed 500 of the 1,000 computers, and DCAM D managed 500 of 1,000 computers.

Table 28 Effect on Database Server of Additional Managed Computers - Enterprise Configuration

Managed computer count

% CPU utilization

SQL Server service utilization

SQL Server working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

700

19.76%

69.67%

711,511,735

94.91

204.91

56.19%

0.46%

1,000

24.50%

76.10%

743,612,153

190.44

218.82

54.35%

0.45%

Figures 14 through 21 graphically present information about the DCAMs and the MOM database server from Table 27 and Table 28.

Cc750905.mmpfsz14(en-us,TechNet.10).gif

Figure 14: expected, adding managed computers increases CPU utilization on the database server

Even with the 550 MHz processor that was used in these tests, the CPU utilization for 1,000 managed computers is approximately 25 percent. It is projected that the utilization for 2,000 managed computers would be less the 50 percent on a 768 MHz processor. With the new more powerful multi-gigahertz processors, we expect that that this utilization would be drastically reduced.

Cc750905.mmpfsz15(en-us,TechNet.10).gif

Figure 15: Adding managed computers increases CPU utilization on the DCAMs

Figure 15 reflects the usage of four different DCAMs. For more information, see Table 27. Notice that DCAM B, DCAM C, and DCAM D, which were all managing 500 computers, had almost identical CPU utilization factors. These tests reflect the consistency of MOM DCAMs. Also, note that the utilization for DCAM B, DCAM C, and DCAM D was at 50 percent on an 800 MHz computer, which is well below the 75 percent level, leaving 25 percent reserve capacity. It is expected that on the new more powerful multi-gigahertz processors, that the CPU utilization would be drastically reduced.

Cc750905.mmpfsz16(en-us,TechNet.10).gif

Figure 16: Increasing I/O affects disk performance

Note: Information about I/O activity on the DCAM was not included because it was inconsequential and only reflects the operating system and MOM activity.

For MOM SP1, there is marked increase in read and write activity over MOM RTM. In the original MOM RTM testing, the total peak I/O rate for 1,000 managed computers was 59.57/sec/computer. This is due to the increased activity caused by integrated grooming. For more information about integrated grooming, see the “Integrated Grooming” section earlier in this paper.

Cc750905.mmpfsz17(en-us,TechNet.10).gif

Figure 17: Disk queues increase as managed computers are added

Figure 17 displays queue lengths of up to 15 on the database server. The queue length on the DCAMs was zero for all test cases, so no figure is shown. In MOM RTM testing the queue lengths were less than two. This is the result of the increased I/O activity caused by integrated grooming in MOM SP1. These disk queues could be decreased considerably by adding more disk spindles to the RAID array.

Recommendation Use the RAID Selector Section of the MOM SP1 Management Server Sizer to determine the adequate spindle counts based on the various workloads and RAID configurations that you might want to use. The RAID Selector Section of the MOM SP1 Management Server Sizer takes into account that disk queue lengths should be less than two. For more information about the Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

Cc750905.mmpfsz18(en-us,TechNet.10).gif

Figure 18: Free memory space is adequate at all managed computer levels

Memory usage for the MOM database server was 60 percent free memory space with 768 MB of memory. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 2 GB. It is projected that at the recommended 2 GB memory size, SQL Server would run more efficiently.

Cc750905.mmpfsz19(en-us,TechNet.10).gif

Figure 19: Free memory space is adequate at all managed computer levels

Memory usage for the MOM DCAM was 80 percent free memory space with 512 MB of memory for all tests. This demonstrates efficient use of memory by MOM SP1. Best practices would recommend that on a MOM deployment this large, memory would be set at a minimum of 1 GB. It is projected that at the recommended 1 GB memory size, the DCAM would run more efficiently.

Cc750905.mmpfsz20(en-us,TechNet.10).gif

Figure 20: Network utilization remains very low at all managed computer levels

As expected, and consistent with MOM RTM testing, network utilization is at a minimum for all managed computers levels. As Figure 20 demonstrates, MOM SP1 does not overburden the network. In further tests, the highest utilization seen was 9 percent utilization during an agent push down, demonstrating that agent pushdowns will result in much higher network utilizations.

Cc750905.mmpfsz21(en-us,TechNet.10).gif

Figure 21: Network utilization remains very low at all managed computer levels

As in the case of the database server, the network utilization from the managed computers to the DCAM was consistently around 25 percent utilization. As Figure 21 demonstrates, the usage for 200 managed computers up to 500 managed computers was about the same. This is because the workload for all managed computers levels were the same. As in comments throughout this paper, the workloads were consistently higher than any reported by customer surveys.

Best Practice: Capacity/Performance Recommendation

Use the MOM SP1 Management Server Sizer to determine the appropriate system size and configurations based on the various workloads that you might want to use. The MOM SP1 Management Server Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes. For more information about the MOM SP1 Management Server Sizer, see "Appendix B: MOM SP1 Management Sizer" later in this paper.

MOM SP1 Management Packs

These tests are designed to measure the memory usage (footprint) of the MOM SP1 Management Packs both individually and cumulatively (build-up) as they are added to a managed computer.

Test Parameters

Hardware Test Environment

Table 29 describes the system configuration for the computers used in this test. The same configuration was used for the MOM SP1 server and the three managed computers.

Table 29 Computer Configuration (MOM SP1 Server and Managed Computers)

System component

Description

Processor count

1

Processor type

600 MHz Pentium 3

Memory

256 MB

Disk count

1

Disk designation OS

Drive C

Disk size

12.76 GB

Operating system

Windows 2000 Server SP3

Software Test Environment

The software and the versions used for these tests are listed in the Table 30:

Table 30 Product s and Versions Used for Tests

Product name

Version or build tested

Windows 2000 Server

Service Pack 3

SQL Server 2000

RTM + Service Pack 3

MOM 2000

Service Pack 1

MOM 2000 Application Management Pack

Service Pack 1

MOM SP1 Configuration

MOM SP1 was configured as a single configuration group, with three managed computers and with all MOM components installed on a single server.

The software that the test team installed on the MOM SP1 server is as follows:

  • MOM SP1

  • MOM SP1 Application Management Pack

  • SQL Server 2000 SP3

  • Internet Information Server

  • Terminal Services

  • Anti-virus software (eTrust)

After installing MOM SP1, the test team created three performance processing rules to capture performance data from the managed computers. Details of these custom performance processing rules are listed in the Table 31. After creating the performance processing rules, the test team created three Public views to chart this information.

Table 31 Custom Performance Processing Rules

Rule name

Provider

Performance-Private Bytes-OnePointService Agent

Process-Private Bytes-OnePointService-10-minutes

Performance-% Processor Time-OnePointService Agent

Process-% Processor Time-OnePointService-10-minutes

Process-% Processor Time-OnePointService-10-minutes

Process-Working Set-OnePointService-10-minutes

Managed Computers Configuration

The managed computers were a basic Windows 2000 Server configuration. The only additional service or product installed on the managed computers was the eTrust anti-virus software.

Agent Installation and Configuration Process

The test team installed each agent by adding the computer name to the Agent Manager, and then approving the installation of the agent to the managed computers. The installation of the agent was verified by viewing the All Agents view on the MOM SP1 server and by checking for the OnePointService process on each managed computer.

After installing each agent, the custom performance processing rules were enabled for graphing by selecting each computer in the Recent Performance view and enabling the counters for graphing.

For each test case, the managed computers were placed in several default computer groups. The computer groups that were common to each test variation are listed in Table 32.

Table 32 Common Computer Groups

Hardware Attributes – Number of Processors

Hardware Attributes – CPU Vendor

Hardware Attributes – CPU speed

Hardware Attributes – CPU Identifier

Hardware Attributes – BIOS Version

Hardware Attributes – BIOS Date

Microsoft Operations Manager Agents

When adding Management Packs to the agents, the managed computers were explicitly added to the computer groups for each Management Pack. This ensured that the Management Pack was deployed to the managed computer.

Test Cases

Management Pack Memory Build-up Tests

This series of tests is designed to measure the cumulative agent memory footprint as Management Packs are added to a managed computer. For each test case, the Management Packs were added to the agent computers in the order listed. Performance metrics were collected on the OnePointService process by using the custom performance processing rules listed in Table 31 earlier in this paper.

Test Case 1: Windows Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Windows NT & 2000 RRAS & RAS Non-Authorized Computers

  3. Windows 2000 Servers

  4. Windows 2000 License Logging Service

  5. Windows 2000 Dr. Watson

  6. Windows 2000 Any Computer

  7. Service Pack Version

Test Case 2: Add MOM SP1 Management Pack

The managed computer was placed in the following groups:

  1. All groups in Test Case 1

  2. Microsoft Operations Manager Database

  3. Microsoft Operations Manager Data Access Server

  4. Microsoft Operations Manager Consolidator

Test Case 3: Add Active Directory Management Pack

The managed computer was placed in the following groups:

  1. All groups in Test Cases 1 and 2

  2. Windows 2000 Domain Controllers

  3. Active Directory Trust Monitoring

  4. Active Directory Replication Latency Data Collection

  5. Active Directory Client Side Monitoring

Test Case 4: Add Exchange Server Management Pack

The managed computer was placed in the following groups:

  1. All groups in Test Cases 1, 2, and 3

  2. Microsoft Active Directory Connector

  3. Microsoft Exchange Server 2000

  4. Microsoft Exchange Instant Messaging Server

Test Case 5: Add SQL Server Management Pack

The managed computer was placed in the following groups:

  1. All groups in Test Cases 1, 2, 3, and 4

  2. Microsoft SQL Server 2000

Management Pack Memory Footprint Tests

This series of tests is designed to measure the agent memory footprint of individual Management Packs. Each Management Pack listed was added individually to a managed computer. Performance metrics were collected on the OnePointService service by using the custom performance processing rules listed in Table 31 earlier in this paper.

To keep the Windows Management Pack from automatically being installed on each managed computer and because they were all running Windows 2000 Server, the processing rule was modified.

Test Case 6: Base MOM Agent Only

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

Test Case 7: Windows Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Windows NT & 2000 RRAS & RAS Non-Authorized Computers

  3. Windows 2000 Servers

  4. Windows 2000 License Logging Service

  5. Windows 2000 Dr. Watson

  6. Windows 2000 Any Computer

  7. Service Pack Version

Test Case 8: MOM SP1 Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Microsoft Operations Manager Database

  3. Microsoft Operations Manager Data Access Server

  4. Microsoft Operations Manager Consolidator

Test Case 9: Active Directory Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Windows 2000 Domain Controllers

  3. Active Directory Trust Monitoring

  4. Active Directory Replication Latency Data Collection

  5. Active Directory Client Side Monitoring

Test Case 10: Exchange Server Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Microsoft Active Directory Connector

  3. Microsoft Exchange Server 2000

  4. Microsoft Exchange Instant Messaging Server

Test Case 11: SQL Server Management Pack

The managed computer was placed in the following groups:

  1. Common groups listed in Table 32, earlier in this paper

  2. Microsoft SQL Server 2000

Test Results

To achieve consistent results for the Management Pack memory footprint test cases, data was sampled for a 2-hour period, starting 15 minutes after the agent was installed on each managed computer. Waiting 15 minutes after the agent installation provides sufficient time for the agent to initiate communication with the MOM SP1 server, to properly identify the computer groups that the managed computer is included in, and for the Management Pack to be installed on the managed computer.

In addition, for the Management Pack memory build-up test cases, a 15-minute wait was provided after the addition of a Management Pack to the managed computer. An average was taken over the 2-hour period after the waiting period, and the results are reported in the Tables 33 and 34.

Table 33 Management Pack Memory Build-Up Test Results

Test case number

Test case title

CPU utilization

Cumulative Working set (bytes)

1

Windows Management Pack

0.040%

18,497,536

2

Add MOM SP1 Management Pack

0.055%

19,935,232

3

Add Active Directory Management Pack

0.243%

28,618,752

4

Add Exchange Management Pack

0.427%

41,435,136

5

Add SQL Server Management Pack

0.483%

43,552,768

Note: Table 33 displays possible memory cumulative usage when the Management Packs are added to the managed computer.

Table 34 Management Pack Individual Memory Footprint Test Results

Test case number

Test case title

CPU utilization

Individual working set (bytes)

6

Base MOM Agent Only

0.004%

12,193,792

7

Windows Management Pack

0.061%

6,303,744

8

MOM SP1 Management Pack

0.020%

1,761,280

9

Active Directory Management Pack

0.121%

7,553,024

10

Exchange Management Pack

0.111%

13,300,000

11

SQL Server Management Pack

0.030%

10,366,976

Note: Table 34 displays possible net memory usage for each Management Pack added to a managed computer.

Appendix A: Test Results For Microsoft Operations Manager 2000 RTM

The MOM test team conducted tests in June 2001 using Microsoft Operations Manager 2000 (MOM 2000). The processing of MOM data is handled completely differently with MOM SP1; therefore, it is not possible to make direct comparisons of the test results between MOM 2000 and MOM SP1.

Note: For the tests of MOM 2000, all MOM components were installed on a single computer, which is referred to throughout this appendix as the management server.

For MOM 2000, these tests were conducted using simulators that imitate heartbeat activity. These tests indicate that a management server supporting 700 managed computers met the criteria for managing that many computers and delivering alerts to the database within the two-minute Service Level Agreement. Although the computer systems sized in this report can process MOM events, alerts, and performance counters within the two-minute Service Level Agreement, this should not be construed as a best practice of MOM usage. There is a minimum-size configuration that has been tested and is known to be able to handle the managed computer count. A best-practice recommendation is offered for each of the management servers depicted in this report.

Test Parameters

This section presents the key factors for the MOM 2000 performance and sizing testing — the hardware used, the scope and goals for the testing, the tools used, and how the test workload was calculated. Later sections present the results of testing.

Hardware Test Environment

For MOM 2000, the test hardware used to conduct this performance study consisted of management servers and client computers. The client computers simulated the activity level of managed computers. Six client computers could simulate up to 2,000 managed computers. Each of the management servers had multiple processor support, although multiple processors were not used in all the testing. The exact configuration of each management server is disclosed in the description section of each test series.

The network had a line capacity of 100 Mbps, which represents the highest available bandwidth for most enterprise environments. Using lines with greater capacity for testing, such as T1, would rule out a large part of the user community. The hardware was set up in a single-tier configuration. Multitiered configurations were not tested during this phase. The configuration of the client server used to simulate the managed computers is as follows:

Table A-1 Managed Computer Configuration

Server name

MOMTEST3 – MOMTEST8

Processor count

1

Processor type

550 MHz - 733 MHz Pentium 3

Memory

256 MB

Disk count

One

Disk designation OS

Drive C

Disk size

9.1 GB

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Manufacturer

Compaq

Model

Deskpro

Scope of Testing

The scope of testing determines how well MOM scales, what management server configuration best manages the computers, and the maximum number of computers a single management server can manage. The findings include any bottlenecks that were discovered and recommendations to remove the bottlenecks. The builds of MOM 2000 used include: MOM 0003, 0005, 00012.1, 00012.2, and Beta version 00012.5.

Data for all tests performed for MOM 2000 were collected under the following conditions:

  • The Event Simulator program generated the workload activity for all tests.

  • The Managed Computer Heartbeat Activity parameter was not tested for scalability because, for many systems, a maximum of 10 clients were used to simulate activity. Heartbeat activity can cause additional network, CPU, and disk usage.

  • Limited testing was performed for reporting. Database grooming studies also are included in this report.

Goals of Testing

The goal of testing was to determine if MOM meets the high standards of performance that our customers have come to expect from Microsoft. We tested each MOM component for how well it uses resources on the management server, and if the component runs within an acceptable range for defined parameters. These parameters are:

  • System resource usage

  • MOM agent usage on managed computers

One of the most important goals of this testing was to determine the correct size of the management server, in terms of hardware resources, needed to manage a set number of managed computers. Scalability was tested to establish the maximum number of managed computers per management server.

Tools for Testing

The software tools used for the testing include the following:

  • Performance Monitor. Performance Monitor was used to record performance statistics for MOM. This is essential for testing functionality and efficiency. Performance Monitor can detect bottlenecks in any of the core systems such as CPU, disk, memory, and network. Using the performance counters, we can analyze exactly what the computer system is doing, and what level of resources the computer system is using.

  • Event Simulator. To test the environment accurately from a performance perspective, we needed to create multiple events that strained the Consolidator and the MOM database components. We used Event Simulator, a tool developed by NetIQ, to perform this function. Event Simulator can simulate an event and alert level that is many times the actual quantity of managed computers in our test environment.

Note: Event Simulator is a development test tool, which is not intended for general use.

Test Workload and Calculations

For the MOM 2000 testing, the test workload was determined through a study of the Microsoft Information Technologies Group (ITG) MOM database. A copy of the ITG MOM database that had been in operation for several months was examined to determine the exact number of events and alerts that were generated during peak periods of operation. In addition, the number of performance counters that were collected for each computer also was recorded. These values were then used to simulate actual network line and database usage during the testing.

Test workload information for MOM 2000:

  • Number of computers that were managed = 350

  • Time range of test = 67 hours

  • Suppressed alerts = 1,513

  • Unsuppressed alerts = 6,273

  • Events = 9,253,409

  • Performance counters = approximately 100-130 per computer, per 900 seconds

This results in the following peak numbers:

  • Suppressed alerts = 0.00108 per computer, per minute

  • Unsuppressed alerts = 0.00445 per computer, per minute

  • Events = 6.57 per computer, per minute

  • Performance counters = 0.144 per computer, per second

Examples of Calculations

Based on the test workload calculations from the previous section, to simulate 100 managed computers required the following calculations:

Suppressed alerts = 0.00108 × 100 computers = 0.108 × 10 minutes = 1.08 alerts

This can be rounded off and expressed as one suppressed alert every 10 minutes.

Unsuppressed alerts = 0.00445 × 100 computers = 0.445 × 2 minutes = 0.89 alerts

This can be rounded off and expressed as one unsuppressed alert every two minutes.

Events = 6.57 × 100 computers = 657 events per minuteCounters = 0.144 × 100 computers = 14.4 counters per second

For MOM 2000, the Event Simulator was set up to generate alerts and events information according to the previous calculations. Setting performance rules within MOM to collect and write these values into the MOM database simulated the network line and database activity of 100 managed computers collecting counters in the previous example.

Note: The collection of performance counters can cause high CPU utilization if collected at too high a rate. In these tests large amounts of counters were collected at very low collection intervals to simulate high numbers of managed computers, and to measure the activity they cause. If used, these low collection intervals can cause abstract effects such as very high CPU utilization for long periods of time, and database latency in excess of 20 minutes.

Performance Monitor Counter Metrics

Table A-2 lists the primary performance counters that were collected and used for this analysis. For a complete list and description of the counter functions, see “Appendix D: Counter Definitions” later in this paper.

Table A-2 Primary Counters Used in Testing

Counter object

Counter property

Instances

Processor

% Processor Time average

Total

 

% Processor Time peak

 

 

Interrupts/sec

 

Process

% Processor Time

OnePoint process

 

Working Set

SQL Server processes

 

Thread Count

 

 

IO Read Operations/sec

 

 

IO Write Operations/sec

 

Memory

Available Bytes

Total

 

Page Faults/sec

 

 

% Committed Bytes In Use

 

Network Interface

Bytes Total/sec

100 Mbps network adapter card

 

Current Bandwidth

 

Physical Disk

Disk Reads/sec

Drive C

 

Disk Writes/sec

Database disk drives

 

Avg. Disk Queue Length

 

System

Processor Queue Length

Total

SQL Server:Databases

Transactions/sec

OnePoint database

SQL Server:Buffer Manager

Buffer Cache Hit Ratio

OnePoint database

Calculated counters

Counter property

Calculations

% Network Busy

Bytes Total/sec/Current Bandwidth Bytes

Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec

Current Bandwidth Bytes = Current Bandwidth/8

Memory Free Space

Available KBytes/Total Physical Memory

 

Note: Table A-2 establishes the core performance counter collection metrics. Other counters might be used for further analysis. The Physical Disk, % Disk Time counter was not used because it gives false readings on Redundant Array of Independent Disks (RAID) arrays. All database disk arrays used for these tests were RAID 5.

Test Results for MOM 2000

The following sections show test results for a range of managed computers with different sized server configurations.

Test Results I: MOM 2000/SQL Server Typical Install, Small Management Server, 20 to 85 Managed Computers

This series of tests was performed on a small management server using SQL Server 2000 Standard database, with a 5 GB database and a 1 GB log file size. The database was loaded on three-disk drive arrays set to RAID 5. RAID 5 was selected because it offers the most inexpensive form of disk fault tolerance, and it generates the most I/Os per second to accommodate this fault tolerance. This system is meant to monitor up to 85 managed computers. This section shows the capacity of the system, and makes recommendations for improving the system as tested where applicable. The test results show the asymptotic capacity bounds of managed computers that can be adequately controlled by this server.

Management System Configuration

Table A-3 lists the system components used for this series of tests.

Table A-3 System Configuration for Small Server Test

Processor count

1

Processor type

733 MHz Pentium 3

Memory

512 MB

Disk count OS

1×9.1 GB

Disk count DB

3×9.1 GB

Disk designation OS

C drive (9.1 GB)

Disk designation DB

D drive (21.3 GB)

Database size (+ Log)

6 GB

DB I/O capacity

210 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Manufacturer

Compaq

Model

350 ML

MOM build

0003, 0005, 00012.1, 00012.2

The MOM 2000/SQL Server Footprint

The following statistics show the actual resources needed to install the management server along with the SQL Server database. SQL Server 2000 Standard was used for these tests. Table A-4 lists the disk sizes that are required to install these components.

Table A-4 System Resources for Small Management Server Test

MOM disk space requirement total

230 MB

MOM OnePoint working set average memory

76.85 MB-97.98 MB

OnePoint threads

71

SQL Server working set memory

64.3 MB-123.52 MB

SQL Server database size disk space

5 GB

Database log disk space

1 GB

MS DTC log size disk space

512 MB

MOM 2000/SQL Server Test Results

These tests were performed to determine the optimum size of the management server needed to perform this task. We started the testing from 20 managed computers (as covered in test series 1) to find the upper limit. For this test, the database is placed on a three-disk volume designated as the D drive.

Establishing the Baseline of the Small Management Server

In this test series, the management server was monitored while managing 20 computers at the low end. These findings are used to establish a baseline for its operation. Table A-5 depicts the growth rate as more managed computers were added.

Table A-5 Effect on Server of Additional Managed Computers

Managed computer count

% CPU utilization

OnePoint service

OnePoint working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

20

21.96%

19.98%

76,852,765

8.98

11.45

9.57%

0.635%

30

31.78%

22.83%

97,987,467

11.31

14.89

8.342%

0.756%

50

42.76%

29.45%

97,987,467

14.58

21.56

7.99%

0.963%

75

46.97%

36.52%

97,987,467

17.94

26.45

7.560%

1.041%

85

53.74%

43.81%

97,987,467

20.71

29.48

7.89%

1.39%

Table A-5 shows a linear trend. Figures A-1, 3, 5, and 7 depict this linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.

Cc750905.mmpfsz22(en-us,TechNet.10).gif

Figure A: -1 Adding managed computers affects CPU utilization

Figure A-1 depicts the CPU utilization trend between 20 and 85 managed computers for MOM 2000. As the graph depicts, the growth trend is linear. The utilization rate for 85 managed computers is almost 54 percent. This gives you a good reserve capacity of 46 percent. Because it is not known exactly how much effect the Application Management Packs or scripts will have on CPU utilization, it is recommended that you have as much reserve capacity as possible.

Figure A-2 shows the I/O usage by the MOM 2000 management server monitoring up to 85 managed computers. The maximum I/O count per second is almost 50, which, for this configuration, is very close to the maximum of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if you intend to increase the amount of managed computers.

Cc750905.mmpfsz23(en-us,TechNet.10).gif

Figure A: -2 Increasing I/O affects disk performance

Figure A-3 shows the disk queue length for MOM 2000. Notice that at 85 managed computers, the queue is less than 2.00. This does not indicate that there is an abnormal amount of contention. Although the queue should be as close to zero as possible, this should not be a problem, and can be corrected easily by the addition of another disk drive to the database volume. Combined with the I/O rate that the managed computers create, this indicates that the configuration can handle the 85-managed computer workload.

Cc750905.mmpfsz24(en-us,TechNet.10).gif

Figure A: -3 Disk queues remain short at 85-managed computer level for MOM 2000

Figure A-4 indicates that the amount of free memory is just under the 10 percent free space limit. Considering all the other counters for the configuration and the volume, this is acceptable. The memory usage has leveled off, which indicates that no more memory usage for this managed computer group will take place.

Cc750905.mmpfsz25(en-us,TechNet.10).gif

Figure A: -4 Memory is adequate at the 85-managed computer level

Figure A-5 depicts the network usage of the management server for MOM 2000. Notice that the 100 Mbps network is not affected very much by the activity of 85 managed computers. The total usage is about 1.4 percent in steady state. This indicates that a 10 Mbps network can handle this managed computer workload at about 14 percent usage. This further indicates that the managed computer scan, which can add up to three times the average volume, would still not affect the network enough to cause a bottleneck.

Cc750905.mmpfsz26(en-us,TechNet.10).gif

Figure A: -5 Network usage is adequate at the 85-managed computer level for MOM 2000

Sizing Recommendations for Small Configurations

The management server used in these tests has enough capacity to manage 85 computers. The indication of these tests is that the I/Os per second will exceed the recommended I/O capacity if more than 85 computers are managed. In subsequent testing of 90 managed computers, the disk I/O level and the acceptable reserve memory exceeded the maximum values.

As a result of these tests, it can be concluded that the hardware required for an 85-computer system should have the following components, at a minimum:

  • One CPU running at 733 MHz or higher

  • 512 MB memory

  • One disk drive designated as C drive for the operating system and Consolidator

  • 3×9.1 GB disk volume designated as D drive for the database and the database log file

  • 100 Mbps network

This configuration is a recommended starting point. Results may vary depending on the types of managed computers, and the events/alerts they are generating.

It is expected that 85 managed computers will generate:

  • 558.45 events per minute

  • One unsuppressed alert per 11 minutes

  • One suppressed alert per 3 minutes

  • 12.24 performance counters per second

Best Practice: Capacity/Performance Recommendation

The usage of this system should not exceed the 85-managed computer limit. The CPU utilization is above 50 percent maximum steady state usage. Keep in mind that the testing did not include the use of Application Management Packs, which could easily add 25 percent additional utilization. This could very well exceed the maximum usage in steady state. Although memory shows ample reserve capacity, users should monitor the memory closely and add additional memory if the reserve drops below 5 percent. The disk I/Os also should be monitored, and additional disk capacity should be added if the I/O per second rate increases above 60. The management server should be a DDCAM (single database and DCAM unit), and should contain the system components listed in Table A-6.

Table A-6 Recommended Minimum System Capacity for up to 85 Managed Computers

Processor count

1

Processor type

550 MHz - 733 MHz Pentium III

Memory

512 MB

Disk count

5

Disk designation OS, MOM

C drive, one disk drive

Disk designation MOM DB, and DB Log

D drive, four disks or more depending on RAID factor

MOM DB + DB log size

Refer to the database calculator

Disk space size

9.1 GB or higher

Disk I/O capacity

280 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Test Results II: MOM 2000/SQL Server Typical Install, Medium Management Server, 85 to 200 Managed Computers

The next series of tests was performed on a medium management server. The database for these tests was 10 GB, with a 2 GB log file size. The database was installed on a six-disk array set to RAID 5. This system has twice the disk capacity of the previous management server, and is designed to monitor up to 250 managed computers. This section shows the capacity of the system, and makes recommendations for improving the system as tested where applicable. The test results show asymptotic capacity bounds of managed computers that can be adequately controlled by this server.

Management System Configuration

Table A-7 lists the system components used for this series of tests.

Table A-7 System Configuration for Medium Management Server Test

Server Name

MOMTEST3

Processor count

2

Processor type

550 MHz Pentium 3

Memory

1 GB

Disk count OS

1×9.1

Disk count DB

6×9.1 GB

Disk designation OS

C drive (9.1 GB)

Disk designation DB

D drive (54.6 GB)

Database size (+ Log)

12 GB

DB I/O capacity

420 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Manufacturer

Compaq

Model

350 ML

MOM build

0003, 0005, 00012.1, 00012.2

The MOM 2000/SQL Server Footprint

The data in Table A-8 shows the resources that were used to install the management server and the SQL Server database. SQL Server 2000 Enterprise was used for these tests.

Table A-8 System Resources for the Medium Management Server Test

MOM disk space requirement total

230 MB

MOM OnePoint working set average memory

97.98 MB

OnePoint threads

68

SQL Server working set memory

64.3 MB

SQL Server database size disk space

10 GB

Database log disk space

2 GB

MS DTC log size disk space

512 MB

MOM 2000/SQL Server Test Results

These tests were performed to determine the optimum size of the management server needed to perform this task. Based on our last set of tests, we started the testing from 86 managed computers to find the upper limit.

Establishing the Baseline of the Medium Management Server

In the third test series, the management server was monitored while managing 85 computers at the low end. These findings are used to establish a baseline for its operation and the upper bounds of the management server limit. Table A-9 depicts the growth rate as more managed computers are added.

Table A-9 Effect on Server of Additional Managed Computers

Computer count

% CPU utilization

OnePoint service

OnePoint working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

86

31.67%

27.89%

97,987,467

16.35

16.67

51.66%

01.438%

100

38.69%

26.35%

97,987,467

21.02

20.71

48.00%

01.441%

250

59.65%

49.73%

97,987,467

34.69

57.81

46.00%

01.422%

Table A-9 shows, like those in previous tests, a linear growth trend. The following figures show linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.

Cc750905.mmpfsz27(en-us,TechNet.10).gif

Figure A: -6 Utilization trend for a medium configuration

Figure A-6 shows a utilization trend for up to 250 managed computers. Notice the growth trend is again linear. The utilization for 250 managed computers is almost 60 percent on a two-processor system. This gives the user a reserve capacity of 40 percent, which is acceptable. We can also estimate how much effect the Application Management Packs or scripts might have on CPU utilization: they can add up to 25 percent overhead.

Cc750905.mmpfsz28(en-us,TechNet.10).gif

Figure A: -7 Approaching the limits of I/O count with medium configuration

Figure A-7 shows the I/O usage by the management server managing up to 250 managed computers. The maximum I/O count per second is almost 60, which, for this configuration, is very close to the maximum allowed count of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if you intend to increase the amount of managed computers. The disk queue in these tests was less than one, which indicates that there were no disk bottlenecks.

Figure A-8 shows the memory free space. In this set of tests, memory has been increased to 1 GB, and there is a little more than 40 percent available. This indicates that memory is not a problem or potential bottleneck.

Cc750905.mmpfsz29(en-us,TechNet.10).gif

Figure A: -8 Increasing memory helps performance with medium configuration

Figure A-9 shows the network usage of the management server. Notice that the 100 Mbps network is not affected very much by the activity of 250 managed computers. The total usage is about 1.6 percent in steady state. This indicates that a 10 Mbps network can handle this managed computer workload at about 16 percent usage. This further indicates that the managed computer scan, which can add up to three times the average volume, will not affect the network enough to cause a bottleneck.

Cc750905.mmpfsz30(en-us,TechNet.10).gif

Figure A: -9 Adding up to 250 managed computers does not affect network usage adversely

Sizing Recommendations for Medium Configurations

The management server used has enough capacity to manage 250 computers, but not a lot of reserve capacity on the disk end. This indicates that the I/Os per second will exceed the recommended I/O capacity if more than 250 computers are managed. In subsequent testing, the disk I/O level exceeded the maximum values that are considered acceptable at 260 managed computers. The CPU utilization was at almost 60 percent, which indicates an adequate reserve capacity. The memory had more than enough reserve capacity, as did the network.

As a result of this test, it can be concluded that the hardware required for a 250 managed computer system should be, at a minimum:

  • Two CPUs running at 550 MHz or higher

  • 1 GB memory

  • One disk drive designated as the C drive for the operating system and Consolidator

  • 6×9.1 GB disk volume designated as D drive for the database and database log file

  • 100 Mbps network

It is expected that 250 managed computers will generate:

  • 1642.5 events per minute

  • One unsuppressed alert per 4 minutes

  • One suppressed alert per minute

  • 36 performance counters per second

Best Practice: Capacity/Performance Recommendation

The usage of this system should not exceed the 250 managed computer limit. The disk I/Os should also be monitored, and additional disk capacity should be added if the I/O per second rate increases above 60 per disk. As a best practice, the management server should be a DDCAM (one database server and DCAM server), and contain the following system components:

Table A-10 Recommended Minimum System Capacity for up to 250 Managed Computers

Database server

Database unit

Processor count

2

Processor type

550 MHz-733 MHz Pentium 3

Memory

1 GB

Disk count

7

Disk designation OS, MOM

C drive, one disk drive

Disk designation MOM DB, and DB log

D drive, six disks or more, depending on RAID factor

MOM DB + DB log size

Refer to the database calculator

Disk size

9.1 GB or higher

Disk IO capacity

420 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

DCAM server

DAS/CAM unit

Processor count

2

Processor type

550 MHz-733 MHz Pentium 3

Memory

1 GB

Disk count

1

Disk designation OS, MOM

C drive, one disk drive

Disk space size

9.1 GB or higher

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Test Results III: MOM 2000/SQL Server Typical Install, Large Management Server, 250 to 1,200 Managed Computers

The last series of tests was performed on a large management server with four processors and 2 GB of memory. The database size for these tests was 20 GB with a 5 GB log file size. The database was loaded on an eight-disk array set to RAID 5. The database was SQL Server 2000 Enterprise. This section will show the capacity of the system and make recommendations on how improvements of the system as tested were applicable. The test results will show asymptotic capacity bounds of managed computers that can be adequately controlled by this server.

Management System Configuration

The system tested for this evaluative series was configured as shown in Table A-11.

Table A-11 System Configuration for Large Management Server Test

Server name

MOMTEST4

Processor count

4

Processor type

733 MHz Pentium III

Memory

2 GB

Disk count OS

1×9.1

Disk count DB

8×9.1 GB

Disk designation OS

C drive (9.1 GB)

Disk designation DB

D drive (72.8 GB)

Database size (with log)

25 GB

DB I/O capacity

560 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Manufacturer

Dell

Model

Power Edge 6300

MOM build

0003, 0005, 00012.1, 00012.2, 66.7

Note The MOM Build 66.7 was used for verification testing.

The MOM 2000/SQL Server Footprint

The following data in Table A-12 show the resources used to install the management server and the SQL Server database. SQL Server 2000 Enterprise was used for these tests.

Table A-12 System Resources for Large Management Server Test

MOM disk space requirement total

230 MB

MOM OnePoint working set average memory

98.81 MB-108.134 MB

OnePoint threads

77

SQL Server working set memory

64.3 MB-212.23 MB

SQL Server database size disk space

20 GB

MSDE database log disk space

5 GB

MS DTC log size disk space

512 MB

MOM 2000/SQL Server Test Results

These tests were performed to determine the optimum size of the management server for this task. Based on the last set of tests, the testing was started from 251 computers to find the upper limit. This system has much greater capacity than the previously tested computers. There are four CPUs, rated at 733 MHz. There are eight disks set to RAID 5.

Establishing the Baseline of the Large Management Server

In the fourth test series, the management server was monitored while managing 251 computers at the low end. These findings are used to establish a baseline for its operation. Table A-13 shows the growth rate as more managed computers are added:

Table A-13 Effect on Server of Additional Managed Computers

Computer count

% CPU utilization

OnePoint service

One Point working set peak

Disk reads/sec

Disk writes/sec

Memory free space

Network busy

251

18.45%

8.78%

98,816,000

15.67

18.56

51.66%

01.433%

450

36.45%

13.67%

98,816,000

18.35

17.90

50.03%

01.649%

550

42.98%

18.69%

98,816,000

23.45

22.75

49.99%

01.894%

750

43.79%

22.31%

98,816,000

25.73

24.54

46.98%

02.316%

1000

46.45%

35.95%

106,046,916

29.90

29.67

42.67%

02.678%

1200

53.67%

47.73%

108,134,400

31.21

31.56

40.09%

02.853%

The growth trend shown in Table A-13 is again linear. The following figures demonstrate linear incremental growth by managed computer for CPU utilization, I/O, memory free space, and network usage.

Cc750905.mmpfsz31(en-us,TechNet.10).gif

Figure A: -10 Adding CPUs decreases over-utilization in large configurations.

Note 700 Managed computer limit.

Figure A-10 shows the usage trend for up to 1200 managed computers. The growth trend shows that usage starts low, because the testing has gone from two CPUs to four CPUs, but then levels off at about 550 managed computers. As more managed computers are added, the utilization climbs to almost 54 percent. This gives the user a reserve capacity of more than 40 percent, which is acceptable. How much effect the Application Management Packs or scripts will have on CPU utilization can also be estimated: they can add up to 25 percent. In this case, the CPU power is present, but at 1200 managed computers it is close to 60 percent. Whether this is acceptable or not is determined by the frequency of use of these Management Packs and the number of scripts that are processed.

Cc750905.mmpfsz32(en-us,TechNet.10).gif

Figure A: -11 Increased I/O in large installation means more disk space may be needed.

Note 700 managed computer limit.

Figure A-11 shows the I/O usage by the management server managing up to 1200 managed computers. The maximum I/O count per second is at almost 63 I/Os per second, which, for this configuration, is above the maximum allowed count of 60 I/Os per second. This indicates that the disk farm should be larger for I/O capacity if the amount of managed computers grows to 1200.

Figure A-12 shows the disk queue in these tests. For 1200 managed computers it is very close to the maximum, but still tolerable. The maximum queue length for this series of tests is almost 2.5, which will not affect system integrity.

Cc750905.mmpfsz33(en-us,TechNet.10).gif

Figure A: -12 Disk queues get somewhat longer in 1200-computer configuration.

Note 700 managed computer limit.

Figure A-13 depicts the memory free space. In this set of tests, memory has been increased to 1 GB, and there is a little less than 40 percent available. This indicates that memory is not a problem or potential bottleneck in these tests.

Cc750905.mmpfsz34(en-us,TechNet.10).gif

Figure A: -13 Increase in memory assures adequate free space.

Note 700 managed computer limit.

Figure A-14 shows the network usage of the management server. Notice that the 100 Mbps network is not affected very much by 1200 managed computers. The total usage is about 2.85 percent in steady state. This indicates that a 10 Mbps network can handle the managed computer workload at about 28 percent usage. As in the previous tests, this further indicates that the managed computer scan, which can add up to three times the average volume, would still not affect the network enough to cause a bottleneck.

Cc750905.mmpfsz35(en-us,TechNet.10).gif

Figure A: -14 Network usage is adequate in large configurations.

Note 700 managed computer limit.

Sizing Recommendations for Large Configurations

These tests indicate that the management server used has enough capacity to manage 1200 computers, but it is very close to the limit on the reserve capacity. The CPU has enough reserve capacity, even with the additional burden of the Application Management Packs. As in the other tests, memory and network are not issues for concern.

With these test results as a basis, it can be concluded that the minimum hardware for a 1200-computer server should be:

  • 4×733 MHz

  • 2 GB memory

  • One disk drive designated as the C drive for the operating system and Consolidator

  • 8×9.1 GB disk volume designated as the D drive for the database and database log file (or more)

  • 100 Mbps network

This configuration is recommended as a starting point. Results may vary depending on the types of managed computers and the events or alerts they generate. 1200 managed computers should generate:

  • 7884 events per minute

  • 1.29 unsuppressed alerts per minute

  • 5.34 suppressed alert/ per minute

  • 173 performance counters per second

Best Practice: Capacity/Performance Recommendation

To prevent overloading the management server, the total number of managed computers for a large management server computer system should not exceed the 700-managed computer limit. Depending on additional Application Management Packs, the managed computers should be decreased to no more than 600 managed computers for a large number of Management Packs. The disk I/Os should also be monitored, and additional disk capacity should be added, if the I/O per second rate increases above 60 per disk. Furthermore, the total managed computer count for any configuration group should not exceed 1,000. An adequate management server to monitor 700 managed computers should be a three-server configuration consisting of the components listed in Table A-14.

Table A-14 Recommended Minimum System Capacity for Large Configurations

Database server

1

Processor count

4

Processor type

550 MHz-733 MHz Pentium 3

Memory

2 GB

Disk count

9

Disk designation OS, MOM

C drive, one disk drive

Disk designation MOM DB, and DB Log

D drive, eight disks or more, depending on RAID factor

MOM DB + DB log size

Refer to the database calculator

Disk size

9.1 GB or higher

Disk I/O capacity

560read/write operations per second

Network capacity

100 Mbps (12.5 MB)

DCAM server

DAS/CAM unit

Processor count

2

Processor type

550 MHz-733 MHz Pentium 3

Memory

1 GB

Disk count

1

Disk designation OS, MOM

C drive, one disk drive

Disk space size

9.1 GB or higher

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

DCAM server

DAS/CAM unit

Processor count

2

Processor type

550 MHz-733 MHz Pentium 3

Memory

1 GB

Disk count

1

Disk designation OS, MOM

C drive, one disk drive

Disk space size

9.1 GB or higher

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Two DCAM systems are recommended for redundancy and fault-tolerance. It is recommended that the DCAM systems have no more than 350 managed computers assigned to each. Table A-15 lists the recommendations for a two-server configuration.

Table A-15 Recommendations for Two-Server Configuration

Database server

1

Processor count

4

Processor type

550 MHz-733 MHz Pentium 3

Memory

2 GB

Disk count

9

Disk designation OS, MOM

C drive, one disk drive

Disk designation MOM DB, and DB log

D drive, eight disks or more depending on RAID factor

MOM DB + DB log size

Refer to the database calculator

Disk space size

9.1 GB or higher

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps, 12.5 MB

DCAM server

DAS/CAM unit

Processor count

4

Processor type

550 MHz-733 MHz Pentium 3

Memory

1 GB

Disk count

1

Disk designation OS, MOM

C drive, one disk drive

Disk space size

9.1 GB or higher

Disk I/O capacity

70 read/write operations per second

Network capacity

100 Mbps (12.5 MB)

Report Generation for MOM 2000

This section presents the findings for the reporting function of MOM 2000 in terms of resource consumption. The reports that were tested are shown in Figure A-15. They were selected out of the Windows NT/2000 reporting tree and include reports from Windows NT/2000 Capacity Planning, Windows NT/2000 Operations, and Windows NT/2000 Performance Analysis. These were selected because of the reputed high-resource consumption associated with generating these reports.

Cc750905.mmpfsz36(en-us,TechNet.10).gif

Figure A: -15 Reporting resource consumption with MOM 2000

Reports and CPU Utilization

The first series of test results concern CPU and disk usage, and the report generation process. Figure A-16 depicts CPU utilization as the result of managing 150 computers. The average utilization is 53.63 percent. The time interval for the measurement was 45 seconds. The figure shows no 100 percent utilization spikes for prolonged periods, indicating that there is even response time and processing distribution.

Cc750905.mmpfsz37(en-us,TechNet.10).gif

Figure A: -16 Reporting CPU utilization with MOM

Figure A-17 shows report generation managing the same 150 computers. The two CPU spikes of 100 percent each (on the right side of the graph) were prolonged for at least 2.5 minutes each, during which time no MOM Administrator console function was available. The red (dark) line on the lower section of the graph shows the OnePoint service utilization, which is not affected during report generation.

Cc750905.mmpfsz38(en-us,TechNet.10).gif

Figure A: -17 Reporting CPU spikes with MOM

Figure A-18 shows a normal disk read pattern for the same 150 managed computers. Even reads occur every five minutes, which is how often MOM reads and writes to the disks.

Cc750905.mmpfsz39(en-us,TechNet.10).gif

Figure A: -18 Reporting disk reads with MOM

The write pattern, as shown in Figure A-19, is identical to the read pattern. Both figures show no reporting for this period.

Cc750905.mmpfsz40(en-us,TechNet.10).gif

Figure A: -19 Reporting disk writes with MOM

Figure A-20 shows read activity with report generation for the same 150 managed computers. Notice that the disk is constantly running at almost 100 percent I/O activity. This made disk writes and reads almost impossible, and caused very long disk queues. The write activity for this time was the same.

Cc750905.mmpfsz41(en-us,TechNet.10).gif

Figure A: -20 Reporting disk reads with report generation

Figure A-21 shows the average disk queue length to be above 12 for the same measurement and time as in Figure A-20.

Cc750905.mmpfsz42(en-us,TechNet.10).gif

Figure A: -21 Reporting disk queuing with MOM

Conclusions & Best Practices: Reporting

These results show that producing reports and managing computers—at the same time and on the same computer—consumes CPU and disk resources. There is an estimated 30 percent overhead for report generation in CPU and disk utilization of 100 percent, and average disk queue lengths of 12 or more during this function.

Therefore, you should not attempt to generate reports while managing computers from the management server. The best way is to use a separate reporting system and a duplicate database, if possible, from which to generate reports. Another way to generate reports if you do not have a separate reporting system is to schedule reports at an appropriate time. Do not generate too many reports at a time; generating more than two reports at a time will overtax most systems.

Grooming the Database

The final series of tests concerned grooming the MOM database. These tests were run while the management server computer system was managing 250 computers to isolate the CPU and disk overhead caused by this function.

Grooming and System Reactions

CPU utilization accounted for almost 30 percent additional utilization on the management system. The average pre-grooming rate was approximately 35 percent. When grooming was turned on, the utilization averaged about 65 percent. This can be observed in Figure A-22.

Cc750905.mmpfsz43(en-us,TechNet.10).gif

Figure A: -22 Reporting CPU utilization with database grooming

Utilization was constant for at least 18 minutes, and then dropped off at the end to about 35 percent, which is what it was before the grooming was turned on (far right).

Cc750905.mmpfsz44(en-us,TechNet.10).gif

Figure A: -23 Reporting massive CPU usage

Figure A-23 indicates that the CPU queue length was more than twice the accepted maximum of two. This indicates massive CPU usage. Figure A-24 shows massive disk usage during this same time period.

Cc750905.mmpfsz45(en-us,TechNet.10).gif

Figure A: -24 Reporting massive disk utilization

Conclusions and Best Practices: Database Grooming

Database grooming is a necessary function for MOM. Testing was performed to move grooming out to later times, but that only caused longer grooming periods later on.

Although grooming took place during all tests, when isolated, the grooming function appears to use excessive resources. However, this really is not the case when all the other functions that MOM performs simultaneously are taken into account. As a best practice, you should place the database on a separate server with multiple CPUs and multiple disks to overcome the additional reads and writes that are caused by this function. It is also best to keep the primary database as small as possible, but not greater than 12 GB, with the database log files.

Appendix B: MOM SP1 Management Sizer

This MOM SP1 Management Sizer recommends database sizes according to the data that you input; there are no preset limitations to what the MOM SP1 Management Sizer can recommend. Any recommendation that the MOM SP1 Management Sizer makes that is over 30 GB is an unsupported database configuration. You should adjust grooming parameters to yield a supported database size, which is 30 GB or less. Any recommendation that is over 30 GB could be considered as a database warehouse recommendation for long-term reporting and storage.

To use the MOM SP1 Management Sizer, type the targeted managed computer count in the highlighted yellow area to the right of Managed Computer Count. You can also adjust the grooming parameters to determine the appropriate database size. To change the grooming parameters, type different grooming parameter values in the highlighted yellow areas to the right of the parameters in the Enter Groom Factor section. The MOM SP1 Management Sizer automatically calculates the RAID selection and spindle count; expected network usage for various network bandwidths; DCAM server and MOM database server hardware sizes; and the database and log sizes.

You can download the MOM SP1 Management Sizer as part of the MOM SP1 Performance and Sizing Kit, which is available from the MOM Web site at https://www.microsoft.com/mom/techinfo/deployment/default.asp.

Cc750905.mmpfsz46(en-us,TechNet.10).gif

Appendix C: SQL Server Installation for Microsoft Operations Manager Usage

The correct installation of the Microsoft SQL Server database is very important. If configured incorrectly, the consequences can be poor performance and violation of the two-minute alert insertion Service Level Agreement. The following can assist you in making the proper choices for installation.

The SQL Server database should never be installed on the same disk drive as the operating system, or the MOM subsystem. In addition, partitioning the C drive will not afford better performance, because the database is still on the same disk drive as the operating system and MOM, and the drive is still governed by the same disk controller. The MOM database should be placed on its own disk drives, and should be striped on more than one disk. The database log file should also be placed on a separate disk for further efficiency. Figure C-1 depicts how this is accomplished for a medium-sized installation of 250 managed computers:

Cc750905.mmpfsz47(en-us,TechNet.10).gif

Figure C: -1 Installing the SQL Server database

In addition, configuring SQL Server correctly for memory usage is crucial. In limited-memory situations, the SQL Server processes can overwhelm the memory usage and cause page faults to occur. To prevent this from happening, you can limit the SQL Server memory usage by configuring the memory to be limited, instead of dynamic. This can be accomplished as shown in Figure C-2.

Cc750905.mmpfsz48(en-us,TechNet.10).gif

Figure C: -2 Configuring SQL Server memory

To configure SQL Server for memory usage

  1. Start SQL Server Enterprise Manager.

  2. Navigate to the local computer item, right-click it, and then click Properties.

  3. In the SQL Server Properties dialog box, click the Memory tab.

  4. On the Memory tab, click Use a fixed memory size, and use the slider to set an appropriate memory size.

Appendix D: Counter Definitions

Object name

Property name

Definition

System

Processor Queue Length

The number of threads in the processor queue. There is a single queue for processor time, even on computers with multiple processors. Unlike the disk counters, this counter counts ready threads only, not threads that are running. A sustained processor queue of greater than two threads generally indicates processor congestion. This counter displays the last observed value only; it is not an average.

This counter is necessary to see what kind of contention is occurring on the CPUs of the system. An increase in this value to greater than 2 indicates that an additional CPU is needed, or that some of the workload should be considered for relocation. This counter is a general purpose queue counter.

 

Processes

The number of processes in the computer at the time of data collection. Notice that this is an instantaneous count, not an average over the time interval. Each process represents the running of a program.

This counter is useful for keeping track of the running threads in a system. This counter must be averaged over the delta for the measurement statistics to be useful. This counter is a general purpose incremental counter.

 

Threads

The number of threads in the computer at the time of data collection. Notice that this is an instantaneous count, not an average over the time interval. A thread is the basic executable entity that can execute instructions in a processor.

This counter is useful for keeping track of the running threads in a system. This counter must be averaged over the delta for the measurement statistics to be useful. This counter is a general purpose incremental counter.

Processor

% Interrupt Time

The percentage of time the processor spent receiving and servicing hardware interrupts during the sample interval. This value is an indirect indicator of the activity of devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network adapters and other peripheral devices. These devices normally interrupt the processor when they have completed a task or require attention. Normal thread execution is suspended during interrupts. Most system clocks interrupt the processor every 10 milliseconds, creating a background of interrupt activity. This counter displays the average busy time as a percentage of the sample time.

This counter is useful for keeping track of the time spent the CPU spends processing Interrupts. Considering that this counter can reveal gradual or sudden rises in interrupt time, which indicates free CPU time for processing workloads, it can be used as a general purpose counter.

 

% Processor Time

The percentage of time the processor is executing a non-Idle thread. This counter was designed as a primary indicator of processor activity. It is calculated by measuring the time that the processor spends executing the thread of the Idle process in each sample interval, and subtracting that value from 100 percent (each processor has an Idle thread that consumes cycles when no other threads are ready to run). It can be viewed as the percentage of the sample interval spent doing useful work. This counter displays the average percentage of busy time observed during the sample interval, which is calculated by monitoring the time the service was inactive, and then subtracting that value from 100 percent.

This counter is useful for keeping track of the time spent by the CPU processing anything. Considering that this counter displays the amount of time the CPU is busy, it is very useful in determining system usage in capacity studies, and performance studies. The calculation for this counter is: 1–Idle_Time

Memory

Available Bytes

The amount of physical memory available to processes running on the computer, in bytes. It is calculated by adding the space on the Zeroed, Free, and Standby memory lists. Free memory is ready for use. Zeroed memory is pages of memory filled with zeros to prevent later processes from seeing data used by a previous process. Standby memory is memory removed from the working set (physical memory) of a process on its way to disk, but it is still available to be recalled. This counter displays the last observed value only; it is not an average.

This is a general purpose counter used to track memory usage. If the user wants to produce a “% Memory Free Space” counter the user must know the amount of memory the computer has, and then perform the calculation:

Available Bytes/Total Memory = % Memory Free Space

Because this is a snapshot value, this counter must be averaged over the Available Bytes counter in the report database for many data points to provide an accurate value of Available Bytes or Memory Free Space Counters.

Note:This is a composite counter.

 

Page Faults/sec

The overall rate at which faulted pages are handled by the processor. It is measured in numbers of pages faulted per second. A page fault occurs when a process requires code or data that is not in its working set (its space in physical memory). This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory). Most processors can handle large numbers of soft faults without consequence. However, hard faults can cause significant delays. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval.

This is a general purpose counter used to track memory usage by the Page Faults that occur within the system. An indication of increased memory use would be a gradual or radical increase of Page Faults per second. The page fault interrupt is also indicated in the processor page faults and is part of that number.

Network Interface

Bytes Total/sec

The rate at which bytes are sent and received on the interface, including framing characters.

This is a general purpose counter. It shows network traffic, which can have a negative effect on transaction throughput, or message throughput.

This counter can either be collected or calculated by using the calculation:

Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec.

 

Current Bandwidth

An estimate of the current bandwidth in bits per second. For interfaces that do not vary in bandwidth, or for those where no accurate estimation can be made, this value is the nominal bandwidth.

This is a general purpose counter It shows network size in bits per second and is used in calculations pertaining to the network usage.

 

% Network Busy

A calculated counter that shows network usage. This is a general purpose composite counter.

This counter must be calculated by using the following calculation:

% Network Busy = Bytes Total/sec/Current Bandwidth Bytes

where

Bytes Total/sec = Bytes Sent/sec + Bytes Received/sec

and

Current Bandwidth Bytes = Current Bandwidth/8

 

Output Queue Length

Output Queue Length is the length of the output packet queue (in packets). If this is longer than 2, delays are being experienced, and the bottleneck should be found and eliminated if possible. Because the requests are queued by NDIS in this implementation, this will always be 0.

This is a general purpose counter.

Physical Disk

% Disk Time

The percentage of elapsed time that the selected disk drive is busy servicing read or write requests.

This is a general purpose counter. This counter can either be collected or calculated using the following equation:

% Disk Time = % Disk Read Time+% Disk Write Time or

1 - % Idle Time

 

Avg. Disk Queue Length

The average number of both read and write requests that were queued for the selected disk during the sample interval.

This is a general purpose counter.

Server

Bytes Total/sec

The number of bytes the server has sent to and received from the network. This value provides an overall indication of how busy the server is.

This is a general purpose counter.

 

% Server Network Busy

A calculated counter that shows network usage.

This is a general purpose counter. This counter must be calculated by using the following equation:

% Server Network Busy = Bytes Total/sec/Network Interface_Current Bandwidth Bytes

where

Network Interface_Current Bandwidth Bytes = CurrentBandwidth/8

Server Work Queues

Bytes Received/sec

The rate at which the server is receiving bytes from network clients on this CPU. This value is a measure of how busy the server is.

This is a general purpose counter.

 

Current Clients

Current Clients is the instantaneous count of the clients being serviced by this CPU. The server actively balances the client load across all the CPUs in the system. This value will always be 0 in the Blocking Queue instance.

This is a general purpose counter.

 

Queue Length

Queue Length is the current length of the server work queue for this CPU. A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.

This is a general purpose counter.

SQL Server:Buffer Manager

Buffer Cache Hit Ratio

Percentage of pages that were found in the buffer pool cache without having to incur a read from disk.

This is a general purpose counter.

SQL Server:Databases

Active Transactions

Number of active transactions for the database.

This is a general purpose counter.

 

Transactions/sec

Number of transactions started for the database.

This is a general purpose counter.

SQL Server:General Statistics

User Connections

Number of users connected to the system.

Process

% Processor Time

The percentage of elapsed time that all of the threads of this process used the processor to execute instructions. An instruction is the basic unit of execution in a computer, a thread is the object that executes instructions, and a process is the object created when a program is run. Code executed to handle some hardware interrupts and trap conditions are included in this count. On multiprocessor computers, the maximum value of the counter is 100 percent times the number of processors.

This is a general purpose counter.

 

Page Faults/sec

The rate at which Page Faults occur in the threads executing in this process. A page fault occurs when a thread refers to a virtual memory page that is not in its working set in main memory. This will not cause the page to be retrieved from disk if it is on the standby list and already in main memory, or if it is in use by another process with which the page is shared.

This is a general purpose counter.

Thread

% User Time

The percentage of elapsed time that this thread has spent executing code in user mode. Applications, environment subsystems, and integral subsystems execute in user mode. Code executing in user mode cannot damage the integrity of the Windows NT Executive, Kernel, and device drivers. Unlike some early operating systems, Windows NT uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. These subsystem processes provide additional protection. Therefore, some work done by Windows NT on behalf of your application might appear in other subsystem processes in addition to the privileged time in your process.

This is a general purpose counter.