Domain Sizing and Capacity Planning for Windows NT Server 4.0

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

By David B. Cross, Microsoft Consulting Services, Ken Burns, Microsoft Premier Support

Microsoft Windows NT Server 4.0

White Paper Version 2.0

On This Page

Abstract
Introduction
Hardware Configurations
Performance Monitor
Network Monitoring
User's Service Levels
Event Log Monitoring
Domain Size Limitations
General Recommendations
Appendix A - Terminology/Acronyms
Appendix B - Reclaiming Unused Space in the SAM Database
Appendix C - Additional Reading

Abstract

This paper describes some of the key characteristics and capabilities of Microsoft Windows NT Server 4.0 in supporting very large single or multiple master domains in the corporate environment.

Introduction

Overview

This white paper is designed to assist the network administrator and deployment planners in properly planning, managing, and monitoring Microsoft Windows NT Server 4.0 domains in the corporate environment.

Environment Assumptions

This white paper addresses the Windows NT Server 4.0 operating system only. At a minimum, Service Pack 3 should also be applied.

The data in this white paper is updated to reflect the changes in Service Pack 4. Note that Service Pack 4 for Windows NT 4.0 includes numerous performance improvements and eliminates various service memory leaks. For additional information, refer to the Readme.txt documentation provided with the service pack.

All network data in this white paper is based on the Ethernet network topology.

Hardware Configurations

This section identifies key hardware components and configurations that affect the design and characteristics of a domain.

Memory and Processor Requirements

Although a number of factors can affect the performance of a domain controller, the two most important aspects of hardware configuration for a given domain controller are the processor speed and the amount of physical RAM installed in the domain controller. The following table provides a guideline for the number of users a given registry and paged pool size will support, given a specified memory and processor configuration. It is assumed that no other services or Microsoft BackOffice applications are concurrently installed on the dedicated domain controllers.

Number of users

SAM size

Registry size

Paged pool size

CPU size

Pagefile size

Physical RAM

3,000

5

25

50

486DX/33

32

16

7,500

10

25

50

486DX/66

64

32

10,000

15

25

50

P, M, A

96

48

15,000

20

30

75

P, M, A

128

64

20,000

30

50

100

P, M, A

256

128

30,000

45

75

128

P, M, A

332

166

40,000

60

102

128

SMP

394

197

50,000

75

102

128

SMP

512

256

60,000

80

102

128

SMP

1 GB

512

Legend:

  • P = Intel Pentium processor

  • M = MIPS processor

  • A = Alpha processor

  • SMP = Symmetric Multiprocessor configuration

Note: SAM sizes may vary significantly from one environment to the next, depending on the number of changes that are applied to a SAM during a given period, as well as any compression routines that are applied. The table should be used as a general guideline only.

For additional information, see Microsoft Knowledge Base article 130914, "Number of Users and Groups Affects SAM Size of Domain."

Physical RAM Requirements

An exact formula for physical RAM size for a domain controller cannot be determined without proper analysis of an individual server and the environment in which it will operate. The table above gives guidelines administrators can follow, based on estimates of the number of users in a given domain. The formula for the minimum size of a pagefile is always:

Physical RAM + 12 MB

This size ensures that a given computer has enough pagefile space to save a memory dump or STOP error.

Note: The paged pool size is a very important Windows NT resource to monitor because the paged pool is a resource limited to a maximum size of 192 MB. However, the paged pool shares a common address space with the nonpaged pool, and the sum of the two pools may never exceed 256 MB. If the paged or nonpaged pool limits are exceeded because of memory leaks or faulty applications, a server may perform a crashdump or perform erratically.

Network Requirements

The network requirements for a domain depend on a number of variable factors such as:

  • LAN structure

  • WAN structure

  • Traffic analysis of LAN and WAN

  • Number of logon requests each second for each domain controller

A typical domain controller in a domain with 5,000 or fewer users should be able to reside on a 10-megabit Ethernet backbone. A domain controller in a domain with more than 10,000 users should reside on a Fast Ethernet (100-megabit or greater ) backbone. This recommendation is based on the need for users to receive timely WINS name resolution to a domain controller and the need to authenticate without delay during peak logon hours. Generally, replication traffic requirements do not dictate the need for a Fast Ethernet backbone. Please refer to the Network Monitoring section of this white paper for additional information.

Very Large Domain Controller Configuration

A very large domain controller can easily support 8,000 users. Note that hardware performance varies by vendor and the following is presented as a guideline of expected performance:

  • Dual Intel Pentium 200 processor

  • 256 MB RAM

  • Fast Ethernet (100-megabit) backbone

Note: For this white paper, no extrapolations have been made for the Alpha processor.

In a large domain with a consistent environment (that is, an environment in which the domain controller hardware is the same and has been configured in a consistent manner), it may be optimal to configure the registry size and paged pool sizes manually to ensure a consistent level of paged pool allocation from one domain controller to another. For additional information on how to perform this task, see Microsoft Knowledge Base article 126402, "PagedPoolSize and NonPagedPoolSize Values in Windows NT."

Determining Hardware Requirements

The number of accounts in the security accounts manager (SAM) database determines the hardware requirements for domain controllers. To determine the size of the SAM, view the SAM file in the following directory location:

SystemRoot\System32\Config\

Follow these general guidelines when planning for and optimizing domain controllers:

  • Each user takes approximately 1 KB of disk space.

  • Each global group with 300 or fewer users takes approximately 4 KB of disk space.

  • Each global group with more than 300 users takes the following amount of disk space:

  • 4 KB + (12 bytes * (number of users 300))

  • Each local group takes 512 bytes of disk space, plus 36 bytes for each member.

  • Each computer account takes 512 bytes of disk space.

Notes:

  • The SAM database (file system) uses fixed length record sizes of 1 KB, and the physical RAM required to read the SAM into memory (RAM) will be slightly smaller than the actual database file size.

  • The 1 KB size is an approximation; a user account may take more than 1 KB of disk space. The actual size will depend on how much data is included in each account. For example, a simple user account with just a user name, a password, and no descriptions or full names will be approximately 1 KB. However, a complex user account that has the maximum amount of information possible on every available input line for names, passwords, paths to home directories, and so on, can increase the size to 8 KB for each user.

For additional information, see Microsoft Knowledge Base article 186626, "Terminal Server and User Accounts/SAM Use."

Performance Monitor

The Windows NT Performance Monitor (Perfmon.exe) tool is a graphical tool for observing the performance counters on a computer running Windows NT. To use Performance Monitor, click Start, point to Programs, point to Administrative Tools, and then click Performance Monitor. Performance counters indicate the throughput, queue lengths, and congestion associated with devices and applications. Various objects in the system (such as memory, disks, and the CPU) update these counters. Applications can provide performance counters as well. Performance Monitor is the primary tool for monitoring performance on a computer running Windows NT. It can chart counters in real time, and can also save performance data to a log. For additional information, refer to the Performance Monitor Help file (Perfmon.hlp) or Chapter 10, "About Performance Monitor," in the MicrosoftWindows NT Workstation 4.0 Resource Kit.

The overall goal of this section is to identify key Performance Monitor characteristics of a domain relative to the health and stability of the domain.

Server Overhead

Note that Performance Monitor places little or no load on a computer running Windows NT Server. Additional CPU use caused by Performance Monitor being run locally or remotely should account for less than 1 percent of the total system processor time.

Monitor Intervals

Typically, a production domain should have Performance Monitor running a log of all Performance Monitor counters 24 hours a day, 7 days a week. This process may be automated; additional information can be found in the MicrosoftWindows NT Workstation 4.0 Resource Kit. The suggested capture interval for all counters is 5 minutes. This interval will generate a daily log file that can be stored and compared for trend analysis, incident troubleshooting, and service level assessment. On average, Performance Monitor generates a 12-MB log file each day (with all counters being recorded at a 5-minute interval).

The following sections are included to highlight some key Performance Monitor counters that can assist in monitoring the health and status of a domain controller.

Memory Object

Detecting Memory Bottlenecks

The following Performance Monitor counters are key in determining if the amount of physical RAM is a server bottleneck.

Object

Counter

Memory:

Available Bytes

The Available Bytes counter details the amount of virtual memory currently on the zeroed, free, and standby lists. Windows NT Server will try to keep at least 4 MB available at all times. If this counter is near 4 MB and the Pages/sec counter is high, there may not be enough RAM.

Note: If Windows NT Server is tuned to maximize throughput for file sharing, this limit can be as low as 1 MB.

Object

Counter

Memory:

Pages/sec

Pages/sec is the number of pages read from or written to disk to resolve hard page faults. Hard page faults occur when a process requires code or data that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. A high level of paging activity is acceptable (Pages/sec greater than 150), but if the paging activity is associated with low available bytes, a problem may exist.

Object

Counter

Memory:

Pool Nonpaged Bytes

Pool Nonpaged Bytes is the number of bytes in the nonpaged pool, an area of system memory (physical memory used by the operating system) for objects that cannot be written to disk but that must remain in physical memory as long as they are allocated. The system may be overloaded if this value is greater than 115 MB or if the sum of the paged and nonpaged pools totals 256 MB.

Note: This counter displays the last observed value only; it is not an average.

Object

Counter

Memory:

Pool Paged Bytes

Pool Paged Bytes is the number of bytes in the paged pool, an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. The system may be overloaded with a large SAM size or a large number of user sessions if this value is greater than 156 MB or if the sum of the paged and nonpaged pools totals 256 MB.

Note: This counter displays the last observed value only; it is not an average.

Object

Counter

Server:

Pool Nonpaged Failures

The Pool Nonpaged Failures counter records the number of times allocations from the nonpaged pool have failed. If this value is greater than 1, enough physical memory does not exist on the server.

Object

Counter

Server:

Pool Paged Failures

The Pool Paged Failures counter records the number of times allocations from the paged pool have failed. If this value is greater than 1, there is not enough physical memory or the paging file is too small.

Processor Object

Detecting CPU Bottlenecks

Two performance monitor counters are key in determining whether or not the CPU is a bottleneck on a particular server. They are:

Object

Counter

Processor:

% Processor Time

The % Processor Time counter is the percentage of time that the processor is running a nonidle thread.

Object

Counter

Server Work Queues:

Queue Length

The Queue Length counter in Server Work Queues is the current length of the server work queue for a particular CPU. This count is a snapshot; it is not an average over time.

The general rule-of-thumb is that, if an individual processor is running consistently above 85 percent and the average Server Work Queues length is consistently higher than 3, the CPU is throttling (limiting) the performance of the server. For a multiple CPU server (one with symmetric multiprocessing or SMP), the guideline is that the % Total Processor Time should be greater than 85 percent and the aggregate queue length should be greater than 2 times the number of CPUs in the server.

To capture a possible CPU bottleneck in Performance Monitor, an administrator must run Performance Monitor over an extended period of time as well as in a granular fashion. Extended monitoring may not highlight the points of time when counters are unusually or unacceptably high; these times can be better captured with granular monitoring. Therefore, it may be necessary to run multiple instances of Performance Monitor against a single server.

Monitoring CPU Usage on a Domain Controller

Depending on the replication schedule of a domain controller, both a primary domain controller (PDC) and a backup domain controller (BDC) will encounter high levels of CPU usage during times of replication on a large SAM domain controller. In general, no cause for alarm is warranted when CPU levels rise to 80 to 90 percent levels for several minutes during times of replication. Administrators should be concerned when the high level CPU usage extends for longer than 5 minutes and rises above 95 percent usage.

Interrupt Driven Bottlenecks

All hardware devices installed on a server generate interrupts that must be processed by a CPU. It is often necessary to monitor the number of such interrupts to determine if a CPU bottleneck is caused by hardware interrupts. These interrupts are monitored through Performance Monitor by watching the % Interrupt Time counter in the Processor object. A normal server without any load will have 100 to 200 interrupts each second. A server under heavy load may log as many as 9,000 interrupts each second.

Network Interface Object

Detecting Network Bottlenecks

The following five main Performance Monitor counters can assist in determining whether or not a network interface is a contributing bottleneck on a particular server.

Object

Counter

Network Interface:

Output Queue Length

Output Queue Length is the length of the output packet queue in number of packets. Generally, if this value is greater than 3 for sustained periods of time or for more than 15 minutes, the selected network interface may be a performance bottleneck.

Note: This value may not be valid in an SMP environment.

Object

Counter

Network Interface:

Bytes Total/sec

Bytes Total/sec is the rate at which bytes are sent and received on a selected network interface, including framing characters. If this value is close to the maximum transfer rate for a selected network and the Output Queue Length is greater than 3, the selected network interface may be a bottleneck.

Object

Counter

Network Interface:

Current Bandwidth

Current Bandwidth is an estimate of the interface's current bandwidth in bits per second (bps). For interfaces that do not vary in bandwidth, or for those where no accurate estimate can be made, this value is the nominal bandwidth. This counter is used primarily with Bytes Total/sec to determine network usage levels.

Object

Counter

Network Segment:

% Network Utilization

The % Network Utilization counter details the percentage of network bandwidth in use on a given network segment.

Object

Counter

Network Interface:

Packets Outbound and Received Errors

Packets Outbound and Received Errors is the number of inbound packets that contained errors preventing them from being delivered to a higher-layer protocol. If the Packets Outbound and Received Errors is greater than 1, the selected network interface may be experiencing errors.

Note: The Network Interface and Network Segment objects are not installed by default on Windows NT Server. The SNMP Agent service and the Network Tools and Agent must be installed prior to collecting these counters.

Ethernet Network Usage

The percentage of Ethernet network usage at which performance degrades varies from environment to environment. Typically, severe performance degradation occurs when network usage rises above the 50 to 70 percent range.

Network-generated Interrupt Requests

In a multiple CPU computer running Windows NT Server, a single CPU may become overloaded servicing network-generated interrupt requests. Load balancing across multiple processors occurs only when multiple network interface cards (NICs) are also installed. The reason for this is that a single CPU is assigned to handle interrupts for a particular NIC at boot time.

Note: Multiple NICs servicing the same network segment will not affect network performance if the network segment is already at saturation.

Server Object

Determining the Number of Domain Controllers

A conservative recommendation for domain controller usage in a production environment is to have one domain controller available for every 2,000 user accounts. This estimate is outside the minimum of one PDC and BDC for any domain for fault tolerance purposes. One BDC for every 2,000 user accounts should provide enough domain controllers to handle normal logon validation requests. To ensure that there are enough domain controllers to service a user's logon needs, Performance Monitor counters are available to monitor logon requests.

Object

Counter

Server:

Logon/sec

This counter records the number of logon requests each second. The counter includes both successful and unsuccessful attempts for interactive, network, and service account logon requests.

Object

Counter

Server:

Logon Total

This counter details the total number of interactive, network, and service account logon requests both the successful and unsuccessful attempts since the server was last started.

Remote Administration and Monitoring

Many tasks such as remote administration and performance monitoring can be performed remotely in a Windows NT domain. Often, command-line tools and utilities can be more efficient in a WAN or RAS environment for large domains. For additional information on this topic, see the Microsoft white paper, "Windows NT 4.0 Remote Troubleshooting and Diagnostics."

Network Monitoring

Network monitoring is an important consideration for both troubleshooting and network planning. Network monitoring of key processes, triggers, or events may dictate necessary requirements for server hardware or network bandwidth. Likewise, isolation of a server bottleneck or a performance problem may require "sniffing the network" to determine causes or factors. One of the key tools included with Windows NT Server 4.0 is the Network Monitor tool, also known as Netmon.

Network Monitor Overview

Network Monitor is a graphical tool included with both Windows NT Server 4.0 and Systems Management Server that monitors the network data stream, which consists of all information transferred over a network at any given time. Prior to transmission, this information is divided by the network software into smaller pieces, called frames or packets. Each frame contains the following information:

  • The source address of the computer that sent the message

  • The destination address of the computer that received the frame

  • Headers from each protocol used to send the frame

  • The data or a portion of the information being sent

To ensure that security is maintained on your Windows NT network, Windows NT Network Monitor displays only those frames sent to or from your computer, broadcast frames, and multicast frames. For additional functionality, the version of Netmon included with Systems Management Server must be used.

For instructions on how to set up Network Monitor, see Microsoft Knowledge Base article 148942, "How to Capture Network Traffic with Network Monitor."

Two primary processes generate a network load across a LAN or WAN. They are:

  • Logon authentication

  • SAM replication

The following sections describe key aspects of how a domain may affect a network infrastructure.

Primary Domain Controller Dictates Replication

The primary domain controller (PDC) of a given domain is the focal point for all operations that occur in the domain. Before the replication process starts, the PDC requires a large amount of CPU time to calculate what changes need to be replicated where in the domain. Normally, this replication process will work without exception; however, if the Change Log is overrun before all backup domain controllers (BDCs) are synchronized, a full "complete replication" synchronization is then required. This activity can be a cascading event if all BDCs then need a full synchronization instead of the delta changes. The Change Log size should be increased as stated in the Change Log section of this white paper.

It is almost always recommended that a PDC be a symmetric multiprocessing (SMP) computer in a large domain environment. If a PDC is running in a single processor environment, the replication process requires a high percentage of CPU time. If a bad password check, passthrough authentication, or similar task, occurs during the replication process, the PDC halts the calculation process to answer the external request. After the external request has been completed, the PDC must start the replication calculation process from the start. For this reason, to prevent this problem, it is highly recommended that the PDC be a multiprocessor computer.

When you use more than one processor in a PDC, the lowest processor (the processor with the lowest number) will work to process external requests. All other processors will focus on the replication calculation process and will not be halted because of external requests. The lowest processor will assist in the replication calculation process when the exceptions (external requests) have been completed. However, because the lowest processor starts over if another processor completed the synchronization, the lowest processor will not create duplicate replication network traffic and will kill the process.

With a large number of BDCs (more than 50), the PDC will take resources for a longer period of time. This period of time will increase exponentially with the number of accounts, BDCs, and changes. The PDC is designed to handle this stress and will never flood the network with changes. It will take its time and pass out the updated information in an orderly fashion.

Note that the starting and stopping of replications never flood the network. The PDC breaks the deltas into small manageable packets of no more than 255 deltas in a remote procedure call (RPC) transmission. Therefore, the packets pulsate through the network, with the highest bandwidth being the local segment that the PDC is on, but it will never flood that segment.

The NetLogon Service and How It Works

The NetLogon service maintains and controls both replication and authentication services. Within the NetLogon service, the PDC maintains the list of account deltas in a circular list of 64 entries (the buffer holding 4,096 bytes). These entries are account deltas of 3 types:

  • User delta: changes to user accounts

  • Group delta: changes to group accounts

  • Delta mod: changes to security settings in Net Admin

Each entry in this list of account deltas has an update value corresponding to it. As the NetLogon service runs, the following sequence is sent to make updates to the User Accounts Subsystem (UAS):

  1. The PDC sends a /PULSE on to the network as a second-class mailslot message (\\*\Mailslot\Net\Netlogon). The /PULSE parameter is ignored unless the server is a PDC. This message contains the value of the most recent update made to the UAS (known as the Y value), the date and time of the notice, the name of the domain, and the values for /PULSE and /RANDOMIZE. A BDC or member server may miss this call because the call's delivery is not guaranteed.

  2. When a BDC or member server receives this message from the PDC, it compares this Y value to its current updated value (known as X). The BDC's or member's UAS will need to be updated if the following equation is true:

    X + 1 < Y

    The BDC or member then sends an I_NetAccountDeltas() call and sends a Server Message Block (SMB) to the PDC requesting that a null session be established and a UAS update commence. BDCs and members use the /RANDOMIZE parameter to determine how long they should wait before requesting updates from the PDC. This is to prevent the PDC from receiving a flood of requests from all the replicated UAS systems. If, however, Y is less than X, the BDC sends an I_NetAccountSync() call, requesting that the entire UAS be replicated because the BDC thinks that it has missed a change in the delta list. BDCs and members never poll the PDC for updates they always wait for update notices from the PDC.

    Note: Members and BDCs assume the PDC has crashed if they don't receive a pulse from the PDC within one minute of the time they expect a pulse. To confirm this failure, the BDC or member attempts an I_NetAccountDeltas() call. If the failure is confirmed, a message is posted in the error log reporting the PDC crash. After 60 minutes, the service clears the primary flag and treats the next pulse as a new failure.

  3. The PDC can then grant a null session to the requesting server and start transmitting the account deltas to that server until the following equation becomes false:

    X + 1 < Y

Change Log

The Change Log maintains a sequential record of any new or changed passwords, new or changed user and group accounts, and any change in the associated group memberships and user rights. Also, any additions or deletions of machine accounts or domain controllers in the domain are recorded. The Change Log is stored both in memory and on the PDC hard disk at %SystemRoot%\Netlogon.chg. The Change Log is 64 KB in size by default, and can be as large as 4 MB. The default size of 64 KB can support approximately 2,000 changes, and the maximum size can support approximately 130,000 changes. A change averages 32 bytes in size. The size of the Change Log should not degrade performance on domain controllers with 64 MB or more of RAM.

The size of the Change Log determines how many changes can exist before a full synchronization is required on a BDC. When a BDC requests changes, the changes that occurred since the last synchronization are copied to the BDC. Because the Change Log is a circular log, only the most recent changes exist in the log at any given time. Older changes are sequentially replaced with newer changes. If a BDC is offline for a long period of time or if network connectivity problems prevent BDC to PDC communication, the number of changes may exceed the size of the Change Log and a full synchronization must occur. Full synchronization will also occur if more changes are recorded than the log file size can hold at one time.

Therefore, it is sometimes necessary to manually set the ChangeLogSize to a higher value to prevent unnecessary full synchronization events. The Windows NT registry can be modified to change the default size of the Change Log in the following key:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \Services \NetLogon \Parameters

Value:

ChangeLogSize REG_DWORD

Range:

65536 to 4194304 (bytes)

Default:

65536

Note: A domain controller must be restarted before Change Log modifications will take effect.

Partial and Full Synchronization

Two types of synchronization occur within the NetLogon replication service: fullsynchronization and partialsynchronization. Both types occur automatically and are controlled by the PDC. Partial synchronization replicates all changes to the SAM that have occurred since the previous partial or full synchronization. This type of synchronization occurs by default every five minutes and includes all changes recorded in the Change Log. Full synchronization replicates the entire SAM to a given BDC. Full synchronization occurs whenever a new BDC is added to a domain or when more changes have been made on the PDC than can be recorded in the Change Log. For additional information on replication and the Change Log, see The NetLogon Service and How It Works section in this white paper.

Events That Cause Immediate Replication

Security accounts manager (SAM) and local security authority (LSA) replication can fall into a number of categories, including immediate (or urgent) replication. Most replication handled by the NetLogon service occurs at set intervals, but certain types of account or policy changes are considered urgent and must be handled immediately, causing an ANNOUNCE_IMMEDIATE event to be generated and acted upon by the PDC. The following replications are considered urgent:

  • Changing the account lockout policy.

  • Changing the domain password policy.

  • Changing the password on a machine account.

  • Replicating a newly locked-out account.

  • Changing an LSA secret (essentially the "trusting" side of changing the machine account password).

These changes are immediate by necessity. For example, if a workstation were to change its machine account password and then lose its connection to its domain controller, it would not be able to connect to any other domain controller until the replication occurred.

Machine Account Replication

For each Windows NT Workstation computer that is a member of a domain, there is a discrete communication channel (for example, the secure channel) with a domain controller. The secure channel's password is stored along with the computer account on the PDC, and is replicated to all BDCs. The password is also in LSA secret $machine.acc of the workstation. Each workstation owns such secret data.

Windows NT Workstation changes this password every seven days. Before the password change is considered successful, it must be registered with the PDC. If the PDC is not accessible, Windows NT Workstation caches the new password for three more days and continues the attempt to register it with the PDC. The password on the Windows NT Workstation computer will expire if seven days have passed and the workstation does not send a secure channel password change to the PDC. Computer account password changes are marked as "Announce Immediate" so that each time a computer account password is modified, a replication takes place immediately.

For example, if a domain has 1,000 workstations, a computer account password change will occur every:

1 week / 1,000 = 7 * 24 * 60 / 1,000 minutes = 10 minutes

Therefore, a SAM replication takes place every 10 minutes, regardless of the replication interval defined on the PDC (for example, with the Pulse and PulseMaximum registry settings). Replication may be expensive in situations where BDCs are segmented into different subnets, interconnected by routers.

For additional information on how to modify these settings, see Microsoft Knowledge Base article 175468, "Effects of Machine Account Replication on a Domain." For additional information on how to disable machine account password changes, see Microsoft Knowledge Base article 154501, "How to Disable Automatic Machine Account Password Changes."

User Accounts Database Synchronization

User accounts database synchronization occurs on three databases maintained by the system: the security accounts manager (SAM) accounts database, the SAM built-in database, and the local security authority (LSA) database. Contents of these databases are listed in the following table.

Database

Description

SAM accounts database

Contains the user and group accounts that the administrator creates. Also includes all built-in global groups and computer accounts added to the domain, such as domain controllers and Windows NT Workstationbased computers.

SAM built-in database

Contains the built-in local group accounts, such as Administrators, Users, and Guests.

LSA database

Contains the LSA secrets that are used for trust relationships and domain controller computer account passwords. Also included in the LSA database are the account policy settings configured by the administrator.

Replication Intervals

Domain replication intervals are dictated by a number of parameters that are defined in the registry of the PDC. In a large domain or in a domain that encompasses a large wide area network (WAN), manual changes to these parameters may be necessary to optimize the performance and efficiency of the domain replication process. All of the parameters can be found in the HKEY_LOCAL_MACHINE \Services \Netlogon \Parameters section of the registry.

The following table details some of the key parameters and how they may be used.

Parameter

Range

Default

Description

Pulse
(REG_DWORD)

60 to 172,800
(seconds)

300

The Pulse parameter dictates the interval at which the PDC will check the Change Log for updates and send synchronization messages to the BDCs that require updates from the PDC. The PDC does not send messages to BDCs that do not require synchronization. The PDC will also send a mandatory message (or pulse) to each BDC whenever the PDC is restarted or the NetLogon service is started. Increasing this parameter reduces the number of times a PDC will notify the BDCs of changes to the SAM.

PulseConcurrency
(REG_DWORD)

1 to 500
(domain controllers)

10

The PulseConcurrency parameter dictates the maximum number of domain controllers to which the PDC may simultaneously send a pulse message. Increasing this parameter decreases the time it takes to synchronize a large domain. However, it creates a corresponding increased load on the PDC and the network.

PulseMaximum
(REG_DWORD)

60 to 86,400
(seconds)

7,200

The PulseMaximum parameter dictates the frequency with which the PDC will send a mandatory message to each BDC. The PDC sends a pulse even if no changes have been made to the PDC. Increasing this parameter reduces the number of times a PDC will try to contact each BDC.

PulseTimeout1
(REG_DWORD)

1 to 120
(seconds)

10

The PulseTimeout1 parameter defines the length of time a PDC will consider a BDC to be responsive and available. If a BDC does not respond within the interval, the PDC will not count the BDC in the PulseConcurrency limit. Increasing this parameter allows more time for BDCs to respond in a WAN environment. However, this increase may cause a partial replication to take longer to complete. Decreasing this parameter may incorrectly flag BDCs as unresponsive by the PDC.

Pulse Timeout2
(REG_DWORD)

60 to 3,600
(seconds)

300

The PulseTimeout2 parameter defines the length of time a PDC allows for a BDC to complete partial replication. A BDC resets this parameter with the PDC every time it makes an RPC call with the PDC during replication. Increasing this parameter allows BDCs additional time to perform replication in a WAN environment.

Relative Impact of Replication Traffic on the Network

When a BDC requests changes, it informs the PDC of the last change it received so that the PDC is always aware of which BDC needs to be updated. If a BDC is up-to-date, the NetLogon service on the BDC does not request changes.

Synchronization of the user accounts databases occurs:

  • When a backup domain controller is installed or restarted in the domain.

  • When forced by the administrator using Server Manager.

  • Automatically by domain controllers, depending on the registry configuration.

Replication produces approximately the following traffic:

  • 1 user account = 1 KB

  • 1 group definition = 1 KB

  • 1 group member replication = 8 bytes * (number of users)

Note: All members of a group are always replicated after you add or remove a user to or from a group.

The total replication traffic generated by a given domain is affected by a number of factors such as the number of user accounts in a domain, the number of changes each day, and so on. The following tables show some basic guidelines with regard to replication traffic and how these guidelines may affect a network:

Number of user accounts

Password change interval

Average number of changes each day

Replication traffic each day for each BDC

10,000

90 days

111

111 KB

20,000

90 days

222

222 KB

10,000

60 days

166

166 KB

20,000

60 days

333

333 KB

10,000

30 days

333

333 KB

20,000

30 days

666

666 KB

Note: New user accounts will generate approximately 1 KB per day per BDC for every new account created.

Number of machine accounts

Number of changes each day

Replication traffic generated each day for each BDC

10,000

1,428

1,429 KB

20,000

2,856

2,859 KB

30,000

4,285

4,287 KB

Number of local groups changed each day

Average number of users in each group

Number of changes each day

Replication traffic each day for each BDC

100

100

10,000

354 KB

200

100

20,000

706 KB

100

500

50,000

1,760 KB

50

1,000

50,000

1,760 KB

Number of global groups changed each day

Average number of users in each group

Number of changes each day

Replication traffic each day for each BDC

100

100

10,000

119 KB

200

100

20,000

237 KB

50

500

25,000

295 KB

50

1,000

50,000

588 KB

Replication Frequency

As shown in the tables above, user accounts database synchronization can generate large amounts of network traffic if many updates are made to the accounts database. The frequency of synchronization and amount of traffic generated depends on configuration of the NetLogon service, which is responsible for carrying out synchronization. The NetLogon service of the PDC defines the pulse (replication) frequency in seconds in the following registry key:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \System \CCS \Services \NetLogon \Parameters

Value:

Pulse REG_DWORD

Range:

60 to 172,800 seconds (48 hours)

Default:

300

All SAM or LSA changes made within this time are bundled together. After this period has elapsed, a pulse is sent to each BDC needing the changes. No changes are sent to a BDC that is up-to-date. Increasing this value on the PDC reduces the number of replications between the PDC and the BDCs. Every BDC will be sent at least one pulse at this frequency, whether or not its database is current.

Note: Replication takes place immediately if a change is made in LSA secrets for example, when adding a workstation to the domain or changing trust relationships.

For additional information, see Microsoft Knowledge Base article 150350, "NetLogon Maximum Value of Pulse Should Exceed 3600."

Forcing Full Synchronization

In Windows NT 3.5x, full synchronization replication could be forced by selecting a BDC in Server Manager and clicking Synchronize with Primary Domain Controller on the Computer menu. In Windows NT 4.0, this procedure triggers a partial synchronization if both the PDC and BDC are running Windows NT Server 4.0. Unless otherwise manually initiated, a full synchronization occurs only when the NetLogon service is started on a BDC. This synchronization is performed by the BDC as it makes a request for a full update of the user accounts subsystem (UAS) from the PDC by making a NetAccountSync() call.

Full synchronization can be manually initiated by typing the following command at a command prompt on the BDC:

net accounts /sync

Full synchronization can also be initiated by using the NLTEST utility from the Windows NT Server 4.0 resource kit. Use of the /sync option forces the BDC to dump the current copy of the SAM and request a new one:

nltest /sync /server: BDC_name

Note: The administrator can select to receive only the changes since the last replication by replacing the /sync parameter with the /repl parameter.

Enabling Full SAM Database Synchronization Every Time Windows NT Starts

By default, when a Windows NT BDC starts, it doesn't attempt to synchronize the SAM database with the PDC until the replication interval expires. Even then, the BDC performs only a partial synchronization by default. You can add the Update value (data type REG_SZ) to the following registry key to guarantee that a full synchronization occurs every time a Windows NT 4.0 domain controller starts:

HKEY_LOCAL_MACHINE \SYSTEM \Current Control Set\Services\Netlogon\Parameters

You can supply two settings, Yes and No, to the Update value. A setting of Yes enables the full synchronization option, and a setting of No disables this option.

Verifying That Replication Has Completed

The following command can be used to verify that synchronization has occurred, without actually initiating synchronization:

nltest /bdc_query : DomainName

Replication Governor

The ReplicationGovernor registry entry defines both the size of the data transferred on each call to the PDC and the frequency of those calls. For instance, setting ReplicationGovernor to 50 percent will use a 64-KB buffer rather than a 128-KB buffer, and will have a replication call outstanding on the network a maximum of 50 percent of the time. In effect, a 64-KB buffer will slow down the process or replication as a whole.

Warning: If the ReplicationGovernor setting is set too low for the environment, replication may never complete successfully.

It is theoretically possible to have different replication rates at different times throughout the day. To do this, adjust the ReplicationGovernor parameter in the registry to restart the NetLogon service from within an AT script. The following registry key details how the ReplicationGovernor parameter may be set:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \Services \NetLogon \Parameters

Value:

ReplicationGovernor REG_DWORD

Range:

0 to 100 (Percent)

Default:

100

Note: If the parameter is anything but 100 percent, it needs to be set on each BDC.

License Service Replication

The license service performs licensing replication between servers. Data moves from BDCs and member servers to the PDCs, and then optionally from the PDCs to an enterprise server that maintains licensing information across the whole network. This replication, by default, is performed once every 24 hours. If, for some reason, the BDC cannot connect to the license service on the PDC, the BDC will continue to attempt replication once every 15 minutes until it is successful. It is important to note that the license service can cause significant replication traffic if it is not configured or managed properly.

Registry Settings

The following Windows NT registry setting can be used to disable the License Service:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \Services \LicenseService

Value: Start

Disable=0x4

Automatic=0x2

Manual=0x3

Note: Windows NT does not need to be restarted before these changes take effect.

Theoretical Domain Limitation Because of Replication

Although no code limitation prevents the number of BDCs in a domain, replication time and traffic sets real world limits on the actual number that can feasibly be deployed across a WAN.

Two issues affect how NetLogon functionality relates to the number of BDCs in a single domain: the NetLogon algorithm is linear, and the PDC attempts to replicate to only 20 BDCs at a time. Over a local area network (LAN), NetLogon functionality can technically support 500 to 1,000 BDCs in a single domain. Even though the number is slightly lower over slower links (such as RAS, 19.2, 56K, or 128K) NetLogon functionality can technically handle up to 700 BDCs in such an environment. However, in both scenarios, this is an ideal scenario where very few SAM changes occur over longer periods of time. Microsoft does not recommend having a large number of domain controllers except in environments where network topology dictates the design. The preferred design is to have fewer, more powerful domain controllers connected over fast network links. Customers should carefully examine the use of resource domains for satellite offices and similar scenarios.

Note: All members of a group are replicated after adding or removing a user to or from a group.

Domain Trust Traffic

Under certain circumstances, it is possible for two PDCs of two domains with a trust relationship to generate traffic every 15 minutes. The scavenge interval on a domain controller is set in the following registry key:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \System \CurrentControlSet \Services \NetLogon \Parameters

Value:

ScavengeInterval REG_DWORD

Range:

60 to 172,800 seconds (48 hours)

Default:

900 (15 minutes)

This parameter defines the time interval during which NetLogon does miscellaneous work (on the PDC and on the BDCs), such as the following:

  • Finding a domain controller.

  • Determining if a password on a secure channel needs to be changed.

  • Determining if a secure channel has been idle for too long.

  • On domain controls, sending a mailslot message to each trusted domain for a domain controller (DC) that hasn't been discovered.

  • On the PDC, attempting to add the <DomainName>[1B] NETBIOS name if it already has not been added.

Note: None of the above operations are critical and the registry parameter may be tuned to optimize the use of leased lines such as ISDN.

Effect of Multihomed Domain Controllers

Multihomed domain controllers have a significant impact on browsing services in Windows NT. Because the browser service does not merge networks, the PDC cannot be multihomed. Each browser service bound to each interface operates independently, and the PDC maintains a "separate" cumulative list on each interface that is not merged. A master browser that exchanges lists with the PDC on one interface will not obtain servers discovered by a master browser that is exchanging lists on the other interface.

Windows NT 4.0 introduced the Unbound Bindings setting, which can be used to prevent the PDC from directly gathering a browse list on more than one interface. Unfortunately, this setting does not force the master browsers in the domain to use only the bound interface card. If WINS is used to provide the IP address for the master browser to find the PDC, there is no way of guaranteeing that the correct interface will be chosen. This limitation cannot be overcome with Windows NT 4.0; to guarantee that the PDC can merge a single domain-wide list, the PDC must not be a multihomed computer.

Likewise, master browsers cannot be multihomed. Because only one IP address is maintained for session establishment to a computer name, and the PDC communicates with a master browser based on its computer name alone, the PDC can only collect the local list of servers discovered by the multihomed master browser from one of its interfaces.

For additional information, see Microsoft Knowledge Base article 181774, "Multihomed Issues with Windows NT."

Application Services' Impact on Network

Although domain services have an impact on the performance of a LAN or WAN, typically, application services and network traffic are the primary users of network bandwidth. Planning a Windows NT networking environment really starts with thinking about the applications that will be used across the LAN and WAN. Some considerations to keep in mind:

  • Are the applications network aware?

  • Do the applications require high WAN bandwidth?

  • Does the application log on, on behalf of the user?

  • If so, how aggressively does it retry on failure?

For additional information about understanding how applications can cause significant network problems, see Microsoft Knowledge Base article 184858, "SMS: CLIMON Consumes PDC lsass Resources When Password Expired."

User's Service Levels

This section provides some basic guidelines for determining an appropriate service level for a general Windows NT 4.0 environment. It should not be construed as a final rule of configuration or guarantee of performance because the service level is affected by a number of factors such as:

  • Hardware environment

  • Software environment

  • Network infrastructure

  • Health of domain

  • Size of domain

Definition

Level of service is defined in this white paper as one of the following conditions:

  • Full availability (20 seconds or less to authenticate on a non-saturated LAN). Full availability means that all clients may authenticate to the domain and access local or trusted resources with no error or timeout related delay.

  • Diminished capacity (more than 20 seconds). Diminished capacity means that a user may receive intermittent logon/network errors or delays in authentication.

  • Denial of service (domain controller not available). Denial of service means that the system is incapable of providing logon or authentication services to a client or user.

Performance of Windows NT Administrative Tools

Note that the performance of standard Windows NT 4.0 administrative tools such as User Manager and Server Manager will have linearly reduced performance based on the number of user accounts (administered by User Manager) and machine accounts (administered by Server Manager) in a domain. Both User Manager and Server Manager will require longer periods of time and increased network bandwidth to load a particular domain context into the active administrative window. By default, User Manager refreshes every 10 minutes. This may place an unwanted load on a PDC or cause network congestion when it operates across a WAN. Administrators may find that third-party utilities and command-line functions can more easily manage a large domain than the standard Windows NT administrative tools.

Note: The manageability of a large domain through the standard tools may become difficult and cumbersome above 25,000 user or machine accounts, based on the above information. The most efficient tool in a large domain environment will almost always be the NET command-line tools such as NET USER and NET GROUP.

Service Pack 3 Computers in a Service Pack 4 Domain

Only computers with Service Pack 4 or later should be used to administer domains with Service Pack 4 or later domain controllers, because of limitations of the administrative tools in Service Pack 3 and earlier. A computer with Service Pack 3 or earlier may not be able to perform some functions, such as promoting a BDC to a PDC. For additional information, see Microsoft Knowledge Base article 197488, "Access Denied When Attempting to Promote a BDC to PDC."

Increasing Simultaneous Logon Validations

By default, domain controllers have their Server service configured for Maximize Throughput for File Sharing. Although this is the proper setting for a file and print server, it does not provide the best performance for a domain controller that needs to validate logon requests. Instead, configure the Server service of all domain controllers for Maximize Throughput for Network Applications. By properly configuring this option, most domain controllers can triple their maximum number of simultaneous logon requests, from about 6 to 7 each second, to almost 20.

Load Balancing Across Domain Controllers

Windows NT does not load balance across the various domain controllers that are available to a client computer. However, several methods may be employed to reduce a disproportionate load on one or more domain controllers in a domain. Logon or network validation problems are indications that one or more domain controllers may be overloaded. You can monitor the domain controllers to detect this condition:

  • In Server Manager, you may see a large number (hundreds) of connections to Pipe\LSARPC.

    In Network Monitor captures of network traffic, you may see:

    • Connections that are frequently not made or transactions that are performed on Pipe\Lsarpc.

    • Server Message Block error message codes that may include STATUS_IO_TIMEOUT and STATUS_PIPE_NOT_AVAILABLE.

    In Performance Monitor, you may see:

    • A large Handle Count in the LSASS process.

    • A large Thread Count in the LSASS process (300700 or more).

    • A combined high CPU usage (approaching 100 percent) by the LSASS and System processes.

Note: As the LSASS Thread Count increases, the proportion of CPU time used by the System process versus that used by LSASS increases because of increased context switching between the LSASS threads.

For additional information on using Performance Monitor or Network Monitor to monitor a domain, please see the Performance Monitor or Network Monitor section in this white paper.

Using a Preferred Domain Controller

The Secure Channel client (SC_Client) on Windows NT 4.0 uses standard protocol behavior to determine the possible domain controllers (DCs) with which to set up the secure channel. It then requests a secure channel to these DCs and will establish one with whichever is first to answer. This usually results in a secure channel with the DC physically closest, or over the fastest link. Two problems may result because of this standard protocol:

  • If the local DCs are too busy, or environmental conditions on the network delay the response from the local DCs, a remote DC or one on a less-than-optimal link could be the first to respond and set up the secure channel.

  • A very responsive DC could end up with an uneven share of the secure channels and, hence, the authentication load.

A channel remains in effect until either a workstation or domain controller is removed from the network or the DC is too busy to service the channel request. Generally, the first signs that the resource domain and master domain DCs are not doing load-balance validation are user complaints of slowness when logging on, slow resource authentication, or logon scripts taking a long time to run. Secure channels can be monitored through the NLTEST utility included in the Windows NT resource kit or by applying the SETPRFDC utility to NetLogon to assign a preferred DC to a client. For additional information on the SETPRFDC utility, see Microsoft Knowledge Base article 167029, "Resource and Master Domain DCs Do Not Load-Balance Validation."

Note: The SETPRFDC utility may have additional benefits in environments where resource domains (such as Microsoft Exchange Server resource domains) require passthrough authentication. You can use SETPRFDC to allow an administrator to designate specific resource domain BDCs and account domain BDCs to handle authorization traffic and effectively manage the load on the most capable domain controllers.

Domain Controller Discovery

In a network that uses WINS for name resolution (h-node), a workstation's logon server is selected as follows:

  1. A workstation attempts to find a logon server (DC discovery). The NetLogon service first checks to see if it already has cached information about domain controllers in the target domain. If NetLogon finds one or more domain controllers in the target domain, it attempts to contact those computers first.

  2. If NetLogon does not discover any domain controllers in the target domain, or if none of the known domain controllers is reachable, NetLogon instructs NetBT to resolve the TARGETDOMAIN [1C] name. NetBT uses standard node-type name resolution (default h-node on a WINS-enabled system) to resolve the name: NBT Cache, WINS, Broadcast, Lmhosts, HOSTS, and DNS. As soon as NetBT discovers one or more IP addresses for the name, it quits and returns the found address or addresses to NetLogon.

  3. The workstation broadcasts a SAM logon request on its local subnet and immediately sends a directed logon request to each address that was discovered by NetBT. In this manner, any local domain controllers are given preference during DC discovery.

WINS is designed to return the 1C name in an order likely to be the most beneficial to the client. The 1C list is always sent to the client in a specific order PDC first, then BDCs that register with that WINS server, and then replicated BDCs. The assumption is that BDCs that register with the same WINS server that the client uses are probably closer than BDCs that register with other WINS servers. The WINS server returns a maximum of 25 IP addresses and the workstation sends a directed SAM logon request to each listed DC IP address.

There is no guarantee that the local server, if one exists, will respond first. If the local server is slow to respond to a logon request because of load or resource limitations, and a domain controller across a router is quick to respond, the computer is validated by the remote domain controller, even though the remote domain controller would provide a lower level of service because of bandwidth or usage limitations.

The best way to ensure the use of a DC on a local segment is to change the node type of the clients on that segment to b-node. B-node uses NetBT broadcasts to find the DC, which heavily favors the local DC (through broadcast timeouts) without removing any fault tolerance.

If selecting a local logon server is secondary to load balancing logons across a number of domain controllers, a method and hotfix exists to control this behavior through the employment of "<1C> List Randomization." The WINS server rotates the list of IP addresses in WINS TARGETDOMAIN [1C] and name query responses a random amount each time a query is made. For additional information on how to enable this capability, please refer to Microsoft Knowledge Base article 231305, "WINS Randomize1cList Feature Aids Load-Balancing Between DCs."

When it comes to client logon requests, the PDC is often the most powerful domain controller and the most responsive of all the domain controllers. Coupled with the fact the PDC is often the first WINS 1C record to be returned, this causes most of the client computers to connect to the PDC for authentication requests.

Note: In some environments, Microsoft Windows 95 clients may prefer the PDC to other domain controllers for the reasons previously cited.

Using Lmhosts to Locate Multiple Preferred Logon Servers

The Lmhosts file on a client workstation can be used to specify preferred domain controllers to use when logging on. NetBT has been modified to support multiple domain controller entries in the Lmhosts file. Lmhosts lookup must be enabled on the client computer first to take advantage of this feature. Entries must appear as shown here:

10.1.1.1 example1 #PRE #DOM:mydomain
10.1.1.2 example2 #PRE #DOM:mydomain

With the above Lmhosts file entries, a computer attempts to use the IP addresses 10.1.1.1 and 10.1.1.2 to log on to the domain called mydomain.

For additional information on using Lmhosts for preferred logon servers, see Microsoft Knowledge Base article 192064, "Using LMHOSTS to Locate Multiple Preferred Logon Servers."

Monitoring Secure Channels

In a WAN environment, where account domains and resource domains are connected across wide area network (WAN) links, it is extremely important to monitor the secure channels between domain controllers. In a WAN environment, secure channels may be dropped because of network connectivity problems. Administrators should use the Nltest or Dommon utilities in the Windows NT resource kit to ensure that secure channels between resource domain controllers and account domain controllers remain intact.

CPU Performance on a Domain Controller

As the SAM size grows on a domain controller, especially a PDC, the performance of the server is degraded if the physical memory and CPU resources are not adequate for the associated SAM size. An undersized domain controller can experience short periods of peaked (greater than 90 percent) CPU activity during times of replication that may cause concurrent periods of NetLogon unavailability. Excessively long periods of high CPU usage (greater than 90 percent usage for longer than 5 minutes) associated with the LSASS process may be indicative of one or more conditions:

  • Inadequate processor speed or number of processors

  • A SAM that is undergoing a large number of changes to the database

  • A fragmented SAM database

  • A SAM database that is too large for the current hardware

For additional information, see the Determining Hardware Requirements section in this white paper.

Warning: A SAM database that grows in size with no associated changes over time may indicate that the accounts database has excessive "white space" and should be compressed. See Appendix B for additional information.

Impact of Rdisk Utility

Use of the Rdisk utility on a domain controller to create or update an Emergency Repair Disk or to back up the SAM may have a significant impact on the performance of a domain controller. The Rdisk process runs at a high priority and may use a significant amount of CPU time, which effectively reduces the ability of the domain controller to respond to logon and authentication requests. In addition, the Rdisk utility, similar to the Regback and Regrest utilities outlined in Appendix B, requires sufficient space in the registry to perform this operation. You may need to increase the registry size limit (RSL) to accommodate this operation on domain controllers with large SAM sizes.

Event Log Monitoring

The Windows NT Event Log can be a preemptive tool for identifying health problems on a domain controller. The Windows NT Event Log can be viewed remotely or locally through the Event Viewer under Administrative Tools (which is installed on every computer running Windows NT Workstation or Windows NT Server). Microsoft recommends that customers monitor the event logs on domain controllers in very large domains on a real-time basis to rapidly identify and troubleshoot potential system problems. Real-time monitoring of event logs can be accomplished through the use of tools such as the Microsoft Systems Management Server event-to-trap utility or other third-party event management utilities available through retail channels.

Error Messages or Warnings

The following messages may appear in the Windows NT Event Log when resource limitations exist:

  • 2017: The server was unable to allocate memory from the system nonpaged pool because the server reached the configured limit for nonpaged pool allocations.

  • 2018: The server was unable to allocate memory from the system paged pool because the server reached the configured limit for paged pool allocations.

  • 2019: The server was unable to allocate memory from the system nonpaged pool because the pool was empty.

  • 2020: The server was unable to allocate memory from the system paged pool because the pool was empty.

    In addition, when the nonpaged pool gets low, Windows NT Server will display this error message:

    Not enough storage available to process this command.

  • 2510: The server was unable to map error code 1450.

This error means the server has run out of paged pool space. It usually occurs when a primary domain controller (PDC) has encountered a high level of logon traffic during the time of synchronization when physical memory is not sufficient.

Replication Errors

Permission Problems

Often when a permission problem exists, errors 5, 1300, and 1307 will appear in the Windows NT Event Log.

Sharing Violations

When an account or service has a file open all the time, error 32 will appear in the Windows NT Event Log.

For additional information on troubleshooting replication problems, see Microsoft Knowledge Base article 104204, "Troubleshooting Directory Replicator Problems."

NetLogon Service Fails to Start

When a backup domain controller (BDC) is part of a domain, a computer account is created. (The computer account can be seen with Server Manager.) A default password is given to the computer account and the BDC stores the password in LSA secret storage $machine.acc. The password is then changed every seven days. Each BDC maintains such an LSA secret, which is used by the NetLogon service to establish a secure channel. If the computer account password and the LSA secret are not synchronized, the NetLogon service fails to start on the BDC with the following error message:

NetLogon Event 3210 Failed to authenticate with DOMAINBDC, a Windows NT domain controller for domain DOMAINNAME.

If the computer account has been deleted, one of the following error messages is logged by the BDC NetLogon service:

NetLogon Event 5721 The session setup to the Windows NT domain controller \\DOMAINPDC for the domain DOMAIN failed because the Windows NT domain controller does not have an account for the computer DOMAINBDC.

NetLogon Event 5723 The session setup from the computer DOMAINBDC failed because there is no trust account in the security database for this computer. The name of the account referenced in the security database is DOMAINBDC$.

Note: The Service Control Manager will also log error 7023 on the BDC because the NetLogon service could not be started.

Similarly, the NetLogon service on the PDC logs NetLogon Event 5722 or NetLogon Event 5723, with the following error message, when the password is not synchronized:

The session setup from the computer DOMAINBDC failed to authenticate. The name of the account referenced in the security database is DOMAINBDC$. The following error occurred: Access is denied.

Note: Secure channels may be reset by using the Netdom utility in the resource kit, and typing the following command:

netdom member \\ domainmember /joindomain

SAM Corruption

SAM corruption can occur on a Windows NT 4.0 domain controller and is usually indicated by NetLogon Event 5735 on a PDC or BDC. A workstation or member server may also have an associated NetLogon Event 5723. SAM corruption is very rare and is usually caused by one of three conditions:

  • Disk or file corruption.

  • Registry hive corruption.

  • SAM database corruption (most rare). When the SAM database is corrupted, it is usually an individual user or computer account.

The first sign of SAM corruption on the PDC is failure to replicate. If this occurs, it is recommended that you take the PDC offline and then promote a BDC to a PDC. SAM corruption occurs most often on a PDC in combination with taking the PDC offline and promoting a BDC. It can be corrected by formatting the system and reloading Windows NT 4.0 as a domain controller. When the SAM is replicated to the newly installed BDC, the SAM is not fragmented.

Registry Hive Fragmentation

Memory can become fragmented in a registry hive when a process is used to repeatedly modify the same values in the registry. The fragmented memory cells often require the registry hive size to be much larger than the actual data contained within the hive. The RSL may eventually be exceeded. If the RSL is not increased, updates to the SAM eventually do not work because the registry reaches its maximum size. Registry hive fragmentation occurs most often when members are added to large global groups in a domain.

A hotfix is available from Microsoft Premier Support for this issue. See Microsoft Knowledge Base article 197632, "Registry Hive Fragmentation Leads to Excessive Size," for more information.

Notes:

  • This hotfix may provide little or no benefit to a domain if the size of the SAM is excessive because of a large number of users and not because of a large number of groups. This hotfix provides the most benefit in cases where a domain has thousands of global groups and thousands of members in those groups. In some situations where a large number of groups exist with just enough user accounts to force the SAM into an oversize allocation, the hotfix may aggravate an excessive SAM size condition. It is also important to note that the Regback and Regrest utilities documented in Appendix B of this white paper may no longer show compression benefits to a SAM after this hotfix is applied.

  • In Microsoft Windows 2000, account information is not stored in the SAM or registry and this condition no longer exists.

Domain Size Limitations

SAM Size Limitation

The Windows NT SAM database has no physical limitations with regard to the number of users, groups, or machine accounts. It does, however, have a physical limit on the overall memory footprint that is based on the registry size limit (RSL) and the size limit of the paged pool. The registry may not consume more than 80 percent of the paged pool allocation in memory. The paged pool allocation is limited to 192 MB, which is allocated dynamically at boot time, depending on the amount of actual physical RAM installed. Note that, although the paged pool is limited to a maximum of 192 MB, it shares a common 256-MB address space with the nonpaged pool. Therefore, if the amount of memory allocated to the nonpaged pool is 128 MB, the paged pool would be limited to a maximum size of 128 MB.

Windows NT calculates NonPagedPoolSize and PagedPoolSize based on the amount of physical memory present in the computer at boot time. The following algorithms describe how the values are calculated on an Intel-based computer:

Memory Pool Constants

MinimumNonPagedPoolSize

= 256 KB

MinAdditionNonPagedPoolPerMb

= 32

DefaultMaximumNonPagedPool

= 1 MB

MaxAdditionNonPagedPoolPerMb

= 400 KB PTE_PER_PAGE
= 1,024 PAGE_SIZE
= 4,096 bytes

NonPagedPoolSize Calculation

NonPagedPoolSize =

MinimumNonPagedPoolSize + ((Physical MB 4) * MinAdditionNonPagedPoolPerMB)

MaximumNonPagedPoolSize =

DefaultMaximumNonPagedPool + ((Physical MB 4) * MaxAdditionNonPagedPoolPerMB)

If MaximumNonPagedPoolSize < (NonPagedPoolSize + PAGE_SIZE * 16), then MaximumNonPagedPoolSize = (NonPagedPoolSize + PAGE_SIZE * 16)

PagedPoolSize Calculation

Size = (2 * MaximumNonPagedPoolSize) / PAGESIZE

Size = (Size + (PTE_PER_PAGE 1)) / PTE_PER_PAGE

PagedPoolSize = Size * PAGESIZE * PTE_PER_PAGE

If PagedPoolSize >= 192 MB, then PagePoolSize = 192 MB

Registry Settings

In all situations, the entry of a non-zero value will override the default dynamic configuration at boot time, and will be a hard limit that cannot be changed dynamically by the operating system.

Paged Pool

To set the paged pool, modify the following registry key:

Warning: Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Microsoft cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk.

For information about how to edit the registry, view the "Changing Keys And Values" Help topic in Registry Editor (Regedit.exe) or the "Add and Delete Information in the Registry" and "Edit Registry Data" Help topics in Regedt32.exe. Note that you should back up the registry before you edit it. If you are running Windows NT, you should also update your Emergency Repair Disk (ERD).

HKEY_LOCAL_MACHINE \System \CurrentControlSet \Control \Session Manager\Memory Management

A value entry of PagedPoolSize as type REG_DWORD sets the paged pool. If this value is missing or set to 0, the system calculates the default page pool as slightly less than the amount of installed RAM, but limits it to 192 MB (0x0C000000). The range of acceptable values is 0x00000001 through 0x0C000000 (192 MB).

Note: The paged pool size cannot be changed dynamically; the server must be restarted for the change to take effect. Microsoft does not recommend manually configuring the paged pool size through the registry unless a special need exists or the server environment dictates a manual setting.

The following table provides a guideline of the number of users a given registry and paged pool size will support. It should be noted that no assumptions are made with regard to the number of machine accounts or global groups that are included in the calculations. It is also assumed that no other services or BackOffice applications are concurrently installed on the dedicated domain controllers.

Number of Users

SAM Size

Registry Size

Paged Pool Size

CPU Size

Page File Size

Physical RAM

3000

5

25

50

486DX/33

32

16

7500

10

25

50

486DX/66

64

32

10,000

15

25

50

P, M, A

96

48

15,000

20

30

75

P, M, A

128

64

20,000

30

50

100

P, M, A

256

128

30,000

45

75

128

P, M, A

332

166

40,000

60

102

128

SMP

394

197

50,000

75

102

128

SMP

512

256

60,000

80

102

128

SMP

1 GB

512

Legend:

  • P = Intel Pentium processor

  • M = MIPS processor

  • A = Alpha processor

  • SMP = Symmetric Multiprocessor configuration

As illustrated in the table above, a 75-MB SAM within a 102-MB registry would require approximately a 128-MB paged pool. Although it is not recommended, the absolute maximum size of a registry would be approximately 153 MB, which could potentially hold a SAM of approximately 100 MB. The server would require enough physical memory to allocate a 192-MB paged pool.

Notes:

  • Using a 100-MB SAM makes the assumption that no other services will be using large allocations of the paged pool. This assumption is not a realistic assumption to make in a production type domain.

  • The largest tested SAM size in a pseudo-production environment has been 80 MB.

  • Changes in Windows 2000 address SAM size concerns by moving the largest part of the registry, the SAM, to the Ntds.dit (directory information tree) file. This move frees up much of the demand for registry memory. Unlike the SAM, which resided entirely in memory, the Active Directory directory services in Windows 2000 is an indexed database store that does not have to reside entirely in memory. This allows for support of a much larger user accounts database. A Windows 2000 domain controller will alleviate SAM size concerns only when running in native mode; in mixed mode used for backwards compatibility, the SAM will still be similar to Windows NT 4.0.

Warning: A Windows NT Server computer that runs out of paged or nonpaged pool memory may perform erratically or potentially stop with a blue screen error message. In domain controllers that have very large SAM sizes, the Pool Paged Bytes, Pool Nonpaged Bytes, Pool Paged Failures, and Pool Nonpaged Failures counters should be monitored closely in Performance Monitor. Pool memory problems can be detected through Performance Monitor by monitoring the Pool Nonpaged Failures and Pool Paged Failures counters. If the counters are greater than one, a memory pool problem exists.

Large Domain Considerations

Extremely large SAM database sizes (80 MB and larger) may exhibit noticeable performance delays in routine operations and are generally not recommended. Such databases require monitoring to ensure that adequate processing power and physical RAM do not degrade the overall performance of the domain.

Some of the performance areas of the domain controller to monitor as the domain size increases include:

  • System boot time to first user operations. With a very large SAM database, the system boot time increases as the system pages in the entire account database increase. Some sample times that exemplify the boot time differences between various SAM sizes are shown in the following table, which is based on a dual Pentium Pro 200 computer with 512 MB of RAM. An actual customer database may reflect very different times depending on the content of the user database.

    Time to Reboot a PDC and Load SAM into Memory

    SAM Size

    20 MB SAM

    60 MB SAM

    75 MB SAM

    Boot time (minutes)

    2:00

    8:00

    15:00

  • Time to create a new local or global group increases, as the total SAM size grows very large.

  • As group membership increases (as a percentage of total users), the time to add additional users to a large group in very large SAM databases also increases. For example, as the total SAM size increases beyond 60 MB, it takes longer to add users to a group that already contains 20 percent of the total user accounts.

  • The sheer number of domain local or global groups may also affect the time it takes for the NetLogon service to start on a domain controller. Significant degradation of boot times may appear in domains with global groups in the 6,00015,000 range.

Notes:

  • For batch update operations, the time involved for adding a large number of users to the same group takes longer as the SAM database size increases beyond 60 MB.

  • Global groups that contain more than 5,000 individual members may hinder a Windows 2000 domain migration. Groups with large membership should be avoided if possible to ease in migration.

Browse List Limitations

In a complete Windows NT 4.0 client/server environment, there are no browse list limitations. However, in a heterogeneous environment with Windows NT 3.5x clients and/or servers, the browse list may be truncated to fit within a 64-KB buffer. For additional information, see Microsoft Knowledge Base article 152076, "Browser Returns Truncated List of Resources." A similar issue may be encountered with Microsoft Windows 95 clients when a server has more than 1,000 shares or long descriptions are associated with the shares. The large share configuration fills up the buffer on the client and the client may not be able to see all the shares. This manifests itself in browsing errors, network profiles that don't work, and Briefcase problems. For additional information or for a client-side fix, see Microsoft Knowledge Base article 160807, "Cannot Connect to Windows NT Server with Many Shares."

Number of Domain Controllers

Windows NT 4.0 does not have any limitations with regard to the overall number of domain controllers in a domain. However, the network traffic and limitations of domain replication may govern the actual number of domain controllers feasible in a particular environment. Other considerations, such as the number and rate of SAM changes, how applications use the network, and so on, will change the maximum number of BDCs that will optimally operate on a given network.

For additional information, see the Network Monitoring section in this document.

WINS Server Considerations/Limitations

When a query is made to a Windows NT WINS server from a domain controller (DC), that query is made as a request for a group 1Ch entry. The WINS server then replies with up to a maximum of 25 IP addresses of domain controllers for the queried domain. In the reply, addresses owned by the queried WINS server are returned first, sorted by registration date and time. The first entry in the list will be the 1Bh entry that is locally owned by the replying WINS server. The 1Bh entry represents a registration for a PDC. This entry is followed by domain controller addresses for the requested domain that are owned by the queried server. Finally, any remaining 1Ch records that are not owned by the queried WINS server follow those records. Therefore, with PDC and BDC network traffic and processor availability being equal, most clients will connect to the PDC rather than to a local BDC. A large domain with a large number of domain controllers may be affected by this limitation. This situation must be a consideration when planning domain sizing and WINS infrastructure design.

Domain Trust Limitations

Windows NT 4.0 does not have any limitations for the number of incoming trusts (trusting domains). Any number of "resource" domains can trust a single Master Accounts Domain. However, trusts to other domains should be limited to 2,048 because of LSA secret limitations. See the LSA Secret Limitation section in this white paper for additional information.

LSA Secret Limitation

Windows NT 4.0 has a limit of 4,096 LSA secrets. Examples of LSA secrets are domain trusts and service accounts.

Note: It is recommended not to consume more than half of the LSA secrets for domain trusts.

For additional information, see Microsoft Knowledge Base article 129815, "LSA Secret Limitation Increased to 4096 in WinNT 4.0."

General Recommendations

In conclusion, the following recommendations are made:

  • As a general rule, limit the SAM size to 60 MB, unless a careful study of the environment is performed to assess the impact of such a large SAM. Microsoft recommends that customers review their domain design with Product Support Services or Microsoft Consulting Services before exceeding this limit.

  • In accordance with the prior recommendation, do not let the number of objects in the SAM exceed 40,000 without prior assessment. An object is generally referred to as a user account or machine account.

  • Remove user applications and unnecessary services to member servers. Domain controllers in larger domains should only be used for authentication.

  • Increase the physical RAM in very large domain controllers (domains with 30,000 or more users) to a minimum of 256 MB; 512 MB of physical RAM is preferable. Memory usage should be monitored for maximum performance and optimal configuration.

  • Increase the Windows NT paging file size if necessary. Physical RAM plus 12 MB is considered to be the minimum page file size; two times the physical RAM is the recommended size.

  • Regularly use Windows NT Performance Monitor to monitor CPU and memory usage. Implement hardware upgrades as appropriate.

  • Balance the number of domain controllers. Increasing the number of domain controllers also increases SAM replication traffic. In general, it is recommended that you keep the physical number of domain controllers low in larger domains because of replication traffic. It is often more advantageous to improve the hardware of existing domain controllers than to add additional domain controllers.

  • Move workstation and member server machine accounts to a resource domain.

  • Increase the maximum registry size, or RSL, on all domain controllers to accommodate the appropriate SAM size of the domain.

  • Disable the license logging service to reduce WAN traffic.

Note: License management must be maintained separately using an outside method when this service is disabled.

Appendix A - Terminology/Acronyms

This table lists some of the more frequently used acronyms found in this white paper.

BDC

backup domain controller

DC

domain controller

IPC

interprocess communication

LSA

local security authority

LSASS

local security authority subsystem

MCS

Microsoft Consulting Services

MSF

Microsoft Solutions Framework

Netmon

Network Monitor

NIC

network interface card

NPP

nonpaged pool memory

PDC

primary domain controller

Perfmon

Performance Monitor

RAM

random access memory (physical)

RSL

registry size limit

SAM

security accounts manager

SMP

symmetric multiprocessing

UAS

user accounts subsystem

Appendix B - Reclaiming Unused Space in the SAM Database

Over time, the SAM size may grow significantly from the creation of a large number of user accounts, global groups, and workstation accounts. If a large number of users is added to a Windows NT user accounts database, and those users are later deleted, the size of the user accounts database or security accounts manager (SAM) does not shrink in size.

Windows NT does not have a mechanism to compress this empty space, but it is reclaimed when new user or group accounts are created. When the Windows NT primary domain controller synchronizes the SAM with the backup domain controllers, the new records, or changes in existing records, are sent. The SAM is located in %SystemRoot%\System32\Config\Sam.

If the SAM file grows too large, additional memory and PagedPoolSize are needed to load the file at system boot, and to load applications such as User Manager.

Although Windows NT has no built-in method of compressing the SAM database, there are three methods that can be used to effectively compress the SAM on a specific domain controller. The compressed SAM is NOT replicated to backup domain controllers because only new records, or changes in the records, are replicated. Therefore, measures will need to be applied at each domain controller.

Method 1

This method must be performed on a backup domain controller (BDC). Use the emergency repair disk (ERD) and select Inspect registry files. When prompted, choose SECURITY (SecurityPolicy) and SAM (User Accounts Database). This replaces the large SAM with the original one that was created when Windows NT was installed on the computer. This requires the Administrator password that was used when Windows NT was installed, or when rdisk -s was last used. After replacing the SAM, synchronize with the primary domain controller (PDC). To apply the above fix to the original PDC, promote a BDC to become the PDC.

Method 2

This method is the most invasive method, and requires that any services or applications be reinstalled. Install Windows NT onto the same computer of a backup domain controller as a New Install. This installation replaces the large SAM by creating a new file and downloading the accounts from the PDC. This installation can be performed on all of the BDCs. To reinstall the PDC, promote a BDC to PDC, and then perform the same operation.

Method 3

This method can be performed on a backup or primary domain controller. This method uses the utilities Regback and Regrest from the Windows NT resource kit. Using Regback will copy the records from the SAM into a new file. Restoring the SAM copies this new file over the old SAM. You must reboot the computer after using Regrest to have the restore take effect.

Note: After you back up the SAM using REGBACK, you can compare the two files and determine the size of free space or extraneous information.

The net result is a compressed SAM database. For example, the following command lines could be used. These examples assume the Backup directory already exists on drive C, and Windows NT is also installed on drive C (Windows NT and the Backup directory must reside on the same logical disk):

C:\>regback c:\backup\sam.bak machine_sam

-or-

C:\>regrest c:\backup\sam.bak c:\backup\sam.old machine_sam

For additional information, see Microsoft Knowledge Base article 140380, "User Account Database Fails to Shrink After Deleting Accounts."

Note: If the compacted size of the SAM is more than half the size of paged pool, you may not be able to uncompress the SAM. Compaction works by copying the SAM to a new key in the registry and then saving the database. If sufficient space exists for a full copy to fit in the paged pool, the compaction does not work. If the RSL is reached, this process does not work.

Appendix C - Additional Reading

References

Microsoft Windows NT Workstation 4.0 Resource Kit, available from Microsoft Press

Inside Windows NT Second Edition, available from Microsoft Press

Network Traffic Analysis and Optimization (Windows NT 3.5x and 4.0, and Windows 95), available on the TechNet CD

Microsoft Knowledge Base Articles

Article ID

Article title

104204

Troubleshooting Directory Replicator Problems

120151

Browsing a Wide Area Network with WINS

124594

Understanding and Configuring Registry Size Limit (RSL)

126402

PagedPoolSize and NonPagedPoolSize Values in Windows NT

129815

LSA Secret Limitation Increased to 4096 in WinNT 4.0

134985

Browsing & Other Traffic Incur High Costs over ISDN Router

140364

Registry Size Limit Change Results in PagedPoolSize Change

140380

User Account Database Fails to Shrink After Deleting Accounts

148942

How to Capture Network Traffic with Network Monitor

149664

Verifying Domain Netlogon Synchronization

150350

NetLogon Maximum Value of Pulse Should Exceed 3600

150518

NetLogon Service Fails when Secure Channel Not Functioning

150518

NetLogon Service Fails when Secure Channel Not Functioning

151259

New Netlogon Registry Entry for Dialup Routers

152076

Browser Returns Truncated List of Resources

152719

WAN and Trust: Traffic on the Wire

154355

How to Tune Trusts for Dialup Routers in a WAN

154398

BDC Secure Channel May Fail If More Than 250 Computer Accounts

154501

How to Disable Automatic Machine Account Password Changes

154501

How to Disable Automatic Machine Account Password Changes

154502

Replication Increased by ANNOUNCE_IMMEDIATE Events

158148

Domain Secure Channel Utility -- Nltest.exe

165202

WinNT Client Logon in Resource and Master Domain Environment

167029

Resource and Master Domain DCs Do Not Load-Balance Validation

168471

New Synchronization Behavior with Windows NT Server Version 4.0

174205

LSASS May Use a Large Amount of Memory on a Domain Controller

175468

Effects of Machine Account Replication on a Domain

175468

Effects of Machine Account Replication on a Domain

181774

Multihomed Issues with Windows NT

186626

Terminal Server and User Accounts/SAM Use

Q75294

The Netlogon Service and How It Works

197488

Access Denied When Attempting to Promote a BDC to PDC

197632

Registry Hive Fragmentation Leads to Excessive Size

231305

WINS Randomize1cList Feature Aids Load-Balancing Between DCs

192064

Using LMHOSTS to Locate Multiple Preferred Logon Servers

White Papers

BackOffice Server 4.0 Performance Characterization (Available in Microsoft TechNet and at https://www.microsoft.com/technet/prodtechnol/bosi/evaluate/featfunc/bo45perf.mspx . Click Download the BackOffice Server 4.0 Performance Characterization White Paper.)

Windows NT 4.0 Remote Troubleshooting and Diagnostics (Available in Microsoft TechNet and at https://www.microsoft.com/ntserver/support/faqs/remotewp.asp . Click Download this document.)