Monitoring Your Branch Office Environment

Applies To: Windows Server 2008, Windows Server 2012

This topic contains guidelines for monitoring Active Directory Domain Services (AD DS) and domain controllers that run Windows Server 2008. These domain controllers may be writable domain controllers in hub sites or read-only domain controllers (RODCs) in branch offices. Many of the guidelines presented here are updated from the monitoring guidelines in the Windows Server 2003 Active Directory Branch Office Planning and Deployment Guide (https://go.microsoft.com/fwlink/?LinkID=28523).

  • Determining what to monitor

  • Using Windows Reliability and Performance Monitor

  • Monitoring SYSVOL replication

  • Using System Center Operations Manager

Determining what to monitor

We recommend that you monitor all domain controllers regularly. Domain controllers manage the directory service and ensure that the information that is stored in the directory is as up to date as possible. Domain controllers do this by communicating with one another and replicating updates among themselves. If problems exist that prevent replication from occurring, information that is stored in the directory might become outdated. Outdated directory data can result in problems, such as users being unable to access network resources because updated account information is not available. In addition, a directory that is not up to date is a security risk because a domain controller might not have the information that an account has been deleted, disabled, or removed from a security group that is used to grant permissions to resources. In this case, a user might be granted access to resources even though the account is no longer valid.

Monitoring domain controllers gives you the best opportunity to detect problems before they jeopardize your environment and the ability of your users to access network resources. Schedule daily automated health checks for your domain controllers, and monitor the following aspects of the Active Directory environment on all domain controllers in the deployment:

  • General domain controller health (specifically, CPU utilization and disk space use)

  • Directory service performance

Monitoring domain controller performance

To monitor the general performance of domain controllers, observe performance counters for CPU utilization and disk space availability. Monitor these performance counters regularly on all bridgehead servers. If you experience problems with a branch office domain controller, monitor these performance counters on that domain controller.

Windows Server 2008 and Windows Vista both include Windows Reliability and Performance Monitor (Perfmon.msc). You can use this tool to monitor these performance counters. However, we recommend that you use a more comprehensive monitoring solution, such as Microsoft System Center Operations Manager 2007. For more information about using System Center Operations Manager 2007 to monitor AD DS, see the Active Directory Management Pack Guide (https://go.microsoft.com/fwlink/?LinkID=139785).

Monitoring CPU utilization

Monitor CPU utilization on your domain controllers to determine whether the domain controllers are overloaded by logon traffic or whether bridgehead servers are overloaded by replication requests. You can also monitor CPU utilization to verify that you are meeting your service-level agreements.

To monitor a domain controller’s CPU utilization, use Windows Reliability and Performance Monitor to monitor the Processor\% Processor Time counter.

Monitoring available disk space

Problems can occur with your domain controllers if the partition that stores any of the following runs out of disk space:

  • Active Directory database files

  • Active Directory log files

  • The SYSVOL folder

Consequently, you must monitor the available disk space for these partitions.

By default, these files and the folder are stored in either C:\WINDOWS\NTDS or C:\WINDOWS\SYSVOL. To monitor the free disk space on the partition that contains these files and the folder, use Windows Reliability and Performance Monitor to monitor the LogicalDisk\Free Megabytes counter.

Monitoring disk utilization

Performance problems can occur if the hard disk on the domain controller cannot keep up with all the disk read-write requests that it receives. As requests are received, they are queued and processed when the server has time to service the requests. If the requests are not being processed fast enough, the queue begins to fill up with the backlog of requests. Monitoring the size of the queue provides you with the opportunity to detect this situation before a problem occurs.

Use Windows Reliability and Performance Monitor to observe the queue length. Specifically, use the PhysicalDisk object and the Avg. Disk Queue Length counter. We recommend that the outstanding requests on a domain controller not average more than 10. If the queue length consistently exceeds 10 outstanding requests, you might want to consider using a higher-performance disk configuration or reducing the workload on the domain controller.

Monitoring memory utilization

Make sure that the domain controller has sufficient memory available at all times. You can use the Available Mbytes counter in the Memory object in Windows Reliability and Performance Monitor to monitor the amount of memory available. We recommend that domain controllers always have at least 50 megabytes (MB) of available memory.

Note

The process that runs AD DS is LSASS.exe. This process is designed to maximize the utilization of available memory but release memory to other processes as they request it. The result is that if you examine memory utilization through Task Manager, it appears that LSASS.exe is monopolizing the memory. On a domain controller, this is expected behavior that helps AD DS run more efficiently.

Monitoring directory service performance

Monitoring directory service performance involves monitoring the amount of information that the directory service is reading and writing to the directory database, as described in the previous section, and it also involves monitoring replication traffic between domain controllers. The following section explains how to monitor Active Directory replication.

Monitoring Active Directory replication

Besides monitoring the domain controllers themselves, you should also monitor replication traffic between them. Problems with the network can interfere with replication. Network problems can cause replication to slow down or stop, resulting in backlogs of replication data and inconsistency among domain controllers. Regular, ongoing monitoring helps you detect problems that might occur in your Active Directory replication environment. It also gives you the opportunity to correct these problems before they affect the directory or the users. While you can use Windows Reliability and Performance Monitor to monitor directory replication from the perspective of the volume of data traffic, there are additional tools available that you can use to monitor other aspects of the replication topology. The following table lists the areas of the replication environment that you should monitor, along with some tools that you can use.

Area

Tool

Tool notes

Domain Name System (DNS) and network configuration

dcdiag /test:<testName>

Performs the following tests:

  • Connectivity

  • DNS

  • RegisterinDNS

These Dcdiag tests replace the tests that you could perform by using Netdiag.exe in previous versions of Windows Server. Netdiag.exe is not included in Windows Server 2008 or Remote Server Administration Tools (RSAT).

Replication activity

Repadmin.exe

Dcdiag.exe

Replmon.exe, which was previously included in Windows Support Tools, is not included in Windows Server 2008 or RSAT.

Domain controller performance

Windows Reliability and Performance Monitor

Windows Reliability and Performance Monitor is included in Windows Server 2008 and Windows Vista. Each performance counter is individually specified.

Repadmin.exe and Dcdiag.exe are tools that you can use to monitor replication activity between domain controllers. These tools make it easier to detect problems within the replication topology by showing the current status of the various replication connections between domain controllers. These tools are available on servers that run Windows Server 2008 and that have the AD DS server role installed. The tools are also available in RSAT. For more information about RSAT, see RODC Administration (https://go.microsoft.com/fwlink/?LinkID=133521) and Installing Remote Server Administration Tools (https://go.microsoft.com/fwlink/?LinkId=153624).

Repadmin

You can use Repadmin.exe to view the replication topology from the perspective of an individual domain controller and to diagnose replication problems. In addition, you can use the repadmin command to force replication events between domain controllers and to view both the replication metadata and the up-to-dateness vectors. On each domain controller, you can use the /showrepl, /showconn, /replsummary, and /showutdvec switches with the repadmin command to monitor replication information. To organize the repadmin /showrepl output in a more readable, comma-separated value (CSV) format, you can also use the /csv option. For more information about using the /csv option, see Repadmin Requirements, Syntax, and Parameter Descriptions (https://go.microsoft.com/fwlink/?LinkId=147380).

/showrepl

Specifying the /showrepl switch displays both inbound and outbound replication partners for each directory partition that a domain controller hosts. (In the tool, each directory partition is referred to as a “naming context.”) You can examine the replication partners to determine whether the domain controller has the correct connection objects.

For each replication partner, /showrepl also displays the last time that replication was attempted and whether the attempt was successful. For more information about using /showrepl, see Display Replication Partners and Status of a Domain Controller (https://go.microsoft.com/fwlink/?LinkId=124353).

/showconn

Specifying the /showconn switch displays the connection objects on the domain controller. You can examine the connection objects to determine whether the domain controller is configured to replicate with the correct bridgehead servers in the hub site. In addition, you can use this switch to verify that the connection is enabled, to identify the transport being used, and to check when the connection object was created and when it was last changed. For more information about using /showconn, see Can I Look at My Connection Objects and Schedule Details? (https://go.microsoft.com/fwlink/?LinkId=124354).

/replsummary

Specifying the /replsummary switch can be a very useful way to check the replication health of your deployment and determine where potential issues might be. To create this summary, repadmin contacts each domain controller in the forest and collects replication status information.

The information that is collected is summarized, and two views are generated: one from the source perspective and one from the destination perspective. The information is presented in three columns: Source DC, Largest Delta, and Fails/total.

From the source domain controller perspective, the columns can be interpreted as follows:

  • Source DC: The domain controller that other domain controllers are attempting to replicate from.

  • Largest Delta: The amount of time that has elapsed since all replication partners successfully replicated from this domain controller.

  • Fails/total: A number that shows how many of the total number of replica links are failing. A high ratio of fails to total may indicate that the source domain controller is probably causing a problem.

From the destination domain controller perspective, the columns can be interpreted as follows:

  • Destination DC: The domain controller on which the replica links that specify inbound replication are located.

  • Largest Delta: The amount of time that has elapsed since this domain controller last successfully replicated with each replication partner.

  • Fails/total: A number that shows how many of the replication links on the destination domain controller are failing. A high ratio indicates that the failure is likely attributable to the destination domain controller.

In general, the source domain controller perspective is often more useful because a 100-percent failure rate indicates that the source domain controller cannot be reached by any of its replication partners and that it is offline or experiencing network issues. For more information about using /replsummary, see Monitor Forest-Wide Replication (https://go.microsoft.com/fwlink/?LinkId=124355).

/showutdvec

Examining the up-to-dateness (UTD) vector from time to time on one bridgehead server is another good way to ensure that replication is healthy. The UTD vector shows the last time that a domain controller has received updates from each replication partner for a particular naming context. The UTD vector is transitive in that one domain controller does not have to communicate directly with another domain controller to receive an update from it.

Note

/showutdvec shows the health of only inbound replication, which is sufficient for an RODC.

The output of this switch is a list of dates and times indicating the last time that inbound replication of the Configuration container occurred from each domain controller. If an excessive amount of time has passed since replication last took place, it could indicate a problem and there is reason to be concerned.

The entries are listed by domain controller. Occasionally, a globally unique identifier (GUID) appears instead of the name of a domain controller. It is safe to ignore the GUID entries because they indicate invocation IDs for domain controllers that have been demoted or rebuilt. These entries do not affect the health of the topology.

Note

The invocation ID is the server database GUID that domain controllers use to ensure replication consistency after a restore operation. For more information, see article 885875 in the Microsoft Knowledge Base (https://go.microsoft.com/fwlink/?LinkID=137184).

Dcdiag

You can use Dcdiag.exe to analyze the state of a domain controller and its interaction with other domain controllers. Dcdiag.exe performs the following tests and reports both status and problems:

  • Connectivity

  • Replication

  • Topology Integrity

  • Check NC Head Security Descriptors

  • Check Net Logon Rights

  • Locator Get Domain Controller

  • Intersite Health

  • Check Roles

  • Trust Verification

To monitor Active Directory replication, use the following switches with the dcdiag command.

Switch

Description

/v

Provides verbose results, which makes troubleshooting errors easier.

/f:<LogFile>

Redirects output to the specified log file.

/ferr:<ErrLog>

Redirects fatal error output to a separate log file.

For more information about Dcdiag, see Dcdiag (https://go.microsoft.com/fwlink/?LinkID=133110).

When to monitor branch office domain controllers

Monitoring should be part of your daily operations. Implementing a regular monitoring solution helps you track the health of your environment on a daily basis. Checking monitoring feedback on a daily basis makes it much easier to spot potential problems early and gives you a better chance to diagnose them before they affect the functionality of the directory.

Monitoring should be integrated into your deployment plan. Building your monitoring solution into your deployment plan provides two key benefits:

  • Makes it possible to monitor your deployment

    If you are able to discover problems during the deployment process, you can address them as they are revealed. If necessary, you can pause the deployment operations so that you can solve problems that might not have been discovered during your deployment testing. Monitoring during your deployment helps to ensure that your new environment is operating as expected.

  • Makes it easier to deploy your monitoring solution

    Having your monitoring plan in place during the deployment operations means that you already know where your monitoring components must be located. Therefore, you might be able to use your deployment process to also distribute your monitoring components. In this way, newly deployed computers will already have their monitoring components in place when they are brought online at their new locations. Implementing your monitoring solution during deployment eliminates the need for a second deployment operation to implement monitoring after the initial deployment is complete.

    After the deployment is complete, continue to perform daily monitoring operations and keep track of the daily health of your directory environment. The items that you should monitor are described in other sections of this topic. Over time, you will determine which items you need to track in your particular environment. It is not likely that everyone will need to track every option all the time. Rather, you will learn over time which items are useful to you for your particular circumstances. When your environment is stable, continue your monitoring activities on a daily basis. Monitoring is something that you should continue to perform throughout the lifetime of your directory.

Scheduling

If your branch office domain controllers replicate with the bridgehead servers in the hub site only once a day, schedule the daily quality assurance check to occur after the replication cycle completes. You can then verify that the day’s replication was successful, detect problems that might have occurred, and correct those problems before the next replication cycle begins. If the quality assurance check is performed before the daily replication cycle, you might not detect problems for up to 24 hours, which might allow the problem to have a greater effect on your environment.

Using Windows Reliability and Performance Monitor

Windows Reliability and Performance Monitor is a built-in tool for monitoring various performance counters that are built into Windows Server 2008. You can use Windows Reliability and Performance Monitor to monitor CPU utilization, memory utilization, disk use, and Active Directory database performance, as described in Monitoring domain controller performance.

To monitor the general performance of domain controllers, use Windows Reliability and Performance Monitor to observe performance counters for CPU utilization and disk space availability. Monitor these performance counters regularly on all bridgehead servers. If you experience problems with a branch office domain controller, monitor these counters on that domain controller.

Using Windows Reliability and Performance Monitor involves collecting monitoring data over a period of time and then viewing the results. For example, to monitor whether a server is regularly receiving and applying directory replication updates, you can select one or more counters from the NTDS performance object and then view the current activity in Windows Reliability and Performance Monitor.

Monitor the counters of the following performance objects:

  • NTDS object counters

  • Database object counters

When they are monitored over time, the NTDS and Database performance counters should all show some activity. However, the amount of activity depends on your environment. Factors that affect activity include the number of branch office domain controllers and clients in your environment, how often replication is scheduled, the number of directory changes that occur, and so on.

Installing performance objects and running the Active Directory Diagnostics Data Collector Set

The NTDS and Database object counters are not installed by default. This section explains how to install the NTDS and Database object counters and how to run the Active Directory Diagnostics Data Collector Set to capture NTDS and Database object data over time.

To install NTDS and Database object counters

  1. Click Start, click Administrative Tools, and then click Reliability and Performance Monitor.

  2. Double-click Monitoring Tools, right-click Performance Monitor, and then click Properties.

  3. Click the Data tab, and then click Add.

  4. Double-click the name of the Performance object whose counters you want to install, click the name of each counter, and then click Add. For example, double-click NTDS, and then click each counter that is listed in the following section. After you select the appropriate counters, click Add.

  5. Click OK to close the Add Counters dialog box, and then click OK to close Performance Monitor Properties.

To start the Active Directory Diagnostics Data Collector Set

  1. Click Start, click Administrative Tools, and then click Reliability and Performance Monitor.

  2. Double-click Data Collector Sets, double-click System, right-click Active Directory Diagnostics, and then click Start.

  3. To stop the data collection, right-click Active Directory Diagnostics, and then click Stop.

Using NTDS object counters

Use NTDS performance object counters to monitor the performance of AD DS. The NTDS performance object includes counters that provide information about Active Directory replication activity between domain controllers, Lightweight Directory Access Protocol (LDAP), and authentication. The following table describes the NTDS counters that you can use to monitor AD DS.

Object\counter

Description

Guidelines

NTDS\DRA Inbound Bytes Total/sec

Indicates the total number of bytes received per second through inbound replication. This number is the sum of the bytes of uncompressed and compressed data received during inbound replication.

This counter should show activity over time. If it does not, the network is probably slowing replication.

NTDS\DRA Inbound Object Updates Remaining in Packet

Indicates the number of object updates received in the most recent directory replication update packet that have not yet been applied to the local server.

This counter indicates that the monitored server is receiving changes, but it is taking a long time to apply them to the database. This counter should be as low as possible. If it is not, it usually indicates that server hardware is slowing replication.

NTDS\DRA Outbound Bytes Total/sec

Indicates the total number of bytes sent per second during outbound replication. This number is the sum of bytes of uncompressed and compressed data.

This counter should show activity over time, except on an RODC, where outbound replication does not occur. If this does not show activity, either server hardware or network problems are slowing replication.

NTDS\DRA Pending Replication Synchronizations

Indicates the number of directory synchronizations that are queued for this server. This counter helps identify replication backlogs—the higher the number, the larger the backlog.

This counter should be as low as possible. If it is not, the server hardware is probably slowing replication.

NTDS\ATQ Threads LDAP

Indicates the number of threads that are being used by the directory service.

If values for this counter and the NTDS\ATQ Threads Total counter are equal, a queue is likely building on the LDAP port, which will result in long response times. If the two counters are always equal, use Server Performance Advisor to troubleshoot the problem.

NTDS\ATQ Threads Total

Indicates the number of threads that are being used by the directory service.

If values for this counter and NTDS\ATQ Threads LDAP counter are equal, a queue is likely building on the LDAP port, which will result in long response times. If the two counters are always equal, use Server Performance Advisor to troubleshoot the problem.

NTDS\Kerberos Authentications/sec

Indicates the number of Kerberos authentications that the domain controller services per second.

This counter should show activity over time. If it does not and the clients use the Windows Server 2008 operating system, network problems are indicated.

NTDS\LDAP Bind Time

Indicates the time in milliseconds (msec) that was required to complete the last successful LDAP binding.

This counter should be as low as possible. If it is not, hardware or network-related problems are indicated.

NTDS\LDAP Client Sessions

Indicates the number of sessions of connected LDAP clients.

This counter should show activity over time. If it does not, it usually indicates that network-related problems are occurring.

NTDS\LDAP Searches/sec

Indicates the number of search operations performed by LDAP clients per second.

This counter should show activity over time. If it does not, network problems are probably hindering the processing of client requests.

NTDS\NTLM Authentications

Indicates the number of NTLM authentications serviced by the domain controller per second.

This counter should show activity over time. If it does not and the clients use the Windows® 98 or Windows NT® operating systems, network-related problems are indicated.

NTDS\DS Directory Searches/sec

Indicates how many searches are occurring.

This counter should show activity over time. If it does not, network problems are probably hindering the processing of client requests.

NTDS\DS Search Sub-operations/sec

Indicates how “large” the directory service search operations are.

This counter should show activity over time. If it does not, network problems are probably hindering the processing of client requests. If the ratio of this counter to DS Directory Searches/sec is very high, the server might be getting bogged down by a large number of expensive queries. You can run the Active Directory Diagnostics Data Collector Set to analyze the load on the server to see if the load is expected.

Using Database object counters

Use Database performance object counters for advanced monitoring of the Active Directory database. The Database performance object monitors the Extensible Storage Engine (ESENT), which is the transaction-based database system that stores all Active Directory objects. The Database counters provide information about the performance of the database cache, files, and tables. You can use some of these counters to determine whether you need additional hard disks to store Active Directory data. The following table describes the Database counters that you can use to analyze the Active Directory database.

Object\counter

Description

Guidelines

Database\Database Cache % Hit

Indicates the percentage of page requests for the database file that were fulfilled by the database cache without causing a file operation.

This counter should show activity over time. If it does not, the server does not have enough free memory. In this case, add more memory.

Database\Database Page Fault Stalls/sec

Indicates the number of page faults that occur per second that cannot be serviced because no pages are available in the database cache for allocation.

This counter should be 0. If it is not, the server probably needs more memory.

Database\Database Page Evictions/sec

Indicates memory pressure on the database cache.

If this counter is too high, the Active Directory host computer needs more memory.

Database\Database Cache Size

Indicates the current amount of memory that AD DS uses to cache its database.

If the amount of memory that AD DS uses is low and the Database Page Eviction rate is high, the host computer might need more memory.

Database\Log Threads Waiting

Indicates the number of threads that are waiting for data to be written to the log so that updates to the database can be written.

This counter should be as low as possible. If it is not, the server probably needs more memory or a faster hard disk.

Monitoring SYSVOL replication

This section explains how to use Distributed File System (DFS) Replication tools to monitor replication of SYSVOL. You can use DFS Replication to replicate SYSVOL if the domain functional level is Windows Server 2008. If the domain functional level is not Windows Server 2008, you can only use File Replication Service (FRS) to replicate SYSVOL. If you are using FRS to replicate SYSVOL, you can use a tool such as Ultrasound to monitor FRS. For more information about using Ultrasound, see the Windows Server 2003 Active Directory Branch Office Guide (https://go.microsoft.com/fwlink/?LinkID=28523).

You can use the DFS Management snap-in to administer DFS Replication. DFS Management is not installed by default. You can use the following procedure to install it.

To install DFS Management

  1. Click Start, and then click Server Manager.

  2. Click Add Features.

  3. Double-click Remote Server Administration Tools, double-click Role Administration Tools, double-click File Services Tools, select the Distributed File System Tools check box, and then click Next.

  4. Click Install, and when the installation is complete, click Close.

You can use the DFS Management snap-in to create a health report about DFS Replication of SYSVOL.

To create a health report about DFS Replication of SYSVOL

  1. Click Start, click Administrative Tools, and then click DFS Management.

  2. Double-click Replication, right-click Domain System Volume (SYSVOL), click Create Diagnostic Report, and then follow the instructions in the Diagnostic Report Wizard.

Using System Center Operations Manager

We recommend System Center Operations Manager 2007 services for monitoring the servers that reside in your data center site. System Center Operations Manager 2007 is the suite of Microsoft enterprise monitoring software. You can purchase System Center Operations Manager 2007 as an additional package that provides a comprehensive, enterprise-wide monitoring solution. After you deploy System Center Operations Manager 2007, you can download management packs from the Microsoft Web site. For more information about System Center Operations Manager, see Microsoft System Center Operations Manager (https://go.microsoft.com/fwlink/?LinkId=124356).

Management packs are preconfigured, technology-specific monitoring solutions that are loaded into the System Center Operations Manager 2007 environment so that you can monitor specific aspects of your deployment. For example, in a branch office environment, you can use the Active Directory Management Pack to monitor the domain controllers in your organization’s data center. For more information about using System Center Operations Manager 2007 to monitor AD DS, see the Active Directory Management Pack Guide (https://go.microsoft.com/fwlink/?LinkID=139785).