Planning Active Directory for Branch Office

Article
02/20/2014

Archived content. No warranty is made as to technical accuracy. Content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.

Chapter 3 - Planning Replication for Branch Office Environments

This chapter presents the basic elements of replication in the context of the two types of replication: Microsoft® Windows® 2000 Active Director™ service replication and File Replication service (FRS) system volume replication. It outlines the replication issues to be considered before setting up sites. After reviewing these considerations, you should be able to decide how and where to place hub sites, when and where to create connection objects, and where to place bridgehead servers.

Introduction
Process Flowchart
Replication Fundamentals
Components of the Replication Topology
Determining the Choice of Bridgehead Servers
Determining the Number of Bridgehead Servers
Configuring Replication Topology for Large Branch Office Deployments
Using KCC With a Small Number of Sites (< 100)
Summary

Introduction

This chapter guides you through the process of planning replication for your branch office environment.

Resource Requirements

Individuals from the following teams will be required to participate during this phase of the planning:

Windows 2000 Active Directory Architecture
Windows 2000 Active Directory Administration
Infrastructure Administration
Network Administration

What You Will Need

Before you start replication planning process, you need to finish the forest, domain, and Domain Name Service (DNS) planning as presented in the previous chapters. Having a tool like Microsoft Visio drawing and diagramming software will enable you to lay out the plan graphically, including the individual steps.

What You Should Know

You should have a thorough familiarity with the basics of Microsoft Windows® 2000 Active Directory™ service, sites, replication, and bridgehead servers. This information is available in the Windows 2000 Server Resource Kit, Distributed Systems Guide.

Process Flowchart

Replication Fundamentals

Microsoft Windows 2000 operating system domain and forest-wide replication consists of two major components:

Active Directory replication
SYSVOL replication which utilizes the File Replication System (FRS)

Administrators planning for a branch office deployment need to understand Active Directory replication and the separate FRS system volume (SYSVOL) replication used to replicate group policy changes. This guide assumes a familiarity with the topics covered in the chapters on Active Directory, FRS replication, and related topics in the Windows 2000 Server Resource Kit, specifically the Distributed Services Guide. There are some additional implications for branch office deployments. A brief review of the important concepts is therefore provided here, followed by the specific considerations and issues encountered in branch office deployments.

Comparison of Active Directory and FRS Replication

Active Directory replication and FRS replication used for SYSVOL are different processes that use the same replication topology, but run independently from each other. There are two major differences between how an available replication window is utilized by Active Directory and by FRS SYSVOL replication: start time and replication behavior. Active Directory replication chooses a start time randomly within the first 15 minutes of a replication window to distribute the concurrence factor across the window. FRS SYSVOL replication, on the other hand, starts the moment the window opens. This means that while Active Directory replication with multiple partners starts at different times within a 15-minute window, FRS SYSVOL replication with multiple partners starts at the same time for all partners.

Table 3.1 Comparison of Active Directory and FRS replication

	Active Directory	SYSVOL FRS
Scope	Forest	Domain
Type	Pull (notify/pull within site)	Notify/Push/Ack
Concurrent partners - Inbound	Serialized	Parallel
Concurrent partners outbound	Parallel	Parallel
Threading model	Single thread per replication partner	Multiple threads per replication partner
Versioning	Per attribute version number	Per file timestamp

Active Directory

Active Directory replication is always a one-way pull replication; the domain controller that needs updates (target domain controller) contacts a replication partner (source domain controller). The source domain controller then selects the updates that the target domain controller needs, and copies them to the target domain controller. Since Active Directory uses a multi-master replication model, every domain controller works as both source and target for its replication partners. From the perspective of a domain controller, it has both inbound and outbound replication traffic, depending on whether it is the source or the destination of a replication sequence.

Inbound Replication

Inbound replication is the incoming data transfer from a replication partner to a domain controller. For a hub domain controller, inbound traffic is data that is replicated from a branch office domain controller. This replication traffic is serialized, meaning that the hub domain controller can handle inbound replication with only a single branch domain controller at a time. This fact is critical in determining the number of bridgehead servers (hub servers) that will be used in the organization. Assuming that replication with each branch domain controller takes two minutes, the hub domain controller can handle a maximum of 30 branch domain controllers in one hour.

Outbound Replication

Outbound replication is the data transfer from a domain controller to its replication partner. For a hub domain controller, this is the replication from the central hub to the branch office domain controller. Outbound replication is not serialized, it is multithreaded. During outbound replication the branch office domain controllers pull changes from the hub domain controller.

Recommendations for Optimum Performance

For Active Directory replication, a rule of thumb is that a given domain controller that acts as a bridgehead server should not have more than 50 active simultaneous replication connections at any given time in a replication window. (This was determined on a reference server that had four Pentium III Xeon processors with 2 gigabytes (GB) of RAM and 2 megabytes (MB) of L2 cache.) Adjusting this rule to a limit of fewer than 50 servers will have significant positive impact on CPU utilization, network throughput, and I/O throughput on this domain controller. Additional performance improvement can be achieved by putting the components of Active Directory on different physical drives.

File Replication System

System policies and logon scripts stored in SYSVOL use FRS to replicate. Each domain controller keeps a copy of SYSVOL for network clients to access. (FRS is also used for Distributed File System (DFS). Because this guide is not concerned with DFS replication, no further mention will be made of this area of FRS replication.) FRS can copy and maintain shared files and folders on multiple servers simultaneously. When changes occur, content is synchronized immediately within sites, and by schedule between sites.

FRS is a multithreaded, multi-master replication engine that replaces the LMRepl service which is used in the Microsoft Windows NT operating system. Multithreaded means that several replication sessions can run at the same time to handle multiple tasks. This allows FRS to replicate different files between different computers simultaneously. Multi-master replication means that changes to the SYSVOL can be made on any domain controller, and this domain controller will then replicate the changes out to the other domain controllers using a store-and-forward mechanism. FRS SYSVOL replication uses the same Active Directory replication topology defined by connection objects. In contrast to Active Directory replication, FRS SYSVOL replication uses a timestamp on a file to determine which version is the newer version and should be kept on a domain controller and replicated out to partners.

FRS does not guarantee the order in which files arrive. FRS begins replication in sequential order based on when the files are closed, but file size and link speed determine the order of completion. Because FRS replicates only whole files, the entire file is replicated even if you change only a single byte in the file. By default, FRS can transfer up to 8 files per partner, in parallel.

Key points to know when maximizing the performance of a bridgehead server in a hub site with regard to FRS are to:

Place SYSVOL on its own physical disk drive.
Split the FRS database files and logs across different physical disk drives.
Place the FRS staging area on its own physical disk drive.

Components of the Replication Topology

Components of the replication topology include the Knowledge Consistency Checker (KCC), connection objects, site links, and site link bridges. Basic information on these components as it relates to replication and the branch office scenario is presented here. The information is presented so that you will understand the implications of decisions you will be making in the scenario with large number of sites, (for example, turning off the Inter-Site Topology Generator (ISTG) between sites, and disabling the KCC generation of intra-site replication topology in the staging site.)

Connection Objects

In a configuration with 100 or more sites the KCC will not scale. With more replication partners, you must either create a staggered replication schedule, or manual connection objects, or both. Your replication topology design should take into account load balancing for hub or bridgehead servers. After this design is implemented you will need to continually monitor the replication traffic and server behavior to ensure that your design continues to be optimum for your organization.

Since you will only use the KCC in small (<100 sites) deployments, the KCC background information is presented in the section on <100 sites, at the end of this chapter.

Several key factors that limit the functionality and scalability of the KCC include the:

Number of sites
Number of domains
Transitiveness of site-links
Usage of site-link-bridges

Intra-Site

When the KCC generates the intra-site topology for the site in which it resides, the KCC creates a one-way connection object in the Active Directory only when a connection object is required for the local computer. These changes propagate to other domain controllers through the normal replication process. Each domain controller uses the same algorithm to compute the replication topology, and in a state of equilibrium among domain controllers, each should arrive at the same result with respect to the target replication topology. In the process, each domain controller creates its own connection objects.

Recommendation for Intra-Site Replication

For intra-site replication, always use the KCC to create and manage the replication topology. (The only exception is presented during the discussion of the staging site, which is a special case.) The intra-site replication topology generation is not a very costly operation. Disabling the KCC for this task can create unpredictable and unmanageable results.

Inter-Site

For every site, one domain controller, known as the Inter-Site Topology Generator (ISTG), is responsible for managing the inbound replication connection objects for all bridgehead servers in the site in which it is located. If the server holding the ISTG role is taken offline, the role is automatically transferred to another domain controller in that site by the system. Unlike the transfer of the Operations Master roles, this transfer does not require any intervention by the administrator. For inter-site replication the administrator creates the connection objects by using scripts, third-party tools, or by hand.

For both inter-site and intra-site replication, the KCC periodically creates and checks connection objects to maintain directory connectivity. Sometimes, the administrator may want to supplement the least cost spanning tree topology of connection object that is created by the KCC, for example, to add more connectivity to improve replication latency. In those cases, the administrator can use a script or manually create additional connections. If you create a connection that is identical to the one the KCC would create, the KCC will not create an additional one, nor will it delete or alter any connections that it has not created.

There are three types of connection objects. Look in the Active Directory Sites and Services console, and focus on a particular connection object. Now look in the details pane under the name column. The three types of objects are displayed differently:

<automatically generated> (A connection object created by the KCC)
{GUID For a Name} (A connection object that was initially created by the KCC, but when modified by an administrator, the object gets converted to a non-KCC created connection object, with a globally unique identifier (GUID) for a name).
Any other name indicates a connection object that was created by the administrator.

A connection object does not restrict which partitions may be replicated between the two servers. The rule is that the directory will replicate all partitions that are common between the two servers. To illustrate, if GC1 (global catalog server 1) has two inbound connections from GC2 and GC3, GC1 will replicate all partitions in the enterprise from both GCs, even if there is redundant replication.

In the Active Directory Sites and Services console, under the Site name, Server name, NTDS Settings object, you can see, in the details pane, the connection objects for this server. A server in the details pane is the source for replication to the server in the results pane. Each connection object uses a schedule to control its replication times. The schedule is derived from the site link schedule, if one exists, and the replication interval on the site link. This schedule is also used by applications and other components to control their behavior.

Site Links

A site link is a connection object between two or more sites. A site link allows the administrator to assign cost, a replication schedule, and a transport for replication. Cost is an arbitrary value selected by the administrator to reflect the relative speed and reliability of the physical connection between the sites; the lower the cost, the more desirable is the connection.

To get a feasible cost factor (not including the operating cost of the link) for site links that correlates to the available bandwidth you could use the following formula :

The following table shows some examples of the formula applied to various line speeds.

Table 3.2 Cost factor calculation sample

Available bandwidth (kilobits/second)	Cost
9.6	1042
19.2	798
38.8	644
56	586
64	567
128	486
256	425
512	378
1024	340
2048	309
4096	283

If a site-link has more than two sites in it, it means that all of the sites in the site-link are considered connected in an N x N fully-connected star topology.

Site Link Bridges

A site link bridge (SLB) is a collection of two or more site links and provides a structure to build transitive links between sites and evaluate the "least cost path." See the <100 sites section for an extended discussion. Site link bridges are only significant when the "Bridge all site links" option is not enabled. This may be necessary in scenarios where not all naming contexts are in connected sites. Bridging all site links implies that all site links are transitive. When this mode is on, bridges are ignored, and all site links are considered to be in one big bridge. This is the default behavior in Windows 2000.

Sites and Domains

Windows 2000 Server Distributed Systems Guide Chapter 6, "Active Directory Replication" in the Windows 2000 Server Resource Kit, provides a detailed discussion of sites, site-link bridges and other advanced topics. These topics include site-link scheduling, bridgehead server design, a comparison of IP and SMTP transports and how replication works with respect to the three naming contexts (Configuration, Schema, and Domains).

More Information

Details on how the tuning measures can be applied are outlined later in this guide. For in-depth information about how these formulas were derived and tested, read Knowledge Base article 244368, "How to Optimize Active Directory Replication in a Large Network."

Determining the Choice of Bridgehead Servers

Before you can determine how many bridgehead servers you need in the hub, you must understand which servers should be bridgehead servers, and why. Inbound replication behaves differently than outbound replication, and therefore places a different load on the server. In conjunction with deciding which servers are to be partners for replication between sites, you need to consider the implications of managing the replication schedule.

Inbound Versus Outbound Replication

Inbound replication is serialized, meaning that a bridgehead server in a hub site will go through the list of its replication partners one by one when the replication window opens. Especially when network connections are slow, it might take a long time to finish inbound replication from all inbound partners. If the window closes before the hub domain controller has finished replicating changes in from the branch sites, the hub domain controller will nevertheless continue until it has reached the end of the list. This is important when you plan to use multiple replication cycles. If the list of incoming connections cannot be finished before the window closes, the replication will be carried over, and can potentially affect replication sequences that would start when the inbound window closes.

Outbound replication from the hub, however, is performed in parallel. While for inbound replication the number of replication partners determines the time it will take to finish, for outbound replication the limiting factors will be the performance of the bridgehead servers, the CPU load, and the disk I/O.

Determining Inbound Replication Partners for a Bridgehead Server

The following formula can be used to compute the number of inbound replication partners for a bridgehead server (hub domain controller):

(where: R = Length of replication window in minutes, and N = # of minutes a domain controller needs to replicates all changes)

Several factors that contribute to the time it will take a bridgehead server to replicate changes from the branch (N) include:

Whether the change model is centralized or decentralized
If decentralized, how many changes happen in a branch per day
Total number of domain controllers
Time needed to establish dial-on-demand network connectivity

A centralized administration model, and therefore a centralized change control model, is very beneficial for branch office deployments. Since incoming replication to the hub bridgehead servers is serialized, but outgoing replication can happen in parallel, it is important to initiate changes in the location from which the changes can be replicated off in a parallel fashion, outbound from the hub or data center. If the administration model is decentralized, however, and changes like creating new users and managing group memberships can happen in branches, a worst-case scenario (highest number of possible changes) must be used to compute the potential replication traffic to ensure that there is sufficient time and capacity for replication. To calculate the replication traffic, use the following spreadsheet, taken from the Microsoft Press book, "Building Enterprise Active Directories."

Table 3.3 Table to calculate the replication traffic (taken from the Microsoft Press book, "Building Enterprise Active Directories"

Even if the administration model is centralized, and no administration tasks can be performed in the branch, there will still be some changes initiated in the branch that have to be replicated to the hub, such as user and computer password changes. To take one example: Assume there are 25 users on average in a branch using 25 workstations. The default password change policy is 60 days. This results in approximately one password change per day that has to be replicated to the hub. The replication traffic for a single password change is around 18 kilobytes (KB).

For more information about network traffic caused by replication and for tools, refer to the Microsoft Press book, "Building Enterprise Active Directories."

Overhead for Replication

An additional factor for replication network traffic is the number of domain controllers for a specific naming context. When a domain controller initiates a replication sequence, it sends some information to its replication partner, so that the replication partner can learn how much information it already has about the naming context. Part of this information is a table with one entry for each domain controller that holds a full replica of this naming context. For inter-site replication, each naming context will be sourced separately.

Assume that there is a single domain branch office deployment. In each replication sequence, the three naming contextsSchema, Configuration, and Domainneed to be replicated. Also assume that there are no changes for the Configuration and the Schema containers, and only a single password change for the Domain Naming context. If there are only two domain controllers in the deployment, the traffic breakdown would be as follows:

13 KB to setup the replication sequence
5 KB to initiate replication of the domain naming context, including the changed password
1.5 KB for each schema and configuration naming context (where no changes occurred)

Total traffic in this case would be 21 KB.

Let's compare the network traffic numbers with a larger deployment consisting of 1,002 domain controllers, instead of two domain controllers. If there are 1,002 domain controllers, each domain controller would add overhead for the replication sequence of every naming context, both for the request and the reply. The overhead is 24 Bytes per DC. Therefore, for the additional 1000 domain controllers, add a request, and reply, times three naming contexts times the additional domain controllers times the overhead:

2 * 3 * 1000 * 24 bytes = 144,000 bytes = 141 KB

The total replication traffic is now 21 KB + 141 KB = 162 KB.

Estimating the Replication Payload

If an ISDN dial-up line is used, where only 50% of the 64-kilobit bandwidth can be used for replication, (the balance being used by other applications) it will take ((162 * 8) / (64/2)) secs = 40.5 secs. This is just for the data transmission. To get the total for N, add the overhead for creating the connection.

For dial-on-demand lines, ample time for session setup has to be added to the equation. Considering that in many cases the dial-on-demand line would already be open, a replication time of one minute per domain controller sounds feasible. However, if replication happens during off-times, the line has to be opened for replication specifically. The time it takes to open the lines adds to the time needed for replication. If the line opens for replication, in most cases a minimum of two minutes instead of one minute should be used. For a dial-up line, N is now two minutes.

Given the N that has been calculated and the replication window, R, the formula R/N tells you the total number of inbound connections a single bridgehead server could support.. It can be used to plan for the total number of bridgehead servers needed in hub sites. If the number of branches you have is higher than the number of bridgehead servers, two tactics can be used:

Increase the number of bridgehead servers
Increase the length of time available for inbound replication to the hub site

Which tactic you choose probably depends on the result of your calculation for outbound replication partners.

Determining Outbound Replication Partners for a Bridgehead Server

To determine the maximum number of outbound replication partners a hub bridgehead server can support in a given replication cycle, use the following formula:

where:

H* = *Sum of hours that outbound replication can occur per day

O* = *# of concurrent connections per hour of replication (a realistic value is 30 on the reference server specified below)

K* = *Number of required replication cycles per day (This parameter is driven by replication latency requirements.)

T* = *Time necessary for outbound replication (Depending on assumed replication traffic, this should be one hour or a multiple of one hour.)

This formula assumes that outgoing Active Directory replication and FRS replication to the branch can be completed in time T. In most cases, T= 1 hour is a good estimate. This is a very conservative estimate; replication to branch offices will most likely depend on the network speed.

Assuming that each branch has at least a dedicated 64-kilobit connection to the branch, and 50% of the bandwidth is available for replication, a total data volume of 14 MB can be submitted in one hour. For comparison, creating 5,000 users in Active Directory creates a replication traffic of around 1 MB.

This formula results in the total number of outbound replication partners that a single bridgehead server could support in one replication cycle.

Let's look at one example. The reference hardware for the bridgehead servers is again a four CPU XEON 500-megahertz (MHz) server with a good configuration of the storage system for Active Directory (7 hard drives, of which two are mirrored for the operating system, two are mirrored for the log files, and three are used in a RAID 5 array for the database).

The customer has the following requirements for replication:

Network traffic for replication is restricted to 12 hours per day.
Replication has to happen twice daily from the hub to the branch, and once daily from the branch to the hub.
Most changes will happen in the hub.
Changes from the hub can be replicated to the branch in less than one hour in all foreseeable scenarios. This includes traffic for FRS replication of the SYSVOL, created by policy changes.
Replicating changes from the branch to the hub take an average of one minute (these are only minor changes, such as the change of a user password).
Policies will never be changed in branches; thus there will be no outbound FRS replication traffic from the branches.
The hub site has to replicate to 1.200 branch sites.

The initial idea is to break up the time available for replication into three different slices:

Outbound from hub (four hours)
Inbound to the hub (four hours)
Outbound from the hub again (four hours) (This cycle is for changes received from branches which are now replicating to other branches.)

Outbound Replication Calculation

Outbound replication involves the following calculation: (8*30)/(2*1) = 120 outbound replication partners, assuming the following variables:

Concurrent connections to one bridgehead server (O): 30
Hours of replication: H = 8
Number of cycles: K = 2
Time needed for replication T = 1 hour

Each bridgehead server will serve 30 branch offices in four 1-hour groups, resulting in two 4-hour outbound replication cycles every eight hours. The administrator then distributes the 120 branches in groups of 30 during the 4-hour cycle.

Inbound Replication Calculation

To compute inbound replication, use the numbers above that resulted in 1-minute for inbound replication. The calculation also uses the 4-hour inbound replication parameter decided on above:

Using the formula:

where:

R* = *Length of replication window in minutes

N* = *# of minutes a domain controller needs to replicates all changes

R is calculated to be four hours times 60 minutes per hour = 240 minutes ( 4 * 60 = 240), and N is a given of one minute. One bridgehead server could replicate in from 240 / 1 = 240 branch office domain controllers.

Determining the Number of Bridgehead Servers

The formulas for inbound and outbound replication connections provide the basis for the determination of the number of bridgehead servers needed for a given scenario. The limiting factor is the smaller value of the two calculated, outbound and inbound. For outbound replication, the calculation made earlier resulted in the following: a maximum of 120 replication partners per bridgehead server, for inbound replication, and a maximum of 240 replication partners per bridgehead server. Therefore, the limiting factor is outbound replication with 120 possible partners.

Note: In Windows 2000 a domain controller can have a maximum of 800 replication partners. This is especially important to consider when determining the number of bridgehead servers you will need in your hub site.

To calculate the required number of hub site bridgehead servers for an organization, divide the number of branches by the number of replication partners per bridgehead server to get the number of bridgehead servers required for Active Directory replication:

Where B is the number of branches that will have a domain controller installed.

For our 1,200 sites, this is 1,200 / 120 = 10 bridgehead servers.

One tactic to minimize the number of bridgehead servers is to adjust the replication cycle. As we saw, the number of possible replication partners for inbound replication to the hub is very high, 240, compared to 120 for outbound. Therefore, you can adjust the replication schedule to:

5 hours outbound

2 hours inbound

5 hours outbound

Since the available time for inbound replication is now half the time, only 180 replication partners can be supported by each bridgehead server. Now, however, you can create 5 groups of 30 outbound replication partners, which results in 150 replication partners. The smaller of the two numbers (180 and 150) is now our limiting factor in the calculation of the number of bridgehead servers required: (1,200 / 150 ) = 8.

Note: When you plan for the number of bridgehead servers, be conservative and leave ample time for replication. If you do not plan for enough available time for replication, you will have replication backlogs. Besides good planning, monitoring of successful replication and replication failure (as well as monitoring of CPU utilization, disk I/O queues, and so on) are absolutely necessary to determine whether bridgehead servers are getting into an overload situation. Planning for monitoring of replication is a must for branch office deployment planning.

Recommendation Configure bridgehead servers for failover and redundancy.

The configuration of your replication topology depends on the number of sites you have. For a very large number of sites ( > 100 ), the KCC should not be used to compute the inter-site replication topology so that you control the redundancy and failover strategy for bridgehead servers. Connection objects should be created by using scripts, third-party tools, or by hand.

Configuring Replication Topology for Large Branch Office Deployments

There are two methods for creating a replication topology in a branch office environment:

Use the KCC to create connection objects. This method is recommended if your branch office environment is 100 sites or less. (See the Small Number of Sites section later in this chapter for more information.) However, with a large number of sites, the KCC experiences scalability issues, and either needs to be tuned, or cannot be used at all.
Use a scripted or third-party tool (or do it manually, if you prefer), creation of connection objects. This method is recommended if your branch office environment has more than 100 sites.

Non-KCC Creation of Connection Objects

There are two ways to create connection objects when not using the KCC:

An administrator creates the connection objects manually, using the Active Directory Sites and Services console.
Use the hub-spoke topology scripts included with this guide, or use a third-party tool.

Caution: Special care must be taken when connection objects are created without using the KCC. As soon as a connection object is created, it replicates directory service information immediately without obeying the schedule. This means that if you create 180 inbound connection objects at the same time to 180 branch office domain controllers, all 180 domain controllers will start replicating from the hub domain controller immediately. If different schedules are used on the connection objects to distribute the load, these schedules are used starting with the second replication cycle. This can create a high load on the bridgehead server. If you need to create a lot of connection objects at once, make sure that the server can handle the load or deploy the connection objects in batches according to the server's capabilities. Note that FRS replication will wait for the replication schedule for both the first and all subsequent replication cycles.

Disable Inter-Site Topology Generator

This option works well in typical hub-spoke configurations. It is generally used in configurations with hundreds of sites.

When automatic inter-site topology generation is disabled entirely, it becomes the responsibility of the administrator to create the necessary inter-site replication connection objects to ensure that replication data continues to flow across the forest. Typically, customers with enough sites to surpass the KCC limits employ hub-and-spoke network topologies to connect a corporate headquarters with a large number of homogeneous branch office sites. This symmetry greatly simplifies the process.

Before creating your own connection objects without the help of the KCC, there are several points to consider:

Server failures

Consider the case where the domain controller BODC1 in a branch office site is connected to the domain controller BH1 in the corporate hub site, and BH1 undergoes a hardware error, power failure, or some other catastrophic event. When automatic inter-site topology is enabled, the KCC takes care of adding an additional connection to temporarily replicate from another domain controller in the corporate hub site until BH1 returns online. Without automatic inter-site topology generation, to ensure that replication continues to occur in cases of server failures, redundant connections must be defined.

Define two connections inbound to BODC1, one from BH1, and one from BH2. If there are two domain controllers in the branch office, BODC1 and BODC2, then the second connection should be from BH2 to BODC2. This allows updates to be replicated from the corporate hub site in the event that one of the two branch office domain controllers fails. Redundant connections defined in this manner may force the same Active Directory updates to be replicated more than once unless the IP transport is being used and all connections inbound to the site have the same destination domain controller within the site.

When using the SMTP transport or multiple destination domain controllers, the replication schedule should be alternated such that the updates from one source are received, applied, and replicated within the destination site before the request to the second source is made. Extending the example above, the first connection might replicate on odd hours and the second connection replicate on even hours.

Global Catalog placement

If a site contains global catalog servers, one or more of the global catalog servers must be used for replication to and from the site. This will ensure that the global catalog servers remain synchronized.

Domain placement

If domain controllers of a particular domain are spread out over multiple sites, one or more domain controllers of that domain must be used for replication with other domain controllers of that same domain. This ensures that domain data is replicated across all domain controllers of that domain. It is not sufficient for a domain controller from domain A in site 1 to replicate solely with a global catalog server from domain B in site 2 when site 2 contains a domain controller for domain A. Because the global catalog server for domain B has only a subset of the attributes for objects in domain A, it cannot act as a conduit to replicate attributes beyond this set (between the domain controllers for domain A).

Load balancing

You must consider the distribution of the inbound and outbound replication load. For example, if you have 100 domain controllers in your corporate hub site and 1,000 branch offices with one domain controller each, you do not want to configure all 1,000 branch office domain controllers to replicate from the same domain controller in your hub site. Instead, balance the load such that each domain controller in the corporate hub communicates with 10 branch office sites. Since only one inbound replication can occur at a time and communication with branch office sites is often over slow wide area network (WAN) links, failing to load balance will not only increase the CPU and memory load on the hub site domain controller, but this may also result in very large backlogs of data to replicate.

If the environment has acceptable bandwidth, a single run of the KCC can also be used to initially create connections that can then be adapted by an administrator. If the inter-site KCC will not be run periodically thereafter, the administrator must define additional replication connections so that replication continues to function if the source domain controller identified by the first connection fails. If all existing connections fail and the inter-site KCC is not re-run, the administrator must connect directly to the target domain controller and create a connection to a domain controller that is reachable. In configurations with high volatility (when the optimal source domain controllers are occasionally unavailable for long periods of time due to network failures) it is advisable to have more than one extra connection.

Planning Staggered Replication Schedules

With the implementation of non-KCC generated connection objects, the management of the replication schedule becomes a critical configuration. Usage of a staggered replication schedule among branch servers is one approach to managing the traffic between the hub and the branches.

Assume that you want to trigger replication from the hub bridgehead servers to the branch servers once per hour. You set the schedule on each of the even-numbered branch office domain controllers (BODC0, BODC2, ...) so they pull from hub BH1 during the even hours of the day and pull from hub BH2 during the odd hours of the day. Similarly, set the schedule on each of the odd numbered branch domain controllers so that they pull from hub BH2 during the even hours of the day and pull from hub BH1 during the odd hours of the day. The schedules on connection objects under the two-hub servers are set in the same way. This automatically balances the load on the hub domain controllers across all of the branch servers. Further, if one of the hub servers were to fail then the load on the remaining server remains unchanged but now the branch servers get their replication updates every two hours instead of every hour.

If you wanted to replicate with the branches every two hours, you can spread the load out on the hub servers by setting the schedules on half of the even numbered branches to replicate with hub BH1 during the even hours and the other half of the even numbered branches to replicate with hub BH1 during the odd hours. Do the same thing for the odd numbered branches and hub BH2. Additionally, you can construct a redundant set of connections where replication to each branch alternates between hubs every two hours by making the period on these schedules four hours and then adding a second set of connections, offset by two hours and pulling from alternate hubs.

Figure 3.1: Manually configured fault tolerant topology with load balancing

The Hub-Topology Scripts

In a hub-and-spoke topology, the hub site contains one or more well-connected bridgehead domain controllers and there are some number of branch domain controllers, each in their own branch office site. A set of hub-topology scripts are included with this guide, in the \ADBranch\BranchDC\Mkdsx folder. You can use these scripts to create connection objects for the domain controllers in a given domain. Given a list of hub and branch servers, you can use the scripts to create the replication topology automatically. The scripts are designed to balance the load among the hub servers, as well as provide redundancy in case of failover.

When using the scripts to create your connection objects, you will use four main files:

Topo.dat. Contains a list of hub bridgehead servers and a list of branch office domain controllers. This file must be manually built with all of your hub bridgehead servers and the first domain controller for each branch office.
Mkhubbchtop.cmd. Takes the Topo.dat file as input and builds the Mkdsx.dat file with the hub-and-spoke topology.
Mkdsx.dat. Contains the hub-and-spoke topology to be used by Mkdsx.cmd. It was generated by running Mkhubbchtop.cmd with the Topo.dat as input.
Mkdsx.cmd. Creates the connection objects for the hub-and-spoke topology specified in the Mkdsx.dat file.

Consider a hub site with two domain controllers (BH1 and BH2) that are well connected by a high-speed link. These two domain controllers connect with 200 branch office domain controllers (BODC0 thru BODC199) in a hub-and-spoke arrangement. To make this environment work, create manual connection objects and set the schedule attribute in each connection object appropriately. To implement the hub-and-spoke topology for this environment, create 200 connection objects under the BH1 server in the Active Directory, where each connection object refers to one of the branch office domain controllers.

You should also create the same set of connection objects under the BH2 server. In addition, each branch office domain controller needs two connection objects, referring to BH1 and BH2 respectively. While creating all of these connections manually is possible, the likelihood of an error, or omitting a connection object increases with the number of hub and branch servers.

The hub-spoke topology scripts included with this planning guide can be used to build the hub-and-spoke replication topology, creating all of the connection objects between the domain controller for you. To build this environment you would edit Topo.dat to include the bridgehead servers (BH1 and BH2) and all of the branch office domain controllers. You could then use Mkhubbchtop.cmd and Mkdsx.cmd to build the connection objects for this environment.

Note: For the specifics about how to modify Topo.dat and run Mkhubbchtop.cmd, refer to Deployment Chapter 4, "Planning the Hub Site for Branch Office Environments" in the Active Directory Branch Office Deployment and Operations Guide. For the specifics on using Mkdsx.cmd to build the connection objects, refer to Deployment Chapter 7, "Pre-shipment Configuration of the Branch Office Domain Controller " in the Active Directory Branch Office Deployment and Operations Guide.

Connection Schedules and Load Balancing

Connection schedules are an attribute associated with each connection object on a domain controller. Connection schedules contain a 7x 24 array of bytes, one byte for each hour in a 7-day week (UTC time zone). The low 4 bits of each byte are used to indicate the number of times replication is attempted in that hour. If all 4 bits are set, the replication is attempted 4 times in that hour, if all 4 bits are clear then no replication is attempted in that hour.

Note: Once replication begins in an hour it can run over into the next hour(s) as needed to replicate all the data.

The upper 4 bits in each schedule byte are reserved for future use. The example schedule below (each entry is 2 hex digits) would trigger replication of both the SYSVOL (by FRS) and the Active Directory content on the even hours of the week. This is how the schedule information is interpreted by Active Directory and FRS, the user interface may not present the schedule in this form.

                                  Hour of the day 
    00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
sun=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
mon=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
tue=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
wed=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
thu=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
fri=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00  
sat=01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00

Scheduling Hub Server Load Balanced Replication

Assume you want to trigger replication from the hub servers to the branch office domain controllers once per hour. Then you would set the schedule on each of the even-numbered branch office domain controllers (BODC0, BODC2, and so on) so that they pull from hub BH1 during the even hours of the day and pull from hub BH2 during the odd hours of the day. Similarly, set the schedule on each of the odd-numbered branch office domain controllers to pull from hub BH3 during the even hours of the day and pull from hub BH2 during the odd hours of the day. The schedules on connection objects under the two hub servers are set in the same way. This automatically balances the load on the hub domain controllers across all the branch office domain controllers. Further, if one of the hub servers were to fail then the load on the remaining server remains unchanged but now the branch office domain controllers get their replication updates every two hours instead of every hour.

If instead you wanted to replicate with the branches every two hours then you could spread the load out on the hub servers by setting the schedules on half of the even-numbered branches to replicate with hub BH1 during the even hours and the other half of the even-numbered branches to replicate with hub BH1 during the odd hours. It's the same for the odd-numbered branches and hub BH2. In addition, by making the period on these schedules four hours and then adding a second set of connections offset by two hours and pulling from alternate hubs you can construct a redundant set of connections where replication to each branch alternates between hubs every two hours.

Scripting Load Balancing

Staggering the replication times as discussed in the above examples can be done using the hub-spoke scripts, however it requires some additional configuration and planning. You will need to:

Create two Topo.dat filesone listing the even branch office domain controllers (Eventopo.dat) and the other one listing the odd branch office domain controllers (Oddtopo.dat). Remove the /auto_cleanup switch from both topo.dat files.
Create two mask.txt filesone with the replication schedule for the even branch office domain controllers (Evenmask.txt) and the other with the replication schedule for the odd branch office domain controllers (Oddmask.txt).
In the Eventopo.dat file, change Mask.txt on the "/schedmask mask.txt" line to evenmask.txt and change Mkdsx.dat on the "/output mkdsx.dat" line to evenmkdsx.dat.
In the Oddtopo.dat file, change Mask.txt on the "/schedmask mask.txt" line to oddmask.txt and change Mkdsx.dat on the "/output mkdsx.dat" line to oddmkdsx.dat.

As you create your branch office domain controllers and are ready to create the connection objects between them and the bridgehead servers, you will run the Mkxdsx.cmd file with either Evenmkdsx.dat or Oddmkdsx.dat, depending on the domain controller.

Using KCC With a Small Number of Sites (< 100)

If a number of sites is less than or equal to 100, the KCC can be used to compute the inter-site replication topology.

KCC Configuration

If the KCC is used for both intra-site and inter-site replication, the administrator creates site links and site-link-bridges as necessary. The configuration on the site links, such as cost factor and schedule, are used by the Inter-Site Topology Generator (ISTG) to compute the inter-site replication topology and create connection objects for the bridgehead servers. The schedule from the site links is inherited by the connection objects. These connection objects will have a GUID Based Name and the description of "<automatically created>" in the sites and services manager.

Using the KCC is the best solution with a small number of sites, as the KCC provides automatic failover if connections between servers become unavailable.

Site Links

Site links have a replication interval and a schedule that are independent of cost. The KCC translates the site link information into connection objects. Cost is used by the KCC to prefer one site link path over another.

There are scenarios where the KCC has to go through multiple site links to create a replication path. This is the case if not all naming contexts (or all the domains) are available in each site, but domain controllers holding the same naming context are deployed to sites that are not included in at least one common site link. In this case, the KCC has to go through a series of site links to create a replication path, and create a connection object. Note that the connection object will be created between the two domain controllers directly. The chosen path through the site links cannot be seen.

The KCC can only create a connection object through multiple site links if transitiveness of site links is enabled. If the KCC cannot connect all domain controllers or global catalog servers so that all naming contexts are connected, the KCC will generate error 1311 in the event log. In other words, the KCC cannot create a spanning tree replication topology, because a spanning tree requires all sites to be connected with all naming contexts.

If the KCC logs this error in the Event Log, the KCC goes into a special mode called conservation mode. In conservation mode, the KCC will not delete any connection objects that are not required anymore, or could be deleted to improve the efficiency of the replication topology. If the administrator observes a situation where unnecessary connection objects are not deleted, it is a good practice to check the Event Log for the KCC error noted above, and resolve the configuration error by closing the topology by adding more site links or site-link bridges.

KCC Generation of Connection Objects

In this scenario, the KCC runs on its default schedule and determines the least cost spanning tree configuration. This is only feasible in environments where the number of sites does not exceed 100. The KCC always selects one server in a site as the bridgehead server. When two sites are connected with each other, no load balancing (using multiple bridgehead servers for the same naming context) or staggering of schedules is done. If the hub location is implemented as a single site and the KCC is used to select bridgehead servers and create connection objects, this can lead to an overload situation on one server.

If the Administrator chooses to modify one of these connection objects it will be converted into a manual connection object. This conversion means that the administrator now owns the connection object, and controls it completely. The KCC will respect the connection object in the future and consider it an existing connection when it computes whether additional connection objects have to be created. It will be used by the KCC as long as the connection can be verified. In the case of an error condition (for example, because one site is unavailable for some time, or a bridgehead server is taken offline for maintenance), the KCC will create a new temporary connection object rerouting replication until the error condition has been solved. Then replication will fall back to the original connection object, and the temporary connection object will be deleted.

Hub Sites

The KCC selects one server per site and naming context as a bridgehead server. As mentioned earlier, when a large number of branches have to be served from a single hub site, an overload condition can easily occur on a single hub server. This happens because it is highly likely that the KCC selects one server as the replication partner for all branches. In this case, when leaving the choice of bridgehead server to the KCC, you should consider splitting the single hub site into two or more, which will force the KCC to distribute the branches among the hub servers in the various hub sub-sites. The branch office sites will then be connected to one of the hub sites with a site link.

Figure 3.2: Branches connected to hub sub-sites

The picture above shows three different sites in a data center, HubSiteA, HubSiteB, and HubSiteC. Each of them has one server that takes the role of a bridgehead server. Each branch is connected to one of the hub sites through a site link. There is also one site link in the data center that connects the three hub sites. The schedule on this site link is very different from the site link used from the branches to the hubreplication is always availableand in order to simulate the same behavior as intra-Ssite replication, the notification mechanism is enabled on the site link. This ensures that updates will be sent within 5 minutes to other sites, and it takes a maximum of 15 minutes until new information is replicated to all hub sites.

Summary

By now you should have an understanding of the replication issues presented by a large branch deployment with slow links. This understanding enables you to decide where to place your bridgehead servers, and how many of them are needed for successful replication with your branches. This chapter also should have helped you understand the configuration necessary for creating the connection objects, and for scheduling the replication.

The next chapter outlines the planning necessary to create a hub or data center site.

For More Information

For more information, refer to the resource list at the end of Planning Chapter 1, Overview of Planning Active Directory for Branch Office Environments.