Active Directory Replication Traffic

Article
12/09/2009

By Andreas Luther—Microsoft Enterprise Services

Editor's Note This article is excerpted from "Optimizing Network Traffic," which is part of the Microsoft Press Notes From the Field series that outlines the best system management practices and procedures. For more information on this and other Microsoft Press books, go to https://www.microsoft.com/mspress/.

Introducton

The Windows 2000 Active Directory extends the replication model introduced in Microsoft Exchange Server 4.0—a directory based on a database and a flexible replication engine. The new Active Directory model uses an updated replication architecture to meet the needs for an enterprise directory service. The new design results in finer replication granularity and an architecture that allows administrators to tune replication for specific environments, controlling what is replicated to whom, how, and when. Active Directory replication is designed to work well without tuning, but if you have to perform tuning you'll need a solid understanding of the architecture and the resulting network traffic.

This chapter introduces the new Active Directory replication architecture, shows how to detect network packets that are caused by replication, and presents some network traffic statistics that will help you define an efficient replication topology. It is not a complete discussion of this topic—that level of detail is available in other sources—but is instead intended as a functional overview useful for implementation planning.

Replication Architecture

The Active Directory is made up of one or more naming contexts or partitions. A naming context is a contiguous sub-tree of the directory (such as the directory schema) that is a unit of replication. In the Active Directory each domain controller always holds at least three naming replicas:

The schema
The configuration (replication topology and related metadata)
One or more domain naming contexts (sub-trees containing the actual objects in the directory)

The schema container defines the objects (such as users) and attributes (such as telephone numbers) that can be created in the Active Directory, and the rules for creating and manipulating them. Schema information (which attributes are mandatory for object creation, what additional attributes can be set, and what attribute data types are used) is replicated to all domain controllers to ensure that objects are created and manipulated in accordance with the rules.

The configuration container includes information about the Active Directory as a whole—what domains exist, what sites are available, what domain controllers are running in the particular sites and domains, and what additional services are offered. All enterprise domain controllers need this information to make operational decisions (such as choosing replication partners) so it is replicated to all of them.

A domain naming context holds objects such as users, groups, computers, and organizational units. A full domain naming context replica contains a read-write replica of all information in the domain—all objects and attributes. A domain controller holds a full replica of its domain naming context. A partial domain naming context replica contains a read-only subset of the information in the domain—all objects, but only selected attributes. A domain controller that's a global catalog (GC) server contains a partial replica of every other domain in the forest (and a full replica of its own domain.)

GC servers speed up enterprise-wide directory searches by acting as indexers for the enterprise, holding in their database a copy of selected attributes for all enterprise objects—a small set of common object attributes typically meaningful in searches: first and last names for user objects, locations for printers, etc. Thus the GC optimizes a search for, say, a specific color-printer, by consulting its database. Even if the specific attribute is not found in the GC database, the user or application can at least find out which domain controllers to contact for more information.

Besides that, global catalogs are also needed for logon operations. GCs servers map user principal names (FredG@Acme.com) to accounts (HQ.Acme.com\FredG). Only GC servers know all user memberships in universal groups, so the logon process must communicate to a GC to add the security IDs (SIDs) of the universal groups to the user's access token, if the user is a member of a universal group.

Normal replication mechanisms keep GC server partial domain replicas up to date. When Active Directory builds a replication topology for a naming context, it includes the partial domain replicas. Thus, a partial domain replica cannot act as a replication source for a full domain replica, because the partial domain replica only knows about a subset of attributes. A partial domain replica can act as a replication source for another partial domain replica. This allows for very low-cost topologies.

Object Model

Objects in the Active Directory are defined by their attributes' types and values. An object receives its identity from its Global Unique Identifier (GUID—the only attribute that cannot be changed). It keeps track of a system object in the Active Directory even after a move between domains changes its distinguished name (DN).

As noted, the schema defines which attributes can be used on objects. The Active Directory's Extensible Storage Engine (ESE) reserves space only for attributes with actual values set on them (attributes containing values). This helps conserve database space because user objects have more than 100 defined attributes, but only 20 to 30 are commonly used. Using this model, the replication engine minimizes traffic by replicating only object attributes that have values assigned, and only attributes with values that have changed since the last replication.

Multi-Master Replication

The Windows 2000 Active Directory uses a multi-master replication topology that allows you to use any domain controller to manipulate the domain database and to replicate changes to its replication partners.

Domain controllers use update sequence numbers (USNs) to see if replication partners are up to date. In the case of collisions (when the same attribute of the same object is manipulated on two domain controllers at the same time) the last writer wins. To determine last writer status, an algorithm checks: attribute version number, then attribute timestamp, then the GUIDs of the domain controllers that performed the write operation. This ensures that the attribute value is determined consistently and locally, reducing communication between domain controllers.

Replication Topology

Replication efficiency is enhanced by a flexible replication topology that can reflect the structure of an existing network. The Active Directory's replication topology generator runs as part of the Knowledge Consistency Checker (KCC). You feed the KCC information on the cost of sending data from one location to another, and which domain controllers are running in the same location. Using this, the KCC builds an inter-site replication topology that is a spanning tree based on low-cost routing decisions between remote locations, and a more strongly connected intra-site topology. You can disable the KCC topology generator and manually create the connection objects required for replication. During this process, the Active Directory logging mechanism identifies domain controllers that appear to be isolated from the enterprise-wide replication.

Note: Using replication topology generator is strongly recommended: it simplifies a complex task, has a flexible architecture that reacts to failures and to changes you make later in the network topology, and helps compute the lowest-cost topology.

The next sections discuss essential replication concepts.

Connection Objects

The fact that one domain controller uses another as a source for replication information is expressed as a connection object in the Active Directory. These define incoming replication only. For example, if domain controller DC1 has a connection object to DC2, then DC1 can use all naming contexts on DC2 as a source for updated information, but DC2 cannot use DC1 unless it creates a connection object that defines DC1 as a source. Once a connection object has been created, it can be used to replicate information from all naming contexts.

Sites

Used in the Active Directory to express proximity of network connection, a site is defined as an IP subnetwork—a concept used in all routed TCP/IP network environments. A legal definition of a site could be 177.177.177.0/24, where 177.177.177.0 describes the IP subnet, and 24 tells how many bits are used to define it. The remaining bits of an IP address (32 – 24 = 8 bits in this example) can be used to define hosts.

A site consists of one or more subnets (unique network segments). For example, in a network with three subnets in Redmond and two in Paris, the administrator can create two sites: one in Redmond and one in Paris, and add the subnets to the local sites.

The Active Directory uses site information in these ways:

The KCC generates a replication topology more strongly-connected within a site than between sites (adds some traffic but reduces intra-site replication latency).
Does not compress intra-site replication messages (adds some traffic but reduces CPU utilization on DCs).
Intra-site DC replication is change-based; inter-site DC replication is scheduled.
Client machines use site information to find nearby DCs for logon operations.
The Active Directory uses site information to help users find the closest machine that offers a needed network or a third-party service.

Intra-Site Replication

Intra-site replication (between domain controllers in the same site) attempts to complete in the fewest CPU cycles possible. Because domain controllers should be able to serve clients quickly for logons, searches, etc., the network connection between them is assumed to have lots of available bandwidth and reliable connection.

Replication is a trade off: data should be as accurate as possible on all domain controllers, which means that latency should be as small as possible, which means fast updates, which means frequent replication. On the other hand, frequency doesn't always equal efficiency. For instance, if there is a bulk import of directory objects, changes in a domain controller database will become out of date after 10 seconds, but it makes no sense to replicate the changes until the database is stable (the bulk import is complete).

Intra-site replication avoids unnecessary network traffic by introducing a change notification mechanism that replaces the usual polling of replication partners for updates. When a change is performed in its database, a domain controller waits a configurable interval (default 5 minutes), accepts more changes during this time, then sends a notification to its replication partners, which pull the changes. If no changes are performed for a configurable period (default 6 hours) the domain controller initiates a replication sequence anyway, just to make sure that it did not miss anything.

Attribute changes considered security-sensitive are immediately replicated and intra-site partners are notified: lockout of user accounts, change of domain trust passwords, some changes in the roles of domain controllers.

Intra-site replication topology is a bi-directional ring built using domain controller GUIDs. If a ring contains seven or more DCs, bi-directional connections are added to keep the path between any pair to less than three hops. New DCs configured in the site are included in the ring. One bi-directional ring is built for each naming context available in a site. Schema and configuration information share the same topology and only one bi-directional ring is built for them, because they must be replicated to all domain controllers.

If all of a site's domain controllers are in the same domain, the two rings are the same—the ring that includes all site domain controllers is equivalent to the ring that includes all domain controllers in that domain. You have more than one distinct ring only when your site contains more than one domain: 2 domains = 3 rings, 3 domains = 4 rings, and so forth. To find out what rings exist in a site, use the Active Directory Sites and Services Manager snap-in to check the connection objects and see what incoming replication connections they represent.

Inter-Site Replication

Inter-site replication is based on the assumption that the WAN is connected by slower links, so it is designed to minimize traffic rather than CPU cycles. Before being sent out, data is compressed to about 10% to 15% of original volume.

Inter-site replication topology is a spanning tree. As long as a replication route can be constructed between all sites in the enterprise, the replication topology is functional. It is not necessary to create additional links. The administrator decides which sites are connected, and can create a site link that allows domain controllers from any site to talk to domain controllers in any other site. Site links are based on the cost of replication (which reflects the speed and reliability of the underlying network) and its schedule (which defines a window when replication is allowed over the link).

Unlike Intra-site replication, inter-site replication does not use a notification process. Inter-site replication can be fully scheduled by the administrator on a per site-link basis.

Since there is no notification between the replication partners, a domain controller does not know which naming context was updated on the sourced replication partner. Therefore, it has to check all existing naming contexts on the source machine. A normal domain controller (that is, one that uses a GC as replication partner) will check only the normal three naming contexts on the GC (schema, configuration, and its domain) but never the partial naming contexts of other domains. For that reason, the initial replication setup traffic is slightly higher in the inter-site case. But if many objects are replicated, compression kicks in and makes this kind of replication more efficient.

Replication Transports

While intra-site replication supports only replication based on remote procedure calls (RPCs), the initial release of Windows 2000 offers two transports for inter-site replication:

Synchronous (scheduled) via RPC over TCP/IP
Asynchronous via simple mail transfer protocol (SMTP) using the Collaborative Data Objects (CDO v2) interface and the SMTP component in IIS 5, that is included in Windows 2000.

The intra-site RPC transport does not support data compression; the inter-site transports, both RPC and SMTP, do. RPC-based replication can be used for any kind of replication—intra-domain, configuration information, or global catalog information.

The SMTP transport has some restrictions: It can be used to replicate configuration and global catalog information, but cannot be used for replication between domain controllers that belong to the same domain and therefore have to replicate the full domain-naming context. The reason for this restriction is that some domain operations (for example: global policy) require the support of the file replication service (FRS) that does not yet support an asynchronous transport like SMTP for replication.

How to Measure Replication Traffic

Windows 2000 gives you some tools for assessing network load:

Performance Monitor
Event Log
Network Monitor

The Windows 2000 Performance Monitor looks slightly different than previous versions available in Windows NT. It is implemented as a Microsoft ActiveX control that can be used either as an Microsoft Management Console (MMC) snap-in, or as a control in a Web page. This allows you to monitor servers and network traffic from a browser.

The Performance Monitor counters most frequently used to measure replication traffic appear under the NTDS object. They are:

DRA Inbound Bytes Total. Total number of bytes replicated in. Sum of the number of uncompressed bytes (never compressed) and the number of compressed bytes (after compression).
DRA Inbound Bytes Not Compressed. Number of bytes replicated in, that were not compressed at the source (which typically implies they arrived from other DSAs in the same site).
DRA Inbound Bytes Compressed (Before Compression). Original size in bytes of inbound compressed replication data (size before compression).
DRA Inbound Bytes Compressed (After Compression). Compressed size in bytes of inbound compressed replication data (size after compression).
DRA Outbound Bytes Total. Total number of bytes replicated out. Sum of the number of uncompressed bytes (never compressed) and the number of compressed bytes (after compression).
DRA Outbound Bytes Not Compressed. Number of bytes replicated out that were not compressed (which typically implies they were sent to DSAs in the same site, or that less than 50,000 bytes of replicated data was sent).
DRA Outbound Bytes Compressed (Before Compression). Original size in bytes of outbound compressed replication data (size before compression).
DRA Outbound Bytes Compressed (After Compression). Compressed size in bytes of outbound compressed replication data (size after compression).

You can also retrieve a subset of this information from the Event Log. The Event Log for directory service logging is set to the lowest level by default. This reduces the size of the log files, and restricts logging to important events such as errors, lost connections to replication partners, etc. Activating higher levels of event logging consumes CPU time and can present you with the tedious task of finding the right information in huge log files.

The third tool is Network Monitor. Measuring replication traffic with a network "sniffer" helps you isolate network packets that belong to replication between domain controllers. These figures differ for the RPC transport and the SMTP transport.

Sniffing RPC Replication Traffic

An easy way to detect replication traffic is to start Network Monitor and then force replication, which you can do either by using the Sites and Services Administration snap-in in MMC, or REPADMIN.EXE, which allows you to specify the particular naming context that has to be replicated. Once REPADMIN.EXE returns and reports that replication was successful, you can stop Network Monitor.

At this point, however, you have measured all incoming and outgoing packets from the server machine. Some could have been sent by other services running on the machine, such as NetLogon or the file server.

It is not easy to determine which packets belong to replication. One way would be to use the IP port that is used by replication. In general, this is not possible because replication uses dynamic RPC port mapping (as a means of security). This process allows the replication server to request an available port from the RPC port mapper interface, which tells the requesting client the port used by the replication interface, which the client then uses to communicate with the replication server.

One way to get around this is to configure the IP port used on the replication server by adding this value in the registry:

HKEY_LOCAL_MACHINE \CurrentControlSet \Services \NTDS \Parameters \TCP/IP Port

You can set this to 1349 (decimal), for example, to make 1349 the IP port, then find all replication-related packets by filtering on that port with Network Monitor.

Sniffing SMTP Transport Replication Traffic

Finding packets that belong to SMTP replication is easier because the SMTP service always uses port number 25 (decimal). Filtering network traffic using this port number shows only the SMTP-related replication packets.

Traffic Scenarios

This section discusses the traffic caused in two scenarios: single attribute replication, then complete object replication (such as users, groups, etc). It examines replication between domain controllers in the same domain (intra-site and inter-site), and global catalog server replication (intra-site and inter-site). The inter-site GC replication examines the RPC and SMTP transports. In the other cases, only the RPC transport can be used.

This is the basic scenario:

Single Attribute Changes

The Windows 2000 Active Directory has a replication granularity of one attribute: if only a single attribute changes, only that one new value is sent over the network.

There are two areas to examine: how efficient single-property replication is when compared to whole-object replication, and how replication traffic grows with attribute size.

The tests use non-security attributes because replicating security-related attributes (attributes owned by the security accounts manager—SAM) involves a special domain controller, the PDC emulator, that adds non-replication traffic onto the wire. (Passwords are tested later, however.) Specifically, the tests use string-data attributes taken from the user object because these affect traffic growth clearly.

Replication of one attribute with string size 1 – 75 characters.

Character	Bytes	Diff	Character	Bytes	Diff	Character	Bytes	Diff
1	4396		26	4444	0	51	4492	0
2	4396	0	27	4444	0	52	4492	0
3	4396	0	28	4444	0	53	4492	0
4	4396	0	29	4444	0	54	4492	0
5	4396	0	30	4444	0	55	4492	0
6	4396	0	31	4444	0	56	4492	0
7	4396	0	32	4444	0	57	4508	16
8	4396	0	33	4460	16	58	4508	0
9	4412	16	34	4460	0	59	4508	0
10	4412	0	35	4460	0	60	4508	0

Replication of one attribute with string size 1 – 75 characters. (continued)

Character	Bytes	Diff	Character	Bytes	Diff	Character	Bytes	Diff
11	4412	0	36	4460	0	61	4508	0
12	4412	0	37	4460	0	62	4508	0
13	4412	0	38	4460	0	63	4508	0
14	4412	0	39	4460	0	64	4508	0
15	4412	0	40	4460	0	65	4524	16
16	4412	0	41	4476	16	66	4524	0
17	4428	16	42	4476	0	67	4524	0
18	4428	0	43	4476	0	68	4524	0
19	4428	0	44	4476	0	69	4524	0
20	4428	0	45	4476	0	70	4524	0
21	4428	0	46	4476	0	71	4524	0
22	4428	0	47	4476	0	72	4524	0
23	4428	0	48	4476	0	73	4688	164
24	4428	0	49	4492	16	74	4688	0
25	4444	16	50	4492	0	75	4688	0

The table shows that the replication of a 1-character string property causes 4396 bytes of traffic. Increasing the string size up to 8 characters does not change the value. From 9 to 16 characters the traffic increases by 16 bytes, an increment that recurs up to 72 characters. This result is not surprising because the Active Directory uses Unicode to store strings in the database. Each Unicode character is 2 bytes, so for every 8 characters a 16- byte buffer is created. From the 72nd to the 73rd character is a big jump. This was caused by network fragmentation; the packet that contained the changed value reached its maximum size of 1,500 bytes and a new packet had to be created (causing overhead for an empty packet). After this jump, the 8-character/16-byte pattern resumes.

The next table shows what happens when multiple attributes are changed at the same time. It begins with one attribute and increases to six. Two things are examined: how traffic grows as attributes grow from one to ten characters, and how replication traffic increases as attributes increase.

Changing multiple attributes simultaneously.

One Attribute			Two Attributes		Three Attributes		Four Attributes		Five Attributes		Six Attributes
Character	Bytes	Diff	Bytes	Diff	Bytes	Diff	Bytes	Diff	Bytes	Diff	Bytes	Diff
1	4396		4460		4524		4736		4800		4864
2	4396	0	4460	0	4524	0	4736	0	4800	0	4864	0
3	4396	0	4460	0	4524	0	4752	16	4816	16	4880	16
4	4396	0	4460	0	4524	0	4752	0	4816	0	4880	0
5	4396	0	4476	16	4688	164	4768	16	4832	16	4912	32
6	4396	0	4476	0	4688	0	4768	0	4832	0	4912	0
7	4396	0	4476	0	4704	16	4784	16	4848	16	4928	16
8	4396	0	4476	0	4704	0	4784	0	4848	0	4928	0
9	4412	16	4492	16	4720	16	4800	16	4864	16	4960	32
10	4412	0	4492	0	4720	0	4800	0	4880	16	4960	0

Replicating one attribute with a one-character string size takes 4396 bytes. Two attributes with one-character strings each take 4460 bytes—a difference of 64 bytes. The growth from two attributes to three is again 64 bytes. From three attributes to four is a bigger jump, however, because the increases exceeds the maximum Ethernet packet size and an extra packet has to be created. From four to five and from five to six attributes, the pattern resumes the 64-byte jump.

It can be surmised, then, that it takes 64 bytes to replicate one attribute if it has a one-character string. Ten-character-string attributes show an increase of 80 bytes per attribute (except when a new packet has to be created).

These tests again show the 8-character/16-byte jump. For example, in the column for 4 attributes, there is a 16-byte jump after two characters are added to the string size. This is the same behavior seen during the single-attribute replication.

Since the replication of single attributes causes little network traffic, it is not useful to research the behavior in different domain/site scenarios.

Object Replication

For object replication, all domain, sites and GC scenarios are covered. Here is the scenario:

Figure 11.1: Replication scenario for object replication testing.

Figure 11.1: Replication scenario for object replication testing.

This small environment consists of two domains: Microsoft.com and Sales.Microsoft.com. Each has two domain controllers. Microsoft.com has Red-MS1 and Red-MS2: Sales.Microsoft.com has Red-Sales1 and Mil-Sales2.

The network is distributed over two sites: Redmond (headquarters) and Milan. Both sites have one global catalog: Red-MS1 for Redmond, Mil-Sales2 (the only domain in Milan) for Milan.

This small scenario covers all replication cases.

Between two domain controllers that belong to the same domain, both intra-site (Red-MS1 and Red-MS2) and inter-site (Red-Sales1 and Mil-Sales2) replication occurs
Partial global catalog server replication, both intra-site (Red-MS1 and Red-Sales1) and inter-site (Red-MS1 and Mil-Sales2). Replication between Red-MS1 and Mil-Sales2 can use either the RPC or the SMTP transport.

Intra-Site Domain Replication (Red-MS1 and Red-MS2)

Intra-Site replication assumes fast and reliable network connections. This should be a 10-Mbps Ethernet network or a comparable network topology. The emphasis on intra-site replication is on using the fewest possible CPU cycles at the domain controllers. This frees the domain controllers for other tasks, such as client logon, search operations etc. This is why data compression is not available within a site; it would cause additional CPU load.

In this sample case, both the global and the universal groups had no members. For all objects, only the mandatory attributes were set.

The following table shows how many bytes were created when objects were sent from one replication partner to another:

Intra-site domain replication.

#Objects	Users	Global groups	Universal groups	Volumes
1	13,019	11,309	11,145	10,277
10	47,037	26,902	26,823	22,848
100	386,148	187,754	185,606	149,736
500	1,914,087	905,015	906,079	715,577

1,000

3,818,256

1,815,170

1,803,090

1,436,085

This shows that the network traffic is absolutely predictable. Replicating 1,000 users, for example, causes twice as much traffic as replicating 500 users.

Replicating user objects causes more traffic than replicating objects that are not security principals. This is not surprising; the same effect was evident in the database sizing tests in Chapter 10.

The next test examines the replication of additional attributes. For the test, user objects were created, and filled with different sets of attributes. In the first test, only mandatory attributes were set. For the next test, one attribute was added to the mandatory attributes, then three added, then five added. All additional attributes were string attributes filled with 10 characters. The numbers in quotation marks show the overhead per attribute. For the replication of one user with five additional attributes, for example, this is (13,451 bytes – 13,019 bytes )/5 = 86 bytes.

Replication of additional attributes.

#Users	Mandatory attributes	Plus 1 attribute	Plus 3 attributes	Plus 5 attributes
1	13,019	13,233 "214"	13,439 "80"	13,451 "86"
10	47,037	47,917 "88"	49,923 "96"	51,765 "96"
100	386,148	396,308 "102"	416,107 "99"	435,496 "96"

Replication of additional attributes. (continued)

#Users	Mandatory attributes	Plus 1 attribute	Plus 3 attributes	Plus 5 attributes
500	1,914,087	1,966,967 "106"	2,064,423 "102"	2,160,454 '94"
1,000	3,818,256	3,919,177 "101"	4,123,535 "103"	4,328,794 "107"
5,000	19,123,820	19,619,815 "99"	20,628,381 "102"	21,611,973 "98"

Again, the traffic is very predictable. Adding one attribute to the object adds around 100 bytes of traffic. This makes it easy to compute additional traffic caused by replicating objects with a specific attribute set.

The next test concentrates on group replication. Group size depends on the number of members. This test shows how much overhead is created for a single group member (results are in bytes transferred). The numbers in quotation marks show the overhead per group member. To replicate 1 group with 100 members, this is: (29,212 bytes – 11,309 bytes) / 100 = 179 bytes.

Group replication test results.

#Groups	No members	10 members	20 members	100 members
1	11,309	13,023 "171"	15,028 "186"	29,212 "179"
10	26,902	45,180 "183"	36,199 "191"	206,193 "179
100	187,754	370,333 "183"	549,351 "181"	2,007,563 "182"
500	905,015	1,822,257 "183"	2,745,787 "184"	9,956,677 "181"
1,000	1,815,170	3,633,795 "182"	5,458,848 "182"	19,920,866 "181"

The results again show a very predictable pattern. The overhead per group member is around 180 bytes.

The next test shows one of the most common operations: password changes. In this scenario, the PDC emulator was used for the password change, and only the replication traffic between this domain controller and a replication partner was captured. The numbers in the second column represent the bytes per operation. The third column shows the bytes per changed password.

Password change traffic.

#Users	Bytes	Bytes/user
1	10,805	1,842
10	12,811	385
100	59,856	509
500	275,422	533
1,000	444,085	425
5,000	3,014,610	601

Again the results are predictable: about 500 - 600 bytes/user. The test stops at 5,000 password changes, since more than that many per day would be rare; if users changed their passwords every 14 days, this would be the equivalent of 90,000 users in the same domain and in the same site. (Inter-site replication works differently, and is covered later.) Even if a company uses Windows 2000 or Windows NT workstations exclusively, with 45,000 users working on 45,000 workstations, the resulting 3 MB each day for password changes would be rare. And remember: intra-site traffic assumes a good network connection.

To summarize, intra-site domain replication is very predictable. Replicating more objects or attributes just increases network traffic following the same ratio. Intra-site replication assumes good network connection, so domain controllers don't waste expensive CPU cycles on compression.

Intra-Site GC Replication

Intra-site global catalog replication involves two domain controllers (one of which is a global catalog) that belong to different domains. The global catalog holds a partial replica of the domain-naming context of the other domain, and because only this subset of attributes has to be replicated, replication between these two domain controllers is also called partial replication.

Figure 11.2: Intra-site global catalog replication.

Figure 11.2: Intra-site global catalog replication.

This scenario examines the traffic between Red-Sales1 (domain controller in Sales.Microsoft.com) and Red-MS1 (domain controller in Microsoft.com), and a global catalog server.

The first test examines the traffic generated for objects. Except for domain replication, group type plays a role in global catalog server replication. Universal groups publish the group memberships in the GCs, but global groups do not, so more replication traffic is expected for universal groups than for global groups.

The table shows the bytes per object. The numbers in quotation marks are the domain replication numbers for comparison.

Intra-site replication traffic.

#Objects	Users	Global groups	Universal groups	Volumes
1	12,401 "13,019"	11,601 "11,309"	11,437 "11,145"	11,101 "10,277"
10	35,595 "47,037"	26,783 "26,902"	26,862 "26,823"	23,011 "22,848"
100	272,877 "386,148"	183,123 "187,754"	183,205 "185,606"	145,199 "149,736"
500	1,323,177 "1,914,087"	879,823 "905,015"	879,990 "906,079"	690,042 "715,577"
1,000	2,640,974 "3,818,256"	1,750,665 "1,815,170"	1,751,239 "1,803,090"	1,370,457 "1,436,085"
5,000	13,189,354 "19,123,820"	8,735,103 "8,985,915"	8,745,150	6,860,815

The test shows that the replicated objects for GC replication are smaller than for domain replication. For user objects, the difference is not big. For smaller objects (non-security principals such as volume), the traffic is around 2/3 of what is replicated within a domain.

Universal and global groups are the same size because they were created with no members.

The next two tables compare global and universal groups.

Global groups.

#Groups	No members	10 members	20 members	100 members
1	11,601	11,519	11,437	11,437
10	26,783	26,783	26,783	26,783
100	183,123	183,369	183,369	183,041

The amount of data does not change if members are added to the group because Global Groups do not replicate members to the GC.

Universal groups.

#Groups	No members	10 members	20 members	100 members
1	11,437	13,491	17,149	31,731
10	26,862	46,816	67,256	245,908
100	183,205	388,725	593,623	2,207,641

Universal Group replication traffic depends heavily on the number of group members. To replicate 100 groups of 100 members each, the traffic is 12 times as big as for the comparable Global Group.

Intra-site GC replication is predictable. The objects replicated are smaller than within a domain—they can be as much as 1/3 smaller. However, this relates only to objects that were created with the minimum possible number of attributes set. If additional attributes are used, they won't be replicated to the global catalog, so the ratio goes down. However, if you change the schema so that attributes are added to partial replication, or if applications (such as messaging systems) add them, global catalog server replication traffic increases. Group type is also relevant for the traffic to the global catalog, because universal groups replicate their group members.

Inter-Site Domain Replication

The next scenario covers inter-site domain replication, which involves two domain controllers in different sites within the same domain. These serve as bridgehead servers, and are the only domain controllers of this particular domain that replicate over a WAN link. Once they receive updated information, they distribute it to the other domain controllers in their site.

This scenario examines replication between Red-Sales1 and Mil-Sales2. Both domain controllers are in the Sales.Microsoft.com domain, but are in different sites (Redmond and Milan).

Figure 11.3: Inter-site domain replication.

Figure 11.3: Inter-site domain replication.

Only object creation was tested. Changing the number of group members is not a factor because group members are always replicated between domain controllers of the same domain.

The table shows the traffic for object creation. The numbers in quotation marks are the intra-site results for the same operations for comparison:

Traffic generated by object creation.

#Objects	Users	Global groups	Universal groups	Volumes
1	14,108 "13,019"	10,437 "11,309"	11,227 "11,145"	9,667 "10,277"
10	45,563 "47,037"	25,683 "26,902"	26,741 "26,823"	21,691 "22,848"
100	39,583 "386,148"	28,743 "187,754"	29,675 "185,606"	22,602 "149,736"
500	173,105 "1,914,087"	102,404 "905,015"	119,180 "906,079"	81,691 "715,577"
1,000	291,041 "3,818,256"	194,926 "1,815,170"	199,054 "1,803,090"	151,989 "1,436,085"

The results demonstrate how compression works. Up to a certain size, data is not compressed. In fact, for small replications (such as replicating one user) there is slightly more traffic than in the intra-site case. This is caused by the need to source more naming contexts (schema and configuration). Once the size of the data that has to be replicated exceeds 50,000 bytes, compression kicks in and reduces the amount of data considerably. A good example is the replication of 10 users and 100 users. Replicating 100 users causes much less network traffic (per user) than replicating 10 users. This is clearly the result of compression.

Inter-Site Global Catalog Replication—RPC–over–IP

The next scenario is inter-site GC replication, which involves two domain controllers (one a GC) that belong to different domains and are replicating schema, configuration, and the partial domain naming context. This type of replication can use RPC-over-IP or the SMTP transport. The first test uses RPCs.

Figure 11.4: Inter-site global catalog replication.

Figure 11.4: Inter-site global catalog replication.

In the example, Mil-Sales2 (a GC in Sales.Microsoft.com) replicates partial Microsoft.com information from Red-MS1. Note that Red-MS1, which is also a GC, would not use Mil-Sales2 as a source for the Sales.Microsoft.com naming context, because Red-MS1 can find a closer domain controller in Sales.Microsoft.com (in this case, Red-Sales1 is in the same site).

For this replication set, two factors are of interest: how does partial inter-site replication compare to partial intra-site replication, and how does group membership affect the overall picture?

The first table gives an overview of partial replication of objects. The numbers in quotation marks are the figures for intra-site GC replication, for comparison.

Partial inter-site replication.

#Objects	Users	Global groups	Universal groups	Volumes
1	12,565 "12,401"	11,471 "11,601"	11,389 "11,437"	11,183 "10,277"
10	36,018 "35,595"	26,895 "26,783"	26,813 "26,682"	23,171 "23,011"
100	32,391 "272,877"	28,600 "183,123"	28,379 "183,205"	24,598 "145,199"
500	121,481 "1,323,177"	101,858 "879,823"	102,200 "877,990"	83,099 "690,042"
1,000	233,503 "2,640,974"	194,047 "1,750,665"	194,357 "1,751,239"	170,918 "1,370,457"

The table shows interesting differences between inter- and intra-site replication; again, replicating 100 users causes much less network traffic (per user) than replicating 10 users. In fact, inter-site replication of 1,000 users generates less traffic than intra-site replication of 100 users.

The next two tables show group memberships again, beginning with global groups. The numbers are much smaller again, compared to intra-site replication. The number of group members does not change the picture.

Global groups.

#Groups	No members	10 members	20 members	100 members
1	10,437 "11,601"	11,951 "11,519"	11,389 "11,437"	11,553 "11,437"
10	25,683 "26,783"	26,141 "26,783"	26,223 "26,783"	26,223 "26,783"
100	28,743 "183,123"	28,227 "183,369"	28,319 "183,369"	28,781 "183,041"

Universal groups.

#Groups	No members	10 members	20 members	100 members
1	11,227 "11,437"	17,010 "13,491"	15,786 "17,149"	32,270 "31,731"
10	26,741 "26,862"	46,898 "46,816"	12,999 "67,256"	17,902 "245,908"
100	29,675 "183,205"	39,120 "388,725"	46,912 "593,623"	107,551 "2,207,641"

Again, the numbers are much lower than for intra-site replication. However, this time traffic increases with the number of group members.

Inter-Site global catalog replication—SMTP Transport

This uses the same scenario, replicating between one global catalog and a domain controller that belongs to a different domain; the machines reside in different sites. This time, the SMTP transport is used.

The table shows the replication traffic. The RPC numbers are added in quotation marks.

Inter-site global catalog replication.

Object replication in bytes			Inter-site RPC replication
No. Objects	Users	Global Groups	Universal Groups	Volumes
1	22,253 "12,565"	20,714 "11,471"	20,855 "11,389"	20,078 "11,183"
10	52,887 "36,018"	40,499 "26,895"	40,417 "26,813"	35,409 "23,171"
100	59,675 "32,391"	55,537 "28,600"	55,587 "28,379"	41,375 "24,598"
500	224,804 "121,481"	203,787 "101,858"	203,629 "102,200"	182,324 "83,099"
1,000	440,165 "233,503"	281,916 "194,047"	394,434 "194,357"	349,389 "170,918"

Again, compression helps to minimize the traffic. The threshold at which compression kicks in is higher when compared to the RPC transport; this threshold is around 65,000 bytes. Also, the SMTP transport creates more traffic overall than the RPC transport—about 80% to 100% more.

Summary of Network Traffic Analysis

To summarize, here are some replication recommendations:

Intra-site replication assumes good network connectivity so domain controllers can save CPU cycles (for client logons, search operations etc.) by not compressing data for intra-site replication.
Replication traffic is predictable. Use the tables in this chapter to find the data for your objects. If you set additional attributes on objects, add 100 bytes per attribute with a string size up to 10 characters.
Partial replication (global catalog replication) is smaller than normal replication. The difference is bigger when more attributes are used on objects.
Inter-site replication adds compression. If there is a slow link between domain controllers, create a new site.
Inter-site replication is scheduled. This reduces communication between domain controllers.
The SMTP transport creates more network traffic than the RPC Site connector. Use RPCs between sites whenever possible.

How can you choose between the two inter-site transports?

If good network connectivity is available and fast client logon is desired—use one site.
If reduced network traffic is desired, but the connection between domain controllers is fairly reliable—use multiple sites and the RPC-over-IP replication connector.
If the network connection is unreliable, or domain controllers have no direct network connection (connected only through a messaging system)—use the SMTP replication connector. Remember that it can be used only for schema, configuration, and partial replication, not for replication between two domain controllers in the same domain.