Recovering Your Active Directory Forest

Applies To: Windows Server 2000, Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2008, Windows Server 2008 R2, Windows Server 2012

This section provides an overview of the recommended path for recovering a forest. The forest recovery steps are described in detail later.

The following list summarizes the recovery steps at a high level:

  1. Identify the problem

    Work with IT and Microsoft Support to determine the scope of the problem and potential causes, and evaluate possible remedies with all business stakeholders. In many cases total forest recovery should be the last option.

  2. Decide how to recover the forest

    After you determine that forest recovery is necessary, complete preliminary steps to prepare for it: determine the current forest structure, identify the functions that each DC performs, decide which DC to restore for each domain, and ensure that all writeable DCs are taken offline.

  3. Perform initial recovery

    In isolation, recover one DC for each domain, clean them, and reconnect the domains. Reset privileged accounts, and rectify problems caused by security breaches in this phase.

  4. Redeploy remaining DCs

    Redeploy the forest to return it to its state before the failure. This step will need to be adapted to your specific design and requirements. Virtualized domain controller cloning can help expedite this process.

  5. Cleanup

    After functionality has been restored, reconfigure name resolution as needed, and get LOB applications working.

The following flowchart shows the recovery process.

04d05919-736f-46ef-89df-632332f42850

The steps in this guide are designed to minimize the possibility of reintroducing dangerous data into the recovered forest. You might have to modify these steps to account for such factors as:

  • Scalability

  • Remote manageability

  • Speed of recovery

However, modifications to these forest recovery steps can increase the risk of reintroducing dangerous data. For more information about possible modifications to these forest recovery steps, see What can I do to speed up recovery?

Identify the problem

When symptoms of a forest-wide failure appear, such as in event logs or other monitoring solutions, work with Microsoft Support to determine the cause of the failure, and evaluate any possible remedies.

Examples of forest-wide failures include the following:

  • All DCs have been logically corrupted or physically damaged to a point that business continuity is impossible; for example, all business applications that depend on AD DS are nonfunctional.

  • A rogue administrator has compromised the Active Directory environment.

  • An attacker intentionally—or an administrator accidentally—runs a script that spreads data corruption across the forest.

  • An attacker intentionally—or an administrator accidentally—extends the Active Directory schema with malicious or conflicting changes.

  • An attacker has managed to install malicious software on DCs, and you have been advised by Microsoft Support to recover the forest from backup.

    Important

    This paper does not cover security recommendations about how to recover a forest that has been hacked or compromised. In general, it is recommended to follow Pass-the-Hash mitigation techniques to harden the environment. For more information, see Mitigating Pass-the-Hash (PtH) Attacks and Other Credential Theft Techniques.

  • None of the DCs can replicate with their replication partners.

  • Changes cannot be made to AD DS at any domain controller.

  • New DCs cannot be installed in any domain.

Decide how to recover the forest

Recovering an entire Active Directory forest involves either restoring it from backup or reinstalling Active Directory Domain Services (AD DS) on every domain controller (DC) in the forest. Recovering the forest restores each domain in the forest to its state at the time of the last trusted backup. Consequently, the restore operation will result in the loss of at least the following Active Directory data:

  • All objects (such as users and computers) that were added after the last trusted backup

  • All updates that were made to existing objects since the last trusted backup

  • All changes that were made to either the configuration partition or the schema partition in AD DS (such as schema changes) since the last trusted backup

For each domain in the forest, the password of a Domain Admin account must be known. Preferably, this is the password of the built-in Administrator account, which must not be disabled. You must also know the DSRM password to perform a system state restore of a DC. In general, it is a good practice to archive the Administrator account and DSRM password history in a safe place for as long as the backups are valid, that is, within the tombstone lifetime period or within the deleted object lifetime period if Active Directory Recycle Bin is enabled. You can also synchronize the DSRM password with a domain user account in order to make it easier to remember. For more information, see KB article 961320. Synchronizing the DSRM account must be done in advance of the forest recovery, as part of preparation.

Note

The Administrator account is a member of the built-in Administrators group by default, as are the Domain Admins and Enterprise Admins groups. This group has full control of all DCs in the domain.

Determining which backups to use

Back up at least two writeable DCs for each domain regularly so you have several backups to choose from. Note that you cannot use the backup of a read-only domain controller (RODC) to restore a writeable DC. We recommend that you restore the DCs by using backups that were taken a few days before the occurrence of the failure. In general, you must determine a tradeoff between the recentness and the safeness of the restored data. Choosing a more recent backup recovers more useful data, but it might increase the risk of reintroducing dangerous data into the restored forest.

Restoring system state backups depends on the original operating system and server of the backup. For example, you should not restore a system state backup to a different server. In this case, you may see the following warning:

“The specified backup is of a different server than the current one. We do not recommend performing a system state recovery with the backup to an alternate server because the server might become unusable. Are you sure you want to use this backup for recovering the current server?”

If you need to restore Active Directory to different hardware, create full server backups and plan to perform a full server recovery.

Important

Beginning with Windows Server 2008, it is not supported to restore system state backup to a new installation of Windows Server on new hardware or the same hardware. If Windows Server is reinstalled on the same hardware, as recommended later in this guide, then you can restore the domain controller in this order:

  1. Perform a full server restore in order to restore the operating system and all files and applications.

  2. Perform a system state restore using wbadmin.exe in order to mark SYSVOL as authoritative.

For more information, see Microsoft KB article 249694.

If the time of the occurrence of the failure is unknown, investigate further to identify backups that hold the last safe state of the forest. This approach is less desirable. Therefore, we strongly recommend that you keep detailed logs about the health state of AD DS on a daily basis so that, if there is a forest-wide failure, the approximate time of failure can be identified. You should also keep a local copy of backups to enable faster recovery.

If Active Directory Recycle Bin is enabled, the backup lifetime is equal to the deletedObjectLifetime value or the tombstoneLifetime value, whichever is less. For more information, see Active Directory Recycle Bin Step-by-Step Guide (https://go.microsoft.com/fwlink/?LinkId=178657).

As an alternative, you can also use the Active Directory database mounting tool (Dsamain.exe) and a Lightweight Directory Access Protocol (LDAP) tool, such as Ldp.exe or Active Directory Users and Computers, to identify which backup has the last safe state of the forest. The Active Directory database mounting tool, which is included in Windows Server 2008 and later Windows Server operating systems, exposes Active Directory data that is stored in backups or snapshots as an LDAP server. Then, you can use an LDAP tool to browse the data. This approach has the advantage of not requiring you to restart any DC in Directory Services Restore Mode (DSRM) to examine the contents of the backup of AD DS.

For more information about using the Active Directory database mounting tool, see the Active Directory Database Mounting Tool Step-by-Step Guide.

You can also use the ntdsutil snapshot command to create snapshots of the Active Directory database. By scheduling a task to periodically create snapshots, you can obtain additional copies of the Active Directory database over time. You can use these copies to better identify when the forest-wide failure occurred and then choose the best backup to restore. To create snapshots, use the version of ntdsutil that ships with Windows Server 2008 or the Remote Server Administration Tools (RSAT) for Windows Vista or later. The target DC can run any version of Windows Server. For more information about using the ntdsutil snapshot command, see Snapshot.

Determining which domain controllers to restore

Ease of the restore process is an important factor when deciding which domain controller to restore. It is recommended to have a dedicated DC for each domain that is the preferred DC for a restore. A dedicated restore DC makes it easier to reliably plan and execute the forest recovery because you use the same source configuration that was used to perform restore tests. You can script the recovery, and not contend with different configurations, such as whether the DC holds operations master roles or not, or whether it is a GC or DNS server or not.

Note

While it is not recommended to restore an operations master role holder in the interest of simplicity, some organizations may choose to restore one for other advantages. For example restoring the RID master may help prevent problems with managing RIDs during the recovery.

Choose a DC that best meets the following criteria:

  • A DC that is writeable. This is mandatory.

  • A DC running Windows Server 2012 as a virtual machine on a hypervisor that supports VM-GenerationID. This DC can be used as a source for cloning.

  • A DC that is accessible, either physically or on a virtual network, and preferably located in a datacenter. This way, you can easily isolate it from the network during forest recovery.

  • A DC that has a good full server backup. A good backup is a backup that can be restored successfully, was taken a few days before the failure, and contains as much useful data as possible.

  • A DC that was a Domain Name System (DNS) server before the failure. This saves the time required to reinstall DNS.

  • If you also use Windows Deployment Services, choose a DC that is not configured to use BitLocker Network Unlock. In this case, BitLocker Network Unlock is not supported to be used for the first DC that you restore from backup during a forest recovery.

    BitLocker Network Unlock as the only key protector cannot be used on DCs where you have deployed Windows Deployment Services (WDS) because doing so results in a scenario where the first DC requires Active Directory and WDS to be working in order to unlock. But before you restore the first DC, Active Directory is not yet available for WDS, so it cannot unlock.

    To determine if a DC is configured to use BitLocker Network Unlock, check that a Network Unlock certificate is identified in the following registry key:

    HKEY_LOCAL_MACHINE\Software\Policies\Microsoft\SystemCertificates\FVE_NKP

Maintain security procedures when handling or restoring backup files that include Active Directory. The urgency that accompanies forest recovery can unintentionally lead to overlooking security best practices. For more information, see the section titled “Establishing Domain Controller Backup and Restore Strategies” in Best Practice Guide for Securing Active Directory Installations and Day-to-Day Operations: Part II.

Identify the current forest structure and DC functions

Determine the current forest structure by identifying all the domains in the forest. Make a list of all of the DCs in each domain, particularly the DCs that have backups, and virtualized DCs which can be a source for cloning. A list of DCs for the forest root domain will be the most important because you will recover this domain first. After you restore the forest root domain, you can obtain a list of the other domains, DCs, and the sites in the forest by using Active Directory snap-ins.

Prepare a table that shows the functions of each DC in the domain, as shown in the following example. This will help you revert back to the pre-failure configuration of the forest after recovery.

DC name Operating system FSMO GC RODC Backup DNS Server Core VM VM-GenID

DC_1

Windows Server 2012

Schema master, Domain naming master

Yes

No

Yes

No

No

Yes

Yes

DC_2

Windows Server 2012

None

Yes

No

Yes

Yes

No

Yes

Yes

DC_3

Windows Server 2012

Infrastructure Master

No

No

No

Yes

Yes

Yes

Yes

DC_4

Windows Server 2012

PDC emulator, RID Master

Yes

No

No

No

No

Yes

No

DC_5

Windows Server 2012

None

No

No

Yes

Yes

No

Yes

Yes

RODC_1

Windows Server 2008 R2

None

Yes

Yes

Yes

Yes

Yes

Yes

No

RODC_2

Windows Server 2008

None

Yes

Yes

No

Yes

Yes

Yes

No

For each domain in the forest, identify a single writeable DC that has a trusted backup of the Active Directory database for that domain. Use caution when you choose a backup to restore a DC. If the day and cause of the failure are approximately known, the general recommendation is to use a backup that was made a few days before that date.

In this example, there are four backup candidates: DC_1, DC_2, DC_4, and DC_5. Of these backup candidates, you restore only one. The recommended DC is DC_5 for the following reasons:

  • It satisfies requirements for using it as a source for virtualized DC cloning, that is, it runs Windows Server 2012 as a virtual DC on a hypervisor that supports VM-GenerationID, runs software that is allowed to be cloned (or that can be removed if it is not able to be cloned). After the restore, the PDC emulator role will be seized to that server and it can be added to the Cloneable Domain Controllers group for the domain.

  • It runs a full installation of Windows Server 2012. A DC that runs a Server Core installation can be less convenient as a target for recovery.

  • It is a DNS server. Therefore, DNS does not have to be reinstalled.

Note

Because DC_5 is not a global catalog server, it also has an advantage in that the global catalog does not need to be removed after the restore. But whether or not the DC is also a global catalog server is not a decisive factor because beginning with Windows Server 2012, all DCs are global catalog servers by default, and removing and adding the global catalog after the restore is recommended as part of the forest recovery process in any case.

Recover the forest in isolation

The preferred scenario is to shut down all writeable DCs before the first restored DC is brought back into production. This ensures that any dangerous data does not replicate back into the recovered forest. It is particularly important to shut down all operations master role holders.

Note

There may be cases where you move the first DC that you plan to recover for each domain to an isolated network while allowing other DCs to remain online in order to minimize system downtime. For example, if you are recovering from a failed schema upgrade, you may choose to keep domain controllers running on the production network while you perform recovery steps in isolation.

If you are running virtualized DCs, you can move them to a virtual network that is isolated from the production network where you will perform recovery. Moving virtualized DCs to a separate network provides two benefits:

  • Recovered DCs are prevented from reoccurrence of the problem that caused the forest recovery because they are isolated.

  • Virtualized DC cloning can be performed on the separate network so that a critical number of DCs can be running and tested before they are brought back to the production network.

If you are running DCs on physical hardware, disconnect the network cable of the first DC that you plan to restore in the forest root domain. If possible, also disconnect the network cables of all other DCs. This prevents DCs from replicating, if they are accidentally started during the forest recovery process.

In a large forest that is spread across multiple locations, it can be difficult to guarantee that all writeable DCs are shut down. For this reason, the recovery steps—such as resetting the computer account and krbtgt account, in addition to metadata cleanup—are designed to ensure that the recovered writeable DCs do not replicate with dangerous writeable DCs (in case some are still online in the forest).

However, only by taking writeable DCs offline can you guarantee that replication does not occur. Therefore, whenever possible, you should deploy remote management technology that can help you to shut down and physically isolate the writeable DCs during forest recovery.

RODCs can continue to operate while writeable DCs are offline. No other DC will directly replicate any changes from any RODC—especially, no Schema or Configuration container changes—so they do not pose the same risk as writeable DCs during recovery. After all the writeable DCs are recovered and online, you should rebuild all the RODCs.

RODCs will continue to allow access to local resources that are cached in their respective sites while the recovery operations are going on in parallel. Local resources that are not cached on the RODC will have authentication requests forwarded to a writeable DC. These requests will fail because writeable DCs are offline. Some operations such as password changes will also not work until you recover writeable DCs.

If you are using a hub-and-spoke network architecture, you can concentrate first on recovering the writeable DCs in the hub sites. Later, you can rebuild the RODCs in remote sites.

Perform initial recovery

This section includes the following steps:

  • Restore the first writeable domain controller in each domain

  • Reconnect each restored writeable domain controller to a common network

  • Add the global catalog to a domain controller in the forest root domain

Restore the first writeable domain controller in each domain

Beginning with a writeable DC in the forest root domain, complete the steps in this section in order to restore the first DC. The forest root domain is important because it stores the Schema Admins and Enterprise Admins groups. It also helps maintain the trust hierarchy in the forest. In addition, the forest root domain usually holds the DNS root server for the forest’s DNS namespace. Consequently, the Active Directory–integrated DNS zone for that domain contains the alias (CNAME) resource records for all other DCs in the forest (which are required for replication) and the global catalog DNS resource records.

After you recover the forest root domain, repeat the same steps to recover the remaining domains in the forest. You can recover more than one domain simultaneously; however, always recover a parent domain before recovering a child to prevent any break in the trust hierarchy or DNS name resolution.

For each domain that you recover, restore only one writeable DC from backup. This is the most important part of the recovery because the DC must have a database that has not been influenced by whatever caused the forest to fail. It is important to have a trusted backup that is thoroughly tested before it is introduced into the production environment.

Then perform the following steps. Procedures for performing certain steps are in Appendix A: Forest Recovery Procedures.

  1. If you plan to restore a physical server, ensure that the network cable of the target DC is not attached and therefore is not connected to the production network. For a virtual machine, you can remove the network adapter or use a network adapter that is attached to another network where you can test the recovery process while isolated from the production network.

  2. Because this is the first writeable DC in the domain, you must perform a nonauthoritative restore of AD DS and an authoritative restore of SYSVOL. The restore operation must be completed by using an Active Directory-aware backup and restore application, such as Windows Server Backup (that is, you should not restore the DC by using unsupported methods such as restoring a VM snapshot).

    An authoritative restore of SYSVOL is required because replication of the SYSVOL replicated folder must be started after you recover from a disaster. All subsequent DCs that are added in the domain must resynchronize their SYSVOL folder with a copy of the folder that has been selected to be authoritative before the folder can be advertised.

    Warning

    Perform an authoritative (or primary) restore operation of SYSVOL only for the first DC to be restored in the forest root domain. Incorrectly performing primary restore operations of the SYSVOL on other DCs leads to replication conflicts of SYSVOL data.

    There are two options perform a nonauthoritative restore of AD DS and an authoritative restore of SYSVOL:

  3. After you restore and restart the writeable DC, verify that the failure did not affect the data on the DC. If the DC data is damaged, then repeat step 2 with a different backup.

    If the restored domain controller hosts an operations master role, you may need to add the following registry entry to avoid AD DS being unavailable until it has completed replication of a writeable directory partition:

    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Repl Perform Initial Synchronizations

    Create the entry with the data type REG_DWORD and a value of 0. After the forest is recovered completely, you can reset the value of this entry to 1, which requires a domain controller that restarts and holds operations master roles to have successful AD DS inbound and outbound replication with its known replica partners before it advertises itself as domain controller and starts providing services to clients. For more information about initial synchronization requirements, see KB article 305476.

    Continue to the next steps only after you restore and verify the data and before you join this computer to the production network.

  4. If you suspect that the forest-wide failure was related to network intrusion or malicious attack, reset the account passwords for all administrative accounts, including members of the Enterprise Admins, Domain Admins, Schema Admins, Server Operators, Account Operators groups, and so on. The reset of administrative account passwords should be completed before additional domain controllers are installed during the next phase of the forest recovery.

  5. On the first restored DC in the forest root domain, seize all domain-wide and forest-wide operations master roles. Enterprise Admins and Schema Admins credentials are needed to seize forest-wide operations master roles.

    In each child domain, seize domain-wide operations master roles. Although you might retain the operations master roles on the restored DC only temporarily, seizing these roles assures you regarding which DC hosts them at this point in the forest recovery process. As part of your post-recovery process, you can redistribute the operations master roles as needed. For more information about seizing operations master roles, see Seizing an operations master role. For recommendations about where to place operations master roles, see What Are Operations Masters?.

  6. Clean up metadata of all other writeable DCs in the forest root domain that you are not restoring from backup (all writeable DCs in the domain except for this first DC). If you use the version of Active Directory Users and Computers or Active Directory Sites and Services that is included with Windows Server 2008 or later or RSAT for Windows Vista or later, metadata cleanup is performed automatically when you delete a DC object. In addition, the server object and computer object for the deleted DC are also deleted automatically. For more information, see Cleaning metadata of removed writable DCs.

    Cleaning up metadata prevents possible duplication of NTDS-settings objects if AD DS is installed on a DC in a different site. Potentially, this could also save the Knowledge Consistency Checker (KCC) the process of creating replication links when the DCs themselves might not be present. Moreover, as part of metadata cleanup, DC Locator DNS resource records for all other DCs in the domain will be deleted from DNS.

    Until the metadata of all other DCs in the domain is removed, this DC, if it were a RID master before recovery, will not assume the RID master role and therefore will not be able to issue new RIDs. You might see event ID 16650 in the System log in Event Viewer indicating this failure, but you should see event ID 16648 indicating success a little while after you have cleaned the metadata.

  7. If you have DNS zones that are stored in AD DS, ensure that the local DNS Server service is installed and running on the DC that you have restored. If this DC was not a DNS server before the forest failure, you must install and configure the DNS server.

    Note

    If the restored DC runs Windows Server 2008, you need to install the hotfix in KB article 975654 or connect the server to an isolated network temporarily in order to install DNS server. The hotfix is not required for any other versions of Windows Server.

    In the forest root domain, configure the restored DC with its own IP address (or a loopback address, such as 127.0.0.1) as its preferred DNS server. You can configure this setting in the TCP/IP properties of the local area network (LAN) adapter. This is the first DNS server in the forest. For more information, see Configure TCP/IP to use DNS.

    In each child domain, configure the restored DC with the IP address of the first DNS server in the forest root domain as its preferred DNS server. You can configure this setting in the TCP/IP properties of the LAN adapter. For more information, see Configure TCP/IP to use DNS.

    In the _msdcs and domain DNS zones, delete NS records of DCs that no longer exist after metadata cleanup. Check if the SRV records of the cleaned up DCs have been removed. To help speed up DNS SRV record removal, run:

    nltest.exe /dsderegdns:server.domain.tld
    
  8. Raise the value of the available RID pool by 100,000. For more information, see Raising the value of available RID pools. If you have reason to believe that raising the RID Pool by 100,000 is insufficient for your particular situation, you should determine the lowest increase that is still safe to use. RIDs are a finite resource that should not be used up needlessly.

    If new security principals were created in the domain after the time of the backup that you use for the restore, these security principals might have access rights on certain objects. These security principals no longer exist after recovery because the recovery has reverted to the backup; however, their access rights might still exist. If the available RID pool is not raised after a restore, new user objects that are created after the forest recovery might obtain identical security IDs (SIDs) and could have access to those objects, which was not originally intended.

    To illustrate, consider the example of the new employee named Amy that was mentioned in the introduction. The user object for Amy no longer exists after the restore operation because it was created after the backup that was used to restore the domain. However, any access rights that were assigned to that user object might persist after the restore operation. If the SID for that user object is reassigned to a new object after the restore operation, the new object would obtain those access rights.

  9. Invalidate the current RID pool. The current RID pool is invalidated after a system state restore. But if a system state restore was not performed, the current RID pool needs to be invalidated to prevent the restored DC from re-issuing RIDs from the RID pool that was assigned at the time the backup was created. For more information, see Invalidating the current RID pool.

    Note

    The first time that you attempt to create an object with a SID after you invalidate the RID pool you will receive an error. The attempt to create an object triggers a request for a new RID pool. Retry of the operation succeeds because the new RID pool will be allocated.

  10. Reset the computer account password of this DC twice. For more information, see Resetting the computer account password of the domain controller.

  11. Reset the krbtgt password twice. For more information, see Resetting the krbtgt password.

    Because the krbtgt password history is two passwords, reset passwords twice to remove the original (prefailure) password from password history.

    Note

    If the forest recovery is in response to a security breach, you may also reset the trust passwords. For more information, see Resetting a trust password on one side of the trust.

  12. If the forest has multiple domains and the restored DC was a global catalog server before the failure, clear the Global catalog check box in the NTDS Settings properties to remove the global catalog from the DC. The exception to this rule is the common case of a forest with just one domain. In this case, it is not required to remove the global catalog. For more information, see Removing the global catalog.

    By restoring a global catalog from a backup that is more recent than other backups that are used to restore DCs in other domains, you might introduce lingering objects. Consider the following example. In domain A, DC1 is restored from a backup that was taken at time T1. In domain B, DC2 is restored from a global catalog backup that was taken at time T2. Suppose T2 is more recent than T1, and some objects were created between T1 and T2. After these DCs are restored, DC2, which is a global catalog, holds newer data for domain A's partial replica than domain A holds itself. DC2, in this case, holds lingering objects because these objects are not present on DC1.

    The presence of lingering objects can lead to problems. For instance, e-mail messages might not be delivered to a user whose user object was moved between domains. After you bring the outdated DC or global catalog server back online, both instances of the user object appear in the global catalog. Both objects have the same e-mail address; therefore, e-mail messages cannot be delivered.

    A second problem is that a user account that no longer exists might still appear in the global address list. A third problem is that a universal group that no longer exists might still appear in a user's access token.

    If you did restore a DC that was a global catalog—either inadvertently or because that was the solitary backup that you trusted—we recommend that you prevent the occurrence of lingering objects by disabling the global catalog soon after the restore operation is complete. Disabling the global catalog flag will result in the computer losing all its partial replicas (partitions) and relegating itself to regular DC status.

  13. Configure Windows Time Service. In the forest root domain, configure the PDC emulator to synchronize time from an external time source. For more information, see Configure the Windows Time service on the PDC emulator in the Forest Root Domain.

Reconnect each restored writeable domain controller to a common network

At this stage you should have one DC restored (and recovery steps performed) in the forest root domain and in each of the remaining domains. Join these DCs to a common network that is isolated from the rest of the environment and complete the following steps in order to validate forest health and replication.

Note

When you join the physical DCs to an isolated network, you may need to change their IP addresses. As a result, the IP addresses of DNS records will be wrong. Because a global catalog server is not available, secure dynamic updates for DNS will fail. Virtual DCs are more advantageous in this case because they can be joined to a new virtual network without changing their IP addresses. This is one reason why virtual DCs are recommended as the first domain controllers to be restored during forest recovery.

After validation, Join the DCs to the production network and complete the steps to verify forest replication health.

  • To fix name resolution, create DNS delegation records and configure DNS forwarding and root hints as needed. Run repadmin /replsum to check replication between DCs.

  • If the restored DC’s are not direct replication partners, replication recovery will be much faster by creating temporary connection objects between them.

  • To validate metadata cleanup, run Repadmin /viewlist * for a list of all DCs in the forest. Run Nltest /DCList:<domain> for a list of all DCs in the domain.

  • To check DC and DNS health, run DCDiag /v to report errors on all DCs in the forest.

Add the global catalog to a domain controller in the forest root domain

A global catalog is required for these and other reasons:

  • To enable logons for users.

  • To enable the Net Logon service running on the DCs in each child domain to register and remove records on the DNS server in the root domain.

Although it is preferred that the forest root DC become a global catalog, it is possible to elect any of the restored DCs to become a global catalog.

Note

A DC will not be advertised as a global catalog server until it has completed a full synchronization of all directory partitions in the forest. Therefore, the DC should be forced to replicate with each of the restored DCs in the forest. Monitor the Directory Service event log in Event Viewer for event ID 1119, which indicates that this DC is a global catalog server, or verify the following registry key has a value of 1: HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Global Catalog Promotion Complete

For more information, see Adding the global catalog.

At this stage you should have a stable forest, with one DC for each domain and one global catalog in the forest. You should make a new backup of each of the DCs that you have just restored. You can now begin to redeploy other DCs in the forest by installing AD DS.

Redeploy remaining DCs

The steps up to this point apply to all forests: find a valid backup for each domain, recover the domains in isolation, reconnect them, reset the global catalog, and clean up. In this next step you will redeploy the forest. The way to do this will greatly depend on your forest design, your service level agreements, site structure, available bandwidth, and numerous other factors. You will need to design your own redeployment plan based on the principles and suggestions in this section, in a way that is best suited to your business requirements.

The next step is to install AD DS on all DCs that were present before the forest recovery took place. If the DCs still exist, the AD DS service will need to be removed forcibly, or the DCs can be reinstalled. Any existing backups for these DCs cannot be reused, because the corresponding metadata has been removed during forest recovery. In an uncomplicated environment this redeployment process can be as simple as reconnecting the recovered DCs to the production network, and promoting new DCs as needed.

In a large enterprise faced with a worldwide infrastructure, a more sophisticated plan is needed. The first phase is usually to restore the AD as a service; this means to install strategically placed DCs such that all critical business divisions and applications can start working again. It may be acceptable for branch offices to temporarily have reduced performance as a result of this. As a second phase, all remaining and less critical DCs are redeployed.

There are two methods to install additional DCs, both of which can be automated:

  • Cloning

    For virtualized environments that run Windows Server 2012, cloning is the fastest and simplest way to recover a large number of DCs. You can automate the recovery of all virtualized DCs in a domain after you restore a single virtualized DC from backup.

    For more information about cloning and prerequisites, see Introduction to Active Directory Domain Services (AD DS) Virtualization (Level 100).

  • Re-install AD DS by using Windows PowerShell on servers that run Windows Server 2012 (or Dcpromo.exe on servers that run earlier versions of Windows Server) or by using the user interface

    To expedite re-installing AD DS, you can use Install from Media (IFM) option to reduce replication traffic during the installation. For more information about using the ntdsutil ifm command to create installation media, see Installing AD DS from Media.

Consider the following additional points for each replica DC that is recovered in the forest by virtualized DC cloning or by installing AD DS (as opposed to restoring from backup):

  • All software on a DC that is used as the source for cloning must be able to be cloned. Applications and services that cannot be cloned should be removed before cloning is initiated. If that is not possible, an alternative virtualized DC should be chosen as the source.

  • If you clone additional virtualized DCs from the first virtualized DC to be restored, the source DC will need to be shut down while its VHDX file is copied. Then it will need to be running and available online when the clone virtual DCs are first started. If the downtime required by the shutdown is not acceptable for the first recovered DC, deploy an additional virtualized DC by installing AD DS to act as the source for cloning.

  • There is no restriction on the host name of the cloned virtualized DC or the server on which you want to install AD DS. You can use a new host name or the host name that was in use previously. For more information about DNS host name syntax, see Creating DNS Computer Names (https://go.microsoft.com/fwlink/?LinkId=74564).

  • Configure each server with the first DNS server in the forest (the first DC that was restored in the root domain) as the preferred DNS server in the TCP/IP properties of its network adapter. For more information, see Configure TCP/IP to use DNS.

  • Redeploy all RODCs in the domain, either by virtualized DC cloning if several RODCs are deployed in a central location, or by the traditional method of rebuilding them by removing and reinstalling AD DS if they are deployed individually in isolated located locations such as branch offices.

    Rebuilding RODCs ensures that they do not contain any lingering objects and can help prevent replication conflicts from occurring later. When you remove AD DS from an RODC, choose the option to retain DC metadata. Using this option retains the krbtgt account for the RODC and retains the permissions for the delegated RODC administrator account and the Password Replication Policy (PRP), and prevents you from having to use Domain Admin credentials to remove and reinstall AD DS on an RODC. It also retains the DNS server and global catalog roles if they are installed on the RODC originally.

    When you rebuild DCs (RODCs or writeable DCs), there may be increased replication traffic during their reinstallation. To help reduce that impact, you can stagger the schedule of the RODC installations, and you can use the Install From Media (IFM) option. If you use the IFM option, run the ntdsutil ifm command on a writeable DC that you trust to be free of damaged data. This helps prevent possible corruption from appearing on the RODC after the AD DS reinstallation is complete. For more information about IFM, see Installing AD DS from Media.

    For more information about rebuilding RODCs, see RODC Removal and Reinstallation.

  • If a DC was running the DNS Server service before the forest malfunction, install and configure the DNS Server service during the installation of AD DS. Otherwise, configure its former DNS clients with other DNS servers.

  • If you require additional global catalogs to share authentication or query load for users or applications, you can either add the global catalog to the source virtualized DC before cloning or you can make a DC a global catalog server during the installation of AD DS.

Cleanup

Perform the following post recovery steps as needed:

  • After the entire forest is recovered, you can revert to the original DNS configuration, including configuration of the preferred and alternate DNS servers on each of the DCs. After the DNS servers are configured as they were before the malfunction, their previous name resolution capabilities will be restored. Delete any DNS records for DCs that have not been recovered.

  • Delete Windows Internet Name Service (WINS) records for all DCs that have not been recovered.

  • You can transfer the operations master roles to other DCs in the domain or forest and add more global catalog servers based on the configuration before the failure.

  • Because the entire forest is restored to a previous state, any objects (such as users and computers) that were added and all updates (such as password changes) that were made to existing objects after this point are lost. Therefore, you should re-create these missing objects and reapply the missing updates as appropriate.

  • You might also need to restore outgoing trusts with external domains and forests, because these external trust relationships are not restored automatically from backups.