Upgrading Always On Availability Group Replica Instances
Updated: May 17, 2016
Applies To: SQL Server 2016
THIS TOPIC APPLIES TO: SQL Server (starting with 2016)Azure SQL DatabaseAzure SQL Data Warehouse Parallel Data Warehouse
When upgrading a SQL Server Always On Availability Group to a new SQL Server 2016 version, to a new SQL Serverservice pack or cumulative update, or when installing to a new Windows service pack or cumulative update, you can reduce downtime for the primary replica to only a single manual failover by performing a rolling upgrade (or two manual failovers if failing back to the original primary). During the upgrade process, a secondary replica will not be available for failover or for read-only operations, and, after the upgrade, it may take some time for the secondary replica to catch up with the primary replica node depending upon the volume of activity on the primary replica node (so expect high network traffic).
Before you begin, review the following important information:
Supported Version and Edition Upgrades: Verify that you can upgrade to SQL Server 2016 from your version of the Windows operating system and version of SQL Server. For example, you cannot upgrade directly from a SQL Server 2005 instance to SQL Server 2016.
Choose a Database Engine Upgrade Method: Select the appropriate upgrade method and steps based on your review of supported version and edition upgrades and also based on other components installed in your environment to upgrade components in the correct order.
Plan and Test the Database Engine Upgrade Plan: Review the release notes and known upgrade issues, the pre-upgrade checklist, and develop and test the upgrade plan.
Hardware and Software Requirements for Installing SQL Server 2016: Review the software requirements for installing SQL Server 2016. If additional software is required, install it on each node before you begin the upgrade process to minimize any downtime.
The following best practices should be observed when performing server upgrades or updates in order to minimize downtime and data loss for your availability groups:
Before starting the rolling upgrade,
Perform a practice manual failover on at least one of your synchronous-commit replica instances
Protect your data by performing a full database backup on every availability database
Run DBCC CHECKDB on every availability database
Always upgrade the remote secondary replica instances first, then local secondary replica instances next, and the primary replica instance last.
Backups cannot occur on a database that is in the process of being upgraded. Prior to upgrading the secondary replicas, configure the automated backup preference to run backups only on the primary replica. During a version upgrade, no replicas will be readable or available for backups. During a non-version upgrade, you can configure automated backups to run on secondary replicas prior to upgrading the primary replica.
During a version upgrade, readable secondaries cannot be read after an upgrade of the readable secondary and before either the primary replica is failed over to an upgraded secondary or the primary replica is upgraded.
To prevent the availability group from unintended failovers during the upgrade process, remove availability failover from all synchronous-commit replicas before you begin.
Do not upgrade the primary replica instance before failing over the availability group to an upgraded instance with a secondary replica first. Otherwise, client applications may suffer extended downtime during the upgrade on the primary replica instance.
Always fail over the availability group to a synchronous-commit secondary replica instance. If you fail over to an asynchronous-commit secondary replica instance, the databases will suffer data loss, and data movement is automatically suspended until you manually resume data movement.
Do not upgrade the primary replica instance before upgrading or updating any other secondary replica instance. An upgraded primary replica can no longer ship logs to any secondary replica whose SQL Server 2016 instance that has not yet been upgraded to the same version. When data movement to a secondary replica is suspended, no automatic failover can occur for that replica, and your availability databases are vulnerable to data loss.
Before failing over an availability group, verify that the synchronization state of the failover target is SYNCHRONIZED.
In practice, the exact process will depend on factors such as the deployment topology of your availability groups and the commit mode of each replica. But in the simplest scenario, a rolling upgrade is a multi-stage process that in its simplest form involves the following steps:
Remove automatic failover on all synchronous-commit replicas
Upgrade all remote secondary replica instances running asynchronous-commit secondary replicas
Upgrade the all local replica secondary instances that are not currently running the primary replica
Manually fail over the availability group to a local synchronous-commit secondary replica
Upgrade or update the local replica instance that formerly hosted the primary replica
Configure automatic failover partners as desired
If necessary, you can perform an extra manual failover to return the availability group to its original configuration.
If you have deployed an availability group only for disaster recovery, you may need to fail over the availability group to an asynchronous-commit secondary replica. Such configuration is illustrated by the following figure:
In this situation, you must fail over the availability group to the asynchronous-commit secondary replica during the rolling upgrade. To prevent data loss, change the commit mode to synchronous commit and wait for the secondary replica to be synchronized before you fail over the availability group. Therefore, the rolling upgrade process may look as follows:
Upgrade the secondary replica instance on the remote site
Change the commit mode to synchronous commit
Wait until synchronization state is SYNCHRONIZED
Fail over the availability group to the secondary replica on the remote site
Upgrade or update the local (primary site) replica instance
Fail over the availability group back to the primary site
Change the commit mode to asynchronous commit
Since the synchronous-commit mode is not a recommended setting for data synchronization to a remote site, client applications may notice an immediate increase in database latency after the setting change. Moreover, performing a failover will cause all unacknowledged log messages to be discarded. The amount of discarded log messages can be quite large due to the high network latency between the two sites, causing clients to experience a high volume of transactional failure. You can minimize impact to client applications by doing the following:
Carefully select a maintenance window during low client traffic
While upgrading or updating SQL Server 2016 on the primary site, change the availability mode back to asynchronous commit, then revert to synchronous commit when you are ready to fail over to the primary site again
If an availability group contains failover cluster instance (FCI) nodes, you should upgrade the inactive nodes before you upgrade the active nodes. The figure below illustrates a common availability group scenario with FCIs for local high availability and asynchronous commit between the FCIs for remote disaster recovery, and the upgrade sequence.
Upgrade or update REMOTE2
Fail over FCI2 to REMOTE2
Upgrade or update REMOTE1
Upgrade or update PRIMARY2
Fail over FCI1 to PRIMARY2
Upgrade or update PRIMARY1
If you are running multiple availability groups with primary replicas on separate server nodes (an Active/Active configuration), the upgrade path involves more failover steps to preserve high availability in the process. Suppose you are running three availability groups on three server nodes as shown in the following table, and all secondary replicas are running in synchronous-commit mode.
It may be appropriate in your situation to perform a load-balanced rolling upgrade in the following sequence:
Fail over AG2 to Node3 (to free up Node2)
Upgrade or update Node2
Fail over AG1 to Node2 (to free up Node1)
Upgrade or update Node1
Fail over both AG2 and AG3 to Node1 (to free up Node3)
Upgrade or update Node3
Fail over AG3 to Node3
This upgrade sequence has an average downtime of less than two failovers per availability group. The resulting configuration is shown in the table below.
Based on your specific implementation, your upgrade path may vary, and the downtime that client applications experience may vary as well.