This topic provides information to help you troubleshoot typical problems with configuring server instances for Always On availability groups. Typical configuration problems include Always On availability groups is disabled, accounts are incorrectly configured, the database mirroring endpoint doesn't exist, the endpoint is inaccessible (SQL Server Error 1418), network access doesn't exist, and a join database command fails (SQL Server Error 35250).
If an instance of SQL Server is not enabled for Always On availability groups, the instance doesn't support availability group creation and can't host any availability replicas.
Documents the requirement that each server instance that is hosting an availability replica must be able to access the port of each of the other server instances over TCP.
Discusses the possible causes and resolution of a failure to join secondary databases to an availability group because the connection to the primary replica isn't active.
Contains a list of relevant resources that are external to SQL Server Books Online.
Always On Availability Groups Is Not Enabled
The Always On availability groups feature must be enabled on each of the instances of SQL Server.
If the Always On Availability Groups feature isn't enabled, you'll get this error message when you try to create an Availability group on SQL Server.
The Always On Availability Groups feature must be enabled for server instance 'SQL1VM' before you can create an availability group on this instance. To enable this feature, open the SQL Server Configuration Manager, select SQL Server Services, right-click on the SQL Server service name, select Properties, and use the Always On Availability Groups tab of the Server Properties dialog. Enabling Always On Availability Groups may require that the server instance is hosted by a Windows Server Failover Cluster (WSFC) node. (Microsoft.SqlServer.Management.HadrTasks)
The error message clearly indicates that the AG feature isn't enabled and also directs you how to enable it. There are two scenarios where you can get in this state besides the obvious one where AG wasn't enabled in the first place.
If SQL Server was installed and the Always On Availability Groups feature was enabled before you installed the Windows Failover Clustering feature, you may get this error when you attempt to create an Always On AG.
If you remove an existing Windows Failover Clustering feature and rebuild it while SQL Server still has Always On configured, when you attempt to use AG again this error may occur.
In such cases you can take the following steps to resolve it:
The accounts under which SQL Server is running must be correctly configured.
Do the accounts have the correct permissions?
If the partners run under the same domain account, the correct user logins exist automatically in both master databases. This simplifies the security configuration and is recommended.
If two server instances run under different accounts, then each account must be created in master on the remote server instance, and that server principal must be granted CONNECT permissions to connect to the database mirroring endpoint of that server instance. For more information, see Set Up Login Accounts for Database Mirroring or Always On Availability Groups (SQL Server). You can use the following query on each instance to check if the logins have CONNECT permissions:
SELECT
perm.class_desc,
prin.name,
perm.permission_name,
perm.state_desc,
prin.type_desc as PrincipalType,
prin.is_disabled
FROM sys.server_permissions perm
LEFT JOIN sys.server_principals prin ON perm.grantee_principal_id = prin.principal_id
LEFT JOIN sys.tcp_endpoints tep ON perm.major_id = tep.endpoint_id
WHERE
perm.class_desc = 'ENDPOINT'
AND perm.permission_name = 'CONNECT'
AND tep.type = 4
If SQL Server is running under a built-in account, such as Local System, Local Service, or Network Service, or a nondomain account, you must use certificates for endpoint authentication. If your service accounts are using domain accounts in the same domain, you can choose to grant CONNECT access for each service account on all the replica locations or you can use certificates. For more information, see Use Certificates for a Database Mirroring Endpoint (Transact-SQL).
Endpoints
Endpoints must be correctly configured.
Make sure that each instance of SQL Server that is going to host an availability replica (each replica location) has a database mirroring endpoint. To determine whether a database mirroring endpoint exists on a given server instance, use the sys.database_mirroring_endpoints catalog view:
SELECT name, state_desc FROM sys.database_mirroring_endpoints
To identify the port currently associated with database mirroring endpoint of a server instance, use the following Transact-SQL statement:
SELECT type_desc, port FROM sys.tcp_endpoints;
GO
For Always On availability groups setup issues that are difficult to explain, we recommend that you inspect each server instance to determine whether it's listening on the correct ports.
Make sure that the endpoints are started (STATE=STARTED). On each server instance, use the following Transact-SQL statement:
SELECT state_desc FROM sys.database_mirroring_endpoints
In some cases, if the endpoint is started but the AG replicas are not communicating, you may try to stop and restart the endpoint. You can use ALTER ENDPOINT [Endpoint_Mirroring] STATE = STOPPED followed by ALTER ENDPOINT [Endpoint_Mirroring] STATE = STARTED
Make sure that the login from the other server has CONNECT permission. To determine who has CONNECT permission for an endpoint, on each server instance use the following Transact-SQL statement:
SELECT 'Metadata Check';
SELECT EP.name, SP.STATE,
CONVERT(nvarchar(38), suser_name(SP.grantor_principal_id))
AS GRANTOR,
SP.TYPE AS PERMISSION,
CONVERT(nvarchar(46),suser_name(SP.grantee_principal_id))
AS GRANTEE
FROM sys.server_permissions SP , sys.endpoints EP
WHERE SP.major_id = EP.endpoint_id
ORDER BY Permission,grantor, grantee;
Ensure correct server name is used in the endpoint URL
For server name in an endpoint URL, it's recommended to use fully qualified domain name (FQDN), although you can use any name that uniquely identifies the machine. The server address can be a Netbios name (if the systems are in the same domain), a fully qualified domain name (FQDN), or an IP address (preferably, a static IP address). Using the fully qualified domain name is the recommended option.
If you've already defined an Endpoint URL, you can query it by using:
select endpoint_url from sys.availability_replicas
Next, compare the endpoint_url output to the server name (NetBIOS name or FQDN).
To query the server name, run the following commands in a PowerShell on the replica locally:
Each server instance that is hosting an availability replica must be able to access the port of each of the other server instance over TCP. This is especially important if the server instances are in different domains that don't trust each other (untrusted domains). Check if you can connect to the endpoints by following these steps:
Use Test-NetConnection (equivalent to Telnet) to validate connectivity. Here are examples of commands you can use:
If the Endpoint is listening and connection is successful, you will see "TcpTestSucceeded : True". If not, you'll receive a "TcpTestSucceeded : False".
If Test-NetConnection (Telnet) connection to the IP address works but to the ServerName it doesn't, there's likely a DNS or name resolution issue
If connection works by ServerName and not by IP address, then there could be more than one endpoint defined on that server (another SQL instance perhaps) that is listening on that port. Though the status of the endpoint on the instance in question shows "STARTED", another instance may actually have the port bound and prevent the correct instance from listening and establishing TCP connections.
If Test-NetConnection fails to connect, look for Firewall and/or Anti-virus software that may be blocking the endpoint port in question. Check the firewall setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default).
Run the following PowerShell script to examine for disabled inbound traffic rules
If you're running SQL Server on Azure VM, additionally you would need to ensure Network Security Group (NSG) allows the traffic to endpoint port. Check the firewall (and NSG, for Azure VM) setting to see if it allows the endpoint port communication between the server instances that host primary replica and the secondary replica (port 5022 by default)
Capture the output from Get-NetTCPConnection cmdlet (equivalent of NETSTAT -a) and verify the status is a LISTENING or ESTABLISHED on the IP:Port for the endpoint specified
Once the listener is configured you can validate the IP address and port it is listening on by using the following query:
$server_name = $env:computername #replace this with your sql instance "server\instance"
sqlcmd -E -S$server_name -Q"SELECT dns_name AS AG_listener_name, port, ip_configuration_string_from_cluster
FROM sys.availability_group_listeners"
You can also find the listener information together with the SQL Server ports using this query:
$server_name = $env:computername #replace this with your sql instance "server\instance"
sqlcmd -E -S($server_name) -Q("SELECT convert(varchar(32), SERVERPROPERTY ('servername')) servername, convert(varchar(32),ip_address) ip_address, port, type_desc,state_desc, start_time
FROM sys.dm_tcp_listener_states
WHERE ip_address not in ('127.0.0.1', '::1') and type <> 2")
If you need to establish connectivity to the listener and suspect a port is blocked, you can perform a test using the PowerShell Test-NetConnection cmdlet (equivalent to telnet).
This SQL Server message indicates that the server network address specified in the endpoint URL can't be reached or doesn't exist, and it suggests that you verify the network address name and reissue the command.
Join Database Fails (SQL Server Error 35250)
This section discusses the possible causes and resolution of a failure to join secondary databases to the availability group because the connection to the primary replica isn't active. This is the full error message:
Msg 35250 The connection to the primary replica is not active. The command cannot be processed.
Resolution:
Summary of steps is outlined below.
For detailed step-by-step instructions, refer to Engine error MSSQLSERVER_35250
Ensure the endpoint is created and started.
Check if you can connect to the endpoint via Telnet and ensure no firewall rules are blocking connectivity
Check for errors in the system. You can query the sys.dm_hadr_availability_replica_states for the last_connect_error_number that may help you diagnose the join issue.
Ensure the endpoint is defined so it correctly matches the IP/port that AG is using.
Check whether the network service account has CONNECT permission to the endpoint.
Check for possible name resolution issues
Ensure your SQL Server is running a recent build (preferably the latest build to protect from running into fixed issues.
If you are using command line programs like SQLCMD, ensure that you specify the correct switches for server name. For instance, in SQLCMD you must use the upper case -S switch that specifies server name, not the lower case -s switch which is used for column separator.
Example: sqlcmd -S AG_Listener,port -E -d AgDb1 -K ReadOnly -M
Ensure that the availability group listener is online. To ensure that the availability group listener is online run the following query on the primary replica:
SELECT * FROM sys.dm_tcp_listener_states;
If you find the listener is offline, you can attempt to bring it online using a command like this:
ALTER AVAILABILITY GROUP myAG RESTART LISTENER 'AG_Listener';
Ensure READ_ONLY_ROUTING_LIST is correctly populated. On Primary replica, ensure that the READ_ONLY_ROUTING_LIST contains only server instances that are hosting readable secondary replicas.
To view the properties of each replica you can run this query and examine the connectivity endpoint (URL) of the read only replica.
SELECT replica_id, replica_server_name, secondary_role_allow_connections_desc, read_only_routing_url
FROM sys.availability_replicas;
To view a read-only routing list and compare to the endpoint URL:
SELECT * FROM sys.availability_read_only_routing_lists;
To change a read-only routing list you can use a query like this:
ALTER AVAILABILITY GROUP [AG1]
MODIFY REPLICA ON
N'COMPUTER02' WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('COMPUTER01','COMPUTER02')));
Check that READ_ONLY_ROUTING_URL port is open. Ensure that the Windows firewall is not blocking the READ_ONLY_ROUTING_URL port. Configure a Windows Firewall for database engine access on every replica in the read_only_routing_list and any for clients that will be connecting to those replicas.
Note
If you are running SQL Server on Azure VM, you must take additional configuration steps. Ensure that the network security group (NSG) of each replica VM allows traffic to the endpoint port and the DNN port, if you are using DNN listener. If you are using VNN listener, you must ensure the load balancer is configured correctly.
Ensure that the READ_ONLY_ROUTING_URL (TCP://system-address:port) contains the correct fully qualified domain name (FQDN) and port number. See:
Ensure proper SQL Server Networking configuration in the SQL Server Configuration Manager.
Verify on every replica in the read_only_routing_list that:
SQL Server remote connectivity is enabled
TCP/IP is enabled
The IP addresses are configured correctly
Note
You can quickly verify all of these are properly configured if you can connect from a remote machine to a target secondary replica's SQL Server instance name using TCP:SQL_Instance syntax.
Administer an SQL Server database infrastructure for cloud, on-premises and hybrid relational databases using the Microsoft PaaS relational database offerings.