Business Continuity in Windows Azure SQL Database
This article describes the business continuity capabilities provided by Windows Azure SQL Database. It also includes information about the built-in fault tolerance that supports high availability of the SQL Database applications.
Business continuity issues can belong to one of the following main categories:
Failure of individual servers, devices or network connectivity
Corruption, unwanted modification or deletion of data
Widespread loss of data center facilities
The purpose of creating database backups is to enable you to recover from data loss caused by such issues. Backing up and restoring data is different in Windows Azure SQL Database than an on-premises SQL Server and must work with the available resources and tools. For an overview of the backup and restore options provided by Windows Azure SQL Database, see Windows Azure SQL Database Backup and Restore. The following sections explain how Windows Azure SQL Database addresses these categories to help provide business continuity:
How to help protect your database from failure of individual servers and devices.
How to help protect your database from unwanted deletions or modifications
How to help protect your database from widespread loss of data center facilities
SQL Database services in Windows Azure provide built in fault tolerance capabilities. To help protect your database from unwanted deletions or modifications, or from widespread loss of data center facilities, requires developing a business continuity strategy. These capabilities and strategies are described in the sections below.
How to help protect your database from failure of individual servers and devices
By storing your data in Windows Azure SQL Database you are taking advantage of many fault tolerance and secure infrastructure capabilities that you would otherwise have to design, acquire, implement and manage. This section covers the things Windows Azure SQL Database does for you without any additional expense.
Windows Azure SQL Database mitigates outages due to failures of individual devices, such as hard drives, network interface adapters, or even entire servers. Data durability and fault tolerance is enhanced by maintaining multiple copies of all data in different physical nodes located across fully independent physical sub-systems such as server racks and network routers. At any one time, Windows Azure SQL Database keeps three replicas of data running—one primary replica and two secondary replicas. Windows Azure SQL Database uses a quorum based commit scheme where data is written to the primary and one secondary replica before we consider the transaction committed. If the hardware fails on the primary replica, the Windows Azure SQL Database detects the failure and fails over to the secondary replica. In case of a physical loss of the replica, Windows Azure SQL Database creates a new replica automatically. Therefore, there are at least two physical transactionally consistent copies of your data in the data center. The following diagram illustrates how Windows Azure SQL Database keeps three replicas in the physical server racks in the data center.
In addition to the redundant replicas, Windows Azure SQL Database maintains internal copies of your data for the last 14 days for all your databases within the data center. These copies provide a safe guard against simultaneous or catastrophic hardware and system failures but are not available to customers. We recommend that you implement your own backup and restore solutions as described in the following sections.
The goal is that failures within the data center will not result in data loss, but they will result in temporary disconnects and transaction failures.
Your application should be resilient to the temporary disconnects. We recommend that you implement the retry logic in your application to prevent connection losses. For more information on how to handle the connection-loss errors, see Connection Management in SQL Database article in the TechNet Wiki.
We do not provide an SLA for the typical recovery point objective (RPO) and the recovery time objective (RTO) currently. For more information on Windows Azure Service Level Agreements, see Service Level Agreements.
How to help protect your database from unwanted deletions or modifications
Users or applications can make unwanted changes to data. This may need a revert operation. For example, a user might modify some data that belongs to the wrong customer, and so on. The ability to restore application data in the event of a corruption or unwanted modification or deletion is a fundamental requirement for software applications. We recommend that you implement the following approaches for implementing backups for Windows Azure SQL Database instances.
Database copies can be used as a backup and restore option to help protect your database as described in the sections below:
|For an overview of the backup and restore options provided by Windows Azure SQL Database, see Windows Azure SQL Database Backup and Restore.|
To make a separate copy of your Windows Azure SQL Database , run a Transact-SQL command to create a database copy on a periodic basis and then manage these database copies. Execute the Transact-SQL CREATE DATABASE statement with the AS COPY OF clause to create an independent copy of your database in Windows Azure SQL Database. Note that a copy operation may take a long period of time, potentially hours, depending on the workload within the data center, how much workload is currently being executed against the original database being copied, and the size of the original database being copied.
The following code example demonstrates how to execute a database copy by using the Transact-SQLCREATE DATABASE (Windows Azure SQL Database) statement:
CREATE DATABASE destination_database_name AS COPY OF [source_server_name].source_database_name
You can create a copy of your database in a different Windows Azure SQL Database server but the server must be in the same data center. A copy operation might take a long time but the final copy of your destination database will be transactionally consistent with the original database when the copy process completes. For example, if you start a copy process at 14:00 in the 24-hour time notation, Windows Azure SQL Database will immediately create the new copy based on the system backup taken nearest 14:00. Note that this wouldn’t be longer than 5 minutes prior to 14:00. Windows Azure SQL Database will replicate transactions committed to the source database to the destination database until the destination database catches up to the source database. Once the destination database catches up, the copy operation is considered complete and the replication link between the source and the destination is severed leaving the new destination database isolated from any future changes to the source database. At this point, the new destination database becomes accessible. As an example, let’s assume that a database copy process takes several hours to complete. The database copy process will mirror the source database at the point in time when the copy operation completes. Therefore, when thinking about the restore as a mechanism to recover from any user error at a point in time, you must know that if the user error occurs while the copy operation is in progress, the error gets replicated to the destination database as well.
Besides enabling a simpler recovery process, such as maintaining the original connection string, copying the database to a different server doesn’t provide you with any additional disaster protection. The Windows Azure SQL Database server is a logical server and while the source database and the destination database may be grouped under the same logical server, the two databases aren’t necessarily physically located on the same machine. Furthermore, when you consider the two secondary replicas for both the source and the destination databases, the likelihood of physical co-location of the 6 different copies becomes remote. In summary, copying to a separate logical server does not provide you any stronger protection.
After the database copy begins, you can query the sys.databases (Windows Azure SQL Database) and sys.dm_database_copies (Windows Azure SQL Database) views on the master database of the destination server to retrieve more information about the copying progress.
SELECT [databases].[name], [copies].* FROM sys.dm_database_copies copies JOIN sys.databases databases ON copies.database_id = databases.database_id
The frequency at which you choose to copy your database can vary and depends on business needs. To recover from user or application errors, we recommend that you create a daily copy and maintain two or three running copies on a rotating basis by dropping the oldest copy every day after a fresh copy completes.
Note that although we recommend daily copies, you can copy your database more frequently. We recommend that you perform a database copy operation once per hour maximum because each database copy process can finish at different times but each is transactionally consistent with the source database at the point in time when the copy completes. If you start two database copies 5 minutes apart, they may both finish almost at the same time. This creates an identical database copy and you would be billed for the same or similar copy of the data twice.
For more information on the database copy, see Copying Databases in Windows Azure SQL Database,.
Database Copy Restore
You can implement restore for your database and its copy on the same server by renaming them. For example, consider a database named “Database1” and a copy “Database1_copy_02_01_2012”. The following Transact-SQL script demonstrates how to swap the database names in the same server when a source database and its copy are called as “Database1” and “Database1_copy_02_01_2012” respectively. After the script runs successfully, new database traffic is directed to the copy rather than to the original database.
ALTER DATABASE Database1 MODIFY NAME = Database1_OLD GO WAITFOR DELAY '00:00:30' GO ALTER DATABASE Database1_copy_02_01_2012 MODIFY NAME = Database1 GO
While the service runs the rename process, the existing connections to the original database are killed. Therefore, it’s important to ensure that your application is resilient to connection-losses to Windows Azure SQL Database and will be able to re-establish the connection when such an event occurs. The Transact-SQL script includes WAITFOR DELAY statement between the two ALTER statements. This helps ensure that the new connections established during the recovery period do not accidentally connect to the old database before renaming is done. If you need to copy your database to a different Windows Azure SQL Database server, renaming will not work. In such cases, modify your application to point to the database copy on the separate server.
Database copies might take a long time and are variable in duration.
Your bill will include charges for each database copy. But you are charged for the new database copy only after the copy process completes successfully. Each database copy is billed at the same rate as the source database.
You are responsible for managing the copies and dropping them when appropriate.
Database usage is calculated daily, so if your database copy is active only for one day, you will be charged a prorated fee for one day for your copy database.
Restoring to a different logical server requires modifications in connection strings.
Restoring to a different logical server doesn’t guarantee a better data-loss protection and improved system performance.
While a copy process is in progress, all changes to the original database are replicated to the copy. Therefore, deciding which database copy to restore is an important issue.
The SQL Database November 2011 release introduced Federations in Windows Azure SQL Database (formerly SQL Azure). At this point you cannot copy a database that contains federations by using the database copy operation. Conversely, creating a federation fails if a database copy operation is active in the database. Database copy cannot be performed on federation members either.
The recovery time objective (RTO) should be equal to “time to recognize the error + time to rename the database + time interval between the two ALTER statements.
How to help protect your database from widespread loss of data center facilities
To help protect against any data center loss in the event of a disaster, you need to create offsite storage of database backups outside of the data center, in which your database application is deployed. To achieve that, we recommend that you use both the database copy described in the previous section and the SQL Database Import/Export Service. The database copy maintains the transactional consistency during export. The SQL Database Import/Export Service copies the object definitions from your source Windows Azure SQL Database database to a logical export file (BACPAC), and then bulk copies the data from the user tables to BACPAC. You should determine the specific destination of BACPAC and the specific recovery procedure based on your application’s service level agreement and other business requirements.
For an overview of the backup and restore options provided by Windows Azure SQL Database, see Windows Azure SQL Database Backup and Restore. For more information on how to use the SQL Database Import/Export service, see SQL DAC Examples, and DAC Import/Export Hosted as a Service. For more information on how to use Windows Azure Management Portal to import/export database in Windows Azure SQL Database, see How to: Import and Export a Database (Windows Azure SQL Database).
To help protect your data against the loss of a data center, you can implement the following approaches:
Export the BACPAC file to a blob using storage account in a different data center.
Export the BACPAC file to a blob using storage account in the same data center and rely on Windows Azure Storage geo-replication to copy the BACPAC file to the separate data center.
Import the BACPAC file to your on-premises SQL Server.
The following sections describe these approaches and also the advantages and disadvantages of each.
Export the BACPAC file to a blob using storage account in a different data center
With this approach, you have to create a storage account in a data center in the same region but separate from the data center that you have your Windows Azure SQL Database account. For example, if your application and database run in the North Europe data center, we recommend that you maintain a copy of your database or BACPAC in another data center in the same region, such as West Europe. For more protection, you can also maintain a copy of your database or BACPAC in another data center in a different region, for example South Central US.
Once the file export operation is complete, you can immediately use the import operation to recreate the database and data on another Windows Azure SQL Database server in the same or different data center. Then, you can delete the used Windows Azure Blobs to reduce costs. Importing into Windows Azure SQL Database at the targeted data center validates the operation and reduces the overall duration of recovery. In this case, you will get charged for the database. Alternatively, you can defer the import operation until the actual failover completes. In this case, Microsoft charges you for the Blob storage. Note that you would still need to perform periodic imports into the database in each backup data center to validate the operation and test the disaster recovery procedure. Note that you also will be charged for the bandwidth cost of copying the BACPAC to one or more data centers.
The following diagram illustrates an enhanced data protection. The diagram demonstrates performing a database copy in the data center A, and then exporting it to the Windows Azure Storage in the data center B. Next, the diagram shows importing the BACPAC to recreate the database in data center B and copying the BACPAC to the data center C. Finally, it shows importing the BACPAC to recreate the database in data center C.
If the data center A fails, the recovery point objective (RPO) or potential data loss in your application will be determined by how often you perform export. For example, if you export data once a day, your data loss will be 24 hours of data. The recovery time objective (RTO) of the database itself will be minimal if you perform an immediate database import. If you defer the import, the RTO will be determined by the size of the BACPAC and the time it takes to import into the new database.
Export the BACPAC file to a blob using storage account in the same data center
With this approach, you need to have a storage account in the same data center where you have your SQL Database account. This recovery solution relies on the automatic geo-replication of Windows Azure storage objects, such as Windows Azure Blobs and Tables, to another data center in the same region, at no additional cost. For example, if your storage account is in the North Europe data center, your Windows Azure storage objects are automatically replicated to the second data center in the same region, such as West Europe.
With this approach, you do not have an option of immediately importing the BACPAC to the new database in the second data center. This is because the decision to failover is made by Windows Azure. Therefore, the geo-replicated BACPAC can be accessible only after Windows Azure executes the failover operation. You will be billed for the usage of the Windows Azure storage account but there is no bandwidth charge because the BACPAC copy is done within the same data center and the cost of geo-replication is included in the storage cost. It is important to note that the Import/Export service produces the BACPAC using the Block Blobs, which helps to preserve its integrity by replicating the entire BACPAC.
The following diagram illustrates copying a database in data center A, and then exporting it to the Windows Azure storage in the same data center. Next, the diagram shows that Windows Azure performs geo-replication of Windows Azure storage objects from the data center A to data center B automatically. Finally, the diagram illustrates importing the BACPAC to recreate the database in the data center B.
With this approach, the typical recovery point objective (RPO) should be equal to “archiving interval time x 2”. If the failure occurs right after the archive is created, the Blob data may be lost. In that case, only the previous archive will be available. The typical recovery time objective (RTO) should be equal to “24 hours + time to import”. Although the second option can take longer than then first option, it provides reduction in storage costs.
|Windows Azure Blobs and Tables are geo-replicated between two data centers 100s of miles apart from each other on the same continent, to provide additional data durability in the case of a major disaster, at no additional cost. For more information, see Introducing Geo-replication for Windows Azure Storage blog post. With this first release of geo-replication, we do not provide an SLA for how long it might take to asynchronously geo-replicate the data, though transactions are typically geo-replicated within a few minutes after they have been committed in the primary location and the estimated time that the data will be accessible to customers after a disaster is 24 hours.|
Import the BACPAC file to an on-premise SQL Server
In addition, you can also download the BACPAC from your storage account to a local client machine and then import the archive into a local SQL Server 2012, SQL Server 2008 R2, SQL Server 2008, or SQL Server 2005 instance by using the Import/Export functionality provided by provided by SQL Server Management Studio available in SQL Server 2012. Importing the archive to a local instance of SQL Server will help you ensure that you maintain a local copy of your data without needing an internet connection or the availability of Windows Azure data centers. To import an archive file to a local instance of SQL Server, download the SQL Server 2012 product here and install the Management Tools. For more information, see Install SQL Server 2012 in the SQL Server Books Online. After the Microsoft SQL Server 2012 Management Tools is installed, download the archive export file from your storage account to your local client. Once the download is complete, connect to the target database by using SQL Server Management Studio. In the Object Explorer, right-click on the server and select Import Data-tier Application… and then follow the steps to import the archive export file.
These options require a Windows Azure Storage account.
Your application should manage failover events.
Your bill will include charges for each database copy.
Your bill will include charges for offline archives stored in your Windows Azure Storage account.
You are responsible for managing the database copies and dropping them when appropriate.
You might need an on-premise SQL Server instance.
After failover, you might lose some data.
The SQL Database November 2011 release introduced Federations in Windows Azure SQL Database (formerly SQL Azure). To implement a recovery strategy for your federated data, you can either write your own custom export scripts using the bulk copy utility (BCP.exe) or System.Data.SqlClient.SqlBulkCopy class or use the SQL Database Migration Wizard on Codeplex currently.
Warning The SQL Database Migration Wizard tool is built by the community and not supported.
Data Sync Service and Disaster Recovery
Microsoft Windows Azure SQL Database Data Sync service provides data synchronization capabilities for Windows Azure SQL Database instances. The service currently has two main capabilities:
Synchronization of data between on-premises SQL Server databases and Windows Azure SQL Database instances, allowing on-premises and cloud-based applications to utilize the same data.
Synchronization of data between two or more SQL Database; the databases can be in the same data center, different data centers or different regions. This capability provides a scale-out mechanism for applications via multiple copies of the data. It also enables multiple application and database instances to be deployed around the world close to end-users with all the databases being kept in synchronization. Note that SQL Data Sync is often used with Windows Azure Traffic Manager.
When using Windows Azure SQL Database Data Sync, you can configure to synchronize data in a fixed interval. During the synchronization process, only the changed data is transmitted after the first initial synchronization; such as only the rows that have been inserted, updated, or deleted since the previous synchronization. SQL Data Sync keeps multiple copies of your database up-to-date and these copies can be stored in the cloud or on-premises.
We recommend that you assess the following design considerations of the SQL Data Sync service before using it as part of your disaster recovery strategy:
Data Sync does not synchronize transactions or preserve transaction boundaries. Data Sync applies net-changes in batches one table at a time, with each batch being a transaction. A small number of changes would be applied in one batch, but a large number of changes may be applied in multiple batches. If, for example, both an
Orderrow and the corresponding
OrderDetailsrow are inserted in one transaction on one database, it is possible that the
Orderrow and the
OrderDetailsrow might get inserted in different transactions of the other database. In the event of a failure during synchronization, such as a network error, some transactions may not be applied until the next synchronization. If there is a failover before the transactions have been applied in the following synchronization, the data could be in an inconsistent state for the application; such as the
Orderrow is present, but the
OrderDetailsrow is not.
The SQL Data Sync service has some limitations regarding the database schemas supported. For example, SQL Data Sync is unable to synchronize any table that does not have a Primary Key. In addition, SQL Data Sync synchronizes only data but not stored procedures and triggers. For more information, see Data Sync FAQ.
SQL Data Sync does not keep multiple versions of a database available to allow failover or restore to a particular time.
If these considerations are acceptable for your scenario and applications, SQL Data Sync may be an option for your disaster recovery strategy.
If you use the Data Sync service as a disaster recovery solution, we recommend that you build application logic to validate the operational capacity of the application after failover.
|SQL Data Sync is currently available only as a Preview and is meant only for product feedback for future releases and should not be used in production environments.|