Plan for availability (SharePoint Foundation 2010)

Article
07/22/2014

Applies to: SharePoint Foundation 2010

This article describes key decisions in choosing availability strategies for a Microsoft SharePoint Foundation 2010 environment.

As you carefully review your availability requirements, be aware that the higher the level of availability and the more systems that you protect, the more complex and costly your availability solution is likely to be.

Not all solutions in an organization are likely to require the same level of availability. You can offer different levels of availability for different sites, different services, or different farms.

In this article:

Availability overview
Choosing an availability strategy and level
Redundancy and failover between closely located data centers configured as a single farm ("stretched" farm)

Availability overview

Availability is the degree to which a SharePoint Foundation environment is perceived by users to be available. An available system is a system that is resilient — that is, incidents that affect service occur infrequently, and timely and effective action is taken when they do occur.

Availability is part of business continuity management (BCM), and is related to backup and recovery and disaster recovery. For more information about these related processes, see Plan for backup and recovery (SharePoint Foundation 2010) and Plan for disaster recovery (SharePoint Foundation 2010).

Note

When calculating availability, most organizations specifically exempt or add hours for planned maintenance activities.

One of the most common measures of availability is percentage of uptime expressed as number of nines — that is, the percentage of time that a given system is active and working. For example, a system with a 99.999 uptime percentage is said to have five nines of availability.

The following table correlates uptime percentage with calendar time equivalents.

Acceptable uptime percentage	Downtime per day	Downtime per month	Downtime per year
95	72.00 minutes	36 hours	18.26 days
99 (two nines)	14.40 minutes	7 hours	3.65 days
99.9 (three nines)	86.40 seconds	43 minutes	8.77 hours
99.99 (four nines)	8.64 seconds	4 minutes	52.60 minutes
99.999 (five nines)	0.86 seconds	26 seconds	5.26 minutes

If you can make an educated guess about the number of total hours downtime you are likely to have per year, you can use the following formulas to calculate the uptime percentage for a year, a month, or a week:

% uptime/year = 100 - (8760 - number of total hours downtime per year)/8760

% uptime/month = 100 - ((24 × number of days in the month) - number of total hours downtime in that calendar month)/(24 × number of days in the month)

% uptime/week = 100 - (168 - number of total hours downtime in that week)/168

Costs of availability

Availability is one of the more expensive requirements for a system. The higher the level of availability and the more systems that you protect, the more complex and costly an availability solution is likely to be. When you invest in availability, costs include the following:

Additional hardware and software, which can increase the complexity of interactions among software applications and settings.
Additional operational complexity.

The costs of improving availability should be evaluated in conjunction with your business needs — not all solutions in an organization are likely to require the same level of availability. You can offer different levels of availability for different sites, different services, or different farms.

Availability is a key area in which information technology (IT) groups offer service level agreements (SLAs) to set expectations with customer groups. Many IT organizations offer various SLAs that are associated with different chargeback levels.

Determining availability requirements

To gauge your organization's tolerance of downtime for a site, service, or farm, answer the following questions:

If the site, service, or farm becomes unavailable, will employees be unable to perform their expected job responsibilities?
If the site, service, or farm becomes unavailable, will business and customer transactions be stopped, leading to loss of business and customers?

If you answered yes to either of these questions, you should invest in an availability solution.

Choosing an availability strategy and level

You can choose among many approaches to improve availability in a SharePoint Foundation environment, including the following:

Improve the fault tolerance of server hardware components.
Increase the redundancy of server roles within a farm.

Hardware component fault tolerance

Hardware component fault tolerance is the redundancy of hardware components and infrastructure systems such as power supplies at the server level. When planning for hardware component fault tolerance, consider the following:

Complete redundancy of every component within a server may be impossible or impractical. Use additional servers for additional redundancy.
Ensure that servers have multiple power supplies connected to different power sources for maximum redundancy.

In any system, we recommend that you work with hardware vendors to obtain fault-tolerant hardware that is appropriate for the system, including redundant array of independent disks (RAID) arrays.

Redundancy within a farm

SharePoint Foundation 2010 supports running server roles on redundant computers (that is, scaling out) within a farm to increase capacity and to provide basic availability.

The capacity that you require determines both the number of servers and the size of the servers in a farm. After you have met your base capacity requirements, you may want to add more servers to increase overall availability. The following illustration shows how you can provide redundancy for each server role.

Availability within a server farm

Single farm availability

The following table describes the server roles in a SharePoint Foundation 2010 environment and the redundancy strategies that can be used for each within a farm.

Server role	Preferred redundancy strategy within a farm
Front-end Web server	Deploy multiple front-end Web servers within a farm, and use Network Load Balancing (NLB).
Application server	Deploy multiple application servers within a farm.
Database server	Deploy database servers by using clustering or high-availability database mirroring.

Database availability strategies

You can use Microsoft SQL Server failover clustering or SQL Server high-availability database mirroring to support availability of databases in a SharePoint Foundation environment.

SQL Server failover clustering

Failover clustering can provide availability support for an instance of SQL Server. A failover cluster is a combination of one or more nodes or servers, and two or more shared disks. A failover cluster instance appears as a single computer, but has functionality that provides failover from one node to another if the current node becomes unavailable. SharePoint Foundation can run on any combination of active and passive nodes in a cluster that is supported by SQL Server.

SharePoint Foundation references the cluster as a whole; therefore, failover is automatic and seamless from the perspective of SharePoint Foundation.

For detailed information about failover clustering, see Getting Started with SQL Server 2008 Failover Clustering (https://go.microsoft.com/fwlink/p/?LinkID=102837&clcid=0x409) and Configure availability by using SQL Server clustering (SharePoint Foundation 2010).

SQL Server high-availability mirroring

Database mirroring is a SQL Server technology that can deliver database redundancy on a per-database basis. In database mirroring, transactions are sent directly from a principal database and server to a mirror database and server when the transaction log buffer of the principal database is written to disk. This technique can keep the mirror database almost up to date with the principal database. SQL Server Enterprise Edition provides additional functionality that improves database mirroring performance.

For mirroring within a SharePoint Foundation farm, you must use high-availability mirroring, also known as high-safety mode with automatic failover. High-availability database mirroring involves three server instances: a principal, a mirror, and a witness. The witness server enables SQL Server to automatically fail over from the principal server to the mirror server. Failover from the principal database to the mirror database typically takes several seconds.

A change from previous versions is that SharePoint Foundation is mirroring-aware. After you have configured a database mirror instance of SQL Server, you then use SharePoint Central Administration or Windows PowerShell cmdlets to identify the failover (mirror) database server location for a configuration database, content database, or service application database. Setting a failover database location adds a parameter to the connection string that SharePoint Foundation uses to connect to SQL Server. In the event of a SQL Server time-out event, the following occurs:

The witness server that is configured for SQL Server mirroring automatically swaps the roles of the primary and mirror databases.
SharePoint Foundation automatically attempts to contact the server that is specified as the failover database.

For information about how to configure database mirroring, see Configure availability by using SQL Server database mirroring (SharePoint Foundation 2010).

For general information about database mirroring, see Database Mirroring (https://go.microsoft.com/fwlink/p/?LinkID=180597).

Note

Databases that have been configured to use the SQL Server FILESTREAM remote BLOB store provider cannot be mirrored.

Comparison of database availability strategies for a single farm: SQL Server failover clustering vs. SQL Server high-availability mirroring

The following table compares failover clustering to synchronous SQL Server high-availability mirroring.

	SQL Server failover clustering	SQL Server high-availability mirroring
Time to failover	Cluster member takes over immediately upon failure.	Mirror takes over immediately upon failure.
Transactional consistency?	Yes	Yes
Transactional concurrency?	Yes	Yes
Time to recovery	Shorter time to recovery (milliseconds)	Slightly longer time to recovery (milliseconds).
Steps required for failover?	Failure is automatically detected by database nodes; SharePoint Foundation 2010 references the cluster so that failover is seamless and automatic.	Failure is automatically detected by the database; SharePoint Foundation 2010 is aware of the mirror location, if it has been configured correctly, so that failover is automatic.
Protection against failed storage?	Does not protect against failed storage, because storage is shared between nodes in the cluster.	Protects against failed storage because both the principal and mirror database servers write to local disks.
Storage types supported	Shared storage (more expensive).	Can use less-expensive direct-attached storage (DAS).
Location requirements	Members of the cluster must be on the same subnet.	Principal, mirror, and witness servers must be on the same LAN (up to 1 millisecond latency roundtrip).
Recovery model	SQL Server full recovery model recommended. You can use the SQL Server simple recovery model, but the only available recovery point if the cluster is lost will be the last full backup.	Requires SQL Server full recovery model.
Performance overhead	Some decrease in performance may occur while a failover is occurring.	High-availability mirroring introduces transactional latency because it is synchronous. It also requires additional memory and processor overhead.
Operational burden	Set up and maintained at the server level.	The operational burden is larger than clustering. Must be set up and maintained for all databases. Reconfiguring after failover is manual.

Service application redundancy strategies

The redundancy strategy you follow for protecting service applications that run in a farm varies, depending on where the service application stores data.

Service applications that store data in databases

To help protect service applications that store data in databases, you must follow these steps:

Install the service on multiple application servers to provide redundancy within the environment.
Configure SQL Server clustering or mirroring to protect the data.

The following service applications store data in databases:

Business Data Connectivity service application
Application Registry service application

We do not recommend mirroring the Application Registry database, because it is only used when upgrading Windows SharePoint Services 3.0 Business Data Catolog information to SharePoint Foundation 2010.
Usage and Health Data Collection service application

Note

We recommend that you do not mirror the Usage and Health Data Collection service application Logging database.
Microsoft SharePoint Foundation Subscription Settings service

Redundancy and failover between closely located data centers configured as a single farm ("stretched" farm)

Some enterprises have data centers that are located close to one another with high-bandwidth connections so that they can be configured as a single farm. This is called a "stretched" farm. For a stretched farm to work, there must be less than 1 millisecond latency between SQL Server and the front-end Web servers in one direction, and at least 1 gigabit per second bandwidth.

In this scenario, you can provide fault tolerance by following the standard guidance for making databases and service applications redundant.

The following illustration shows a stretched farm.

Stretched farm

"Stretched" farm