Plan for availability (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010, SharePoint Foundation 2010

This article describes key decisions in choosing availability strategies for a Microsoft SharePoint Server 2010 environment.

As you carefully review your availability requirements, be aware that the higher the level of availability and the more systems that you protect, the more complex and costly your availability solution is likely to be.

Not all solutions in an organization are likely to require the same level of availability. You can offer different levels of availability for different sites, different services, or different farms.

In this article:

  • Availability overview

  • Choosing an availability strategy and level

  • Redundancy and failover between closely located data centers configured as a single farm ("stretched" farm)

Availability overview

Availability is the degree to which a SharePoint Server environment is perceived by users to be available. An available system is a system that is resilient — that is, incidents that affect service occur infrequently, and timely and effective action is taken when they do occur.

Availability is part of business continuity management (BCM), and is related to backup and recovery and disaster recovery. For more information about these related processes, see Plan for backup and recovery in SharePoint Server 2010 and Plan for disaster recovery (SharePoint Server 2010).

Note

When calculating availability, most organizations specifically exempt or add hours for planned maintenance activities.

One of the most common measures of availability is percentage of uptime expressed as number of nines — that is, the percentage of time that a given system is active and working. For example, a system with a 99.999 uptime percentage is said to have five nines of availability.

The following table correlates uptime percentage with calendar time equivalents.

Acceptable uptime percentage Downtime per day Downtime per month Downtime per year

95

72.00 minutes

36 hours

18.26 days

99 (two nines)

14.40 minutes

7 hours

3.65 days

99.9 (three nines)

86.40 seconds

43 minutes

8.77 hours

99.99 (four nines)

8.64 seconds

4 minutes

52.60 minutes

99.999 (five nines)

0.86 seconds

26 seconds

5.26 minutes

If you can make an educated guess about the number of total hours downtime you are likely to have per year, you can use the following formulas to calculate the uptime percentage for a year, a month, or a week:

% uptime/year = 100 - (8760 - number of total hours downtime per year)/8760

% uptime/month = 100 - ((24 × number of days in the month) - number of total hours downtime in that calendar month)/(24 × number of days in the month)

% uptime/week = 100 - (168 - number of total hours downtime in that week)/168

Costs of availability

Availability is one of the more expensive requirements for a system. The higher the level of availability and the more systems that you protect, the more complex and costly an availability solution is likely to be. When you invest in availability, costs include the following:

  • Additional hardware and software, which can increase the complexity of interactions among software applications and settings.

  • Additional operational complexity.

The costs of improving availability should be evaluated in conjunction with your business needs — not all solutions in an organization are likely to require the same level of availability. You can offer different levels of availability for different sites, different services, or different farms.

Availability is a key area in which information technology (IT) groups offer service level agreements (SLAs) to set expectations with customer groups. Many IT organizations offer various SLAs that are associated with different chargeback levels.

Determining availability requirements

To gauge your organization's tolerance of downtime for a site, service, or farm, answer the following questions:

  • If the site, service, or farm becomes unavailable, will employees be unable to perform their expected job responsibilities?

  • If the site, service, or farm becomes unavailable, will business and customer transactions be stopped, leading to loss of business and customers?

If you answered yes to either of these questions, you should invest in an availability solution.

Choosing an availability strategy and level

You can choose among many approaches to improve availability in a SharePoint Server environment, including the following:

  • Improve the fault tolerance of server hardware components.

  • Increase the redundancy of server roles within a farm.

Hardware component fault tolerance

Hardware component fault tolerance is the redundancy of hardware components and infrastructure systems such as power supplies at the server level. When planning for hardware component fault tolerance, consider the following:

  • Complete redundancy of every component within a server may be impossible or impractical. Use additional servers for additional redundancy.

  • Ensure that servers have multiple power supplies connected to different power sources for maximum redundancy.

In any system, we recommend that you work with hardware vendors to obtain fault-tolerant hardware that is appropriate for the system, including redundant array of independent disks (RAID) arrays. For recommendations, see Performance and capacity management (SharePoint Server 2010) and Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Redundancy within a farm

SharePoint Server 2010 supports running server roles on redundant computers (that is, scaling out) within a farm to increase capacity and to provide basic availability.

The capacity that you require determines both the number of servers and the size of the servers in a farm. After you have met your base capacity requirements, you may want to add more servers to increase overall availability. The following illustration shows how you can provide redundancy for each server role.

Availability within a server farm

Single farm availability

The following table describes the server roles in a SharePoint Server 2010 environment and the redundancy strategies that can be used for each within a farm.

Server role Preferred redundancy strategy within a farm

Front-end Web server

Deploy multiple front-end Web servers within a farm, and use Network Load Balancing (NLB).

Application server

Deploy multiple application servers within a farm.

Database server

Deploy database servers by using clustering or high-availability database mirroring.

Database availability strategies

You can use Microsoft SQL Server failover clustering or SQL Server high-availability database mirroring to support availability of databases in a SharePoint Server environment.

SQL Server failover clustering

Failover clustering can provide availability support for an instance of SQL Server. A failover cluster is a combination of one or more nodes or servers, and two or more shared disks. A failover cluster instance appears as a single computer, but has functionality that provides failover from one node to another if the current node becomes unavailable. SharePoint Server can run on any combination of active and passive nodes in a cluster that is supported by SQL Server.

SharePoint Server references the cluster as a whole; therefore, failover is automatic and seamless from the perspective of SharePoint Server.

For detailed information about failover clustering, see Getting Started with SQL Server 2008 Failover Clustering (https://go.microsoft.com/fwlink/p/?LinkID=102837&clcid=0x409) and Configure availability by using SQL Server clustering (SharePoint Server 2010).

SQL Server high-availability mirroring

Database mirroring is a SQL Server technology that can deliver database redundancy on a per-database basis. In database mirroring, transactions are sent directly from a principal database and server to a mirror database and server when the transaction log buffer of the principal database is written to disk. This technique can keep the mirror database almost up to date with the principal database. SQL Server Enterprise Edition provides additional functionality that improves database mirroring performance. For more information, see SQL Server 2008 R2 and SharePoint 2010 Products: Better Together (white paper) (SharePoint Server 2010).

For mirroring within a SharePoint Server farm, you must use high-availability mirroring, also known as high-safety mode with automatic failover. High-availability database mirroring involves three server instances: a principal, a mirror, and a witness. The witness server enables SQL Server to automatically fail over from the principal server to the mirror server. Failover from the principal database to the mirror database typically takes several seconds.

A change from previous versions is that SharePoint Server is mirroring-aware. After you have configured a database mirror instance of SQL Server, you then use SharePoint Central Administration or Windows PowerShell cmdlets to identify the failover (mirror) database server location for a configuration database, content database, or service application database. Setting a failover database location adds a parameter to the connection string that SharePoint Server uses to connect to SQL Server. In the event of a SQL Server time-out event, the following occurs:

  1. The witness server that is configured for SQL Server mirroring automatically swaps the roles of the primary and mirror databases.

  2. SharePoint Server automatically attempts to contact the server that is specified as the failover database.

For information about how to configure database mirroring, see Configure availability by using SQL Server database mirroring (SharePoint Server 2010).

For general information about database mirroring, see Database Mirroring (https://go.microsoft.com/fwlink/p/?LinkID=180597).

Note

Databases that have been configured to use the SQL Server FILESTREAM remote BLOB store provider cannot be mirrored.

Comparison of database availability strategies for a single farm: SQL Server failover clustering vs. SQL Server high-availability mirroring

The following table compares failover clustering to synchronous SQL Server high-availability mirroring.

SQL Server failover clustering SQL Server high-availability mirroring

Time to failover

Cluster member takes over immediately upon failure.

Mirror takes over immediately upon failure.

Transactional consistency?

Yes

Yes

Transactional concurrency?

Yes

Yes

Time to recovery

Shorter time to recovery (milliseconds)

Slightly longer time to recovery (milliseconds).

Steps required for failover?

Failure is automatically detected by database nodes; SharePoint Server 2010 references the cluster so that failover is seamless and automatic.

Failure is automatically detected by the database; SharePoint Server 2010 is aware of the mirror location, if it has been configured correctly, so that failover is automatic.

Protection against failed storage?

Does not protect against failed storage, because storage is shared between nodes in the cluster.

Protects against failed storage because both the principal and mirror database servers write to local disks.

Storage types supported

Shared storage (more expensive).

Can use less-expensive direct-attached storage (DAS).

Location requirements

Members of the cluster must be on the same subnet.

Principal, mirror, and witness servers must be on the same LAN (up to 1 millisecond latency round trip).

Recovery model

SQL Server full recovery model recommended. You can use the SQL Server simple recovery model, but the only available recovery point if the cluster is lost will be the last full backup. For more information, see Storage and SQL Server capacity planning and configuration (SharePoint Server 2010).

Requires SQL Server full recovery model.

Performance overhead

Some decrease in performance may occur while a failover is occurring.

High-availability mirroring introduces transactional latency because it is synchronous. It also requires additional memory and processor overhead.

Operational burden

Set up and maintained at the server level.

The operational burden is larger than clustering. Must be set up and maintained for all databases. Reconfiguring after failover is manual.

Service application redundancy strategies

The redundancy strategy you follow for protecting service applications that run in a farm varies, depending on where the service application stores data.

Service applications that store data outside a database

To protect service applications that store data outside a database, install the service application on multiple application servers to provide redundancy within the environment.

In this release of SharePoint Server, when you install a service application on multiple application servers, the timer jobs run either on all the application servers that are running the service instance associated with that service application or on the first available server. If an application server fails, timer jobs that are running on that server will be restarted on another server when the next timer job is scheduled to run.

Installing a service application on multiple application servers keeps the service application running, but does not guarantee against data loss. If an application server fails, the active connections for that application server will be lost and users will lose some data.

The following service applications store data outside a database:

  • Access Services

  • Excel Services Application

Service applications that store data in databases

To help protect service applications that store data in databases, you must follow these steps:

  1. Install the service on multiple application servers to provide redundancy within the environment.

  2. Configure SQL Server clustering or mirroring to protect the data.

The following service applications store data in databases:

  • Search service application, including the following databases:

    • Search Administration

    • Crawl

    • Property

      Note

      Mirroring the Search databases is supported, but providing redundancy for Search requires additional work. For details, see the section Search redundancy strategies within a farm.

  • User Profile service, including the following databases:

    • Profiles

    • Social

    • Synchronization

      Note

      Mirroring the Synchronization database is not supported.

  • Business Data Connectivity service application

  • Application Registry service application

    We do not recommend mirroring the Application Registry database, because it is only used when upgrading Microsoft Office SharePoint Server 2007 Business Data Catalog information to SharePoint Server 2010.

  • Usage and Health Data Collection service application

    Note

    We recommend that you do not mirror the Usage and Health Data Collection service application Logging database.

  • Managed Metadata service application

  • Secure Store service application

  • State service application

  • Web Analytics service application, including the following databases:

    • Reporting

    • Staging

      Note

      Mirroring the Staging database is not supported.

  • Word Automation Services service application

  • Microsoft SharePoint Foundation Subscription Settings Service

  • PerformancePoint Services

Search redundancy strategies within a farm

Server Only

The Search service application is a special case for redundancy within a farm. The following illustration shows how redundancy and failover can be configured for a medium dedicated Search service application that crawls approximately 40 million items. For more information about the architecture of the Search service application, see "Search Architectures for Microsoft SharePoint Server 2010" in the article Technical diagrams (SharePoint Server 2010).

Redundant Search service application

Highly-available search architecture

  • Query server. A query server hosts query components and index partitions.

    • Query components return search results. Each query component is part of an index partition, which is associated with a specific property database that contains metadata associated with a specific set of crawled content. You can make an index partition redundant by adding "mirror" query components to an index partition and putting them on different farm servers.

      Note

      The use of the term mirror query components refers to identical file copies, not to SQL Server database mirroring.

    • Index partitions are groups of query components, each of which holds a subset of the full text index and returns search results. Each index partition is associated with a specific property database that contains metadata that is associated with a specific set of crawled content. You can decide which servers in a farm will handle queries by creating a query component on that server. If you want to balance the load of handling queries across multiple farm servers, add query components to an index partition and associate them with the servers that you want to use to handle queries. For more information, see Add or remove a query component. You can make an index partition redundant by adding mirror query components to an index partition and putting them on different query servers.

  • Crawl server. A crawl server hosts crawl components and a search administration component.

    • Crawl components process crawls of content sources, propagate the resulting index files to query components, and add information about the location and crawl schedule of content sources to their associated crawl databases. Crawl components are associated with a single Search service application. You can distribute the crawl load by adding crawl components to different crawl servers. You can have as many crawl components on a given crawl server as resources allow. If you have many content locations, you can add crawl components and crawl databases and dedicate them to specific content. Each crawl component on a given crawl server should be associated with a separate crawl database. For redundancy, we recommend that you have at least two crawl components. Each crawl component should be set to crawl both crawl databases. If a database grows to more than 25 million items, we recommend that you add a new crawl database and crawl component.

    • The search administration component monitors incoming user actions and updates the search administration database. Only one search administration component is allowed per Search service application.The search administration component can run on any server, preferably either a crawl server or a query server.

  • Database servers. Database servers host crawl databases, property databases, the search administration database, and other SharePoint Server 2010 databases.

    • Crawl database

      Crawl databases contain data that is related to the location of content sources, crawl schedules, and other information that is specific to crawl operations for a specific Search service application. You can distribute the database load by adding crawl databases to different computers that are running SQL Server. Crawl databases are associated with crawl components and can be dedicated to specific hosts by creating host distribution rules. For more information about crawl components, see Add or remove a crawl component (SharePoint Server 2010). For more information about host distribution rules, see Add or remove a host distribution rule. Crawl databases are redundant if they are mirrored or deployed to a SQL Server failover cluster.

    • Property database

      Property databases contain metadata that is associated with crawled content. You can distribute the database load of queries by adding property databases to different computers that are running SQL Server. Property databases are associated with index partitions and return any metadata associated with content in query results.

      Property databases are redundant if they are mirrored or deployed to a SQL Server failover cluster.

    • Search Administration database

      There is only one Search Administration database per Search service application instance in a farm.

      The Search Administration database is only redundant if it is mirrored or deployed to a SQL Server failover cluster.

For more information about search redundancy, see Manage search topology (SharePoint Server 2010).

Redundancy and failover between closely located data centers configured as a single farm ("stretched" farm)

Some enterprises have data centers that are located close to one another with high-bandwidth connections so that they can be configured as a single farm. This is called a "stretched" farm. For a stretched farm to work, there must be less than 1 millisecond latency between SQL Server and the front-end Web servers in one direction, and at least 1 gigabit per second bandwidth.

In this scenario, you can provide fault tolerance by following the standard guidance for making databases and service applications redundant.

The following illustration shows a stretched farm.

Stretched farm

"Stretched" farm

See Also

Other Resources

Resource Center: Business Continuity Management for SharePoint Server 2010