Recommendations: Redundancy and availability (FAST Search Server 2010 for SharePoint)

 

Applies to: FAST Search Server 2010

This article describes redundancy and availability recommendations for Microsoft FAST Search Server 2010 for SharePoint.

Any recommendation on redundancy and availability of FAST Search Server 2010 for SharePoint will depend on your business environment. When sizing your deployment, we recommend that you evaluate the scenarios and deployments we have tested, copy the topology of the deployment with the most relevant characteristics and then further develop the topology to fit your environment.

Each part of the system has its own redundancy and availability possibilities. These are described in the following sections:

  • Content SSA redundancy and availability

  • Query SSA redundancy and availability

  • Indexing redundancy and availability

  • Search redundancy and availability

  • SQL Server database redundancy and availability

  • Storage subsystem redundancy and availability

  • Front-end web server redundancy and availability

Note

In FAST Search Server 2010 for SharePoint, the people search functionality is provided by the Query SSA. It includes a crawl component that retrieves and indexes user profiles, and then stores the metadata properties related to the user profiles in a SQL Server database. For people search redundancy and availability, you may want to add an additional crawl component within the Query SSA.

For a complete recommendations overview, refer to Performance and capacity recommendations (FAST Search Server 2010 for SharePoint).

For more information about the test scenarios, refer to Performance and capacity test results (FAST Search Server 2010 for SharePoint).

Content SSA redundancy and availability

The FAST Search Content SSA enables you to crawl and index content. This SSA represents the default indexing connector for your FAST Search Server 2010 for SharePoint deployment, and you will typically deploy the Content SSA on the parent SharePoint Server 2010 farm. We recommend that you scale out the Content SSA when your business environment requires a certain level of crawling and indexing redundancy and throughput. But do not scale out the Content SSA if there is a chance that the feeding chain will freeze on a partial update. For example, if you don’t have a backup indexer.

You can scale out the Content SSA by adding additional crawl components to the SSA. For more information about how to add a crawl component, see Multiple server deployment of the Content SSA (FAST Search Server 2010 for SharePoint).

Important

Do not deploy more than one Content SSA associated with your FAST Search Server 2010 for SharePoint farm.

For general information about redundancy and availability for the application servers inside a SharePoint Server 2010 farm, see Plan for availability (SharePoint Server 2010).

Query SSA redundancy and availability

The FAST Search Query SSA provides the query-side integration between the FAST Search Server 2010 for SharePoint farm and the parent SharePoint Server 2010 farm. You must deploy the Query SSA on the parent SharePoint Server 2010 farm. You can scale out the Query SSA for query availability by adding additional query components to the SSA. For more information about how to add a query component, see Multiple server deployment of the Query SSA (FAST Search Server 2010 for SharePoint).

Important

Do not deploy more than one Query SSA associated with your FAST Search Server 2010 for SharePoint farm.

For general information about redundancy and availability for the application servers inside a SharePoint Server 2010 farm, see Plan for availability (SharePoint Server 2010).

Indexing redundancy and availability

The following indexing components within the FAST Search Server 2010 for SharePoint farm support redundancy and availability:

Component Description

Content distributor

A content distributor is a stateless component that handles crawling flow control. If a content distributor server fails, the outstanding item batches associated with this content distributor will fail too. If you have redundant content distributors, the flow control protocol will ensure that the indexing connector resubmits the subset of item batches associated with the failing content distributor.

Item processing

An item processing component handles a given set of item batches. If a server that is running item processing fails, the outstanding item batches associated with the item processing instances running on this server will fail too. If you distribute item processing to one or more servers in the farm, each running one processor thread, the flow control protocol will ensure that the indexing connector resubmits the subset of item batches associated with the failing item processing server.

Link analysis

Link analysis consists of database lookup and link processing, and is performed by the web analyzer component. The web analyzer component operates in a batch processing mode, and can distribute link analysis jobs to multiple servers that are running link analysis subcomponents.

If a server that is running link analysis fails with unrecoverable disk errors, you must re-establish the web analyzer link database from the latest backup. If no backup exists, the ranking based on link analysis will be incomplete until you complete a re-crawl. If a server performing database lookup fails, and there is no redundancy, this blocks crawling. You can configure redundancy for the database lookup component during deployment, by modifying the deployment configuration file.

Indexing dispatcher

An indexing dispatcher is a stateless component that handles indexing flow control. If an indexing dispatcher server fails, the outstanding item batches associated with this indexing dispatcher will fail too. If you have redundant indexing dispatchers, the flow control protocol will ensure that the indexing connector resubmits the subset of item batches associated with the failing indexing dispatcher.

Indexer

There is one active indexer per index column. You can scale out the indexing component by defining multiple index columns. Within each index column, you can set up a backup indexer for high availability. If an indexer server fails with unrecoverable disk errors, you must either recover the server from the latest backup, or manually enable a backup indexer to be the new primary indexer.

Full indexing redundancy requires a backup indexer on a separate row, which will give you an increased server count and additional storage volume requirements. Full redundancy provides the quickest recovery path from hardware failures, but other options might be more attractive when hardware outages are infrequent:

  • You can run full re-crawl of all the content sources after recovery. Depending on the deployment this may take several days. If you have a separate search row, you can perform the re-crawl while keeping the old index searchable.

  • You can run regular backup of the index data.

FAST Search specific indexing connectors

There are three FAST Search specific connectors:

  • FAST Search Web crawler

  • FAST Search Lotus Notes connector

  • FAST Search database connector

The FAST Search Web crawler is an alternative indexing connector that we recommend for large-scale web crawl use cases. You can scale out this component by deploying multiple node schedulers that will handle crawl scheduling of different parts of the overall crawl.

The FAST Search Lotus Notes and FAST Search database connectors are stand-alone components, each with one or more associated content repositories, and you can scale out the system by deploying multiple instances of the indexing connectors.

Search redundancy and availability

The following components within the FAST Search Server 2010 for SharePoint farm support redundancy and availability:

Component Description

Query matching

There is at least one query matching component per index column. You can scale out the query matching component by defining multiple search rows. If a query matching server fails, the outstanding queries dispatched to the search row that contains the server will fail too. Subsequent queries will be handled by another search row in the given index column.

Query processing

Query processing is a stateless component that handles query processing flow control. If a query processing server fails, the remaining queries associated with this query processing server will fail too. If you have multiple query processing components, the flow control protocol will ensure that subsequent queries will be handled by another query processing server.

Note

The system will in most cases provide search availability during crawling or indexing failure recovery. But be aware of the following:
If you must recover the system from backup, search will be down for the time that is required to complete the recovery. Depending on the size of the backup, this may take significant time.
If you exclude the binary index from the backup, the time to recover the index becomes significantly longer. The reason is that you must rebuild the index from the pre-index item store (FiXML files). In such situations, you can improve availability for search by keeping the latest available index (before the error situation) on non-affected search rows available for queries until the index is rebuilt from the backup, at the cost of a longer freshness delay before new content is made searchable.
If you do not have end-to-end fault-tolerance in the crawling and indexing chain, and you do not take data backups, you will have to re-crawl and re-index all content from the source repositories. During the re-crawl only content that was indexed after you started the re-crawl will be available for search.

SQL Server database redundancy and availability

A FAST Search Server 2010 for SharePoint farm must have access to a Microsoft SQL Server host that stores configuration information; typically you use an existing SQL Server host within the parent SharePoint Server 2010 farm

For more information about redundancy and availability for the SQL Server within the SharePoint Server farm, refer to Storage and SQL Server capacity planning and configuration (SharePoint Server 2010). Note that the reference to "search" in that topic refers to SharePoint Server 2010 search, which also uses the database as a property store for the index.

Note

The FAST Search Server 2010 for SharePoint farm does not use the SQL Server database for indexing metadata properties. This differs from the default SharePoint Server 2010 search.

Storage subsystem redundancy and availability

The storage subsystem for a FAST Search Server 2010 for SharePoint farm must have some redundancy because loss of storage even in a redundant setup will lead to decreased performance during a recovery period that can last for days. We recommend that you use a redundant RAID disk set like RAID1, RAID10, RAID5 or RAID50 for your storage subsystem, preferably also with hot spares.

Front-end web server redundancy and availability

You can deploy front-end web servers in the parent SharePoint Server 2010 farm, or in child farms. In the latter case, you connect the front-end web servers via a SSA Proxy in the parent farm. You should plan availability for the front-end web servers according to the guidelines in Plan for availability (SharePoint Server 2010).

See Also

Concepts

Recommendations: Content volume capacity (FAST Search Server 2010 for SharePoint)
Recommendations: Content freshness (FAST Search Server 2010 for SharePoint)
Recommendations: Query throughput (FAST Search Server 2010 for SharePoint)
Recommendations: Storage (FAST Search Server 2010 for SharePoint)
Recommendations: Virtualization (FAST Search Server 2010 for SharePoint)