Plan for performance and capacity (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)

Aggiornato: 8 luglio 2010

This article provides general performance and capacity recommendations. Use these recommendations to determine the capacity and performance characteristics of your Microsoft FAST Search Server 2010 for SharePoint topology.

In this section:

  • Content volume capacity dimensioning

  • Content feeding capacity dimensioning

  • Query performance dimensioning

  • Serving queries for multiple SharePoint farms

  • Feature performance effect

  • FAST Search Server Performance and capacity test results and recommendations

FAST Search Server Performance and capacity test results and recommendations

You can download a white paper that provides additional information about the performance and capacity characteristics of FAST Search Server 2010 for SharePoint. It also provides details on how it was tested by Microsoft. It includes the following:

  • Test farm characteristics

  • Test results

  • Recommendations

  • Troubleshooting performance and scalability

The white paper also provides details on how to configure FAST Search Server 2010 for SharePoint to handle up to 40 million items per index column.

Before reading this white paper, make sure that you understand the key concepts behind capacity management in FAST Search Server 2010 for SharePoint. For more information, see Plan FAST Search Server farm topology (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

Downloadable white paper: FAST Search Server 2010 for SharePoint Capacity Planning (https://www.microsoft.com/downloads/details.aspx?FamilyID=65b799e3-825c-4398-8cd7-3311d3297997).

Content volume capacity dimensioning

As a general guideline you should plan to deploy one index column in your installation per 15 million indexed items. The basis for this general guideline is a mix of 70% SharePoint items and 30% documents from file shares indexed, where source document size is between 10 and 100 kilobytes (KB).

The default indexer configuration does not allow for more than 30 million items per index column, and indexing more than 15 million items per index column will adversely affect indexing and query performance.

You can add content volume capacity to your farm in two ways:

When adding index columns you should following the topology recommendations outlined in section Deployment options (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

Ff599526.Important(it-it,office.14).gifImportante:
Some documents such as PowerPoint presentations with lots of graphics may be large in source volume size but contain very little searchable content. In this case the size of the searchable item may be very small and you might want to plan for more items per column.

Adding more columns to a running installation requires re-indexing of all content in order to re-partition the index columns. Ensure that you plan for sufficient index column capacity from the start. A complete re-build of a multi-column index may take several days.

Content feeding capacity dimensioning

The content feeding chain must be dimensioned in order to achieve sufficient capacity for retrieving and indexing new and updated content. Dimensioning should take into account:

  • Content retrieval   This means how fast the new or updated content can be fed into the system.

  • Processing and indexing   This means how efficiently the system can process and index items.

Crawl dimensioning

In a FAST Search Server 2010 for SharePoint environment, crawling content is an operation that must be tuned to be able to catch changes in content as fast as possible. The FAST Content SSA is based on a content pull approach, where the length of each crawl cycle determines the average time to discover changed content that must be indexed. You must perform testing in your own environment to determine how long it takes to crawl content by using a particular content source, and whether the throughput consumed by crawling this content interferes with targeted user response times.

For the FAST Content SSA you can change the number of concurrent requests that the connector generates when it crawls using a specified content source. The bigger the number of concurrent requests, the faster the crawl speed. What you want to achieve is that all changes in the content repositories result in a re-indexed document. As long as the connector is dimensioned to be able to catch all updates, the load on the item processing and indexing will not increase even if you reduce the time that is required to crawl all content sources. In this case the only effect of adding capacity to the FAST Content SSA is reduced indexing latency. For more information, see Manage crawler impact rules (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

You scale out the Content SSA by adding more crawl components to the SSA. For more information, see Multiple server deployment of the Content SSA (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

Indexing latency capacity dimensioning

Two main dimensioning parameters affect the overall indexing latency in your deployment:

  • Item processing capacity. The main item processing overhead is related to parsing of document formats and extracting searchable content and metadata. If your content consists of complex documents such as large PDF or word documents, the item processing may become a bottleneck in your installation. In this case you should deploy the item processing component to all servers in your deployment. For more information, see Deployment options (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

  • Indexing capacity. FAST Search Server 2010 for SharePoint uses an incremental indexing mechanism that ensures low indexing latency for new or updated content. However, there has to be a reasonable balance between total number of indexed items and item update ratio.

You should perform a basic benchmark using a 2-node deployment to get a rough indication of where the bottleneck is. If both nodes run steady on high CPU load this may indicate that the item processing is the bottleneck. We recommend that you reduce the number of items per column if you want to reduce the indexing latency. This reduction will both increase the item processing capacity (each node contributes to the item processing) and reduce the re-indexing time thus reducing the indexing latency.

Query performance dimensioning

There are two main parameters you must take into consideration when dimensioning for query performance.

  • The maximum number of queries served is measured in queries per second (QPS). For search solutions behind the firewall this is normally not a limiting factor. Unless you plan for handling peak query rates of more than 5 QPS, you will not need to consider scaling out the search solution for this dimensioning parameter.

    Scaling out for more QPS implies adding more search rows to the deployment. For more information, see Search Cluster.

    For high QPS deployments, you may also have to scale out the Query SSA by adding more query components to the SSA. For more information, see Multiple server deployment of the Query SSA (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese).

  • The query latency determines the average round-trip delay from the user issues a query until a query result is presented. In addition to the number of items per column, the main solution to improve query latency is to deploy one or more additional search rows. In in this manner you avoid that the indexing load affects the query latency, and you will also achieve high query availability. For more information, see Different levels of high availability.

Serving queries for multiple SharePoint farms

By using the SharePoint Server shared service application framework, you can have one parent farm serving queries for multiple child farms. You connect the front-end web servers in the child farm to the Query SSA in the parent farm via the SSA Proxy.

Feature performance effect

The following table summarizes the performance effect of FAST Search Server 2010 for SharePoint search related features. The values that are shown are to be regarded as rule-of-thumb as the effect of a single feature varies according to usage. There are also inter-dependencies between the features that are not covered in the table.

Feature Item processing Indexing Query matching Query processing RAM - Query matching Disk access Disk space Net/IO

Deep refiners

M

L

L

H1

L-M2

Shallow refiners

L

H

H3

Property extraction

M

L

Trim duplicates

M

L-H4

Full-text sorting

L

L

H1

Hit highlighted summary

M

Complex queries (many terms)

M

M

H

Substring search

L-M5

Stemming

L

L

L

L

Spell check

L

L

L

Synonyms

L

L

L

High stop-word threshold6

H

H

H

Managed property boost6

L

L

L

H=High, M=Medium and L=Low. When no letter is specified, the item has insignificant effect on the corresponding resource compared to not using the feature.

Notes:

  1. The memory usage pattern is similar for deep refiners and full-text sorting. The query matching component keeps aggregation data for the associated managed properties in main memory. The memory usage effect is proportional with the number of items per column and the number of unique values for the associated managed property.

  2. Deep string refiners with many unique values in the index will have good I/O performance effect on the interface between query matching and query processing nodes. The I/O load performance effect is proportional with number of columns and the QPS in the farm. For more information on performance tuning of deep refiners, see RefinerConfiguration.

  3. Using shallow string refiners has good I/O performance effect on the interface between query matching and query processing nodes. The I/O load performance effect is proportional with the average size of the associated managed property, the number of columns and the QPS in the farm. In most cases deep refiners is the recommended option for query refinement.

  4. Duplicate trimming may in certain cases have good I/O performance effect on the interface between query matching and query processing nodes. The I/O load performance effect is proportional with the average number of duplicates per query result, the number of columns and the QPS in the farm.

  5. If you apply substring search to large managed properties (e.g. the body), this has good effect on the index disk usage.

  6. For more information on this feature, see Relevance features.

Vedere anche

Concetti

Plan search topology (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)
Plan for redundancy and availability (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)
Manage search topology (FAST Search Server 2010 for SharePoint)(informazioni in lingua inglese)

Cronologia delle modifiche

Data Descrizione Motivo

8 luglio 2010

2010/07/05

Aggiornamento contenuto

12 maggio 2010

Pubblicazione iniziale