Test results: Large scenario (FAST Search Server 2010 for SharePoint)

 

Applies to: FAST Search Server 2010

With the large Microsoft FAST Search Server 2010 for SharePoint test scenario we targeted a large test corpus. To meet freshness goals, incremental crawls were likely to occur during business hours. The amount of content for the scenario was up to 100 million items.

We set up the parent Microsoft SharePoint Server 2010 farm with two front-end web servers, two application servers and one database server, and arranged them as follows:

  • We used the SharePoint Server 2010 crawler, indexing connector framework and the FAST Content Search Service Application (Content SSA) to crawl content. We distributed two crawl components for the Content SSA across the two application servers, mainly to accommodate I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck

  • One of the application servers also hosted Central Administration for the farm

  • The database server hosted the crawl databases, the FAST Search Server 2010 for SharePoint administration databases and the other SharePoint Server 2010 databases

We used no separate data storage because the application servers and front-end web servers only needed space for operating system, application binaries and log files.

In this article:

  • Test deployments

  • Test characteristics

  • Test results

Test deployments

Within the large scenario, we tested the following FAST Search Server 2010 for SharePoint deployments:

Name Description

L1

Single row, six column deployment, with an additional administration node (7 servers in total)

L2

Same as L1, with the addition of a dedicated search row (13 servers in total)

L3

Same as L2, but where the search row included a backup indexer row

Test characteristics

This section provides detailed information about the hardware, software, topology and configuration of the test environment.

Hardware/Software

We tested the specified deployments using the following hardware and software.

FAST Search Server 2010 for SharePoint servers

  • Windows Server 2008 R2 x64 Enterprise Edition

  • 2x Intel L5520 CPUs with Hyper-threading and Turbo Boost switched on

  • 24 GB memory

  • 1 Gbit/s network card

  • Storage subsystem:

    • OS: 2x 146GB 10k RPM SAS disks in RAID1

    • Application: 12x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 6 drives each). Total formatted capacity of 2 TB.

    • Disk controller: HP Smart Array P410, firmware 3.30

    • Disks: HP DG0146FARVU, firmware HPD6

SharePoint Server 2010 servers

  • Windows Server 2008 R2 x64 Enterprise edition

  • 2x Intel L5420 CPUs

  • 16 GB memory

  • 1 Gbit/s network card

  • Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1

SQL servers

Same specification as for SharePoint Server 2010 servers, but with additional disk RAID for SQL data with 6x 146GB 10k RPM SAS disks in RAID5.

Topology

This section describes the topology of all the test deployments.

L1

L1 was a single row, six column deployment with an additional administration node. We used the following deployment.xml file to set up L1.

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="L1" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>L1</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L1.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <query /> 
      <searchengine row="0" column="0" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp3.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="1" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp4.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="2" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp5.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="3" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp6.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="4" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp7.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="5" /> 
      <document-processor processes="12" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
   </searchcluster> 

</deployment> 

L2

L2 had the same configuration as L1 with the addition of a dedicated search row.

The search row added query throughput capacity, introduced query redundancy, and provided better separation of query and content processing load. Three of the servers that ran in the dedicated search row also included a query processing component (query). The deployment also included a query processing component on the administration server (fs4sp1.contoso.com). The Query SSA does not use the latter query processing component during ordinary operation. But it may use it as a fallback to be able to serve queries if you take down the whole search row for maintenance.

We used the following deployment.xml file to set up L2.

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="L2" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>L2</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L2.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <searchengine row="0" column="0" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp3.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="1" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp4.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="2" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp5.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="3" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp6.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="4" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp7.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="5" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name=" fs4sp8.contoso.com "> 
      <query /> 
      <searchengine row="1" column="0" />  
   </host> 

   <host name="fs4sp9.contoso.com "> 
      <query /> 
      <searchengine row="1" column="1" /> 
   </host> 
   
   <host name="fs4sp10.contoso.com "> 
      <query /> 
      <searchengine row="1" column="2" /> 
   </host> 

   <host name="fs4sp11.contoso.com "> 
      <searchengine row="1" column="3" /> 
   </host> 

   <host name="fs4sp12.contoso.com"> 
      <searchengine row="1" column="4" /> 
   </host> 

   <host name="fs4sp13.contoso.com" 
      <searchengine row="1" column="5" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
      <row id="1" index="none" search="true" /> 
   </searchcluster> 

</deployment>

L3

L3 had the same configuration as L2 with an additional backup indexer enabled on the search row. We deployed the backup indexer by modifying the L2 deployment .xml file as follows.

... 
<searchcluster> 
   <row id="0" index="primary" search="true" /> 
   <row id="1" index="secondary" search="true" /> 
</searchcluster> 
... 

Dataset

This section describes the test farm dataset: The database content and sizes, search indexes and external data sources.

The following table shows the overall metrics.

Object Value

Search index size (# of items)

103 million

Size of crawl database

358 GB

Size of crawl database log file

65 GB

Size of property database

<0.1 GB

Size of property database log file

0.6 GB

Size of SSA administration database

<0.1 GB

The next table shows the content source types we used to build the index. The numbers in the table reflect the total number of items per source and include replicated copies. Note that the difference between the total number of items and the index size can have two reasons:

  • Items may have been disabled from indexing in the content source, or

  • The document format type could not be indexed.

For SharePoint sources, the size of the respective content database in SQL represents the raw data size.

Content source Items Raw data size Average size per item

File share 1 (4 copies)

2.4 M

308 GB

128 kB

File share 2 (4 copies)

58.6 M

13.4 TB

229 kB

SharePoint 1 (4 copies)

18.1 M

18.1 M

443 kB

SharePoint 2 (3 copies)

13.6 M

6.0 TB

443 kB

HTML 1 (3 copies)

3.2 M

26 GB

8.1 kB

HTML 2 (3 copies)

9.5 M

411 GB

43 kB

Total

105.5 M

28 TB

268 kB

To reach sufficient content volume in the testing of the large scenario, we added replicas of the data sources. Each copy of each document then appeared as a unique item in the index, but they were treated as duplicates by the duplicate trimming feature. From a query matching perspective the load was similar to having all unique documents indexed, but any results from these sources triggered duplicate detection and collapsing in the search results.

Note

The large test scenario did not include people search data.

Test results

This section provides data that shows how the various deployments performed under load: Crawling and indexing performance, query performance and disk usage.

Crawling and indexing performance

All the large scenario deployments were limited by the bandwidth of the content sources; L1 through L3 all achieved around 200 items per second crawl rates.

Query performance

L1, L2 and L3 were scaled up versions of M1, M4 and M6 in the medium test scenario (refer to Test results: Medium scenario (FAST Search Server 2010 for SharePoint) for more information). To make the large deployments able to index more content while maintaining the query performance, we added more columns to the medium deployments. The following diagram shows the query latency as a function of QPS for L1 through L3, with and without ongoing crawls.

Query performance graph

L2 and L3 had approximately the same performance, and were also similar to the corresponding smaller scale M4 and M6.

L1 showed a slightly different performance pattern compared to M1. Given 1 second maximum approved latency, M1 achieved 27 QPS with idle crawl and 16 QPS during crawling. The same numbers for L1 was 18 and 9 QPS. The additional columns in L1 compared to M1 inferred that more servers were involved, where the slowest one for any given query at any time was the determining factor for the query latency. This effect was much less visible for the L2 and L3 deployments. The latter deployments had dedicated search rows and the servers did not have other components competing for resources.

Disk usage

The following table shows the combined disk usage on all servers in the L1 deployment. L2 and L3 used additional disk space for replication of FiXML and index files on the second row.

Content source Raw source data size FiXML data size Index data size Other data size

Total

28 TB

1.1 TB

3.8 TB

104 GB

See Also

Concepts

Performance and capacity test results (FAST Search Server 2010 for SharePoint)
Test results: Extra-small scenario (FAST Search Server 2010 for SharePoint)
Test results: Medium scenario (FAST Search Server 2010 for SharePoint)