Test results: Large scenario (FAST Search Server 2010 for SharePoint)

Article
07/22/2014

Applies to: FAST Search Server 2010

With the large Microsoft FAST Search Server 2010 for SharePoint test scenario we targeted a large test corpus. To meet freshness goals, incremental crawls were likely to occur during business hours. The amount of content for the scenario was up to 100 million items.

We set up the parent Microsoft SharePoint Server 2010 farm with two front-end web servers, two application servers and one database server, and arranged them as follows:

We used the SharePoint Server 2010 crawler, indexing connector framework and the FAST Content Search Service Application (Content SSA) to crawl content. We distributed two crawl components for the Content SSA across the two application servers, mainly to accommodate I/O limitations in the test setup (1 Gbit/s network), where a single network adapter would have been a bottleneck
One of the application servers also hosted Central Administration for the farm
The database server hosted the crawl databases, the FAST Search Server 2010 for SharePoint administration databases and the other SharePoint Server 2010 databases

We used no separate data storage because the application servers and front-end web servers only needed space for operating system, application binaries and log files.

In this article:

Test deployments
Test characteristics
Test results

Test deployments

Within the large scenario, we tested the following FAST Search Server 2010 for SharePoint deployments:

Name	Description
L1	Single row, six column deployment, with an additional administration node (7 servers in total)
L2	Same as L1, with the addition of a dedicated search row (13 servers in total)
L3	Same as L2, but where the search row included a backup indexer row

Test characteristics

This section provides detailed information about the hardware, software, topology and configuration of the test environment.

Hardware/Software

We tested the specified deployments using the following hardware and software.

FAST Search Server 2010 for SharePoint servers

Windows Server 2008 R2 x64 Enterprise Edition
2x Intel L5520 CPUs with Hyper-threading and Turbo Boost switched on
24 GB memory
1 Gbit/s network card
Storage subsystem:
- OS: 2x 146GB 10k RPM SAS disks in RAID1
- Application: 12x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 6 drives each). Total formatted capacity of 2 TB.
- Disk controller: HP Smart Array P410, firmware 3.30
- Disks: HP DG0146FARVU, firmware HPD6

SharePoint Server 2010 servers

Windows Server 2008 R2 x64 Enterprise edition
2x Intel L5420 CPUs
16 GB memory
1 Gbit/s network card
Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1

SQL servers

Same specification as for SharePoint Server 2010 servers, but with additional disk RAID for SQL data with 6x 146GB 10k RPM SAS disks in RAID5.

Topology

This section describes the topology of all the test deployments.

L1 was a single row, six column deployment with an additional administration node. We used the following deployment.xml file to set up L1.

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="L1" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>L1</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L1.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <query /> 
      <searchengine row="0" column="0" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp3.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="1" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp4.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="2" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp5.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="3" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp6.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="4" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp7.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="5" /> 
      <document-processor processes="12" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
   </searchcluster> 

</deployment>

L2 had the same configuration as L1 with the addition of a dedicated search row.

The search row added query throughput capacity, introduced query redundancy, and provided better separation of query and content processing load. Three of the servers that ran in the dedicated search row also included a query processing component (query). The deployment also included a query processing component on the administration server (fs4sp1.contoso.com). The Query SSA does not use the latter query processing component during ordinary operation. But it may use it as a fallback to be able to serve queries if you take down the whole search row for maintenance.

We used the following deployment.xml file to set up L2.

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="L2" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>L2</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L2.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <searchengine row="0" column="0" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp3.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="1" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp4.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="2" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp5.contoso.com"> 
      <content-distributor /> 
      <searchengine row="0" column="3" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp6.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="4" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name="fs4sp7.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="5" /> 
      <document-processor processes="12" /> 
   </host> 

   <host name=" fs4sp8.contoso.com "> 
      <query /> 
      <searchengine row="1" column="0" />  
   </host> 

   <host name="fs4sp9.contoso.com "> 
      <query /> 
      <searchengine row="1" column="1" /> 
   </host> 
   
   <host name="fs4sp10.contoso.com "> 
      <query /> 
      <searchengine row="1" column="2" /> 
   </host> 

   <host name="fs4sp11.contoso.com "> 
      <searchengine row="1" column="3" /> 
   </host> 

   <host name="fs4sp12.contoso.com"> 
      <searchengine row="1" column="4" /> 
   </host> 

   <host name="fs4sp13.contoso.com" 
      <searchengine row="1" column="5" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
      <row id="1" index="none" search="true" /> 
   </searchcluster> 

</deployment>

L3 had the same configuration as L2 with an additional backup indexer enabled on the search row. We deployed the backup indexer by modifying the L2 deployment .xml file as follows.

... 
<searchcluster> 
   <row id="0" index="primary" search="true" /> 
   <row id="1" index="secondary" search="true" /> 
</searchcluster> 
...

Dataset

This section describes the test farm dataset: The database content and sizes, search indexes and external data sources.

The following table shows the overall metrics.

Object	Value
Search index size (# of items)	103 million
Size of crawl database	358 GB
Size of crawl database log file	65 GB
Size of property database	<0.1 GB
Size of property database log file	0.6 GB
Size of SSA administration database	<0.1 GB

The next table shows the content source types we used to build the index. The numbers in the table reflect the total number of items per source and include replicated copies. Note that the difference between the total number of items and the index size can have two reasons:

Items may have been disabled from indexing in the content source, or
The document format type could not be indexed.

For SharePoint sources, the size of the respective content database in SQL represents the raw data size.

Content source	Items	Raw data size	Average size per item
File share 1 (4 copies)	2.4 M	308 GB	128 kB
File share 2 (4 copies)	58.6 M	13.4 TB	229 kB
SharePoint 1 (4 copies)	18.1 M	18.1 M	443 kB
SharePoint 2 (3 copies)	13.6 M	6.0 TB	443 kB
HTML 1 (3 copies)	3.2 M	26 GB	8.1 kB
HTML 2 (3 copies)	9.5 M	411 GB	43 kB
Total	105.5 M	28 TB	268 kB

To reach sufficient content volume in the testing of the large scenario, we added replicas of the data sources. Each copy of each document then appeared as a unique item in the index, but they were treated as duplicates by the duplicate trimming feature. From a query matching perspective the load was similar to having all unique documents indexed, but any results from these sources triggered duplicate detection and collapsing in the search results.

Note

The large test scenario did not include people search data.

Test results

This section provides data that shows how the various deployments performed under load: Crawling and indexing performance, query performance and disk usage.

Crawling and indexing performance

All the large scenario deployments were limited by the bandwidth of the content sources; L1 through L3 all achieved around 200 items per second crawl rates.

Query performance

L1, L2 and L3 were scaled up versions of M1, M4 and M6 in the medium test scenario (refer to Test results: Medium scenario (FAST Search Server 2010 for SharePoint) for more information). To make the large deployments able to index more content while maintaining the query performance, we added more columns to the medium deployments. The following diagram shows the query latency as a function of QPS for L1 through L3, with and without ongoing crawls.

Query performance graph

L2 and L3 had approximately the same performance, and were also similar to the corresponding smaller scale M4 and M6.

L1 showed a slightly different performance pattern compared to M1. Given 1 second maximum approved latency, M1 achieved 27 QPS with idle crawl and 16 QPS during crawling. The same numbers for L1 was 18 and 9 QPS. The additional columns in L1 compared to M1 inferred that more servers were involved, where the slowest one for any given query at any time was the determining factor for the query latency. This effect was much less visible for the L2 and L3 deployments. The latter deployments had dedicated search rows and the servers did not have other components competing for resources.

Disk usage

The following table shows the combined disk usage on all servers in the L1 deployment. L2 and L3 used additional disk space for replication of FiXML and index files on the second row.

Content source	Raw source data size	FiXML data size	Index data size	Other data size
Total	28 TB	1.1 TB	3.8 TB	104 GB