Test results: Extra-small scenario (FAST Search Server 2010 for SharePoint)

Article
07/22/2014

Applies to: FAST Search Server 2010

With the extra-small Microsoft FAST Search Server 2010 for SharePoint test scenario we targeted a small test corpus with high query rates. The scenario had no off-business hours with reduced query load, and crawls were likely to occur at any point in time. The amount of content for the scenario was up to 8 million items. We measured query performance at 1M, 5M and 8M content volume.

We set up the parent Microsoft SharePoint Server 2010 farm with four front-end web servers, one application server and one database server, and arranged them as follows:

We used the SharePoint Server 2010 crawler, indexing connector framework and the FAST Content Search Service Application (Content SSA) to crawl content, and the crawl component of the Content SSA was running on the application server
The application server also hosted the Central Administration for the farm
The database server hosted the crawl databases, the FAST Search Server 2010 for SharePoint administration databases and the other SharePoint Server 2010 databases

We used no separate data storage because the application server and web front-end servers only needed space for the operating system, application binaries and log files.

In this article:

Test deployments
Test characteristics
Test results

Test deployments

Within the extra-small scenario, we tested the following FAST Search Server 2010 for SharePoint deployments:

Name	Description
XS1	A stand-alone server that hosted all the FAST Search Server 2010 for SharePoint components using regular disk drives
XS2	Same as XS1, but deployed on a virtual (HyperV) machine
XS3	Multi-node deployment using four virtual computers that were running on the same physical server
XS4	Same as XS1, with the addition of a dedicated search row
XS5	Same as XS1, but with storage on SAS SSD drives

Test characteristics

This section provides detailed information about the hardware, software, topology and configuration of the test environment.

Hardware/Software

We tested all the specified deployments on similar hardware and software. The main difference was that we used virtualization for some setups and solid state disk (SSD) storage for other setups.

FAST Search Server 2010 for SharePoint servers

Windows Server 2008 R2 x64 Enterprise Edition
2x Intel L5520 CPUs with Hyper-threading and Turbo Boost switched on
24 GB memory
1 Gbit/s network card
Storage subsystem:
- OS: 2x 146GB 10k RPM SAS disks in RAID1
- Application: 7x 146 GB 10k RPM SAS disks in RAID5. Total formatted capacity of 880 GB.
- Disk controller: HP Smart Array P410, firmware 3.30
- Disks: HP DG0146FARVU, firmware HPD6

Variations:

Deployment	Variation description
XS2/XS3	Virtualized servers running under Hyper-V: 4 CPU cores 8 GB memory 800 GB disk on servers with index component
XS5	Storage subsystem: Application: 2x 400 GB SSD disks in RAID0. Total formatted capacity of 800 GB. SSD disks: Stec ZeusIOPS MLC Gen3, part Z16IZF2D-400UCM-MSF

XS2/XS3

Virtualized servers running under Hyper-V:

4 CPU cores
8 GB memory
800 GB disk on servers with index component

XS5

Storage subsystem:

Application: 2x 400 GB SSD disks in RAID0. Total formatted capacity of 800 GB.
SSD disks: Stec ZeusIOPS MLC Gen3, part Z16IZF2D-400UCM-MSF

SharePoint Server 2010 servers:

Windows Server 2008 R2 x64 Enterprise edition
2x Intel L5420 CPUs
16 GB memory
1 Gbit/s network card
Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1

SQL servers:

Same specification as for SharePoint Server 2010 servers, but with additional disk RAID for SQL data with 6x 146GB 10k RPM SAS disks in RAID5.

Topology

This section provides the deployment files (deployment.xml) we used to set up the test deployments.

XS1

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="XS1" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>XS1</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS1.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <content-distributor /> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="0" /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="12" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
   </searchcluster> 

</deployment>

XS2

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="XS2" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>XS2</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS2.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <content-distributor /> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="0" /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="1"/> 
      <document-processor processes="4" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
   </searchcluster> 

</deployment>

Note that this is the default deployment.xml file.

XS3

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="XS3" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>XS3</instanceid>

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS3.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <document-processor processes="4" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="0" /> 
   </host> 

   <host name="fs4sp3.contoso.com"> 
      <content-distributor /> 
      <document-processor processes="4" /> 
   </host> 

   <host name="fs4sp4.contoso.com"> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4" /> 
      <document-processor processes="4" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
   </searchcluster> 

</deployment>

XS4

<?xml version="1.0" encoding="utf-8" ?> 

<deployment version="14" modifiedBy="contoso\user" 

   modifiedTime="2009-03-14T14:39:17+01:00" comment="XS4" 
   xmlns="https://www.microsoft.com/enterprisesearch" 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
   xsi:schemaLocation="https://www.microsoft.com/enterprisesearch deployment.xsd"> 

   <instanceid>XS4</instanceid> 

   <connector-databaseconnectionstring> 
      [<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS4.jdbc]]> 
   </connector-databaseconnectionstring> 

   <host name="fs4sp1.contoso.com"> 
      <admin /> 
      <query /> 
      <content-distributor /> 
      <indexing-dispatcher /> 
      <searchengine row="0" column="0" /> 
      <webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/> 
      <document-processor processes="16" /> 
   </host> 

   <host name="fs4sp2.contoso.com"> 
      <query /> 
      <searchengine row="1" column="0" /> 
   </host> 

   <searchcluster> 
      <row id="0" index="primary" search="true" /> 
      <row id="1" index="none" search="true" /> 
   </searchcluster> 

</deployment>

XS5

Same as XS1

Dataset

This section describes the test farm dataset: The database content and sizes, search indexes and external data sources.

The following table shows the overall metrics.

Object	Value
Search index size (# of items)	5.4 M
Size of crawl database	16.7 GB
Size of crawl database log file	1.0 GB
Size of property database	< 0.1 GB
Size of property database log file	< 0.1 GB
Size of SSA administration database	< 0.1 GB

The next table shows the content source types we used to build the index. The numbers in the table reflect the total number of items per source. Note that the difference between the total number of items and the index size can have two reasons:

Items may have been disabled from indexing in the content source, or
The document format type could not be indexed.

For SharePoint Server 2010 sources, the size of the respective content database in SQL represents the raw data size.

Content source	Items	Raw data size	Average size per item
HTML 1	1.1 M	8.8 GB	8.1 kB
SharePoint 1	4.5 M	2.0 TB	443 kB
HTML 2	3.2 M	137 GB	43 kB
Total	8.8 M	2.2 TB	246 kB

Note

The extra-small test scenario did not include people search data.

Test results

This section provides data that show how the various deployments performed under load: Crawling and indexing performance, query performance and disk usage.

Crawling and indexing performance

All the tested deployments had the same CPU resources available, except for XS2. XS2 was running on a single virtual machine and was therefore limited to four CPU cores as opposed to sixteen for the other deployments. The following graph shows the average number of items per second for the different content sources during a full crawl.

Full crawl performance graph

Overall, XS2 showed 65-70% decrease in performance compared to the deployments running on physical hardware. XS3, running four VMs and thus having the same hardware footprint as XS1, results in 35-40% degradation compared to running directly on the host computer. The major degradation for XS3 stems from the lower IO performance when a deployment runs on a virtual machine using a fixed size VHD file. The split of XS3 resources across four virtual machines also resulted in more server to server communication.

Query performance

The following sub sections describe how the deployment configurations and varying content volume affect query performance. We also tried tuning XS5 to make the most out of its high performance storage. The last sub section describes the results compared to XS1 and XS5 without tuning.

Impact of deployment configuration

The following graph shows the query performance when crawling is idle.

Impact of deployment configuration (graph 1)

XS1 and XS5 showed only minor differences, with a slightly better performance for the SSD based XS5 (running with two SSDs versus 7 regular SAS spindles for XS1). As you can expect, the additional search row in XS4 did not improve query performance under idle crawl conditions. XS4 had the same throughput as XS1/XS5 under high load, but with slightly increased latency. The increased latency was caused by queries being directed to both search rows, implying a lower cache hit ratio and server to server communication.

The virtualized deployments (XS2 and XS3) had a significantly lower query performance with more variation than the non-virtualized options. This reduction was related to the storage performance in addition to the search components having maximum four CPU cores at disposal.

The following graph shows the query performance during a full crawl.

Impact of deployment configuration (graph 2)

The XS1 deployment had a reduction in query performance under concurrent crawling and indexing. XS5 was less affected due to the improved storage performance, but we did still see CPU congestion between item processing and query components. XS4 was the least affected, as this deployment had a dedicated search row. XS4 results varied more at concurrent high query and crawl load because of competition for network resources.

The virtualized deployments did not reach 10 QPS during the full crawl. XS1 (native hardware) and XS3 (virtualized) used the same hardware, with the non-virtualized deployment having more than five times the throughput. The difference was caused by virtualization overhead, especially storage performance, and the limitation of four CPU cores per virtual machine. Under high query load, the query components could use all sixteen CPU cores in XS1, whereas they were restricted to maximum four CPU cores in XS3.

Impact of varying content volume

Even though the deployments in this scenario were sized for 8M documents, we also tested query performance when 1M and 5M items were indexed. The following graph shows how the content capacity affected query performance in the XS1 deployment.

Impact of varying content volume graph

The solid lines show that maximum query capacity improved with less content, with maximum 90 QPS at 1M items, 80 QPS at 5M items, and 64 QPS at 8M items. During crawling and indexing, the 1M index could still sustain more than 40 QPS, although with lots of variance. The variance was due to the total index size being fairly small and most of it being able to fit inside application and OS level caches. Both the 5M and 8M indexes had a lower maximum query performance during crawling and indexing, in the 25-30 QPS range.

Impact of tuning for high performance storage

Even if the XS5 deployment demonstrated improved performance over XS1 with default settings, tuning for high performance storage enabled better usage of the higher IOPS potential in SSDs. Such tuning spreads the workload across multiple smaller partitions and allows for more parallel query execution at the expense of more disk operations.

The following graph shows the result of this tuning at full capacity (8M items). The tuning enabled the SSD based XS5 scenario to serve up to 75 QPS, and also reduced the response time under light query load. For example, the response time at 40 QPS at idle crawl was reduced from 0.4 to 0.2 seconds. Further, the response time during crawls was better and more consistent with this tuning. The tuned XS5 scenario was able to deliver around 40 QPS with sub-second latency during crawls, whereas XS1 only delivered 15 QPS with the same load and latency requirements.

Impact of high performance storage graph

In general, high performance storage provides improved query performance, especially during concurrent content crawls, and reduces or even eliminates the performance driven need to run search on dedicated rows. SSDs also provide sufficient performance with a smaller number of disks. In the extra-small test scenario, two SSDs outperformed seven SAS spindles. This is attractive where power, or space restrictions, does not allow for a larger number of disks, for example for blade servers.

Disk usage

The following table shows the combined increase in disk usage on all servers after the various content sources were indexed. Note that deployments providing redundancy (multiple search/indexer rows) require additional disk space because they replicate FiXML and index data.

Content source	Items	FiXML data size	Index data size	Other data size
HTML 1	1.1 M	6 GB	20 GB	4 GB
SharePoint1	4.5 M	41 GB	108 GB	15 GB
HTML 2	3.2 M	27 GB	123 GB	22 GB
Total	8.8 M	74 GB	251 GB	41 GB