Divisional portal environment lab study (SharePoint Server 2010)

Article
07/22/2014

Applies to: SharePoint Server 2010

This document provides guidance on performance and capacity planning for a divisional portal based on Microsoft SharePoint Server 2010. It includes the following:

Test environment specifications, such as hardware, farm topology and configuration
Test farm dataset
Test data and recommendations for how to determine the hardware, topology and configuration that you must have to deploy a similar environment, and how to optimize your environment for appropriate capacity and performance characteristics

In this article:

Introduction to this environment
Glossary
Overview
Specifications
Results and analysis

Introduction to this environment

This document outlines the test methodology and results to provide guidance for capacity planning of a typical divisional portal. A divisional portal is a SharePoint Server 2010 deployment where teams mainly do collaborative activities and some content publishing. This document assumes a "division" to be an organization inside an enterprise with 1,000 to 10,000 employees.

Different scenarios will have different requirements. Therefore, it is important to supplement this guidance with additional testing on your own hardware and in your own environment. If your planned design and workload resembles the environment described in this document, you can use this document to draw conclusions about scaling your environment up and out.

When you read this document, you will understand how to do the following:

Estimate the hardware that is required to support the scale that you need to support: number of users, load, and the features enabled.
Design your physical and logical topology for optimal reliability and efficiency. High Availability/Disaster Recovery are not covered in this document.
Understand the effect of ongoing search crawls on RPS for a divisional portal deployment.

The SharePoint Server 2010 environment described in this document is a lab environment that mimics a production environment at a large company. For details about the production environment, see Departmental collaboration environment technical case study (SharePoint Server 2010).

Before reading this document, make sure that you understand the key concepts behind capacity management in SharePoint Server 2010. The following documentation will help you learn about the recommended approach to capacity management and provide context for helping you understand how to make effective use of the information in this document, and also define the terms used throughout this document.

Also, we encourage you to read the following:

Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)

Glossary

There are some specialized terms that you will encounter in this document. Here are some key terms and their definitions.

RPS: Requests per second. The number of requests received by a farm or server in one second. This is a common measurement of server and farm load.

Note that requests differ from page loads; each page contains several components, each of which creates one or more requests when the page is loaded. Therefore, one page load creates several requests. Typically, authentication checks and events using insignificant resources are not counted in RPS measurements.
Green Zone: This is the state at which the server can maintain the following set of criteria:
- The server-side latency for at least 75% of the requests is less than .5 second.
- All servers have a CPU Utilization of less than 50%.
Note

Because this lab environment did not have an active search crawl running, the database server was kept at 40% CPU Utilization or lower, to reserve 10% for the search crawl load. This assumes Microsoft SQL Server Resource Governor is used in production to limit Search crawl load to 10% CPU.
- Failure rate is less than 0.01%.
Red Zone (Max): This is the state at which the server can maintain the following set of criteria:
- HTTP request throttling feature is enabled, but no 503 errors (Server Busy) are returned.
- Failure rate is less than 0. 1%.
- The server-side latency is less than 1 second for at least 75% of the requests.
- Database server CPU utilization is less than or equal to 75%, which allows for 10% to be reserved for the Search crawl load, limited by using SQL Server Resource Governor.
- All Web servers have a CPU Utilization of less than or equal to 75%.
AxBxC (Graph notation): This is the number of Web servers, application servers, and database servers respectively in a farm. For example, 2x1x1 means that this environment has 2 Web servers, 1 application server, and 1 database server.
MDF and LDF: SQL Server physical files. For more information, see Files and Filegroups Architecture.

Overview

This section provides an overview to our assumptions and our test methodology.

Assumptions

For our testing, we made the following assumptions:

In the scope of this testing, we did not consider disk I/O as a limiting factor. It is assumed that an infinite number of spindles are available.
The tests model only peak time usage on a typical divisional portal. We did not consider cyclical changes in traffic seen with day-night cycles. That also means that timer jobs which generally require scheduled nightly runs are not included in the mix.
There is no custom code running on the divisional portal deployment in this case. We cannot guarantee behavior of custom code/third-party solutions installed and running in your divisional portal.
For the purpose of these tests, all of the services databases and the content databases were put on the same instance of Microsoft SQL Server. The usage database was maintained on a separate instance of SQL Server.
For the purpose of these tests, BLOB cache is enabled.
Search crawl traffic is not considered in these tests. But to factor in the effects of an ongoing search crawl, we modified definitions of a healthy farm. (Green-zone definition to be 40 percent for SQL Server to allow for 10 percent tax from Search crawls. Similarly, we used 80 percent SQL Server CPU as the criteria for max RPS.)

Test methodology

We used Visual Studio Team System for Test 2008 SP2 to perform the performance testing. The testing goal was to find the performance characteristic of green zone, max zone and various system stages in between for each topology. Detailed definitions of "max zone" and "green zone" are given in the Glossary as measured by specific values for performance counters, but in general, a farm configuration performing around "max zone" breakpoint can be considered under stress, whereas a farm configuration performing "green zone" breakpoint can be considered healthy.

The test approach was to start by using the most basic farm configuration and run a set of tests. The first test is to gradually increase the load on the system and monitor its performance characteristic. From this test we derived the throughput and latency at various user loads and also identified the system bottleneck. After we had this data, we identified at what user load did the farm exhibit green zone and max zone characteristics. We ran separate tests at those pre-identified constant user loads for a longer time. These tests ensured that the farm configuration can provide constant green zone and max zone performance at respective user loads, over longer period of time.

Later, while doing the tests for the next configuration, we scaled out the system to eliminate bottlenecks identified in previous run. We kept iterating in this manner until we hit SQL Server CPU bottleneck.

We started off with a minimal farm configuration of 1 Web server /application server and 1 database server. Through multiple iterations, we finally ended at 3 Web servers, 1 application server, 1 database server farm configuration, where the database server CPU was maxed out. Below you will find a quick summary and charts of tests we performed on each iteration to establish green zone and max zone for that configuration. That is followed by comparison of green zone and max zone for different iterations, from which we derive our recommendations.

The SharePoint Admin Toolkit team has built a tool that is named "Load Test Toolkit (LTK)" which is publically available for customers to download and use.

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

Hardware

The table that follows presents hardware specs for the computers that were used in this testing. Every Web server that was added to the server farm during multiple iterations of the test complies with the same specifications.

	Web server	Application Server	Database Server
Processor(s)	2px4c@2.33GHz	2px4c@2.33GHz	4px4c@ 3.19GHz
RAM	8 GB	8 GB	32 GB
Number of network adapters	2	2	1
Network adapter speed	1 Gigabit	1 gigabit	1 Gigabit
Load balancer type	F5 - Hardware load balancer	Not applicable	Not applicable
ULS Logging level	Medium	Medium	Not applicable

Software

The table that follows explains software installed and running on the servers that were used in this testing effort.

	Web Server	Application Server	Database Server
Operating System	Windows Server 2008 R2 x64	Windows Server 2008 R2 x64	Windows Server 2008 x64
Software version	SharePoint Server 2010 and Office Web Applications, pre-release versions	SharePoint Server 2010 and Office Web Applications, pre-release versions	SQL Server 2008 R2 CTP3
Authentication	Windows NTLM	Windows NTLM	Windows NTLM
Load balancer type	F5 - Hardware load balancer	Not applicable	Not applicable
ULS Logging level	Medium	Medium	Not applicable
Anti-Virus Settings	Disabled	Disabled	Disabled
Services running locally	Microsoft SharePoint Foundation Incoming E-Mail Microsoft SharePoint Foundation Web Application Microsoft SharePoint Foundation Workflow Timer Service Search Query and Site Settings Service SharePoint Server Search	Central Administration Excel Services Managed Metadata Web Service Microsoft SharePoint Foundation Incoming E-Mail Microsoft SharePoint Foundation Web Application Microsoft SharePoint Foundation Workflow Timer Service PowerPoint Services Search Query and Site Settings Service SharePoint Server Search Visio Graphics Services Word Viewing Service	Not applicable

The table indicates which services are provisioned in the test environment. Other services such as the User Profile service and Web Analytics are not provisioned.

Topology and configuration

The following diagram shows the topology used for the tests. We changed the number of Web servers from 1 to 2 to 3, as we moved between iterations, but otherwise the topology remained the same.

Farm topology diagram for this environment

Dataset and disk geometry

The test farm was populated with about 1.62 Terabytes of content, distributed across five different sized content databases. The following table explains this distribution:

Content database	1	2	3	4	5
Content database size	36 GB	135 GB	175 GB	1.2 terabytes	75 GB
Number of sites	44	74	9	9	222
Number of webs	1544	2308	2242	2041	1178
RAID configuration	0	0	0	0	0
Number of spindles for MDF	1	1	5	3	1
Number of spindles for LDF	1	1	1	1	1

Transactional mix

The following are important notes about the transactional mix:

There are no My Sites provisioned on the divisional portal. Also, the User Profile service, which supports My Sites, is not running on the farm. The transactional mix does not include any My Site page/web service hits or traffic related to Outlook Social Connector.
The test mix does not include any traffic generated by co-authoring on documents.
The test mix does not include traffic from Search Crawl. However this was factored into our tests by modifying the Green-zone definition to be 40 percent SQL Server CPU usage instead of the standard 50 percent to allow for 10 percent for the search crawl. Similarly, we used 80 percent SQL Server CPU as the criteria for max RPS.

The following table describes the overall transaction mix. The percentages total 100.

Feature or Service	Operation	Read/write	Percentage of mix
ECM	Get static files	r	8.93%
	View home page	r	1.52%
Microsoft InfoPath	Display/Edit upsize list item and new forms	r	0.32%
	Download file by using "Save as"	r	1.39%
Microsoft OneNote 2010	Open Microsoft Office OneNote 2007 file	r	13.04%
Search	Search through OSSSearch.aspx or SearchCenter	r	4.11%
Workflow	Start autostart workflow	w	0.35%
Microsoft Visio	Render Visio file in PNG/XAML	r	0.90%
Office Web Applications - PowerPoint	Render Microsoft PowerPoint, scroll to 6 slides	r	0.05%
Office Web Applications - Word	Render and scroll Microsoft Word doc in PNG/Silverlight	r	0.24%
Microsoft SharePoint Foundation	List – Check out and then check in an item	w	0.83%
	List - Get list	r	0.83%
	List - Outlook sync	r	1.66%
	List - Get list item changes	r	2.49%
	List - Update list items and adding new items	w	4.34%
	Get view and view collection	r	0.22%
	Get webs	r	1.21%
	Browse to Access denied page	r	0.07%
	View Browse to list feeds	r	0.62%
	Browse to viewlists	r	0.03%
	Browse to default.aspx (home page)	r	1.70%
	Browse to Upload doc to doc lib	w	0.05%
	Browse to List/Library's default view	r	7.16%
	Delete doc in doclib using DAV	w	0.83%
	Get doc from doclib using DAV	r	6.44%
	Lock and Unlock a doc in doclib using DAV	w	3.32%
	Propfind list by using DAV	r	4.16%
	Propfind site by using DAV	r	4.16%
	List document by using FPSE	r	0.91%
	Upload doc by using FPSE	w	0.91%
	Browse to all site content page	r	0.03%
	View RSS feeds of lists or wikis	r	2.03%
Excel Services	Render small/large Excel files	r	1.56%
Workspaces	WXP - Cobalt internal protocol	r	23.00%
	Full file upload using WXP	w	0.57%

Results and analysis

This section describes the test methodology and results to provide guidance for capacity planning of a typical divisional portal.

Results from 1x1 farm configuration

Summary of results

On a 1 Web server and 1 database server farm, in addition to Web server duties, the same computer was also acting as application server. Clearly this computer (still called Web server) was the bottleneck. As presented in the data here, the Web server CPU reached around 86% utilization when the farm was subjected to user load of 125 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 101.37.
Even at a small user load, Web server utilization was always too high to consider this farm as a healthy farm. For the workload and dataset that we used for the test, we do not recommend this configuration as a real deployment.
Going by definition of "green zone", there is not really a "green zone" for this farm. It is always under stress, even at a small load. As for "max zone", at the smallest load, where the farm was in "max zone", the RPS was 75.
Because the Web server was the bottleneck due to its dual role as an application server, for the next iteration, we separated out the application server role onto its own computer.

Performance counters and graphs

The following table presents various performance counters captured during testing a 1x1 farm at different steps in user load.

User Load	50	75	100	125
RPS	74.958	89.001	95.79	101.37
Latency	0.42	0.66	0.81	0.81
Web server CPU	79.6	80.1	89.9	86
Application server CPU	N/A	N/A	N/A	N/A
Database server CPU	15.1	18.2	18.6	18.1
75th Percentile (sec)	0.3	0.35	0.55	0.59
95th Percentile (sec)	0.71	0.77	1.03	1

The following chart shows the RPS and latency results for a 1x1 configuration.

Chart with RPS and latency at 1x1 scale

The following chart shows performance counter data in a 1x1 configuration.

Chart with performance counters at 1x1 scale

Results from 1x1x1 farm configuration

Summary of results

On a 1 Web server, 1 application server and 1 database server farm, the Web server was the bottleneck. As presented in the data in this section, the Web server CPU reached around 85% utilization when the farm was subjected to user load of 150 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 124.1.
This configuration delivered "green zone" RPS of 99, with 75th percentile latency being 0.23 sec, and the Web server CPU hovering around 56 % utilization. This indicates that this farm can healthily deliver an RPS of around 99. "Max zone" RPS delivered by this farm was 123 with latencies of 0.25 sec and the Web server CPU hovering around 85%.
Because the Web server CPU was the bottleneck in this iteration, we relived the bottleneck by adding another the Web server for the next iteration.

Performance counters and graphs

The following table presents various performance counters captured during testing a 1x1x1 farm, at different steps in user load.

User Load	25	50	75	100	125	150
RPS	53.38	91.8	112.2	123.25	123.25	124.1
Latency	34.2	56	71.7	81.5	84.5	84.9
Web server CPU	23.2	33.8	34.4	32	30.9	35.8
Application server CPU	12.9	19.7	24.1	25.2	23.8	40.9
Database server CPU	0.22	0.23	0.27	0.32	0.35	0.42
75th Percentile (sec)	0.54	0.52	0.68	0.71	0.74	0.88

The following chart shows RPS and latency results for a 1x1x1 configuration.

Chart with RPS and latency at 1x1x1 scale

The following chart shows performance counter data in a 1x1x1 configuration.

Chart with performance counters at 1x1x1 scale

Results from 2x1x1 farm configuration

Summary of results

On a 2 Web server, 1 application server and 1 database server farm, the Web server was the bottleneck. As presented in the data in this section, Web server CPU reached around 76% utilization when the farm was subjected to user load of 200 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 318.
This configuration delivered "green zone" RPS of 191, with 75th percentile latency being 0.37 sec, and Web server CPU hovering around 47 % utilization. This indicates that this farm can healthily deliver an RPS of around 191. "Max zone" RPS delivered by this farm was 291 with latencies of 0.5 sec and Web server CPU hovering around 75%.
Because the Web server CPU was the bottleneck in this iteration, we relived the bottleneck by adding another Web server for the next iteration.

Performance counters and graphs

The following table presents various performance counters captured during testing a 2x1x1 farm, at different steps in user load.

User Load	40	80	115	150	175	200
RPS	109	190	251	287	304	318
Latency	0.32	0.37	0.42	0.49	0.54	0.59
Web server CPU	27.5	47.3	61.5	66.9	73.8	76.2
Application server CPU	17.6	29.7	34.7	38	45	45.9
Database server CPU	21.2	36.1	43.7	48.5	52.8	56.2
75th Percentile (sec)	0.205	0.23	0.27	0.3	0.305	0.305
95th Percentile (sec)	0.535	0.57	0.625	0.745	0.645	0.57

The following chart shows RPS and latency results for a 2x1x1 configuration.

Chart with RPS and latency at 2x1x1 scale

The following chart shows performance counter data in a 2x1x1 configuration.

Chart with performance counters at 2x1x1 scale

Results from 3x1x1 farm configuration

Summary of results

On a 3 Web server, 1 application server and 1 database server farm, finally, the database server CPU was the bottleneck. As presented in the data in this section, database server CPU reached around 76% utilization when the farm was subjected to user load of 226 users by using the transactional mix described earlier in this document. At that point, the farm exhibited max RPS of 310.
This configuration delivered "green zone" RPS of 242, with 75th percentile latency being 0.41 sec, and database server CPU hovering around 44% utilization. This indicates that this farm can healthily deliver an RPS of around 242. "Max zone" RPS delivered by this farm was 318 with latencies of 0.5 sec and database server CPU hovering around 75%.
This was the last configuration in the series.

Performance counters and graphs

The following table presents various performance counters captured during testing a 3x1x1 farm, at different steps in user load.

User Load	66	103	141	17	202	226
RPS	193.8	218.5	269.8	275.5	318.25	310
Latency	0.3	0.41	0.47	0.58	0.54	0.78
Web server CPU	33	38.3	45.8	43.3	51	42.5
Application server CPU	28	32.6	46.5	40	45.1	43.7
Database server CPU	41.6	44.2	52.6	48	61.8	75
75th Percentile (sec)	0.22	0.24	0.30	0.65	0.78	0.87
95th Percentile (sec)	0.49	0.57	0.72	1.49	0.51	1.43

The following chart shows RPS and latency results in a 3x1x1 configuration.

Chart with RPS and latency at 3x1x1 scale

The following chart shows performance counter data for a 3x1x1 configuration.

Chart with performance counters at 3x1x1 scale

Comparison

From the iterative tests we performed, we found out the points at which a configuration enters max zone or green zone. Here’s a table of those points.

The table and charts in this section provide a summary for all the results that were presented earlier in this article.

Topology	1x1	1x1x1	2x1x1	3x1x1
Max RPS	75	123	291	318
Green Zone RPS	Not applicable	99	191	242
Max Latency	0.29	0.25	0.5	0.5
Green Zone Latency	0.23	0.23	0.37	0.41

The following chart shows a summary of RPS at different configurations.

Chart with comparison of RPS at each scale

The following chart shows a summary of latency at different configurations.

Comparison of latency at all scales

A note on disk I/O

Disk I/O based bottlenecks are not considered while prescribing recommendations in this document. However, it is still interesting to observe the trend. Here are the numbers:

Configuration	1x1	1x1x1	2x1x1	3x1x1
Max RPS	75	154	291	318
Reads/Sec	38	34	54	58
Writes/Sec	135	115	230	270

Because we ran the tests in durations of 1 hour and the test uses only a fixed set of sites/webs/document libraries and so on, SQL Server could cache all the data. Thus, our testing caused very little Read IO. We see more write I/O operations that read. It is important to be aware that this is an artifact of the test methodology, and not a good representation of real deployments. Most of the typical divisional portals would have more read operations 3 to 4 times more than write operations.

The following chart shows I/Ops at different RPS.

Chart with IOPs at all scales

Tests with Search incremental crawl

As we mentioned before, all the tests until now were run without Search crawl traffic. In order to provide information about how ongoing search crawl can affect performance of a farm, we decided to find out the max user RPS and corresponding user latencies with search crawl traffic in the mix. We added a separate Web server to 3x1x1 farm, designated as a crawl target. We saw a 17% drop in RPS compared to original RPS exhibited by 3x1x1.

In a separate test, on the same farm, we used Resource Governor to limit available resources to search crawl 10%. We saw that as Search uses lesser resources, the max RPS of the farm climbs up by 6%.

	Baseline 3x1x1	Only Incremental Crawl	No Resource Governor	10% Resource Governor
RPS	318	N/A	276	294.5
Percent RPS difference from baseline	0%	N/A	83%	88%
Database server CPU (%)	83.40	8.00	86.60	87.3
SA Database server CPU (%)	3.16	2.13	3.88	4.2
Web server CPU (%)	53.40	0.30	47.00	46.5
Application server CPU (%)	22.10	28.60	48.00	41.3
Crawl Web server CPU (%)	0.50	16.50	15.00	12.1

The following chart shows results from tests with incremental Search crawl turned on.

Requests per second with Search running

Important

Here we are only talking about incremental crawl, on a farm where there are not very many changes to the content. It is important to be aware that 10% resource utilization will be insufficient for a full search crawl. It may also prove to be less if there are too many changes. It is definitely not advised to limit resource utilization to 10% if you are running a full search crawl, or your farm generally sees a high volume of content changes between crawls.

Summary of results and recommendations

To paraphrase the results from all configurations we tested:

With the configuration, dataset and test workload we had selected for the tests, we could scale out to maximum 3 Web servers before SQL Server was bottlenecked on CPU. The absolute max RPS we could reach that point was somewhere around 318.
With each additional Web server, increase in RPS was almost linear. We can extrapolate that as long as SQL Server is not bottlenecked, you can add more Web servers and additional increase in RPS is possible.
Latencies are not affected much as we approached bottleneck on SQL Server.
Incremental Search crawl directly affects RPS offered by a configuration. The effect can be minimized by using Resource Governor.

Using the results, here are few recommendations on how to achieve even larger scale if you must have more RPS from your divisional portal:

1x1 farm can deliver up to 75 RPS. However, it is usually stressed. It’s not a recommended configuration for a divisional portal in production.
Separate out content databases and services databases on separate instances of SQL Server: In the test workload used in tests, when SQL Server was bottlenecked on CPU, only 3% of the traffic was to the services databases. Thus this step would have achieved slightly better scale out than what we saw. But, in general, increase in scale out by separating out content databases and services databases is directly proportional to the traffic to services database in your farm.
Separate out individual content databases on separate instances of SQL Server. In the dataset used for testing, we had 5 content databases, all located on the same instance of SQL Server. By separating them out on different computers, you will be spreading CPU utilization across multiple computers. Therefore, you will see much larger RPS numbers.
Finally when SQL Server is bottlenecked on CPU, adding more CPU to SQL Server can increase RPS potential of the farm almost linearly.

How to translate these results into your deployment

In this article, we discussed results as measured by RPS and latency, but how do you apply these in the real world? Here’s some math based on our experience from divisional portal internal to Microsoft.

A divisional portal in Microsoft which supports around 8000 employees collaborating heavily, experiences an average RPS of 110. That gives a Users to RPS ratio of ~72 (that is, 8000/110). Using the ratio, and the results discussed earlier in this article, we can estimate how many users a particular farm configuration can support healthily:

Farm configuration	"Green Zone" RPS	Approximate number of users it can support
1x1x1	99	7128
2x1x1	191	13452
3x1x1	242	17424

Of course, this is only directly applicable if your transactional mix and hardware is exactly same as the one used for these tests. Your divisional portal may have different usage pattern. Therefore, the ratio may not directly apply. However, we expect it to be approximately applicable.

About the authors

Gaurav Doshi is a Program Manager for SharePoint Server at Microsoft.

Raj Dhrolia is a Software Test Engineer for SharePoint Server at Microsoft.

Wayne Roseberry is a Principal Test Lead for SharePoint Server at Microsoft.

Divisional portal environment lab study (SharePoint Server 2010)

Introduction to this environment

Glossary

Overview

Assumptions

Test methodology

Specifications

Hardware

Software

Topology and configuration

Dataset and disk geometry

Transactional mix

Results and analysis

Results from 1x1 farm configuration

Results from 1x1x1 farm configuration

Results from 2x1x1 farm configuration

Results from 3x1x1 farm configuration

Comparison

A note on disk I/O

Tests with Search incremental crawl

Summary of results and recommendations

About the authors

See Also

Other Resources

Additional resources