Estimate performance and capacity requirements for social environments (SharePoint Server 2013) - SharePoint Server

Article
01/20/2023

APPLIES TO: yes-img-13 2013 no-img-16 2016 no-img-19 2019 no-img-se Subscription Edition no-img-sop SharePoint in Microsoft 365

To create a performance and capacity plan for an enterprise intranet My Site and social computing portal solution, this article contains information about the following areas:

Lab environment specifications, such as hardware, farm topology, and farm configuration
The test farm workload and dataset that was used to generate test load
Test results and analysis that demonstrate and explain trends in throughput, latency, and hardware demand under load at specific scale points.

Use the information in this article to understand the following concepts:

Characteristics of the scenario under both normal and peak loads
How performance trends change when farm servers are scaled out
How to estimate an appropriate starting point for your planned architecture
Important factors to consider when you plan for the resources your farm will need to maintain acceptable levels of performance under peak load

Introduction to this environment

Enterprises often use SharePoint Server 2013 to publish My Site and social computing portals that authenticated users access on an intranet site. This article contains capacity and performance data to help plan the number of computers to use and the types of computers that are required to publish My Site and social computing portals in SharePoint Server 2013.

Additional guidance explains how to scale out servers in a SharePoint Server 2013 enterprise My Site and social computing portal solution. Capacity planning informs decisions about hardware to purchase and system configurations that optimize your solution.

Because individual SharePoint Server 2013 farms are unique, each farm has different requirements that depend on hardware, user behavior, the configuration of installed features, and many other factors. Therefore, supplement this guidance with additional testing on your own hardware in your own environment. If your planned design and workload resembles the environment described in this article, you can use this article to draw conclusions about how to scale your environment.

Test results in this article were produced in a test lab, using a workload, dataset, and architecture to simulate a production environment under highly controlled conditions. While great care was exercised in designing these tests, the performance characteristics of a test lab are never the same as the behavior of a production environment. These test results do not represent the performance and capacity characteristics of a production farm. Instead, the test results demonstrate observed trends in throughput, latency, and hardware demand. Use the analysis of the observed data to help you plan capacity and manage your own farm.

This article includes the following:

Specifications, which include hardware, topology, and configuration
The workload, which includes an analysis of the demand on the farm, the number of users, and usage characteristics
The dataset, such as database sizes and content types
Test results and analysis to scale out web servers

Before you read this article, read the following articles to make sure that you understand the key concepts behind capacity management in SharePoint Server 2013.

These articles provide the following information:

The recommended approach to capacity management
How to make effective use of the information in this article

Glossary

The following list contains definitions for key terms found in this article:

RPS: Requests per second. RPS is the number of requests that a farm or server receives in one second. This is a common measurement of server and farm load.

Important

Note that requests differ from page loads. Each page contains several components, each of which creates one or more requests when a browser loads a page. Therefore, one page load creates several requests. Authentication checks and events that use insignificant resources typically are not counted in RPS measurements.
Green Zone: Green Zone represents a defined set of load characteristics under normal operating conditions, up to expected daily peak loads. A farm that operates in this range should be able to sustain response times and latency that are within acceptable parameters.

This is the state at which the server can maintain the following set of criteria:
- The server-side latency for at least 75 percent of the requests is less than 0.5 seconds.
- All servers maintain an average CPU utilization of less than 50 percent.
- Failure rate is less than 0.1 percent.
Red Zone (Max): Red Zone represents a defined set of load characteristics under peak operation conditions. At Red Zone, the farm experiences very high transient resource demands that it can sustain only for limited periods before failures and other performance and reliability issues occur.

This is the state at which the server can maintain the following set of criteria for a limited duration:
- Server-side latency for at least 75 percent of the requests is less than 1 second.
- Average database server CPU utilization is less than 80 percent.
- Failure rate is less than 0.1 percent.

Overview

This section summarizes our scaling approach, the relationship between this lab environment and a similar case study environment, and our test methodology.

Scaling approach

We recommend that you scale the computers in your environment in the specific order we followed for scaling our test lab environment. This approach will enable you to find the best configuration for your workload.

We divided the performance test cycles into three workload categories. The primary parameter that determined the category boundary was number of user profiles, which was set at 10K, 100K and 500K user profile tests. Another parameter was the number of active users, who were carrying out actions related to the social set of features. With both the number of users with a profile and number of active users, we ran tests to simulate usage of the application that would be similar to actual deployments. The following table depicts the initial data set and the number of active users.

Initial Data Set

Entity	% of users with this feature	Small (10K users)	Medium (100K users)	Large (500K users)
Number of user profiles for users	100%	10K	100K	500K
Number of provisioned My Sites	100%	10K	100K	500K
Number of user profiles that have user photos	50%	5K	50K	250K
Number of user profiles that have posts	10%	1K	10K	50K
Number of teams		1,860	18,600	93K
Number of active users per day	10%	1K	10K	50K
Number of active users per hour	5%	500	5K	25K

Testing focused on the following key scenarios:

News Feed page access and other actions
Profile page
Site feed page access and other actions
Outlook Social Connector Activity Feed Sync
OneDrive page access
OneDrive client usage

To simulate a realistic deployment scenario, all tests were run on a database that already had data. The dataset was a model of a tree organization with an average of 4-6 users per team, and 3-4 levels deep. To generate these numbers, we analyzed traffic from an internal social site. The following table describes the set of parameters that we used to build the initial data set.

Data model for initial database

Data entity description	Number
Average number of users in on team	5
Average number of levels per organization	4
Number of teams per 1,000 users	186
Average number of colleagues a user follows	50
Number of User Profile properties	93

The following table describes the set of parameters in terms of actions that would result in the data population:

Usage characteristics

Parameter	Number or percentage
Percentage of users with 1-3 posts	10%
Average number of posts per user	2
Average number of replies per post	2
Percentage of posts that are Liked	15%
Percentage of posts with links	5%
Percentage of posts with tags	12%
Percentage of posts with user mentions	8%
Percentage of posts with image attached	5%

To create each of our scale tests, we applied the following action mix to the preceding data set and the number of active users:

User READ Actions

User action	% of user taking this action	Scenario	Feature or URL
Navigate to My Site home page	12%	Newsfeed	Newsfeed page (http://my/default.aspx)
Navigate to the user's public profile page	8%	Profile	Profile page (http://my/person.aspx?accountname=<alias>)
Navigate to the user's private profile page	4%	Profile	Profile page (http://my/person.aspx)
Automatic syncing of activity feed	32%	Outlook Social Connector	none
Navigate to the People I'm following page	3%	Follow People List	http://my/MyPeople.aspx
Navigate to the default document library	6%	OneDrive	https://msft-my.spoppe.com/personal/<user>/Documents
Navigate to followed documents page	3%	OneDrive	https://msft-my.spoppe.com/personal/<user>/Social/FollowedContent.aspx
Navigate to followed documents page	3%	OneDrive	https://msft-my.spoppe.com/personal/<user>/Social/FollowedContent.aspx
Navigate to the site feed page	8%	Site Feed	Site Feed page (https://<domain>/teams/<site>/newsfeed.aspx_
View all replies on a thread	1%	Newsfeed	Newsfeed page (http://my/default.aspx)
View Everyone feed	3%	Newsfeed	Newsfeed page (http://my/default.aspx)
View more posts on the newsfeed	2%	Newsfeed	Newsfeed page (http://my/default.aspx)
View the @mentions page	1%	Newsfeed	Newsfeed page (http://my/default.aspx)
View newsfeed (Mobile)	1%	Mobile	Mobile Representational State Transfer (REST) Call
View categorized newsfeed	3%	Mobile	Mobile REST Call

User WRITE Actions

User action	Percentage	Scenario	Feature or URL
Create root post in the feed	0.5%	Newsfeed	Newsfeed page (http://my/default.aspx)
Like a post in the feed	0.3%	Newsfeed	Newsfeed page (http://my/default.aspx)
Reply to a post in the feed	0.7%	Newsfeed	Newsfeed page (http://my/default.aspx)
Create post in the feed with @mention	0.1%	Newsfeed	Newsfeed page (http://my/default.aspx)
Create root post in the site feed	0.5%	Site Feed	Site feed page (https://<domain>/teams/<site>/newsfeed.aspx)
Create post in the site feed with @mention	0.5%	Site Feed	Site feed page (https://<domain>/teams/<site>/newsfeed.aspx)
Reply to a post in the site feed	0.15%	Site Feed	Site feed page (https://<domain>/teams/<site>/newsfeed.aspx)
Create post in the site feed with a tag	0.05%	Site Feed	Site feed page (https://<domain>/teams/<site>/newsfeed.aspx)

OneDrive client actions

User action**	Percentage	Scenario	Feature or URL
OneDrive initial sync	0.2%	OneDrive	Initial Sync
OneDrive incremental sync - download a file	0.88%	OneDrive	Incremental Sync
OneDrive incremental sync - no changes	8.1%	OneDrive	Incremental Sync

Test Methodology

We started with a minimum SharePoint Server 2013 farm configuration for social features. We applied a characteristic social load to the test farm and increased the load until we observed levels of normal and maximum server capacity. We analyzed bottlenecks at each of these load levels and added machines of the overloaded role to scale out the farm configuration. This addition alleviated the bottlenecks in each case and provided a view of scalability characteristics of the server for a particular dataset. We repeated this scale-out process for three deployment sizes to provide representative summaries of a SharePoint Server 2013 farm's scalability characteristics and guidelines for capacity planning.

Specifications

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

Important

Al web servers and application servers in the test lab were virtualized by using Hyper-V hosts. Database servers were not virtualized. The physical host hardware and virtual machine virtual hardware are detailed separately in the following sections.

Hardware

The following table lists hardware specifications for the computers that were used in this test. Front-end web servers that were added to the server farm during multiple iterations of the test also complied with these specifications.

Hyper-V Hosts

The farm includes a total of three identically configured Hyper-V hosts, and each host runs one to four virtual machines.

Host hardware	Value
Processor(s)	2 Quad-core 2.27 GHz processors
RAM	64 GB
Operating System	Windows Server 2008 R2 SP1
Number of network adapters	2
Network adapter speed	1 Gigabit

Virtual web servers and application servers

The farm has from one to eight virtual web servers. An additional dedicated virtual server runs the Distributed Cache Service.

Note

In a production environment, dedicated servers that run the Distributed Cache Service are typically deployed in a highly available configuration. For test purposes, we used a single dedicated server for Distributed Cache because high availability was not a critical factor.

VM hardware	Web servers
Processors	4 virtual processors
RAM	12 GB
Operating system	Windows Server 2008 R2 SP1
Size of the SharePoint drive	100 GB
Number of network adapters	2
Network adapter speed	1 Gigabit
Authentication	Windows NTLM
Load balancer type	F5 Big IP
Services running locally	Microsoft SharePoint Foundation Web Application, Microsoft SharePoint Foundation Incoming E-Mail, Microsoft SharePoint Foundation Workflow Timer Service, Managed Metadata Web Service, User Profile Service

VM hardware	Cache
Processors	4 virtual processors
RAM	12 GB
Operating system	Windows Server 2008 R2 SP1
Size of the SharePoint drive	100 GB
Number of network adapters	2
Network adapter speed	1 Gigabit
Authentication	Windows NTLM
Services running locally	Distributed Cache, Microsoft SharePoint Foundation Workflow Timer Service

VM hardware	Search query component
Processors	4 virtual processors
RAM	12 GB
Operating system	Windows Server 2008 R2 SP1
Number of network adapters	2
Network adapter speed	1 Gigabit
Authentication	Windows NTLM
Services running locally	Microsoft SharePoint Foundation Web Application, Microsoft SharePoint Foundation Incoming E-Mail, Microsoft SharePoint Foundation Workflow Timer Service, Search Query and Site Settings Service, SharePoint Server Search

VM Hardware	Search index component
Processors	4 virtual processors
RAM	12 GB
Operating system	Windows Server 2008 R2 SP1
Number of network adapters	2
Network adapter speed	1 Gigabit
Authentication	Windows NTLM
Services running locally	Microsoft SharePoint Foundation Web Application, Microsoft SharePoint Foundation Incoming E-Mail, Microsoft SharePoint Foundation Workflow Timer Service, SharePoint Server Search

Database servers

One physical database server runs the default SQL Server instance that has the SharePoint databases. This article does not track the logging database.

Note

If you enable usage reporting, we recommend that you store the logging database on a separate Logical Unit Number (LUN). Large deployments and some medium deployments might require a dedicated logging database server to accommodate the demand on the processor that a high volume of logging events generates. > In this lab environment, logging was constrained, and the logging database was stored in a separate instance of SQL Server.

Database Server - Default Instance


Processors	2 Quad-core 3.3 GHz processors
RAM	32 GB
Operating system	Windows Server 2008 R2 SP1
Storage and geometry	Direct Attached Storage (DAS) Internal array with 6 x 300 GB 15krpm disk External array with 15 x 450 GB 15krpm disk 50 x content data (external RAID10, 2x3 spindles 300 GB each) 50 x content logs (internal RAID10, 2x2 spindle 300 GB each) 1 x temp data (internal RAID10, 2x2 spindles 300 GB each) 1 x temp log (internal RAID10, 2x2 spindles 300 GB each)
Number of network adapters	1
Network adapter speed	1 Gigabit
Authentication	Windows NTLM
Software version	SQL Server 2008 R2

Topology

The following table shows the topology for this lab environment:

Lab environment topology

Role	Small deployment (10k users)	Medium deployment (100K users)	Large deployment (500K users)
Web server	2-4	4-8	8
Cache	1	1-2	3
SQL Server	1	1-2	2

Test Process

Important

The tests only model normal business-hour usage on a typical social computing portal. We did not consider the cyclical changes in user-generated traffic that day/night cycles produce. We tested Timer jobs such as Profile Synchronization and People Search Crawl, which require significant resources, independently with the same test workload to determine their effect. > The tests focus on social operations, such as newsfeeds, social tagging, and reading people profiles. The test mix includes a small amount of typical collaboration traffic to better simulate a production environment. We expect these results will help to design a separate portal that is dedicated to My Sites and social features. > The test mix does not include traffic from the Search Content Crawl. >

We conducted tests against small, medium, and large deployments for the social features. To configure server hardware, we started at minimum configurations for the smallest size and populated the test database with the dataset as described in the Scaling approach section.

We used Visual Studio Team System (VSTS) to simulate a workload and apply a characteristic social load, driving a small load against the server at first. We uniformly increased this load slowly and recorded performance metrics on all server roles until we observed maximum RPS. This was recognizable as the state where an increase of the applied load on the farm resulted in no increase in delivered RPS output because of server bottleneck constraints.

From these recorded metrics, we defined green zone and red zone states, which represent the normal and fully loaded states of the VM server at a given computer configuration. We then applied a steady load at both green zone and red zone levels to analyze steady-state performance metrics at these loads. This provided a server health and performance representation of the VM server under these key load conditions for each topology configuration.

After we understood the green and red load characteristics and the scaling curve for each topology, we identified the scaling bottleneck that limited RPS. In the case of social workload, this was typically web server CPU for small datasets. For larger datasets, we also observed memory pressure on the Distributed Cache nodes. We added virtual servers of the overloaded role to the configuration to remove the bottlenecks in each case and continue the scale-out process. We then repeated the analysis of performance trends and their conformity to green and red zone definitions at each configuration size until we achieved requirements for a specific deployment size.

After we understood each deployment size, we reconfigured the test farm to the smallest configuration of the next larger size, populated the dataset as described in the Scaling approach section, and repeated the analysis/scale-out process cycle, and measured scale-out characteristics of each dataset size.

Results and analysis

This section shows the measured results for the three deployment sizes. Specifically, it shows how scaling out the server farm by adding web servers affects green and red zone RPS, latency, and average CPU usage.

The following trends were consistent across all three deployment sizes:

Both red and green zone RPS increases linearly with the number of virtual web servers.
The primary bottleneck across all tested configurations was the web server CPU.
At red zone, latency increases slightly as we add web servers and increase load. This is caused by added pressure on SQL Server and the Distributed Cache service (which is running on all web servers in the test farm).
Additionally, average CPU usage on SQL Server and Distributed Cache computers increases as the number of web servers increases. This is caused by additional caching load on the on SQL Server and the Distributed Cache service.
Green zone latency remains fairly flat as the number of web servers increases. This is because the web servers are not overburdened at green zone load levels.

Small Scale Results

The following graph shows how increasing the number of web servers affects RPS for both green and red zones.

Screenshot showing how increasing the number of front-end web servers affects RPS for both Green and RED zones in the 10k user scenario.

The following graph shows how increasing the number of web servers affects latency for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects latency for both Green and RED zones in the 10k user scenario.

The following graph shows how increasing the number of web servers affects average CPU usage for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects CPU usage for both Green and RED zones in the 10k user scenario.

Medium Scale Results

The following graph shows how increasing the number of web servers affects RPS for both green and red zones.

Screenshot showing how increasing the number of front-end-web servers affects RPS for both Green and RED zones in the 100k user scenario.

The following graph shows how increasing the number of web servers affects latency for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects latency for both Green and RED zones in the 100k user scenario.

The following graph shows how increasing the number of web servers affects average CPU usage for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects CPU usage for both Green and RED zones in the 100k user scenario.

Large Scale Results

The following graph shows how increasing the number of web servers affects RPS for both green and red zones.

Screenshot showing how increasing the number of front-end web servers affects RPS for both Green and RED zones in the 500k user scenario.

The following graph shows how increasing the number of web servers affects latency for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects latency for both Green and RED zones in the 500k user scenario.

The following graph shows how increasing the number of web servers affects average CPU usage for both green and red zone load levels.

Screenshot showing how increasing the number of front-end web servers affects CPU usage for both Green and RED zones in the 500k user scenario.

As the number of web servers increases, the following events occur:

Average CPU usage increases for SQL Server and Distributed Cache nodes because of added burden on these shared resources.
Average web server CPU usage at red zone slightly decreases because of bottleneck shifting slightly to SQL Server and Distributed Cache computers.
Average web server CPU usage at green zone remains constant because servers are kept at recommended load levels.

Recommendations

A successful SharePoint Server 2013 social deployment as measured by performance depends on the following factors:

The number of active users who you want to support
The expected transaction mix of read and write operations
How the load is distributed across the farm servers

The expected number of active users is one key factor to determine the number of servers that you should plan to have in the topology. The number of active users also determines the makeup of hosting of the various services that are required to be enabled for the social scenario across the servers.

Though our testing used a typical dataset and applied the load complexity that you might expect in a real-world customer deployment, every deployment is unique. Your capacity planning effort should consider expected usage characteristics, feature configuration, and hardware resource availability. Some factors that can have an affect or change the capacity numbers in a significant way are as follows:

A pattern of increased email usage might increase the load that the Outlook Social Connector generates.
A significant increase in the percentage of write actions (for example, an increase in tagging or @mention) in the transaction mix might increase the load on the database server.
You can add or remove web servers to balance CPU load between web servers, SQL Server, and Distributed Cache nodes.

Carefully follow standard SharePoint Server 2013 configuration guidance for optimal performance. Considerations that matter specifically for social transactions are as follows:

Separate physical disks for Profile DB - Because of the heavy disk usage that social transactions can have on Profile DB, we recommend that you keep Profile DB on its own set of physical disks on the server that runs SQL Server.
Memory requirements for User Profile service application - The User Profile service application is located on front-end web servers and relies heavily on its in-memory cache. Make sure that front-end web servers have enough RAM to cache many requests for data. Minimum recommended RAM is 12 GB per front-end web server.
Memory requirements for Distributed Cache servers- Social features, microblogging in particular, depend heavily on sufficient and robust Distributed Cache storage. Low memory situations on these computers can degrade the capacity of the SharePoint farm while this cache is being repopulated. Therefore we recommend that you configure servers that host the Distributed Cache to use at least 12 GB of RAM, and scaled out as needed based on total number of users in the deployment.

The SharePoint Server 2013 social deployment makes it mandatory to provision a personal site for every user who wants to use social features. Plan the growth of the creation of personal site collections at the level of the content database . For more information about how to scale personal site collections, see Software boundaries and limits for SharePoint 2013.

Introduction to this environment

Glossary

Overview

Scaling approach

Test Methodology

Specifications

Hardware

Hyper-V Hosts

Virtual web servers and application servers

Database servers

Topology

Test Process

Results and analysis

Small Scale Results

Medium Scale Results

Large Scale Results

Recommendations

See also

Concepts

Other Resources

Additional resources