Export (0) Print
Expand All

Microsoft SharePoint Server 2010 social environment: Lab study

 

Applies to: SharePoint Server 2010

Topic Last Modified: 2011-09-25

This article provides guidance on performance and capacity planning for a My Site and social computing portal based on Microsoft SharePoint Server 2010. This article describes the following:

  • Test environment specifications such as hardware, farm topology, and configuration.

  • Test farm dataset.

  • Test data and recommendations for how to determine the hardware, topology, and configuration that you must deploy in a test environment and how to optimize an environment for appropriate capacity and performance characteristics.

Here are the key findings from our test of the My Site and social computing portal:

  • The environment scaled up to eight front-end Web servers for one application server and one database server (8×1×1). The increase in throughput was almost linear throughout. We experienced no additional gains in throughput by adding more than eight front-end Web servers because the bottleneck at this point was the CPU utilization of the database server.

  • We achieved further scale-out by separating the content database and services database onto separate database servers (8×1×2).

  • We reached maximum throughput by using the 8x1x2 topology. At that point, front-end Web server utilization and application server CPU utilization were the bottleneck. Given this, it seems that, for the given hardware, dataset, and test workload, the maximum possible requests per second (RPS) is represented by Max Zone RPS for 8x1x2, which is about 1,877 RPS.

  • Looking at the trends, we think it might be possible to extract the same throughput with a healthy farm if the bottlenecks on the front-end Web server and application server are addressed. The front-end Web server bottleneck can be addressed by adding more front-end Web servers. The application server bottleneck can be addressed by using two computers to play the role of application server. However, we did not try this in the lab.

  • Latency is not affected by throughput or hardware variations.

  • If you have security trimming turned on, one front-end Web server can support from 8 through 10 RPS of Outlook Social Connector traffic. This means that one front-end Web server can support about 28,000 to 36,000 employees using the Outlook Social Connector all day. Thus, if you are rolling out the Outlook Social Connector to 100,000 employees, you will need three front-end Web servers to support the Outlook Social Connector traffic. These values can vary depending on social tagging usage at your company. If you determine that your company will have less social tagging activity than what we used in the dataset for this testing effort, your throughput per front-end Web server might exceed the range of 8 through 10 RPS.

  • The incremental People Search crawl has little effect on the throughput of the farm as long as the farm is maintained in a healthy state.

The test methodology and results in this article provide guidance for planning the capacity of a social computing portal. A social computing portal is a SharePoint Server 2010 deployment where each person in the company can maintain a user profile, find experts in the company, connect with other employees through newsfeeds, and maintain a personal site for document storage and sharing. In addition to the traffic generated by the social computing features, significant collaboration traffic is created by users who upload, share, view, and update documents on their personal sites. These results should help in designing a separate portal dedicated to My Sites and social features.

Different scenarios will have different requirements. Therefore, you must supplement this guidance with additional testing on your own hardware and in your own environment.

After you read this article, you will understand how to:

  • Estimate the hardware that is required for the scale-out that you need to support. This estimate should include the number of users, load, and the features enabled.

  • Design your physical and logical topology for optimal reliability and efficiency. High availability and disaster recovery are not covered in this article.

  • Account for the effect of ongoing People Search crawls and profile synchronizations on the RPS of a social computing portal deployment.

Before reading this article, you should read the following:

If you are interested in reading capacity planning guidance about typical collaboration scenarios, see Enterprise intranet collaboration environment technical case study (SharePoint Server 2010)

noteNote
There is no custom code running on the social computing portal deployment in this lab study. We cannot guarantee the behavior of custom code or third-party solutions that might be installed on your My Site and social computing portal.
noteNote
NTLM authentication was used for this lab study.

The following list contains definitions for key terms found in this article:

  • RPS: Requests per second. RPS is the number of requests received by a farm or server in one second. This is a common measurement of server and farm load.

    Note that requests differ from page loads. Each page contains several components, each of which creates one or more requests when a page is loaded. Therefore, one page load creates several requests. Authentication checks and events that use insignificant resources typically are not counted in RPS measurements.

  • Green Zone: This is the state at which the server can maintain the following set of criteria:

    • The server-side latency for at least 75 percent of the requests is less than 1 second.

    • All servers have a CPU utilization of less than 50 percent.

    noteNote
    Because this lab environment did not have an active search crawl running, the database server was kept at 40 percent CPU utilization or lower to reserve 10 percent for the search crawl load. This assumes that Microsoft SQL Server Resource Governor is used in production to limit the search crawl load to 10 percent CPU utilization.
    • Failure rate is less than 0.01 percent.

  • Red Zone (Max): This is the state at which the server can maintain the following set of criteria:

    • HTTP request throttling feature is enabled, but no 503 (Server Busy) errors are returned.

    • The failure rate for HTTP requests is less than 0.1 percent.

    • The server-side latency is less than 1 second for at least 75 percent of the requests.

    • Database server CPU utilization is less than 80 percent. This allows for 10 percent of utilization reserved for the search crawl load and is limited by using SQL Server Resource Governor.

  • AxBxC (Graph notation): This is the number of front-end Web servers, application servers, and database servers in a farm. For example, 8x1x2 means that this environment has eight front-end Web servers, one application server, and two database servers.

  • VSTS Load: These are threads that are used internally by Visual Studio Team System (VSTS) to simulate virtual users. We increased the VSTS load to generate more and more RPS for the topology.

  • MDF and LDF: SQL Server physical files. For more information, see Files and Filegroups Architecture.

This section summarizes our scaling approach, the relationship between this lab environment and a similar case study environment, and our test methodology.

We recommend that you scale the computers in your environment in a specific order. It is the same approach we took for scaling our lab environment. This approach will enable you to find the best configuration for your workload. The approach we took is as follows:

  1. First, we scaled out the front-end Web servers. They were scaled out as far as possible under the tested workload until the database server was unable to accommodate any more requests from the front-end Web servers.

  2. Until this point, the content databases and the services databases (such as the profile database and the social database) were on the same database server. When we noticed that the database server was the bottleneck, we scaled out the database server by moving the content databases to another database server. After this, the load on the database servers created by the front-end Web servers decreased to the point where we were able to scale out the front-end Web servers even more.

  3. In the lab environment, we did not test the scale out beyond this. However, if you need more scale, the next logical step would be to have two computers share application server responsibilities.

We began with a minimal farm configuration of one front-end Web server, one application server, and one SQL Server-based computer. Through multiple iterations, we stopped at eight front-end Web servers, one application server, and two SQL Server farm configurations. In the Results and analysis section later in this article, you will find a comparison of Green Zone and Max Zone performance characteristics across different iterations. How we discovered the Green Zone and the Max Zone for each iteration is covered in the Results from iterations section.

The lab environment outlined in this article is a smaller scale model of a production environment at Microsoft. Although there are significant differences between the two environments, it can be useful to view them side by side because both are My Site and social computing environments. As a result, the patterns observed should be similar.

The lab environment contains a dataset that closely mimics the dataset from the production environment. The workload that is used for testing is largely similar to the workload seen in the production environment, with few significant differences. The most significant of the differences is that, in the lab environment, we use fewer distinct users to perform the operations and we perform operations on a smaller number of user profiles compared to the production environment. Also, the lab tests occur over a shorter time. All of these affect the number of cache hits that occur for the user profile cache that is maintained on the application server.

The User Profile service caches recently used user profiles on the application server. The default size of this cache is 256 MB, which translates to approximately 500,000 user profiles. Because the number of user profiles that was used in testing was limited to 1,500, and the duration of the tests were less than the recycle time of the cache, cache hits usually occurred. Therefore, the throughput numbers presented in this article are on the higher side. You should definitely account for cache misses in your environment and expect a lower throughput number.

For a detailed case study of a production My Site and social computing portal at Microsoft, see Social environment technical case study (SharePoint Server 2010).

This article provides results from a test lab environment. Because this was a lab environment, we were able to control certain factors to show specific aspects of performance for this workload. In addition, certain elements of the production environment were excluded from the lab environment to simplify testing overhead. Note that we do not recommend omitting these elements for production environments:

  • Between test runs, we modified only one variable at a time to make it easy to compare results between test runs.

  • The database servers that were used in this lab environment were not part of a cluster because redundancy was not necessary for the purposes of these tests.

Search crawl was not running during the tests, whereas it might be running in a production environment. To take this into account, we lowered the SQL Server CPU utilization in our definition of Green Zone and Red Zone (Max) to accommodate the resources that a search crawl would have consumed if it were running at the same time as our tests.

This section provides detailed information about the hardware, software, topology, and configuration of the lab environment.

The following table lists hardware specifications for the computers that were used in this test. Front-end Web servers that were added to the server farm during multiple iterations of the test also complied with these specifications.

 

  Front-end Web server Application server Database server

Processor

2px4c@2.33 GHz

2px4c@2.33 GHz

4px4c@3.10 GHz

RAM

8 GB

8 GB

32 GB

Number of network adapters

2

2

1

Network adapter speed

1 gigabit

1 gigabit

1 gigabit

Load balancer type

F5 - Hardware load balancer

Not applicable

Not applicable

ULS logging level

Medium

Medium

Not applicable

The following table lists software specifications for the computers that were used in this test. Front-end Web servers that were added to the server farm during multiple iterations of the test also complied with these specifications.

 

  Front-end Web server Application server Database server

Operating system

Windows Server 2008 R2 x64

Windows Server 2008 R2 x64

Windows Server 2008 x64

Software version

Microsoft SharePoint 4763.1000 (RTM), Office Web Applications 4763.1000 (RTM)

Microsoft SharePoint 4763.1000 (RTM), WAC 4763.1000 (RTM)

SQL Server 2008 R2 CTP3

Load balancer type

F5 - Hardware load balancer

Not applicable

Not applicable

ULS logging level

Medium

Medium

Not applicable

Antivirus settings

Disabled

Disabled

Disabled

Services running

SharePoint Foundation Incoming E-Mail

SharePoint Foundation Web Application

SharePoint Foundation Workflow Timer Service

Central Administration

Excel Calculation Services

Managed Metadata Web Service

SharePoint Foundation Incoming E-Mail

SharePoint Foundation Web Application

SharePoint Foundation Workflow Timer Service

PowerPoint service

User Profile service

User Profile Synchronization service

Word Viewing service

The following diagram shows the topology for this lab environment:

Farm topology diagram for this environment

The test farm was populated with:

  • 166.5 GB of My Site content, evenly distributed across 10 content databases

  • 27.7 GB of Profile database content

  • 3.7 GB of social database content (GUIDs for social tags, notes, and ratings)

  • 0.14 GB of Managed Metadata database content (text for social tags and corresponding GUIDs)

The following table explains the dataset in detail.

 

Number of user profiles

~150K

Average number of memberships per user

74

Average number of direct reports per user

6

Average number of colleagues per user

28

Number of total profile properties

101

Number of multiple-value properties

21

Number of audiences

130

Number of My Sites

~10K

Number of blog sites

~600

Total number of events in activity feed

798K*

Number of social tags and ratings

5.04 million**

* A social tagging study from del.icio.us suggests that an active user creates 4.2 tags/month. Tags, in this context, refer to any activity of assigning metadata to URLs. This includes keyword tags, ratings, and notes. This means an active user creates 4.2 tags/30 days = 0.14 tags/day. Assuming one-third of the social portal users are tagging, there are 150K/3 × 0.14 tagging events per day. Activity feed tables maintain activity for 14 days. Therefore, the total number of tagging events in the activity feed table equals 150K/3 × 0.14 × 14. In addition to tagging events, if we assume that active users generate one additional event per day, such as a profile property update or status update, we have 150K/3 × 1 × 14 events added to activity feed tables. Thus, the total number of events in the activity feed tables equals 150K/3 × 1.14 × 14 = 798K. Of those events, 98K are tagging activities that may trigger security trimming. The rest of the events will be randomly distributed among status update and profile property changes.

** Assumes that one-third of the population are active users and each creates 4.2 tags per month, where a tag can mean a keyword tag, a note or a rating. Assuming the farm exists for two years, the total number of tags will be 150K/3 × 4.2 × 12 × 2 = 5.04 MB.

The following table explains the disk geometry in detail.

 

Database ContentDB 1, 2, 3, 4 ContentDB 5, 6 ContentDB 7, 8 ContentDB 9, 10 Profile Social Metadata

Database size

61.4 GB

39 GB

32.3 GB

33.7 GB

27.7 GB

3.7 GB

0.14 GB

RAID configuration

0

0

0

0

0

0

0

Number of spindles for MDF

1

1

1

1

6

1

1

Number of spindles for LDF

one physical spindle shared by all databases

Important notes:

  • The tests only model prime-time usage on a typical social computing portal. We did not consider the cyclical changes in user-generated traffic that is seen with day/night cycles. Timer jobs such as Profile Synchronization and People Search Crawl, which require significant resources, were tested independently with the same test workload to determine their effect.

  • The tests focus more on social operations, such as newsfeeds, social tagging, and reading people profiles. It does have a small amount of typical collaboration traffic; however, that is not the focus. We expect these results will help in designing a separate portal dedicated to My Sites and social features.

  • The test mix does not include traffic from the Search Content Crawl. This was factored into our tests, however, by modifying the Green Zone definition to 40 percent SQL Server CPU utilization, instead of the standard 50 percent, to allow for 10 percent CPU utilization for the search crawl. Similarly, we used 80 percent SQL Server CPU as the criteria for Max RPS.

  • In addition to the test mix listed in the following table, we also added eight RPS for each front-end Web server for Outlook Social Connector traffic. We had security trimming turned on. The Secure Token Service showed significant signs of stress as we approached about 8 RPS of Outlook Social Connector’s traffic on a single front-end Web server when obtaining colleague activities. This was a function of the dataset, test workload, and hardware we used in the lab for testing. You might see completely different behavior. To avoid additional stress on the Secure Token Service, we decided to add Outlook Social Connector traffic as a function of the number of front-end Web servers in each iteration. Thus for 1x1x1, we have 8 RPS of Outlook Social Connector traffic, whereas for 2x1x1 we have 16 RPS of Outlook Social Connector traffic, and so on.

The overall test mix is presented in the following table.

 

Test Read/write Percent of mix

Add a colleague.

Write

2.11

Create a rating on a URL, write a note, or tag a URL.

Write

3.22

List operations documents.

Read/Write

2.36

Get published links to model client calls to PublishedLinksService.asmx.

Read

6.92

Get RSS feeds from lists.

Read

3.72

View all items in document libraries and lists on My Site.

Read

1.07

View a blog post.

Read

0.04

View various My Site pages (my content, colleagues, newsfeed, my profile, someone else’s profile, organization browser, memberships, tags, and notes).

Read

3.87

Sync for Shared OneNote files.

Read

10.0

Edit my profile page or status message; update picture.

Write

2.31

Office Web Apps: Open and scroll files (PowerPoint, Word, and Excel).

Read

0.13

List sync with Outlook.

Read

48.16

Upload a document.

Write

0.09

Load pages, document libraries, and folders from the content database.

Read

15.93

Co-author documents.

Read/Write

0.17

The following table describes additional Outlook Social Connector scenario test mix generating 8 RPS per front-end Web server.

 

Auto-sync my colleagues.

Read

4 percent

Auto-sync my colleagues' news feeds.

Read

96 percent

As mentioned earlier, we started with a minimal farm configuration of one front-end Web server, one application server, and one SQL Server-based computer. Through multiple iterations, we finally ended at a farm that has eight front-end Web servers, one application server, and two SQL Server computers. For each of these iterations, we performed step-load tests to determine the Green Zone and Max Zone. In the following table, you will find a comparison of these Green Zone and Max Zone performance characteristics for the different iterations.

The following table and charts provide a summary for comparison and analysis.

The Green Zone performance characteristics across topologies are summarized in the following table.

 

Topology 1x1x1 2x1x1 3x1x1 5x1x1 8x1x1 8x1x2

Green Zone RPS

137.25

278.08

440.72

683.07

793.67

873.4

Green Zone 75th percentile latency

0.12

0.16

0.14

0.16

0.31

0.32

Green Zone front-end Web server CPU

47.84

46.88

48.68

46.13

31.79

36.90

Green Zone application server CPU

9.45

18.88

26.91

35.58

48.73

47.20

Green Zone SQL Server CPU

5.45

10.61

16.46

24.73

30.03

32.40 (17.9 for content DB and 14.5 for services DB)

The following chart shows the variations in the CPU utilization plotted on RPS and offered by different topologies for Green Zone results.

Chart showing CPU utilization with RPS in the Gree

As illustrated in the previous chart:

  • RPS increased throughout as we added more computers to the topologies.

  • It is clear that front-end Web server CPU was the driving factor leading the topology to the boundary of the Green Zone until 5x1x1. At 8x1x1, the application server CPU reached the boundary for the Green Zone before the front-end Web servers could reach the Green Zone boundaries.

  • Throughout the test, the SQL Server CPU remained in very healthy territory.

The following table provides a summary of results across topologies for Max Zone.

 

  1x1x1 2x1x1 3x1x1 5x1x1 8x1x1 8x1x2

Max Zone RPS

203.28

450.75

615.00

971.13

1655

1877

Max Zone latency

0.22

0.23

0.22

0.22

0.31

0.32

Max Zone front-end Web server CPU

75.13

78.17

70.00

67.02

67

71.6

Max Zone application server CPU

12.97

27.07

28.40

48.28

67.1

73.4

Max Zone SQL Server CPU

7.64

16.06

21.00

38.38

79.5

74.9

(45.9 for content DB and 29 for services DB)

The following chart presents variations in CPU utilization plotted on RPS and offered by different topologies for Max Zone results.

Chart showing CPU utilization with RPS in the MaxZ

As illustrated in the previous chart:

  • RPS increased throughout as we added more computers to topologies.

  • It is clear that front-end Web server CPU was the bottleneck until 5x1x1. At 8x1x1, the SQL Server CPU became the bottleneck.

  • Initially, the application server CPU utilization was higher than the SQL Server CPU utilization. However, it seems that the growth rate of the SQL Server CPU utilization is greater than the growth rate of the application server CPU utilization. At higher throughput levels, the SQL Server CPU utilization overtook the application server CPU utilization and became the bottleneck.

The following charts compare throughput and latencies for the Green Zone and Max Zone for different topologies.

Chart showing RPS for each topology Chart showing latency for each topology

As illustrated in the previous charts:

  • Latencies do not vary much with throughput or topologies. In our testing, latencies were all under 0.5 seconds, which is very acceptable.

  • Throughput increase is almost linear.

The following table and chart present the disk I/O that was observed on each database in different topologies. We did not experience disk I/O as a bottleneck, and — looking at the trend — we did not record the data for later topologies.

 

  1x1x1 Max Zone 2x1x1 Max Zone 3x1x1 Max Zone 5x1x1 Max Zone

Reads/Second (Content DB)

21.33

20.80

24.24

22.42

Reads/Seconds (Profile DB)

14.97

17.20

19.82

13.50

Reads/Second (Social DB)

1.81

1.83

2.10

2.01

Writes/Second (Content DB)

50.12

76.24

80.02

99.16

Writes/Second (Profile DB)

9.01

24.31

23.35

38.29

Writes/Second (Social DB)

4.12

9.47

10.63

19.45

Chart showing I/Ops for each topology

We wanted to measure the effect of the People Search crawl on the throughput offered by a configuration and by end-user latencies. For this test, we used the results given by an 8x1x1 configuration as the baseline and started the incremental People Search crawl. The incremental crawl indexed 49,375 items in 53 minutes.

A comparison of the performance characteristics exhibited by the 8x1x1 configuration with and without the People Search incremental crawl is presented in the following table.

 

  Baseline 8x1x1 Green Zone results 8x1x1 with People Search crawl Green Zone results

Throughput (RPS)

1024.00

1026.00

Front-end Web server CPU (percent)

39.84

41.6

Application server CPU (percent)

41.40

43.1

Content/Service SQL Server CPU (percent)

36.63

39.5

Crawl server CPU (percent)

0.52

34.6

SQL Server CPU for Search (percent)

3.62

14.8

As described in this table:

  • RPS almost remained the same. Because there was no resource bottleneck in the 8x1x1 Green Zone, there is no reason for RPS to be affected.

  • The front-end Web server and Content/Service SQL Server CPU utilization became only slightly better.

  • The Crawl server and the SQL Server CPU for search increased from 0.5 percent to 34.6 percent, and 3.6 percent to 14.8 percent.

The application server was not a bottleneck in any of the configurations. Additionally, if you see application server CPU utilization for different VSTS loads in any single configuration, you will notice that it grows and then flattens out. An ideal example of this is seen in the 8x1x1 configuration as shown in the following table.

 

VSTS load 416 616 816 1016 1216 1416 1616

Application server CPU utilization (percent)

37.6

49.4

57.9

61.9

67.1

65.3

63.10

This is expected. In the case of a social portal, most of the operations require dealing with the SharePoint Server User Profile service. Most of the operations require fetching the profile for a user from the Profile database that is provisioned when the User Profile service is created.

To avoid frequent SQL Server round trips, the application server for the User Profile service maintains a cache of user profiles. Initially, as the test environment is warming up, this cache is empty and the application server responds to incoming requests from the front-end Web server by constantly fetching user profiles from SQL Server. These profiles are cached, and then all requests from the front-end Web server can be responded to by the application server without causing a SQL Server round-trip. It does this by looking for the profile in the cache.

Because the number of user profiles used in testing was limited, we saw the application server cache all of those user profiles. Thereafter, it showed an increasing utilization. When all the profiles were cached, it was a steady operation of cache lookups. Therefore, we see the application server CPU utilization stabling down.

The Outlook Social Connector is an add-in that is included with Office 2010 which shows the activities of your SharePoint colleagues in Outlook. This add-in is also available as a free download for 2007 Microsoft Office system and Microsoft Office 2003.

The Outlook Social Connector checks SharePoint Server one time every hour to get the activities for those users who are listed as a colleague of a particular user. It caches those activities each hour. On subsequent checks for colleague activities, the Outlook Social Connector only asks for any new activities since the last time that it checked SharePoint Server. Thus, it follows a very predictable traffic pattern. For a 100,000-people deployment of the Outlook Social Connector and SharePoint Server, assuming all users are using it throughout the whole day, the Outlook Social Connector generates 100,000 requests per hour, which translates to 27.77 RPS.

Displaying colleague activities could lead to the possibility of information disclosure. For example, a URL that is tagged by a colleague may be something confidential that another user does not have access to. In this case, the user can find out about the existence of that confidential piece of content by seeing it in the Outlook Social Connector. To prevent this information disclosure, SharePoint Server filters all activities and shows only those URLs that a user has access to in the activity feeds. This filtering is what we call security trimming. By default, security trimming is turned on. However, it can be turned off by a SharePoint Server administrator.

Not every activity requires security trimming. Out of sixteen activity types that SharePoint Server 2010 supports, only four activities (tagging, Note Board comments, ratings, and distribution list (DL) membership changes) require security trimming. Also, because the Outlook Social Connector asks only for a delta of activities that have happened since the last time that it synced, the number of activities per user that would require security trimming would be reasonably low.

Every request from the Outlook Social Connector that requires security trimming results in an authenticated Windows Communication Foundation (WCF) call to the application server for the Search Service. To get the authentication token to make this call, a WCF call is first made to the Secure Token Service.

We found that if the Outlook Social Connector RPS goes beyond eight RPS per front-end Web server, the Secure Token Service was under stress. The stress on the Secure Token Service might not happen to each user because it is affected by the total number of users and total amount of social tagging that occurs on the activities of a user’s colleagues. In the dataset we created and with the users we used, we probably had enough activities requiring security trimming that we saw this occur. Hence, we increased the Outlook Social Connector traffic as a function of the number of front-end Web servers available. For the 1x1x1 configuration, we generated 8 RPS of Outlook Social Connector traffic. However, for a 2x1x1 configuration, we generated 16 RPS of Outlook Social Connector traffic, and so on.

This means that, for the dataset, test mix, and hardware we had for testing, we could support about 8 × 60 × 60, that is, 28,800 requests per hour. With the way the Outlook Social Connector works, this means that we could have supported 28,800 employees using the Outlook Social Connector on a single front-end Web server that had security trimming turned on. Similarly, we could support 28,800 × 3, which is 86,400 employees using the Outlook Social Connector on three front-end Web servers that have security trimming turned on.

This should help you estimate the hardware that is required to support Outlook Social Connector traffic, but be aware that the results we saw are specific to the dataset, test mix, and hardware we used for testing. Also, remember that you have the option of turning off security trimming by using Windows PowerShell 2.0, or changing the frequency of Outlook Social Connector synchronization with SharePoint Server. Both of these options will have a significant effect on hardware requirements.

The following results are ordered based on the scaling approach described in Overview, earlier in this article.

This section describes the test results that were obtained with one Web server, one application server, and one database server.

  • In addition to the test mix presented earlier in this article, this farm had 8-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with one front-end Web server, one application server, and one SQL Server computer, the front-end Web server was clearly the bottleneck. As presented in the following table, the front-end Web server CPU reached about 90 percent utilization when the farm was subjected to 238 RPS by using the transactional mix that is described earlier in this document.

  • This configuration delivered Green Zone RPS of 137.25, with 75 percent latency being 0.12 seconds and front-end Web server CPU hovering around 47.8 percent utilization. This indicates that this farm can successfully deliver an RPS of about 137.25. Max Zone RPS delivered by this farm was 203.2 with latencies of 0.22 seconds and front-end Web server CPU hovering around 85 percent.

  • Because the front-end Web server was the bottleneck, we added another front-end Web server to the farm for the next iteration.

Various performance counters captured during testing the 1x1x1 farm, at different steps in VSTS load, are presented in the following table.

 

VSTS load 52 77 102 127 152 177

RPS

99.8

147

188

218

238

243

Front-end Web server CPU

33.9

50

71.8

81.1

90.8

89

Application server CPU

7.92

11.7

13.5

14.1

13.9

13.3

SQL Server CPU

4.7

6.48

7.99

8.21

8.41

8.88

75th percentile [seconds]

0.13

0.16

0.17

0.25

0.3

0.45

95th percentile [seconds]

0.29

0.47

0.41

0.55

0.55

0.77

Chart showing RPS and Latency for 1x1x1 topology Chart showing RPS and CPU utilization for 1x1x1 to

This section describes the test results that were obtained with two Web servers, one application server, and one database server.

  • In addition to the test mix presented earlier in this article, this farm had 16-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with two front-end Web servers, one application server, and one SQL Server computer, the front-end Web servers were the bottleneck. As presented in the data here, the front-end Web server CPU reached about 89 percent utilization when the farm was subjected to 520 RPS by using the transactional mix described earlier in this document.

  • This configuration delivered Green Zone RPS of 278, with 75 percent latency being 0.16 seconds, and the front-end Web server CPU hovering around 47 percent utilization. This indicates that this farm can successfully deliver an RPS of about 278 with the test mix and hardware that was used for the tests. Max Zone RPS delivered by this farm was 450 with latencies of 0.23 seconds and the front-end Web server CPU hovering around 78 percent.

  • Because the front-end Web server CPU was the bottleneck in this iteration, we relieved the bottleneck by adding another front-end Web server for the next iteration.

Various performance counters captured during testing the 2x1x1 farm, at different steps in VSTS load, are presented in the following table and chart.

 

VSTS load 104 154 204 254 304 354

RPS

190

278

390

455

500

520

Front-end Web server CPU

36

50.9

71.9

86.9

87.1

89.5

Application server CPU

16

24.9

28.3

26.5

26.5

24.9

SQL Server CPU

8.06

10.6

14.2

16.4

17.9

18.9

75th percentile [seconds]

0.16

0.22

0.22

0.33

0.42

0.53

95th percentile [seconds]

0.42

0.64

0.51

0.69

0.73

0.89

Chart showing RPS and Latency for 2x1x1 topology Chart showing  RPS and CPU utilization for 2x1x1 t

This section describes the test results that were obtained with three Web servers, one application server, and one database server.

  • In addition to the test mix presented earlier in this article, this farm had 24-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with three front-end Web servers, one application server, and one SQL Server computer, the front-end Web servers were the bottleneck. As presented in the data here, the front-end Web server CPU reached about 76 percent utilization when the farm was subjected to 629 RPS by using the transactional mix described earlier in this document.

  • This configuration delivered Green Zone RPS of 440, with 75 percent latency being 0.14 seconds, and the front-end Web server CPU hovering around 48 percent utilization. This indicates that this farm can deliver an RPS of about 440 with the test mix and hardware that was used for the tests. Max Zone RPS delivered by this farm was 615 with latencies of 0.22 seconds and the front-end Web server CPU hovering around 70 percent.

  • Because the front-end Web server CPU was the bottleneck in this iteration, we decided to add more front-end Web servers. Considering the delta between iterations seen previously by addition of one front-end Web server, we decided to add two front-end Web servers. By doing this, we hoped to find that the application server or the SQL Server computer was the bottleneck.

Various performance counters captured during testing the 3x1x1 farm, at different steps in VSTS load, are presented in the following table and charts.

 

VSTS load 156 231 306 381 456 531

RPS

264

393

532

624

634

629

Front-end Web server CPU

30.5

46.3

62.55

72.95

75.4

76

Application server CPU

22.7

35.6

34.2

32.5

32.5

29.4

SQL Server CPU

10.4

14.8

20.8

22.5

22.8

22.4

75th percentile [seconds]

0.17

0.26

0.27

0.28

0.31

0.40

95th percentile [seconds]

0.63

1.08

0.76

0.68

0.88

0.98

Chart showing RPS and Latency for 3x1x1 topology Chart showing  RPS and CPU utilization for 3x1x1 t

This section describes the test results that were obtained with five Web servers, one application server, and one database server.

  • In addition to the test mix presented earlier in this article, this farm had 40-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with five front-end Web servers, one application server, and one SQL Server computer, we saw significant increase in SQL Server CPU and application server CPU utilization, but still, the front-end Web server CPU was the bottleneck. As presented in the data here, the front-end Web server CPU reached about 88 percent utilization when the farm was subjected to RPS of 1315 by using the transactional mix described earlier in this document.

  • This configuration delivered Green Zone RPS of 683, with 75 percent latency being 0.16 seconds, and the front-end Web server CPU hovering around 46 percent utilization. This indicates that this farm can successfully deliver an RPS of about 683 with the test mix and hardware that was used for the tests. Max Zone RPS delivered by this farm was 971 with latencies of 0.22 seconds and the front-end Web server CPU hovering around 68 percent.

  • Looking at the trend, we decided to add three front-end Web servers and test for the 8x1x1 configuration. We hoped to find the application server or the SQL Server to be a bottleneck with that configuration

Various performance counters captured during testing the 5x1x1 farm, at different steps in user load, are presented here. Because we saw no significant effect of VSTS load or configuration changes on latency, we stopped recording it.

 

VSTS load 260 385 510 635 760 885

RPS

359

560

901

1188

1281

1315

front-end Web server CPU

20.5

34

56.2

77.5

86.1

88

Application server CPU

40.2

50.6

66.9

71.3

66.3

58.7

SQL Server CPU

13.9

20.3

34.9

53.6

58.4

64

Chart showing  RPS and CPU utilization for 5x1x1 t

This section describes the test results that were obtained with eight Web servers, one application server, and one database server.

  • In addition to the test mix presented earlier in this article, this farm had 64-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with eight front-end Web servers, one application server, and one SQL Server computer, the SQL Server CPU was finally the bottleneck. As presented in the data here, the SQL Server CPU reached about 80 percent utilization when the farm was subjected to RPS of 1664 by using the transactional mix described earlier in this document.

  • This configuration delivered Green Zone RPS of 793, with 75 percent latency being 0.31 seconds, and SQL Server CPU hovering around 30 percent utilization. However, application server CPU utilization was about 48 percent. This indicates that this farm can successfully deliver an RPS of about 793 with the test mix and hardware that was used for tests. Max Zone RPS delivered by this farm was 1655 with latencies of 0.31 seconds and SQL Server CPU hovering around 80 percent.

  • Because the SQL Server CPU was the bottleneck in this iteration, we relieved the bottleneck by separating the content database and services database on two instances of SQL Server for the next iteration.

Various performance counters captured during testing the 8x1x1 farm, at different steps in VSTS load, are presented in the following table and chart.

 

VSTS load 416 616 816 1016 1216 1416 1616

RPS

664

1101

1359

1530

1655

1664

1617.00

Front-end Web server CPU

26.7

44.4

54.7

61.5

67

65.9

65.10

Application server CPU

37.6

49.4

57.9

61.9

67.1

65.3

63.10

SQL Server CPU

23.2

42

57.9

69.5

79.5

80.8

77.30

Chart showing RPS and CPU utilization for 8x1x1 to

This section describes the test results that were obtained with eight Web servers, one application server, and two database servers.

  • In addition to the test mix presented earlier in this article, this farm had 64-RPS traffic from the Outlook Social Connector asking for feed events by a user.

  • On a farm with eight front-end Web servers, one application server, and two SQL Server computers, we could take the configuration to its extreme. The front-end Web server and application server were both bottlenecks, whereas the combined SQL Server utilization was also in the higher 70s. The farm exhibited RPS of 1817 at maximum load.

  • This was the last iteration we tried. But clearly, if you need more scale, the next step would be to use two computers to perform application server duties. That would enable you to have many more front-end Web servers and therefore reduce the load on each front-end Web server.

Various performance counters captured during testing the 8x1x2 farm, at different steps in VSTS load, are presented in the following table and chart.

 

VSTS load 466 666 866 1066 1266 1416

RPS

466.00

873.40

1431.00

1703.00

1766.00

1817.00

Front-end Web server CPU

19.90

36.90

57.60

68.00

71.40

71.60

Application server CPU

29.80

47.20

63.50

71.40

71.90

73.40

Total SQL Server CPU

19.61

32.40

55.20

63.60

68.50

74.90

Content SQL Server CPU

9.93

17.90

31.90

40.10

42.30

45.90

Services SQL Server CPU

9.68

14.50

23.30

23.50

26.20

29.00

Chart showing RPS and CPU utilization for 8x1x2 to

Gaurav Doshi is a Program Manager at Microsoft

Wenyu Cai is a Software Engineer in Test at Microsoft

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft