Capacity requirements for the Web Analytics Shared Service in SharePoint Server 2010

 

Applies to: SharePoint Server 2010

Initial capacity testing was performed for a simulated midsized deployment of Microsoft SharePoint Server 2010 and other applications that included 30,000 SharePoint entities. This article describes the results of the capacity testing activities and contains guidance on capacity management for the Web Analytics service application in SharePoint Server 2010.

In SharePoint Server 2010, the Web Analytics service application enables you to collect, report, and analyze the usage and effectiveness of SharePoint Server 2010 sites. Web Analytics features include reporting, Web Analytics workflow, and Web Analytics Web Part. For more information, see Reporting and usage analysis overview (SharePoint Server 2010).

The aspects of capacity planning that are described in this article include the following:

  • Description of the architecture and topology.

  • Capacity planning guidelines based on the key factors such as total expected traffic and number of SharePoint components.

  • Description of the other factors that affect the performance and capacity requirements.

Before you continue to read this article, make sure that you understand key concepts related to SharePoint Server 2010 capacity management. The resources that are listed in this section can help you learn about frequently used terms and get an overview of the recommended approach to capacity management. These resources can also help you use the information that is provided in this article more effectively.

For more conceptual information about performance and capacity management, see the following articles:

In this article:

  • Introduction

  • Hardware specifications and topology

  • Capacity requirements

Introduction

Overview

As part of SharePoint Server 2010, the Web Analytics service application is a set of features that you can use to collect, report, and analyze the usage and effectiveness of a SharePoint Server 2010 deployment. You can organize SharePoint Web Analytics reports into three main categories:

  • Traffic

  • Search

  • Inventory

SharePoint Web Analytics reports are typically aggregated for various SharePoint entities, such as sites, site collections, and Web applications for each farm. To view an architectural overview of the Web Analytics service application in a SharePoint deployment, see Architectural overview later in this article.

The Web Analytics shared service requires resources primarily at the application server and database server level. This article does not cover the Web Server layer capacity planning, because the Web Analytics service’s capacity requirements are minimal at this level.

This article contains the capacity requirements for several application servers and Microsoft SQL Server based computers, according to the following criteria:

  • Total expected site traffic (clicks, search queries, ratings).

  • Number of SharePoint components (Site, Site Collection, and Web Application) for each farm.

Other less significant factors which can affect the capacity requirements are summarized in Other factors later in this article.

Architectural overview

The following diagram (Figure 1) shows the flow of the site usage data from a Web browser to the analytics databases, and then back to the Web browser as reports. The usage data is logged to the usage files on the Web servers. The usage timer job calls the Logging Web Service to submit the raw data from the usage files. The Logging Web Service writes it to the staging database, where the raw data is stored for seven days (this is not configurable). The Web Analytics components Log Batcher and User Behavior Analyzer clean and process the raw data on the staging database. The Report Consolidator runs one time every 24 hours. The Report Consolidator aggregates the raw data from the staging database on various dimensions, and then writes it to the reporting database. The aggregated data is stored in the reporting database for a default period of 25 months (this is configurable).

SharePoint Server 2010 Web Analytics Architecture

Figure 1. SharePoint Server 2010 Web Analytics architectural overview

The performance of the Logging Web Service primarily depends on the number of application servers. (Scaling out is available for the application servers.) The performance of the Log Batcher and User Behavior Analyzer depends primarily on the analytics staging database. The Read and Write activities that are performed by all the different components can cause the analytics staging database to slow down the process. (Scaling out is available for the staging database.) The performance of the Report Consolidator also primarily depends on the reporting database. (Scaling out of reporting database is not supported.)

Hardware specifications and topology

This section provides detailed information about the hardware, software, topology, and configuration of a case study environment.

Hardware

Note

This environment is scaled to accommodate initial builds of SharePoint Server 2010 and other products. Therefore, the deployed hardware has larger capacity than necessary to serve the demand typically experienced by this environment. This hardware is described only to provide additional context for this environment and serve as a starting point for similar environments. It is important to conduct your own capacity management based on your planned workload and usage characteristics. For more information about the capacity management process, see Performance and capacity management (SharePoint Server 2010).

Web servers

This article does not cover the Web server layer capacity planning, because the Web Analytic service’s capacity requirements are minimal at this level.

Application servers

The following table describes the configuration of each application server. Based on the site traffic and the number of SharePoint components that are involved, users will need one or more application servers.

Application server Minimum requirement

Processors

4 quad core @ 2.33 GHz

RAM

8 GB

Operating system

Windows Server 2008, 64 bit

Size of the SharePoint drive

300 GB

Number of network adapters

1

Network adapter speed

1 GB

Authentication

NTLM

Load balancer type

SharePoint Load Balancer

Software version

SharePoint Server 2010

Services running locally

  • Central Administration

  • Microsoft SharePoint Foundation Incoming E-mail

  • Microsoft SharePoint Foundation Web Application

  • Microsoft SharePoint Foundation Workflow Timer Service

  • Search Query and Site Settings Service

  • SharePoint Server Search

  • Web Analytics Data Processing Service

  • Web Analytics Web Service

Database servers

Instances of SQL Server were required for both the staging and reporting databases. The following table describes the configuration of each database server.

Database server Minimum requirement

Processors

4 quad core @ 2.4 GHz

RAM

32 GB

Operating system

Windows Server 2008, 64-bit

Disk size

3 terabytes

Note

Although we used this disk size for our capacity testing, your environment will likely require a much larger disk size to support Web Analytics.

Number of network adapters

1

Network adapter speed

1 GB

Authentication

NTLM

Software version

SQL Server 2008

Note

We used the configuration that is described in the previous table for our capacity testing. Your environment will likely require fast, enterprise-class storage to support Web Analytics. For example, you will want to use a multi-disk RAID array or a similar disk configuration to increase Input/Output Operations per Second (IOPS) for daily incremental synchronization. In addition, we recommend that you spread the data load for SQL Server across multiple disk spindles.
For more information about best practices for configuring SQL Server, see the following resources:

Topology

The following diagram (Figure 2) shows the Web Analytics topology.

Web Analytics Topology

Figure 2. Web Analytics topology

Capacity requirements

Testing methodology

This section presents the capacity requirements with regard to the total amount of site traffic (this is measured by number of clicks, search queries, and ratings) per day that can be supported by different numbers of application servers and SQL Server based computers. The numbers presented currently are for a midsize SharePoint deployment that has about 30,000 SharePoint entities. The Web Analytics shared service aggregates the data for each day. Therefore, the data volume that is presented corresponds to the total number of records (this is measured by number of clicks, search queries, and ratings) that the SharePoint farm is expected to receive each day.

This section provides diagrams that show the daily site traffic that can be supported by one, two, or three application servers (Figure 3) and the daily site traffic that can be supported that corresponds to the various database configurations (Figure 4). In the diagrams, data is shown by using two colors:

  • Green   Green values indicate the safe limit for the site traffic that can be processed for the corresponding number of application servers and SQL Server based computers.

  • Yellow   Yellow values indicate the expected limit for the site traffic that can be processed for the corresponding number of application servers and SQL Server based computers.

The green and yellow values are estimates that are based on two key factors:

  • Total site traffic, measured by number of page view clicks, search queries, and ratings.

  • Number of SharePoint entities, such as sites, site collections, and Web applications, for each farm.

The estimates also depend on other properties of the data and the data retention period in the reporting database. For testing, the other properties of the data were maintained as constant as described in Dataset description later in this section.

Also, in smaller SharePoint deployment environments, you can share the application servers and SQL Server based computers together with other SharePoint services and databases. This article contains information about the capacity of the application servers and the SQL Server based computers that are in a test environment so that the Web Analytics shared service is the only major service that is running on the servers. The actual performance results for environments that actively use other shared services at the same time running might vary.

To determine the capacity requirements for your environment, make sure that you estimate the expected daily site traffic and the number of components that you might use for a SharePoint deployment. Then, the number of application servers and SQL Server based computers should be estimated independently, as shown in Figure 3 and Figure 4.

Dataset description

The dataset that was selected for the test environment has approximately 30,000 SharePoint components, which includes all web applications, site collections, and sites. Other characteristics of the data that were kept constant in the environment are also listed in the following table.

Dataset characteristics Value

Number of SharePoint components

30,000

Number of unique users

117,000

Number of unique queries

68,000

Number of unique assets

500,000

Data size in the reporting database

200 GB

The total site traffic, measured by number of clicks, search queries, and ratings, was increased as part of this case study to establish the number of records that can be supported by the corresponding topology.

Important

Some typically used topologies generally exceed the capacity planning guidance. Those topologies include the following:

  • Team sites

  • My Site Web sites

  • Self-provisioning portals

If you anticipate that you might exceed the capacity planning guidelines, we recommend that you turn off the Web Analytics service application. For more information about how to turn off a service application, see Starting or stopping a service.

Application servers

The following diagram (Figure 3) shows the daily site traffic that can be supported by one, two, or three application servers. The site traffic is represented in millions of records (each click, search query, or rating makes up a record) each day. The yellow line represents the expected number of records for the corresponding topology, whereas the green line represents the safe assumption for the number of records.

Daily Site Traffic vs Application Servers Topology

Figure 3. Daily site traffic vs. the application servers topology

The application servers are not very CPU-intensive or memory intensive. Thus, the CPU and the memory usage are not summarized for this section.

SQL Server based computers

The following diagram (Figure 4) shows the daily site traffic that can be supported that corresponds to the following configurations:

  • One instance of SQL Server for both staging and reporting databases (1S+R).

  • Two instances of SQL Server, one staging database and one reporting database (1S1R).

  • Three instances of SQL Server, two staging databases and one reporting database (2S1R).

The site traffic is represented in millions of records (each click, search, or rating makes up a record) each day. The yellow line represents the expected number of records for the corresponding topology, whereas the green line represents the safe assumption for the number of records.

Daily Site Traffic vs SQL Server Topology

Figure 4. Daily site traffic vs. SQL Server topology

The following table summarizes the CPU and memory usage of the various components on the instances of SQL Server that are hosting the staging database and the reporting database.

Configuration 1S+R 1S1R 1S1R 2S1R 2S1R

Staging + Reporting

Staging

Reporting

Staging

Reporting

Total sum of percentage of processor time for 8 processor computer

19

192

5.78

100

13.4

SQL Server buffer hit ratio

99

100

100

100

100

% Disk time

7,142

535

5.28

59.3

98.2

Disk queue length

357

28.6

0.26

2.97

4.91

Other factors

Many other factors can affect the performance of various analytics components and can affect the capacity planning. These factors primarily affect the performance of the Report Extractor component because they can affect the size of the data aggregated each day. The total size of the data in the reporting database also affects the performance of the Reporting Extractor, although this is not significant because the data is partitioned daily. Some of these other factors are as follows:

  • Number of unique queries each day.

  • Number of unique users each day.

  • Total number of unique assets clicked each day.

  • Existing data size in the reporting warehouse, based on the data retention in the warehouse.

The overall effect of these factors is less significant than the total data volume and the number of site entities. However, it is important to conduct your own capacity management based on your planned workload and usage characteristics. For more information about the capacity management process, see Performance and capacity management (SharePoint Server 2010).

See Also

Concepts

Performance and capacity management (SharePoint Server 2010)
SharePoint 2010 Administration Toolkit (SharePoint Server 2010)