Export (0) Print
Expand All

Designing large lists and maximizing list performance (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

Topic Last Modified: 2011-09-24

This article provides information about the performance of large document libraries and large lists. The recommendations in this article are the result of a series of performance tests that were conducted with Microsoft SharePoint Server 2010. The focus of this article is on the performance characteristics of large lists and on how different configurations affect the performance of large lists and of the farm. SharePoint Server 2010 has several new improvements that make it easier to create and use large lists. However, creating and implementing large lists still requires careful planning. You must consider many factors, such as information architecture, performance, disaster recovery, and governance. This article covers information architecture and features that are used to implement large lists, and it covers the effect that specific configurations have on performance.

There are also some key design choices that you must make that can affect large list performance. These include permissions, the number of columns added to a list, the number of lookup columns in views, and the folders and indexes that are used to organize a list. These decisions affect performance of lists, and this effect becomes much bigger as the size of a list increases. This article explains how different design choices affect the performance of large lists so you can appropriately design a large list that meets performance requirements while also meeting business requirements and providing a good user experience.

Although this article is specific to SharePoint Server 2010, the throttles and limits also apply to Microsoft SharePoint Foundation. This article covers several features that greatly enhance the experience of working with large lists, and these features are available only in SharePoint Server 2010. No distinction between SharePoint Foundation and SharePoint Server is made in this article.

In this article:

There are three main methods that can be used for accessing list data: list views with metadata navigation, Content Query Web Part, and search. Each method has pros, cons, and particular uses to which they are well suited.

List views always access the Microsoft SQL Server database. This results in slower query performance and larger load on SQL Server resources than other methods. List views also render the most HTML, which results in slower page load times than other methods. List views do provide the best experience for users to configure views, dynamically filter data, and perform actions on documents, such as manage versions and edit properties. You can use metadata navigation to filter list view results. You should use list views when you must have rich column data and access to list item actions. In high read and query scenarios, you should consider using other query methods.

The Content Query Web Part displays a statically configured view of data that is cached by using the Portal Site Map Provider for better performance. The Content Query Web Part renders the least HTML and is cached. The result is faster page load times and it makes it easier to have multiple queries on one page. The Content Query Web Part should be used to show links to related list items, documents, or pages. Although the Content Query Web Part can also be configured not to be cached, you should use this configuration only on pages for which throughput requirements are low or on pages for which the cache is not useful, such as where queries will change based on the user who accesses the page.

Search Web Parts can be used to offload queries to a system that is optimized for finding content (versus editing properties and seeing the updates in real time). Search Web Parts can be configured to use static or user-specified queries. Search Web Parts have good performance. However, the data is only as current as the most recent crawl. This means results are older than results from list views and Content Query Web Parts.

There are some common large list scenarios, and depending on the scenario, you will make different design decisions. For example, in a collaborative large list scenario, users are frequently adding content and updating properties. In this kind of scenario, you would not want the list size to grow into millions of items because it will be difficult to filter content and because content is frequently updated and changing. If you are working with unstructured document libraries, this article can help you understand the throttles and limits that protect SQL Server performance. The following table shows considerations for large list scenarios.

 

Scenario List size Management Ratio of read/update/add New content Users

Unstructured document library

Hundreds

No manager

High reads, balanced adds and updates

Manual upload

Tens

Collaborative large list or library

Thousands

Informal subject owners

High reads, more updates than adds

Manual upload

Hundreds

Structured large repository

Tens of thousands

Dedicated content steward

Very high reads, fewer adds, and significantly fewer updates

Submission and upload

Tens of thousands

Large-scale archive

Millions

Team of content stewards

Low reads and updates, high adds

Submission

Tens of thousands

The unstructured document library is often used for a team or a workgroup and typically has tens to hundreds of documents. These libraries can grow to be greater than the list view threshold without any planning, which can affect operations, such as adding columns. One potential problem is that users might get list view threshold exceptions if views grow to be greater than 5,000 items. This can be reduced by monitoring libraries that are approaching the list view threshold. (A meter is displayed on the library settings page of a document library to indicate that the document library is approaching the list view threshold.)

This scenario typically has tens or even hundreds of users, but few concurrent users. Therefore, load inside a single library is rarely an issue. However, there can be many of these kinds of libraries. Rather than planning to support specific instances, it is more important to focus on supporting the scale of these libraries.

The collaborative large list ranges from hundreds to thousands of items and is used as storage for a large amount of active content. Collaborative large lists generally include knowledge management solutions, engineering libraries, and sales and marketing collateral repositories. Users actively add and edit content (a large number of read operations and write operations). You can put structure and management in place to keep the library organized. However, because a great deal of work is performed by users, events might occur that are beyond the control of administrators. This can make it easy for the list to grow faster than expected or past the limits it was planned for. This kind of repository can have hundreds or thousands of users with tens or even hundreds of concurrent users.

Compared to a structured repository or archive, a collaborative large list is more prone to administrative changes such as adding and deleting folders, adding content types and columns, or reorganizing content. These actions can be prevented by the list view threshold due to the size of the list.

The structured large repository ranges from thousands to hundreds of thousands of items. The content is usually final and is submitted by users or system processes, such as workflows. Structured large repositories are typically used for departmental records archives, high value document storage, and final documents that are displayed on Web pages. The content is generally structured and highly managed so that it is easier to control the growth of the list. This scenario can have tens or hundreds of concurrent users and a user base of thousands. The percentage of reads is much larger than writes. However, there still might be updates to content, and content might frequently be added and deleted. A knowledge management repository for a division or organization is an example of a structured large repository.

In this scenario, it is important to thoroughly understand user needs and to perform comprehensive testing before the solution goes live. Therefore, the solution is fairly complete and final before it is filled with a large amount of content. For example, configuration of appropriate metadata navigation hierarchies and filters might be necessary to provide an appropriate content browse experience.

A large-scale archive ranges from thousands to millions of items, either in a single list or spread across multiple lists, or at the highest end, multiple site collections. This scenario typically has a low amount of reads and updates and is generally used only as long-term storage for documents that must be retained for compliance or other reasons (for example, documents that must be retained for seven years to meet legal requirements). High throughput of the submission and deletion of documents is important in this scenario. Search is the main method for retrieving content.

The features that helped with large lists in Office SharePoint Server 2007 still help with SharePoint Server 2010, and many of them are improved to provide better performance at large scale. SharePoint Server 2010 also has many new features that help improve the performance of large lists and that enable users to use large lists effectively. This section summarizes new and improved features in SharePoint Server 2010.

The following sections discuss the features from Microsoft Office SharePoint Server 2007 that are improved in SharePoint Server 2010.

You can configure the Content Query Web Part to display results by filtering on lists, content types, and columns. You can sort results and select columns to be displayed. Doing this makes the Content Query Web Part ideal for displaying large list content on Web pages. Content Query Web Parts are generally cached. This allows for faster page loads and less database load. One usage of Content Query Web Parts in knowledge management scenarios is to use them on publishing pages to display links to documents related to the content of the Web page.

SharePoint Server 2010 provides performance improvements in the following key scenarios:

  • Optimizing single list queries to take advantage of indices more effectively

  • Improving invalidation and refresh algorithms and default settings to improve the cache utilization when users perform write operations

SharePoint Server 2010 brings new search capabilities that include a search term refinement panel and improved scalability that has support for sub-second query latency with 100 million documents. There is also Microsoft FAST Search Server 2010 for SharePoint, which can be used to reach larger scale points than SharePoint Server 2010 Search.

Some of the new search improvements that help with finding content in large lists include support for Boolean operators in free text queries; improved operator support such as equal to, less than, and greater than; range refinements; and prefix matching on keywords and properties. For example, the query “share*” finds results that include “SharePoint.” Search also has query suggestions that make recommendations based on what the user is typing for a query. The search user interface is also improved with panels for related searches, best bets, related people, and keyword refinements.

SharePoint Server 2010 Search also improves capabilities around scale. SharePoint Server 2010 Search supports scaling out of index, crawl, and query servers. Other improvements include fresher indexes, better resiliency, and higher availability. FAST Search Server 2010 for SharePoint includes all of the SharePoint Server 2010 Search capabilities and adds scale for extreme demands, entity extraction, tunable relevance ranking, visual best bets, thumbnails, and previews.

The Document Center and Records Center are SharePoint Server 2010 site templates that you can use to create structured repositories. The Document Center site template includes features such as preconfigured Content Query Web Parts for returning relevant results by logged-in users and a document library with metadata navigation configured.

The Records Center site template resembles the Document Center site template, but it has the content organizer feature enabled for routing documents and has a record library in which items that are added to it are automatically declared records and cannot be deleted. The Records Center site template is the only out-of-box site template that does not have the document parser enabled, which preserves the fidelity of submitted content. Disabling the document parser affects the performance of certain operations, which makes it more suitable for large-scale document storage (tens of millions of items) than other site templates.

This section describes new features in SharePoint Server 2010 that help with managing large lists and list performance.

The content organizer can be used on any site to route content to particular document libraries, folders, or even other sites. The content organizer can be used to automatically create folders for content based on metadata properties. Users can submit content to the content organizer from other sites and not worry about where it is stored within the file plan. The content organizer can be used to balance content into different folders to automatically maintain a maximum size for each folder. When the specified size limit is reached a new subfolder is created to contain additional items.

Metadata navigation is a new SharePoint Server 2010 feature that enables users to dynamically filter lists so they can find what they need. Metadata navigation enables users to select filter options, and it handles performing the query in the most efficient manner possible. Metadata navigation consists of two parts. One part is a set of navigation controls that let a user filter a list that has navigation hierarchies and key filters. The second part is a mechanism for rearranging and retrying queries.

Metadata navigation has retry and fallback logic that attempts to perform queries efficiently by using indexes. If a query will return too many results, the query falls back and returns a subset of the results for better performance. If no appropriate query can be made, fallback occurs and the filters are performed on a limited set of results. Metadata navigation automatically creates indexes. Together retry, fallback, and index management make metadata navigation a very important part of working effectively with large lists. There are two kinds of filtering mechanisms: navigation hierarchies and key filters.

Navigation hierarchies use a tree control to navigate hierarchies of folders, content types, choice fields, or managed metadata term sets. This enables users to use a tree control to pivot on a metadata hierarchy, much as how they navigate folders. When users select an item in a hierarchy for a managed metadata column, all items that match the specified term or any of its descendant child terms are displayed. This is called descendant inclusion, and it can be used on fields that are tied to a managed metadata term set. Users can select the item again to filter on only that particular term and not include the descendant child terms. All metadata navigation queries are recursive and display results from all the folders in the list.

Key filters can be configured to perform additional filtering of results within the hierarchy. For example, you can add the Modified By column as a key filter and then type a user name to obtain results where Modified By matches the entered user. For more information, see Metadata navigation and filtering (http://go.microsoft.com/fwlink/p/?LinkId=219154). The following figure shows an example of metadata navigation hierarchies and key filters.

Screenshot of key filters list

Managed metadata is a new set of features that add more information architecture capabilities to SharePoint Server. The managed metadata features include a shared service called the managed metadata service. The managed metadata service can be used to store term sets that can be reused throughout a SharePoint Server 2010 deployment. Some of the managed metadata features include the following:

  • Term sets that support flat or deep hierarchies

  • Managed metadata column type that uses term sets as available properties

  • Term sets that can be open so anyone can add new terms, or restricted so only specific users can manage the term set

By using managed metadata columns and term sets to organize content, you can make use of features such as the Content Query Web Part and metadata navigation to help users find and discover content. Managed metadata also helps with regular search queries because it adds keywords that can be used to classify documents. Managed metadata can be used in the search refinement panel. The following figure shows an example of managed metadata navigation.

Screenshot of terms

SharePoint Server 2010 introduces several configurable limits to help maintain farm performance. At the Web application level there are now configurable throttles and limits. These have been added so that operations from individual users or processes do not adversely affect farm performance. For example, the list view threshold is a limit that prevents queries that affect more than a certain number of list items.

Indexes are important for large lists. In SharePoint Server 2010, you can now create compound indexes. Compound indexes are useful when queries are commonly performed on two columns because a query on just one column might not be selective enough. Compound indexes are not utilized by views. However, they are utilized by metadata navigation. When a throttling condition occurs, the metadata navigation logic can try again and use applicable compound indexes and single indexes for the chosen filter conditions to find complete or partial results that satisfy the query.

The developer dashboard displays detailed diagnostic information for each page load. By default the dashboard is off. However, you can manually turn it on or configure the dashboard to stay on all the time. When the developer dashboard is turned on, you can use it to get information about database queries, load times, and errors. The developer dashboard makes it easier to analyze and diagnose performance issues quickly. The following figure shows the developer dashboard. The metadata navigation feature is visible in the developer dashboard in the case of large lists and throttling conditions where the list of indexes that is used for retry and partial results appears in the operation tree on the left and the different indexed SQL Server queries that are attempted appear in the list on the right.

Screenshot of Developer Dashboard

The developer dashboard is also useful for debugging custom Web Parts and queries. For more information about how to enable the developer dashboard, see Blog: Enable the Developer Dashboard using the Object Model / Powershell (http://go.microsoft.com/fwlink/p/?LinkId=219613).

The content iterator developer API simplifies writing code against large lists and is especially important with the new list view threshold limit. The content iterator is a method for retrieving content to perform operations on it in small sets rather than performing operations on the entire set of data. This prevents operations from exceeding the list view threshold.

By default, SharePoint Server 2010 stores file (Binary Large Object (BLOB)) data in SQL Server databases. A large amount of a content database is commonly BLOB data. Remote BLOB Storage (RBS) allows this data to be stored outside of SQL Server, which makes it possible to have less expensive storage options and reduce content database size. Remote BLOB Storage is a library API set that is incorporated as an add-on feature pack for SQL Server 2008. A third-party remote BLOB storage provider is required to make use of the Remote BLOB Storage API.

This section includes an explanation of the testing methodology that was used for tests that are discussed in this article. Deviations from this methodology are noted where data is presented.

The test farm configuration is specified in the following figures and table. Two aspects of the test configuration were significantly different from most real-world deployments. Specifically:

  • NTLM authentication was used to avoid having the domain controller become the bottleneck. This resulted in a small performance improvement.

  • The application server contained a SQL Server instance that was used for the logging database. This was done to reduce load on the main SQL Server instance because the logging level was much higher than in real-world deployments.

Topology diagram for this test farm

 

Computer name Two Web servers One application server One database server

Role

Front end Web server

Application server

SQL Server instance

Processor(s)

2px4c@2.33 GHz

2px4c@2.33 GHz

4px2c@3.19 GHz

RAM

8 GB

8 GB

32 GB

Operating system

Windows Server 2008 R2 x64

Windows Server 2008 R2 x64

Windows Server 2008 R2 x64

Storage and its geometry (including SQL Server disks configuration)

50 + 18 + 205 GB

50 + 18 + 300 GB

Disk array – 15 disks of 450 GB @ 15 K RPM

Number of NICs

2

2

2

NIC speed

1 gigabit

1 gigabit

1 gigabit

Authentication

NTLM

NTLM

NTLM

Software version

SharePoint Server 2010 (Pre Release)

SharePoint Server 2010 (Pre Release),

SQL Server 2008 CTP 3

SQL Server 2008 CTP 3

Number of SQL Server instances

 N/A

 1

 1

Load balancer type

Hardware

N/A

N/A

Output cache settings

 

 

 

Object cache settings

 

 

 

BLOB cache settings

 

 

 

ULS logging level

Medium

Medium

Medium

Usage database location

 

 X

 

Usage database settings (what is being logged)

 

 

 

IRM settings

 None

None

None

Anti-virus setting

None

None

None

 

Database type Number of databases RAID configuration

Temp DB

1

 0

Config DB

 1

 0

Content DB #1

 1

 0

Profile DB

 1

 0

Search DB

 1

 0

Taxonomy DB

 1

 0

Tests were conducted at an optimal load point, or green zone, with a general mix of operations. To measure particular changes, tests were conducted at each point that a variable was altered. To find the optimal load point, additional threads were added to saturate the environment while remaining under the following metrics:

  • 75th percentile latency is less than 1 second

  • Front-end Web server CPU is less than 50 percent

  • SQL Server CPU is less than 50 percent

  • Application server CPU is less than 50 percent

  • Failure rate is less than 0.01 percent

The following table defines the test and provides an overview of the process that was used for each test.

 

Test name Test description

Document Upload

Upload a document.

Edit and update the properties of the document.

Document Upload and Route

Upload a document.

Edit and update the properties of the document.

Route the document matching a routing rule.

Document Download

Download a document.

Access Document Library

Access a document library list view page.

Access Home Page with Content Query Web Parts

Access a Document Center site home page that has three Content Query Web Parts.

Cached Content Query Web Part returns 15 highest rated documents.

Cached Content Query Web Part returns the 15 newest documents.

Non-cached Content Query Web Part returns the 15 most recent items that were modified by the current user.

Managed Metadata Fallback Query

A list view query that returns more than 5,000 results, filtering on a single value managed metadata column.

Managed Metadata Selective Query

A list view query that returns 1,000 results, filtering on a single value managed metadata column.

Content Type Fallback Query

A list view query that returns more than 5,000 results, filtering by content type.

Content Type Selective Query

A list view query that returns 1,000 results, filtering by content type.

The makeup of each test mix varied. Tests were conducted by using a Visual Studio Test System. Specific data points for each test were populated, and then the test mix was run for 2 minutes of warm up and 10 minutes of data collection. The results that are presented in this article are averaged over those 10 minutes. The following table shows the percent of each solution in one test mix.

 

Solution name Percent in the mix

Document Upload (including editing document properties)

20

Document Download

20

Access Document Library

20

Access Home Page with Content Query Web Parts

10

Managed Metadata Fallback Query (more than 5,000 results)

5

Managed Metadata Selective Query (100 results)

10

Content Type Fallback Query (more than 5,000 results)

5

Content Type Selective Query (100 results)

10

Limits prevent operations that adversely affect farm performance. These defaults have been tested and carefully chosen. For some limits, such as the list view threshold, we strongly recommend not changing the value. Carefully consider the effect of changing these limits. If these limits prevent an operation from being performed, first consider changing the operation to run in a more efficiently indexed way rather than changing the limit to accommodate poorly performing operations. Most of the throttles and limits that are covered in this section can be configured in Central Administration by going to Manage Web Applications, and selecting General Settings – Resource Throttling from the ribbon for a particular Web application.

SharePoint Server 2010 supports document libraries and lists that have tens of millions of items. You can create very large document libraries by using folders, standard views, site hierarchies, and metadata navigation. To retrieve data from large lists by using list views or Collaborative Application Markup Language (CAML) queries, the data must be partitioned by using folders or indexes or both. Otherwise, search is the only mechanism that can be used to access the data efficiently. The number of items that a single document library can support can vary depending on how documents and folders are organized, the size and kind of documents stored, the usage of the document library, and the number of columns in the document library.

tipTip
With the introduction of the list view threshold, you might wonder how this relates to the recommendation of 2,000 items per folder with Office SharePoint Server 2007. It might seem that rather than 2,000, the new limit is 5,000. If users primarily browse content by using folders, then this is the same concept. However, with the introduction of metadata navigation retry and fallback, large queries will return a subset of results for better performance. This means that you can have thousands of items in folders and performance will be protected if queries return too many results.

By default, the list view threshold prevents operations that will involve more than 5,000 items, such as queries that will return more than 5,000 items or adding a column to a list that contains more than 5,000 items. Although this is a configurable default, we strongly recommend that you keep it. If poorly performing queries are used on lists with more than 5,000 items, overall throughput can significantly decrease when you increase this limit.

Some operations, such as non-indexed queries or adding a column to a list, take time and resources that are proportional to the number of items in the list. On a small list this does not matter because there are so few items that the operation is fast. As the list size increases, these operations take longer and use more resources. Rather than let these operations run unbounded, the list view threshold blocks them. You can think of the list view threshold as a guard rail along a highway letting you know that you should change the query and how data is accessed, or you should perform the operation when farm usage is low.

The list view threshold is the maximum number of list or library items that a database operation, such as a query, can involve at one time. By default, this is set to 5,000 items. This limit has a major effect on large lists because, by the definition of this threshold, a large list is a list that has more items than this limit. Operations that exceed this limit are throttled. Operations, such as creating an index on a list that is over this limit, are prevented because the operation affects more than 5,000 items. This limit prevents the queries that have a selectivity (items that can be efficiently filtered by using filter criteria) of more than 5,000 items. This limit also prevents queries that filter on columns that are not indexed. This is because a query that filters (and in some cases sorts) on a column that is not indexed must perform the filter on all items in the list to retrieve the correct dataset, and it will operate on more items than the list view threshold. The default value for this limit is based on farm and list performance and on how SQL Server manages locks. We recommend that this limit not be changed.

To minimize database contention, SQL Server uses row-level locking as a strategy to ensure accurate updates without adversely affecting users who are accessing other rows. However, if a read or write database operation, such as a query, causes more than 5,000 rows to be locked at the same time, then it is more efficient for SQL Server to escalate the lock to the entire table until the database operation is completed. When this lock escalation occurs, it prevents other users from accessing the table. For more information about locks, see Lock Escalation (Database Engine) (http://go.microsoft.com/fwlink/p/?LinkId=219156).

The following graph shows the throughput of a mix of queries against a large list as the list view threshold is adjusted. This mix of queries contains queries that return all items in the list so, as the list view threshold is raised, more items are returned. Even changing the limit from the default of 5,000 to 10,000 has a significant effect on performance. Rather than raising or lowering the list view threshold to improve performance, we recommend that you do not change the default list view threshold and focus instead on making sure queries perform well.

Chart showing list view threshold throughput

List view threshold exceptions occur because operations perform poorly. The operations should be reconfigured. Rather than raising the limit, you should consider why inefficient operations are being performed and then fix them. In a worst-case scenario, you can temporarily change the EnableThrottling setting to false for a particular list to ignore the list view threshold. This can only be done at the list level and not for a site. This should be done to allow list access only until changes can be made to fix poorly performing operations that are blocked by the list view threshold. The EnableThrottling setting should be changed back to true as soon as possible.

importantImportant
List view threshold exceptions can be common, especially immediately after upgrade. It might seem simpler to resolve these issues by changing the list view threshold. We strongly recommend that you do not change the list view threshold.

Farm administrators and local computer administrators on the front-end Web server, where a query originates, are not blocked by the list view threshold. These users should be careful browsing to large lists that are not configured correctly, and they must also be careful when they perform testing. It might look like things are working as expected, but the data that gets returned to users can differ greatly. For a list of operations that are prevented by the list view threshold, see Operations blocked by the list view threshold later in this article.

Likewise, timer services can be run by using an account that is not protected by the list view threshold. Although this enables certain scenarios, such as deferred creation of an index on a large list, generally you should be especially careful to ensure that your code avoids performing large list operations.

 

Default: 5,000

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

 

Default: 20,000

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

The list view threshold for auditors and administrators is the list view threshold that is used for certain service accounts, such as the search query account or the object cache super-reader and super-writer accounts. For example, the Content Query Web Part automatically uses this limit to cache the results of a large query, thereby saving server resources. You can use custom code to request the use of this higher limit if running as an account that is super-reader or super-writer per Web application security policy.

 

Default: Yes

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

Allowing the object model override specifies whether service accounts can use the list view threshold for auditors and administrators. A farm administrator must enable the object model override and programmatically specify that a list is an exception. Then, programmers with appropriate permission can programmatically request that their query or list use the higher list view threshold size for auditors and administrators to take advantage of it. By changing the value to no, custom code that is run by auditors or administrators, even if it requests an override, will be subject to the list view threshold rather than the higher limit for auditors and administrators. We recommend leaving this setting with the default value and configuring only the list view threshold for auditors and administrators if it is necessary.

 

Default: Off

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

A daily time window can be set so that operations can be performed without being subject to the list view threshold. The time can be adjusted in 15-minute increments, up to 24 hours. A database operation or query started within the daily time window continues until completion even if it does not finish within the specified time window. By default, the daily time window is not configured, because the off-peak hours vary widely between deployments. Therefore, this is left to the administrator to decide. We recommend that a daily time window be specified only if there is a reasonable off-hours time frame where few people are using the Web application. This lets users perform administrative operations for large lists, such as creating an index, during time frames when farm usage is much lower.

As the number of unique permissions in a list increases, performance decreases. You should reconsider any design in which all or most content in a large list must be uniquely secured. The throughput difference for operations on a list between 0 and 1,000 unique permissions is around 20 percent. There is a configurable default of 50,000 unique permissions per list. However, we recommend that you consider lowering this limit to 5,000 unique permissions and for large lists consider using a design that uses as few unique permissions as possible. This will help not only performance, but also manageability.

We recommend the following:

  • Minimize the use of unique permissions on individual items, and simplify list designs that require most items to have unique permissions.

  • If unique permissions are needed, try to set them only at the list or folder level and minimize the number of individual items that need unique permissions.

  • Reconsider your design if each item requires individual permissions. Investigate dividing items between multiple lists, or organize items into folders and groups so proper access can be granted without putting unique permissions on every item.

Setting fine-grained permissions can affect performance, and it is also difficult to manage if it is set differently on many individual items. Setting fine-grained permissions on a list or folder that exceeds the list view threshold will be blocked because too many individual items must be updated. However, setting fine-grained permissions also affects performance in other ways. Therefore, there is a configurable limit that, by default, is 50,000 unique permissions per list. If you try to declare unique permissions after this limit has been reached, you will be blocked from doing so. Unlike the list view threshold, this limit applies when you create unique permissions on an item, rather than at query time.

Whenever permission inheritance is broken for an item, such as a folder, it is counted as a unique permission toward this limit. Each time permissions inheritance is broken, a new scope ID is created. Each time that you query on a view, you join against the scopes table. Then, when a query is performed, each unique access control list (ACL) must be parsed and processed. A large number of unique permissions in a list will adversely affect performance and is not recommended. As the number of unique permissions in a list grows, query performance will degrade. Even though the default limit is 50,000 unique permissions, you might want to consider lowering this limit to 5,000 unique permissions.

 

Default: 50,000

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

When columns are added to a list, they are mapped to columns in a SQL Server database table. Each row in the database table supports a fixed number of each of the several different column types. For example, a single database table row supports eight date and time columns and twelve number columns. If there are more than eight date and time columns, then each list item will use two database table rows.

For small lists, the performance effect of this row wrapping is negligible. However, for a large list this can have a major effect. You can go up to the limit for any number of columns before row wrapping occurs, but only one column type has to go over the limit for row wrapping to occur.

The following table shows the number of columns for specific data types before row wrapping occurs.

 

Column type Number of columns per table row

Single line of text

Or

Choice and multiple lines of text

64

32

Date and time

8

Yes or no

16

Number and currency

12

Calculated

8

Integer, single-value lookup, people and group, managed metadata

16

Unique identifier

1

Row wrapping causes a decrease in throughput of approximately 35 percent per additional row for most operations. To check how many rows a list is using, you must analyze the list schema and examine the column types for the fields on the list.

The following graph shows the performance of read-only queries as the number of SQL Server database rows that are used for a list increases to accommodate more managed metadata columns. To get to the second row, 15 managed metadata columns were added to the list, and to get to the third row, 31 managed metadata columns were added to this list. Testing was conducted by using only queries that filtered on items in the list. For each additional row throughput decreases 35 percent.

Chart showing row wrapping throughput

 

Default: 6

Existed in 2007: No

Configurable: Yes

Configuration location: Object model only, SPWebApplication.MaxListItemRowStorage

The row size limit specifies the maximum number of table rows that are internal to the database that is used for each item in a list. To accommodate wide lists that have many columns, each item is wrapped over several internal table rows, up to six rows. For example, if you have a list that has many small columns, one that contains hundreds of Yes/No columns, then you could reach this limit. In this case, you would be unable to add more Yes/No columns to the list. However, you might be able to add columns of other types. Because each additional row adds overhead, for a large list you should minimize the number of columns of the same types to avoid row wrapping.

Each lookup column in a list view causes a join with another table. Each additional lookup column in a view increases the complexity of metadata navigation and list view queries. In addition to standard lookup columns, single-value managed metadata, multiple-value managed metadata, single-value people and group columns, and multiple-value people and group columns count as lookup columns. Adding lookup columns to a view does not cause a gradual or linear decrease in performance, rather performance is somewhat stable until after eight columns when it rapidly degrades.

The following graph shows the change in throughput as the number of lookup columns in a view increases. As you can see, the change in performance from zero to eight is rather stable, but at 10 lookup columns throughput greatly decreases. This test was performed with the list by using only one row. If a list is row wrapping, then performance degrades faster.

Chart showing lookup columns in a view throughput

The following graph shows SQL Server CPU utilization as the number of lookup columns in a view increases. As you can see, there is a significant change at 10 lookup columns. For a list that has a large number of queries, having views that contain more than eight lookup columns causes the queries to use a disproportionately large amount of SQL Server resources. We recommend that you do not change this limit to be more than eight.

Chart showing SQL CPU utilization - lookup columns

Although this decrease in performance is not for the total number of lookup columns on a list, only for the number of lookup columns in a view or query, SharePoint Workspace cannot synchronize any list that has more than eight lookup columns. This is regardless of whether the columns are used in a view or not.

 

Default: 8

Existed in 2007: No

Configurable: Yes

Configuration location: Central Administration, per Web application

   

This section describes limits that are not covered elsewhere in this article.

 

Default: 20

Existed in 2007: Yes, limit was 10

Configurable: No

The preceding table shows the limit of indexes that can be created per list, including compound indexes and indexes that are created by SharePoint Server 2010. This limit is not configurable.

 

Default: 50,000

Existed in 2007: No

Configurable: No

The preceding table shows the maximum number of items that can be used with export to Microsoft Excel and the datasheet view. However, the datasheet view will be blocked by the list view threshold. Therefore, if your list view threshold is 5,000 items and you have between 5,000 and 50,000 items in a list view, when you attempt to use the datasheet view you will get a list view exception message even though the datasheet view limit is higher.

 

Default: 30,000 items per list

Existed in 2007: No

Configurable: No

Microsoft SharePoint Workspace has a non-configurable limit that blocks synchronizing a list that has more than 30,000 items per list. If a list contains 30,000 items, users cannot synchronize the list by using SharePoint Workspace and items cannot be selectively synchronized.

When a list exceeds the list view threshold, some operations that might have worked previously are blocked. The biggest concern is the default list view because this is what users most frequently use to access a list. List views must be configured to work correctly for a large list. For example, an error occurs when you access a list if the root of the list contains more items than the list view threshold. If the metadata navigation feature is enabled, a subset of the results is displayed rather than an error.

The list view threshold blocks any database operation that affects more items than the list view threshold; the threshold does not only block the number of items returned or modified. For example, if you have a filter on an non-indexed column that returns 100 results, and the list contains 10,000 items, then the query fails because it must perform a scan of all 10,000 items. If you add an index to that column, the operation is limited to only 100 items and it succeeds.

Operations on large lists can be classified into the following two groups:

  • List exceeds the list view threshold   Some operations are prevented when the size of the entire list exceeds the list view threshold, even if items are divided into folders. These operations include recursive queries, such as manage checked out versions, which operate on all items regardless of what folder they are in. Views that return all items without folders are also prevented. In addition, operations that affect the entire list, such as adding a column and creating or deleting indexes, are blocked.

  • Container exceeds the list view threshold   Some operations are prevented because a folder or the root of the list contains more items than the list view threshold. For example, if a list contains 10,000 items and a folder contains 3,000 items, you can rename or delete the folder. However, if the folder contains 6,000 items (exceeding the list view threshold) you cannot delete the folder because the operation exceeds the list view threshold.

When a list exceeds the list view threshold, you must plan to correctly configure views and other navigation options. Ideally, you should configure views and other navigation options in advance, but often lists can grow past the list view threshold and require action. Some operations, such as creating a column or indexing a column in a list that has many items, will take a long time. These operations are prevented by the list view threshold. However, they can be performed during the daily time window or by farm or computer administrators. You should plan for these operations before you perform them. If the list is already too large, plan to use a daily time window or administrative credentials to perform these operations.

tipTip
The list view threshold prevents some list administrative actions that are common when a list is being set up. If possible, you should configure all content types, columns, and indexes for a list before the size is greater than the list view threshold.

A list can become so large that some operations might time out when they are run by using a Web browser. For example, if a list contains millions of documents, it might take too long to add a new column. To accomplish this, you would need to use Windows PowerShell and make sure that you do this during off-peak hours because it will block operations for other users.

The following tables list operations that are blocked by the list view threshold.

 

Operation Description

Add/remove/update a list column

All columns, including lookup and calculated columns, in addition to many kinds of updates, such as a type change or a uniqueness change. Some updates, such as a name change, are not blocked because they do not affect every item in the list.

Add/remove/Update a list content type

Affects every item in the list so it is blocked for any list that has more items than the list view threshold.

Create/remove an index

Affects every item in the list so it is blocked for any list that has more items than the list view threshold.

Manage files which have no checked-in version

A non-indexed recursive query that fails for any list that has more items than the list view threshold.

Non-indexed recursive queries

Includes filters and some sorts. This operation fails when the list size is greater than the list view threshold. Because there is no index, it does a full scan against the entire list. Also, it returns all items, and it ignores folders.

Cross list query

Includes queries by the Content Query Web Part and follows the list view threshold setting for auditors and administrators, which by default is 20,000 items. If the operation involves more than 20,000 items, the query fails.

Lookup columns that enforce relationship behavior

You cannot create lookup columns that enforce relationship behavior when the list that is referenced contains more items than the list view threshold.

Deleting a list

Affects every item in the list so it is blocked for any list that has more items than the list view threshold.

Deleting a site

If the sum of all items in a site is greater than the list view threshold, deleting the site is prevented because it affects too many items.

Save list as template with data

Affects every item in the list so it is blocked for any list that has more items than the list view threshold.

Showing totals in list views

Performs a query against every item in the list so it is blocked for any list that has more items than the list view threshold.

Enable/disable attachments in a list

Affects every item in the list so it is blocked for any list that has more items than the list view threshold.

 

Operation Description

Delete/copy/rename a folder

Fails when the folder contains more items than the list view threshold because it affects too many rows.

Queries that filter on non-indexed columns

Fails when the container (folder or list) contains more items than the list view threshold. The operation performs a full scan against the entire folder because there is no index.

Set fine-grained security permissions

Fails when the list or folder on which you are trying to set fine-grained permissions contains more items than the list view threshold because it affects too many rows. You can still set fine-grained permissions on child items, such as documents, in a large list, although you cannot set the permissions on the list itself or on folders that contain more items than the list view threshold.

Open with Explorer

Does not show any items if a container has more items than the list view threshold (excluding items in subfolders). If a folder has 8,000 items, but it has a subfolder that contains 4,000 items and only 4,000 items in the root, then Open with Explorer will work. If the root of a list contains more items than the list view threshold, then Open with Explorer will not show anything. To use Open with Explorer, the list must have items organized into folders in amounts less than the list view threshold in the root of any container.

This section contains information about features that might not work as expected with regard to large lists.

The datasheet view button that is available in the library ribbon tab of a document library is not disabled if the list grows above the list view threshold. However, if the list size exceeds the list view threshold, the view loads some items, but it displays a message that states, “You do not have permission to view the entire list because it is larger than the list view threshold enforced by the administrator.” You can disable the datasheet view option from the ribbon in the settings for the list. There is also a hard limit of 50,000 items, so this view is blocked even if the list view threshold is above 50,000 items.

Before you implement a large list, consider the business case and requirements. Requirements such as service level agreement (SLA), time for backup and restore, size of content, amount of content (number of items), and access times are all important to consider. Depending on the size and demand of the application, you must make important choices at multiple levels, including hardware, content storage, and SharePoint Server 2010 information architecture. A large application that has millions of items and hundreds of concurrent users might require stand-alone hardware for the specific project, although a document repository with tens of concurrent users and tens of thousands of documents can work well with existing shared hardware and a single document library in an existing site.

The end result of planning should be a list of column types (names, data type, and usage), indexes, folder structure, usage of pages and links for navigation, planned structure of permissions, estimated number of items, and total data size. Details should also include information about the kinds of queries that will be performed and how data from the list will be accessed, created, and updated.

After you plan the design and implementation for a large list, the next step is to design and build a prototype. This stage of planning is about designing the large list, implementing a proof of concept, and validating that it will work. At this stage, it might be useful to populate a test environment with a large amount of content to validate assumptions about data access and performance. The end result of the design process should be a proof of concept of the intended list, documentation of the columns, content types, folder structure, views, indexes, columns used for metadata navigation or other retrieval methods, any taxonomies used, usage of various Web Parts, and usage of any other features such as the content organizer.

For large lists, it is important to estimate various numbers to make capacity planning and design decisions. There are a few important numbers that you should plan for, which include the following:

  • Total content database size

  • Average and maximum file sizes

  • Number of versions

  • Amount of content – total number of items in a list

The total content database size is important to plan for the needed disk space and hardware in addition to figuring out what is supportable for backup, restore, and a service level agreement. The overall content database size is the most important for figuring out the amount of downtime that is necessary for backup and restore.

The size of the content database can be estimated by calculating the average document size multiplied by the average number of versions per document multiplied by the expected number of documents. Add an additional 20 percent for content database data in addition to files. This number is high because versions generally increase in size over time so the average file size of checked-in documents is generally a higher number than the average file size of all versions. You should add a significant buffer in case the list grows larger than you anticipated, unless you have mechanisms to effectively control the amount of content.

The maximum file size is needed to ensure that the correct Web application setting is specified for files that can be uploaded (by default 50 MB, the maximum can be 2 GB). The average file size is used to help understand the rate at which content can grow and to estimate the total content size. The average file size can be estimated by evaluating files in systems that currently fill the role of the intended system.

tipTip
You should plan that an additional 10 to 20 percent will be added to the content database for data in addition to files and that the search index will be approximately 75 percent of the content database size.

You must consider versioning because it can greatly increase the size of content. There are methods with which you can limit versions. For example, you can use information management retention policies to delete all previous versions after a specific amount of time, or you can limit the number of versions to be saved. Other factors also affect versions, for example, if your repository has all content submitted to it by using a content organizer, there might be no versions at all because the content organizer copies only the latest checked-in version. If documents in your repository are actively edited by users, then you might have to consider coauthoring; each coauthoring session creates a version automatically. Consider the usage of the repository, and evaluate existing solutions to estimate the average number of versions that will be created for a document.

Amount of content is the total number of items in a single list. To estimate the amount of content, you should evaluate the existing sources of content and what will be moved to the new system, or examine how many users will use the system and what the purpose of the system is. There are some other related numbers, including items per container and items per metadata pivot or index filter. These numbers are also important when you plan views and metadata navigation.

Lists that have large storage requirements can trigger a fundamental decision for how to store the documents. By default, SharePoint Server 2010 stores all documents as BLOBs in the SQL Server database. SharePoint Server 2010 and SQL Server 2008 provide a Remote BLOB Storage API, which allows documents to be stored outside of the SQL Server database, which reduces database size. The decision of whether to use Remote BLOB Storage is largely decided by cost savings.

Current testing by Microsoft has shown that Remote BLOB Storage causes a 5- to 10-percent decrease in throughput, and for large files, no perceptible difference in latency. However, performance might differ depending on the specific Remote BLOB Storage provider that is used. By using Remote BLOB Storage, the content database size is reduced. However, this does not necessarily mean that you can store more items in a content database. Performance is affected by the amount of items in lists in the SQL Server database; even though the BLOBs are removed, the list size does not change. There are a few scenarios where the cost benefit can easily outweigh the performance concerns:

  • Archive, non-collaborative data

  • Storing very large BLOBs such as videos and images that are infrequently updated

Using Remote BLOB Storage can add more servers and technology to your farm and requires the addition of a Remote BLOB Storage provider. A Remote BLOB Storage provider can support storing BLOBs on less expensive storage outside of the SQL Server database. SQL Server Enterprise is required to use the Remote BLOB Storage API.

The crossover point where Remote BLOB Storage becomes cost effective might be in the range of terabytes of data. You do not need to use Remote BLOB Storage only because you have terabyte-sized content databases. You will need to carefully think through backup and restore and service-level agreements. Remote BLOB Storage makes disaster recovery more difficult by requiring two technologies be synchronized. The key concern is the time that is required to restore the system after a disaster and to handle the backup and recovery BLOBs. For more information, see Overview of RBS (SharePoint Server 2010).

Selecting the appropriate architecture for a large list project is important because the decisions can be difficult to change after they have been implemented. Plan ahead and consider the size and amount of content, usage of the repository, how content will be added and updated, and how content will be accessed. All of these things can affect how content is organized (in one list, multiple lists, or even multiple site collections), what metadata is used, and how content is retrieved. All of these decisions are especially important for a large list because with a large amount of content it becomes much more difficult to redesign how a system is used.

When you design a large list solution, it is important to consider whether single list architecture is appropriate. The decision to place content in a single list should be based on business requirements, such as ease of use for working on and discovering content. In many instances, it might be more sensible to use multiple lists. Creating a successful implementation with great usability and user experience with the capabilities of SharePoint Server 2010 and available resources should be your top priority.

Use a single list to make it easy for users to find and work on content so they do not have to decide where to put their content or which list they must access to find what they are looking for. However, as the amount of content increases, it can also be more difficult to find content, especially by using methods such as filtering views or navigating folders. When a list starts to reach into the hundreds of thousands of items, using metadata navigation might become difficult. Queries might return hundreds or thousands of results because they are not specific enough.

For example, if there are 5,000 terms in the domain of an index and each term has an equal amount of items that match the filter, filtering on one term will result in 20 results with a list of 100,000 items and 200 results with a list of 1,000,000 items. If the list size is too large, many filters that users select will not return a reasonable result set for users to find what they are looking for. If a project has multiple kinds of distinctly separate content that users will commonly be able to differentiate between, then you should consider using multiple lists.

At very large scale points, such as a large-scale archive scenario, it might be worth considering a multi-site collection architecture. A new SharePoint Server 2010 feature allows site collections to be grouped together to load balance documents. The content organizer feature is used to route documents between multiple site collections. Search must be used to retrieve the documents. This is fine for long-term storage because it allows content to be balanced across multiple site collections and scales to support many more documents than a single list.

Summary of list recommendations

  • Use single lists for large numbers of items when:

    • It is not logical to place items in separate lists.

    • It will provide the best user experience.

  • Use multiple lists for large amounts of items when:

    • It is logical to group items into multiple lists.

    • It will provide the best user experience.

    • There will not be confusion among users about which list to use to add or find content.

  • Use a multiple site collection architecture when:

    • The repository requires more than the tens of millions of items supported in a single repository.

    • It is logical to group items into multiple site collections, for example to partition data by year.

With SharePoint Server 2010, metadata and content types are useful for creating information architecture. New features, such as managed metadata, term sets, and metadata navigation, make metadata very useful and important for retrieving content. Because operations, such as modifying content types and columns, are blocked on large lists, it is especially important to plan ahead with requirements for metadata. If you plan to use metadata navigation or some other method of retrieving content by metadata, then it is important to plan content types and the columns they will have during the design phase.

In most large list scenarios, metadata is not only useful, it is also a requirement for users to use the system. There are three main ways that metadata can be applied: built in system processes, custom configurations and code, and manual application by users. To use columns to retrieve content, most items should have a value specified for the column. This makes it important to plan which columns will be used for navigation and how the metadata will be filled in. The only way to guarantee that the correct metadata is applied is to use built-in processes and customizations. For example, because every item has a content type, if content types are used to classify documents, then every item will have this metadata, which makes it easy to filter based on content type.

The most basic categorization of content should be with content types. If you have metadata to classify content, such as document or item type, you should consider using that classification for your content types. Content types allow you to define the columns that are associated with a type of content, in addition to being associated with workflows and the template. Only one template can be associated with a content type, and this is the template that you use when you create a new instance of the content type by using the New Document menu in a document library.

You can use templates for file formats in Microsoft Word, Microsoft PowerPoint, and Excel, in addition to other products. When users create new instances of the content type, the particular client application is used to begin authoring by using the template. When content is uploaded, users can select from the available content types. Content types should be distinct, specific, and have a small enough number of content types per list that users do not have difficulty selecting which content type to use.

Because content types control the metadata that users must fill in when they create or upload an item, consider the columns that are necessary to meet business requirements and also minimize the barrier for submitting content. Picking a good set of content types to classify content at the primary level automatically helps with navigation. Because every item has a content type, there is a pivot to filter with that will work for every item.

Summary of content type recommendations

  • Use content types as the most basic method for organizing content.

  • Use content types to select the specific columns that are necessary for that type of content.

  • Content types per list should be a small number (no more than 10) and should be distinct enough to be easy for users to understand which content type that they should use.

  • Content types provide a built-in column for which every item will have a value that can be used for filtering and metadata navigation.

A product specifications library is used by product development teams to store design specifications, testing plans, and other product development items. In this example, the following six content types are used. All the content types have columns for project start date, project end date, budget, members of the design team, product name, and product type.

  • Product specification – A Word file that contains the details of the design for a product. Additional metadata includes the designer and final review date.

  • Test specification – A Word file that contains the test plan for a product. Additional metadata includes the tester and test completion status.

  • Development plan – A Word file that contains the plan for developing the product.

  • Storyboard – A PowerPoint presentation that is used to present the mockups of the design.

  • Cost analysis – An Excel spreadsheet that analyzes the cost to develop the product and the potential market opportunities.

  • Timeline – An Excel spreadsheet that contains details about the schedule for developing the product.

In this example, users can filter based on content type to find a product specification or a product storyboard. The custom templates also help structure user’s work. The number of columns and content types are a small enough amount that users can easily select the appropriate options for their work, but by filling in metadata it is easy to filter and find content.

Number of columns and required columns

You can use columns to specify the kind of metadata that an item has, and you can mark them as hidden, optional, or required. Use hidden columns for automated tasks, such as workflows, so users cannot edit them. Use required columns only when it is absolutely necessary. For example, a metadata property might be necessary to route an item to the appropriate location or for navigation. In these cases, you do not want users to leave the value empty. The fewer items that have metadata filled in for a column that is used as a navigation filter, the less useful the navigation is because many items will never be returned by a query.

As the number of metadata columns increases, it becomes less likely that users will fill in metadata because of the additional work to figure out which columns apply and then to specify a value. If you use a large number of required columns, user adoption might be difficult because it takes so much time to upload content. In a very open and collaborative scenario, this can be detrimental. But as the value of the content and effort to create that content increases, it becomes more likely that users will take the time to fill in the appropriate fields, especially when this operation is not frequent.

During the design phase, you should consider what metadata is necessary to perform required operations and to retrieve content, estimate how long it will take users to fill in that metadata, and estimate the effect on users. If users do not adopt the system because the overhead for creating content is so high, it might be difficult to restructure the system later.

Field types and single vs. multiple value fields

One consideration in the choice of columns is the column type and whether it should be multiple value. Queries on managed metadata fields are more efficient than queries on choice columns so you might want to consider using managed metadata fields instead of choice fields. Columns, such as managed metadata and person or groups, can support multiple values. Queries on multiple value columns are not as efficient as queries on single value columns.

Columns and content types are typically the central components for classifying and retrieving content in a large list. A list of columns and content types should already have been prepared during the planning process. The number of columns and content types that are added to a list can affect performance in subtle ways. The number of columns of a specific type that are added to a single list will cause row wrapping. For more information, see Row wrapping earlier in this article.

Example of a collaborative large list and columns: product specifications library

This section describes columns that are used in this example as follows:

  • Columns maintained automatically by SharePoint Server 2010: ID, Content Type, Modified, Created, Modified By, Created By, Document ID

  • Columns maintained by customizations:

    • Metadata default values by folder for Product Type and Product Team (Each Product Type has a folder and each Product Type folder has multiple Product Team folders)

    • Workflows update: Approval Status, Project Completed

  • Columns maintained by users: Designer, Tester, Final Review Date

  • Columns that work well for navigation: Content Type, Product Type, Product Team

  • Columns that are used for tracking status and also work well for navigation: Final review date, approval status, project completed

  • Columns that can be useful for navigation: Designer, Tester, Product Name, Modified, Modified By

Summary of column recommendations

  • Minimize the number of columns that are available to fill in.

  • Carefully select the columns that will be used for system processes and navigation. Consider which fields will be required, and minimize the number of required fields.

  • Required fields should be used when they are necessary for navigation, such as using Content Query Web Parts against a particular field. They should also be used for administration, such as specifying retention actions on a date field that users must specify.

  • Because queries on single value columns are faster than multiple value columns, attempt to make columns single value unless multiple values are necessary.

The total number of particular columns on a list can cause row wrapping, which reduces performance. Minimize the number of columns on a large list and avoid row wrapping, if it is possible.

You should carefully consider how to organize content into folders. There are three main ways that you can use folders:

  • Organize content into folders logically. For example, organize contracts into folders based on the year or month that the contract was signed, or invoices based on the date that the invoice was created. In this scenario, users can easily navigate through the folder structure or use metadata to find documents. Also, it is easy for documents to be automatically routed to the correct folder. The content organizer provides capabilities that can be used to limit the amount of items in a single folder by creating subfolders when items are added over a certain limit.

  • Organize content into folders for retention and permissions or other administrative functions. For example, a folder for confidential documents that fewer users have access to, or location-based retention so documents have a different retention schedule based on their folder. In this scenario, user navigation might be more difficult because the users might not care where a document is as long as they have access to it. Metadata navigation and search are the dominant methods of finding documents, rather than using the folder structure.

  • Organize content into folders to aid user navigation by topics or categories. Many users are accustomed to navigating with folders. For particular applications, it might be important to retain a folder structure that is easy for users to navigate. In this scenario, users generally understand the folder structure and know where to look for and place documents. Metadata navigation and search can be used in combination with this method.

Improvements in SharePoint Server 2010 give you more flexibility in using folders and less dependence on performance considerations. By using managed metadata and metadata navigation, users can easily filter on metadata, rather than navigating through folders. This enables you to organize content for administrative purposes such as permissions or policy, rather than only for user navigation. For instance, you can have a folder of classified material that only certain employees can access and another folder that all employees can access. You can specify different permissions on the folders and then use the content organizer to automatically move content into the appropriate folder, based on metadata. You can still choose to use folders for navigation in combination with metadata navigation.

With the content organizer feature, content can be automatically moved into folders based on content types and metadata without users needing to decide where content should be placed. In addition, the content organizer can be used to automatically create new folders when one folder has reached a specified item limit. You must consider that using Open with Explorer (WebDAV) will not work on large lists if items are not organized into folders where there are no more items than the list view threshold in the root of any folder.

Folder-based views and metadata-based views are very similar in performance. From a logical and user experience standpoint, it is sensible to use folders to divide content. Metadata navigation performs recursive queries so all items are returned outside of folders. If this is the main method of retrieving content, then the folder structure might not be important.

The following graph shows the results of a test that was conducted by using views of various types to access the same number of items. All views returned 1,000 results. This graph shows the requests per second of these views individually as the list size increases. The results show that as list size increases the performance of the folder and indexed views remain relatively flat, although folders have better performance at smaller list sizes. For most large lists, a combination of folders and indexed views are used so performance differences should not dictate whether to use folders or indexes to retrieve data.

Chart showing folder and indexed view throughput

Summary of folder recommendations

  • Plan for how items will be organized into folders. Items can be moved automatically or manually.

  • Features such as metadata navigation make it less necessary to limit the amount of items in folders.

  • Use metadata navigation in combination with folder navigation. Therefore, folders can be used to manage policy and retention rather than only being used to organize content for retrieval.

  • When it provides the best user experience, consider organizing content into folders to aid navigation, even in combination with other navigation options.

  • When using the content organizer to automatically move documents into folders based on metadata, consider enabling the option to create additional items in subfolders when a specific limit is reached.

  • You must organize items into folders with no more items than the list view threshold in the root of any particular folder if you plan to use Open with Explorer (WebDAV).

  • To retrieve content in list views, you must use metadata navigation and indexes if you do not use folders.

In the product specifications library, folders are used to aid navigation and to logically place content. It is clear to users which folder they should use to create a new product specification. There is a folder for each product type, and each product type has multiple folders for each product team. Each product team has a document set for each product they are designing, and the documents that are specific to a product are stored in the document set. This creates a structure that looks like the following:

  • Downhill Skis – Product Type (Folder)

    • Racing Skis – Product Team (Folder)

      Super Faster Downhill Racer X – Product (Document Set)

You can configure metadata defaults for each of the folders, and it is easy for library users to find content by using folders.

The Records Center site is used for long-term storage of items that must be retained for legal compliance, but content is no longer actively modified. In this scenario, items are automatically routed into two folders: restricted and confidential. The restricted folder has strict permissions that a small number of people have access to, and the documents must be retained for 10 years. The confidential folder allows more access than the restricted folder, and documents in it must be retained for 7 years. This helps lower the number of unique permissions and makes it easier to manage permissions because items receive their permissions from the appropriate folder. All items that come into the Records Center site are routed to the root, confidential, or restricted folders based on metadata.

There is a non-configurable limit of 20 indexes that can be created per list, including compound indexes and SharePoint Server 2010 features that index columns by default. Adding indexes to a list has a minimal effect on performance. However, it does affect some operations, such as add and update. You should avoid using more indexes than necessary because unused indexes will cause a small negative performance effect, and some SharePoint Server 2010 features add indexes when they are enabled. For example, SharePoint Server 2010 requires at least three index slots if you use the expiration and eDiscovery features. Consider keeping at least three index slots available in case you must create new indexes later.

SharePoint Server 2010 uses its own indexing mechanism to work with its database table structure. Create indexes in SharePoint Server by modifying the settings for the list.

When planning for indexes, consider the following:

  • Indexes are needed to perform filtering on lists above the list view threshold.

  • Carefully plan which single and compound indexes to create because there is a limit of 20 indexes.

  • Reserve index slots for SharePoint Server 2010 features that might need to create columns such as eDiscovery and Information Management Policy Retention.

  • Create single indexes when you use a single field to filter with Content Query Web Parts, views with filters, and when you are using metadata navigation hierarchies and filters that are often used as a single filter.

  • Create compound indexes for queries that filter by using two columns and commonly return more items than the list view threshold individually but are selective together.

  • Indexes can negatively affect performance of some operations, such as adding a document or editing its properties. Therefore, only create indexes when it is necessary.

Lists have a hard limit of 20 indexes and indexed columns are very important to a large list. Therefore, you must select single and compound indexes carefully. Several features use indexes. For example, the metadata navigation feature automatically indexes for configured navigation hierarchies and key filters. You should create indexes on columns that are important for filtering in navigation and information architecture.

SharePoint Server features can create indexes that count against the list index limit. The following table provides a list of SharePoint Server 2010 features that are common in document libraries that add indexed columns.

 

Column Feature Usage

Hold & Record Status

In Place Records Management or Hold & eDiscovery

This column is added and indexed when the In Place Records Management site collection feature or Hold & eDiscovery feature is enabled and an item is declared a record or placed on hold in a list.

Expiration Date, Content Type

Information Management Policy

These two columns are added and indexed when retention is enabled for Information Management Policy on a content type that is added to the list or when location-based retention is enabled on a list.

The metadata navigation feature automatically creates appropriate indexes for the hierarchy and key filter columns that are selected on the metadata navigation settings page. Single indexes are created for all of the hierarchy tree fields and for some of the more selective key filter types so that indexed results and partial results can be returned when they are used alone. Compound indexes are created for all supported combinations of hierarchies and key filters together to maximize selectivity when both a tree filter and key filter value are being used.

For a list where you have many columns that you might want to filter on, you might need to manage indexes manually to avoid reaching the index limit. If particular combinations of navigation hierarchies and key filters will never be used, consider not creating compound indexes to decrease the number of indexes. When you create these indexes, it is important to choose single indexing on columns that are selective and can be used alone in the list view, either as the only filter or as the first filter that is applied before choosing another. Compound indexes should be used when two filters will usually be used together in metadata navigation or custom queries and when one index is not very selective on its own. Create indexes for columns that are used for filtering with list views, Web Parts, and other data access methods.

There might be cases in which multiple indexes are not useful. Consider a situation where you are filtering on two columns, Division and Content Type. Because this is an AND operation, only the intersection where the filters for Division and Content Type match are returned. This means that when the query is performed, all the items that match the Division filter are returned first and then those items are filtered by Content Type. If there are only 10 items matching a particular division, you will have a sufficiently small set of data that an index on Content Type will not matter. If there are thousands of items that match the value for Division, then a compound index should be used.

Compound indexes allow you to create an index on two columns which allows for more efficient queries. Compound indexes should be used when doing an AND operation on two columns. Metadata navigation is the only out-of-box SharePoint Server feature that uses compound indexes. When the metadata navigation feature is enabled, retry and fallback logic are used even if metadata navigation controls are not configured for the list. Compound indexes are not be used by views unless it is a metadata navigation query.

Indexes are required to perform queries on lists that have more items than the list view threshold in a single container, and they can provide a significant performance improvement. Although indexes are required to perform efficient queries on large lists and can greatly improve query performance, they can have an adverse effect on other operations because the index must be maintained. When items are created, updated, and deleted, any indexes the item participates in must also be updated. Testing was conducted with upload, update, and delete operations on a list that has 10,000 items and the results showed that the effect between zero and 20 indexes was less than 10 percent.

By default, the metadata navigation feature automatically creates single and compound indexes. From the metadata navigation settings page, you can disable this option. The metadata navigation feature automatically creates a single index for each supported column type and compound indexes for each supported combination of navigation hierarchies and key filters. Compound indexes are not created automatically for combinations of two key filters, although you can manually create them. Metadata navigation automatically creates single and compound indexes for most types of columns that support indexes. Single indexes are not automatically created for key filter types that have fewer values and might not be selective alone. This includes Yes/No, single value choice, or content type columns, but indexes are supported and can be manually created.

The following table provides information about indexes that are automatically created by metadata navigation. Metadata navigation creates single value indexes for all columns that support creating an index.

 

Column type Index created

Navigation hierarchies

 

Single value managed metadata

Yes

Multiple value managed metadata

No (system indexed as multiple value)

Content Type ID

Yes

Single value choice

Yes

Folder

No (system indexed by folder)

Key filters

 

Single value managed metadata

Yes

Multiple value managed metadata

No (system indexed as multiple value)

Content Type ID

No (can be manually created)

Single value choice

No (can be manually created)

Multiple value choice

No (not supported as indexed)

Number

Yes

Date

Yes

Yes / No

No (can be manually created)

Currency

Yes

User (single value)

Yes

User (multiple value)*

No (system indexed as multiple value)

All tags

No (system indexed as multiple value. Special filter over all managed metadata values in the item)

With the metadata navigation feature, users can select a navigation hierarchy and a key filter together. The metadata navigation feature automatically creates compound indexes for all supported combinations of navigation hierarchies and key filters. The following table shows the combinations that are supported.

 

Navigation hierarchies Single value managed metadata Multiple value managed metadata Content Type ID Single choice Folder

Key filters

         

Single value managed metadata

Yes

No

Yes

No

No

Multiple value managed metadata

No

No

No

No

No

Content Type ID

Yes

No

No

No

No

Single value choice

No

No

No

No

No

Multiple value choice

No

No

No

No

No

Number

Yes

No

Yes

No

No

Date

Yes

No

Yes

No

No

User (single)

Yes

No

Yes

No

No

All tags

No

No

No

No

No

Yes / No

Yes

No

Yes

No

No

Currency

Yes

No

Yes

No

No

User (multiple value)

No

No

No

No

No

Metadata selectivity increases in importance as the size of the list increases. The following recommendations still apply to any list size. However, they might not be as important for smaller lists.

Selectivity is the amount of items that must be considered to return results for a query. There are two aspects to this: actual selectivity (the total number of results that match a query’s search condition) and throttling selectivity (the number of results that need to be considered after applying conditions that apply to indexed columns). Actual selectivity is the primary consideration when you are considering user experience. Throttling selectivity is the primary consideration when you are considering the effect to the instance of SQL Server.

To effectively use metadata navigation and other list filtering mechanisms, metadata that is used for filtering must be selective. By default, list views display 30 items so that users can scan results quickly to find what they are looking for. If queries return more than 30 results, users must use paging to find results. If you are using a Content Query Web Part, 10 to 15 is a common number of results. If there are more results than this, the additional results are not displayed. When you have hundreds of results for a query, it becomes difficult to find what you are looking for. Selectivity is also important to help prevent operations that exceed the list view threshold, which, in the case of metadata navigation, results in fallback and not all results are returned.

The content organizer can be the central component for organizing content in a document repository. Repositories that use the content organizer have a submission experience where users upload a document to it when it is in a final state. Some examples of scenarios that the content organizer can be used for include the following:

  • Automatically routing documents based on metadata between and within sites.

  • Routing documents and creating new folders automatically, such as folders based on day, month, and year.

  • Automatically balancing the number of items in folders.

The main purpose for most large lists is storing content so it can be retrieved. It is important to plan how users will retrieve content because this has the largest effect on performance of a large list and on the success of the large list implementation. Several features including search, metadata navigation, Content Query Web Parts, and views can all be used to help users retrieve content. Custom solutions, such as custom Web Parts that query against the data, are also commonly used. Plan ahead for how the Web site will be organized. You can use a central landing page, such as the home page of the Document Center site, to roll up content and provide entry points into the large list. You can also use publishing pages to create a Web site where various topics are covered on each page, and then Web Parts are used to pull related documents or items from the large list. Considerations for data access and retrieval include the following:

  • Any combination of search, Content Query Web Parts, metadata navigation, list views, and custom Web Parts can be used.

  • Plan ahead for how content will be retrieved and which columns will be used to filter and sort.

  • Plan the basic page model. Consider whether all the work is performed in the document library, or whether there is a landing page, or a multi-page model that links to related content.

There are three main SharePoint Server 2010 features that can be used to query and filter list data with simple configuration: list view and metadata navigation, Content Query Web Part, and search. There are additional options for using custom code to query a list; those options are not described in this article.

  • Views allow you to configure columns that are displayed. There are a variety of display methods for list data. Views can be configured to filter and sort results based on columns.

    • Metadata navigation is a filtering control for SharePoint Server 2010 list views. When metadata navigation is configured, you can select metadata hierarchies and key filters to dynamically filter results that are displayed in a list view.

  • The Content Query Web Part displays data from SharePoint Server 2010 lists. Queries can be configured to return results from one or multiple lists. By default, the Content Query Web Part is cached. However, it can be non-cached.

  • Search boxes or Search Results Web Parts can be used to return search results. You can narrow these results to a particular list and perform a guided search using the search metadata refinement controls.

The following table lists query methods and how they are used.

 

Query method Usage

List View & Metadata Navigation

List views always access the SQL Server back-end database, which results in the most performance impacting queries and causes higher SQL Server load. Use list views to provide more options for interacting with documents (check-in, check-out, editing properties) and real-time access to data.

Content Query Web Part

Content Query Web Parts use the portal site map provider to cache queries, and they render the least amount of HTML, which results in the fastest query and render times. Use Content Query Web Parts for dynamic navigation and to perform multiple queries on a single page.

Content Query Web Part Non Cached

To provide real-time access to data, the Content Query Web Part can query the database directly. Use this configuration when the query cannot be cached, when real-time updates are required, and for pages that are accessed less than one time per hour so the cache will never be populated. Initial load of a cached Content Query Web Part carries additional overhead.

Search

Use search queries to offload reads to a server infrastructure that is easier to scale and that is optimized for read performance. Search queries can be configured to use static queries, or you can allow users to specify search queries.

The following table shows various pros and cons of using Content Query Web Parts.

 

Pros Cons
  • Great as a navigation component, for example link to related documents or pages.

  • Simple configuration to display different columns.

  • Multiple Content Query Web Parts can easily be used on one page.

  • Fastest rendering time compared to search and list views.

  • Cached by default, reducing SQL Server load.

  • Only a limited number of properties can be displayed.

  • Links go directly only to items such as the document, page, or list item itself.

  • You cannot perform list actions.

The Content Query Web Part is used to retrieve content from lists. It can be used for pages, documents, and list items. By default, Content Query Web Parts are cached, which provides better performance and less effect on SQL Server resources. The default cache setting is for 30 minutes. Therefore, data remains fairly current. However, this also means that the Content Query Web Part uses more SQL Server resources than search queries use. Because Content Query Web Parts render the least amount of HTML, they are faster for page rendering, and multiple Content Query Web Parts can be configured on a single page. Cached Content Query Web Parts provide quick data access as the list size increases. Non-cached Content Query Web Parts have latency that is almost the same as a similar list view query.

Content Query Web Parts should be used as a navigational component and to provide content rollups on pages. For example, you can use pages to provide overviews of content that is located in a document library and then use Content Query Web Parts to return related pages and documents. Some other examples include items modified by the current user, newest items, and highest rated items. The Content Query Web Part can be used in high-read scenarios where most users do not need to perform list actions such as check-in, check-out, or manage versions. The following figure is a Content Query Web Part that displays highest rated documents.

Screenshot with highest rated documents

The Content Query Web Part can be used to access content without entering a list view. Users might have a small amount of content that they frequently access or items that they want to track. On the Document Center site template, three Content Query Web Parts are used by default: one that returns items modified by the logged in user, another for highest rated documents, and a third for most recent documents. Combined, all three of these Content Query Web Parts provide a landing page that provides content that a user might frequently access or be most interested in. This supports discovering new content and quickly accessing documents that are used frequently. Another example is creating a favorites list so that users can mark content to track, and then use a Content Query Web Part to return the list of favorites so that users can quickly access the content they frequently use without accessing the list itself.

When you use a Content Query Web Part with a large list, there are some important things to consider so that it correctly returns results and does not get blocked by the list view threshold. Items must be filtered to an amount that is lower than the list view threshold by using an indexed column. We do not recommend that you use cross list queries when one of the lists is a large list. If the total number of items considered in the cross list query is greater than the list view threshold for auditors and administrators (by default 20,000), then the operation is blocked.

Summary of Content Query Web Parts recommendations

  • Use Content Query Web Parts to return items that users might frequently access, be interested in, or that might help users discover content.

  • When you are using a Content Query Web Part against a large list, you should filter items so the query does not exceed the list view threshold.

  • You should only use columns with indexes for filtering.

  • Do not use the Content Query Web Part to query multiple lists if the total amount of items considered is greater than the list view threshold for auditors and administrators (by default 20,000).

  • Use caching for faster load times and less SQL Server load.

The following table shows various pros and cons of Search Web Parts.

 

Pros Cons
  • Offload queries from computers that are running SQL Server to search servers that are easier to scale.

  • Search query and index servers are more scalable and better performing than directly querying SQL Server.

  • Results are based on full text search of documents rather than only on metadata.

  • Summaries of text provide more information about results than Content Query Web Parts.

  • Results are the least current and are only as up to date as the most recent crawl.

  • Results do not display column values.

  • You cannot perform list actions.

Search queries scale better than directly accessing SQL Server resources because search is optimized for high-read scenarios, and it is easier to scale out to multiple search index and query servers than to scaling SQL Server instances. You can use preconfigured Search Web Parts, search boxes, or a combination to help users retrieve content. Search queries provide a method of offloading queries to search indexes, reducing SQL Server load. Search queries are also less affected by list size compared to Content Query Web Parts or list view queries.

You can use search in any scenario to display results from preconfigured or user-specified queries. Search provides the best performance at high scale points. Search should not be used if list actions must be performed on items or if data must be displayed in real time because results are only as current as the most recent crawl. The following table shows the three Search Web Parts that can be used.

 

Search Web Part Description

Core Results Web Part

Complete results with paging, and the most full-featured Web Part. It can take system or user specified queries.

Federated Results Web Part

A small set of results that has an optional link to access to the full results.

Search Box Web Part

A Web Part that is used to take user input for a query.

The following table shows various pros and cons of using list views.

 

Pros Cons
  • List view actions such as check-in, check-out, and editing properties can be used to interact with documents.

  • Easy to customize views and display different columns.

  • Highly interactive experience for dynamic filtering and sorting results in real time.

  • Most negative impact on performance for latency and throughput.

  • Slowest rendering time

  • The best user experience is with only one List View Web Part per page.

List views and metadata navigation can support content retrieval in large document libraries that use folders and indexes, or both. Querying with a list view is performed in real time and queries the SQL Server database directly. This provides the most recent results. However, it also has the largest effect on performance. Overall throughput will decrease and latency of operations will increase with large list sizes. List views also render the most content for a page to load. Therefore, page rendering time is often greater with a list view.

You should use metadata navigation and list views when you must be able to perform list view actions on items. List views can be the primary method of working with a list in low-read scenarios. In scenarios that have many read operations, you might want to consider other query methods as the primary method for accessing list data.

Summary of view configuration recommendations

  • Carefully select the columns that are used in views. Having more columns means there is more data to render, which increases page load time. There is a tradeoff between page load time and optimal user experience.

  • Minimize the amount of lookup columns, such as managed metadata and people and groups in views because this will cause joins with other database tables and increase database load.

  • Do not use totals for columns.

  • If views are not filtered by using indexed columns, select the option to show items in folders, and ensure that individual folders do not have more items than the list view threshold.

  • Views should be filtered on indexed columns to reduce the number of items that are returned so the number is less than the list view threshold (especially if there are no subfolders or if folders contain more items than the list view threshold).

  • Enable the metadata navigation feature to return the most recent results for queries that would otherwise be prevented by the list view threshold. By default, this is enabled on almost all sites.

  • If you are using filtered views in combination with metadata navigation, consider using per-location views to create unfiltered views for metadata navigation pivots so all results are returned.

Number of columns and lookup columns

Views are the most common method that is used to access list data. Views should be carefully selected to optimize how users can find content and to meet performance requirements. For a large list, pay special attention to how views are configured. We recommend only standard views and custom views. Datasheet, Gantt, and Calendar views are not recommended with lists that exceed the list view threshold because they can be blocked by the list view threshold. Views should have as few columns as possible, but be especially careful with the number of lookup columns (managed metadata, people and groups, and lookup types) because lookups perform joins on other tables, which affects performance.

Column filtering and totals

The new list view threshold in SharePoint Server 2010 presents a major change to how views must be used with large lists. Users will get errors if views attempt to return more results than the list view threshold. Using totals on a large list will always be blocked by the list view threshold. Therefore, do not use them. The number of items that must be scanned is what is important, not necessarily the number of rows that are returned. If a view has a filter where the column Color = “Red” and color is not an indexed column, then it might be prevented. Even though there might be only 100 items that match this query, if there are 10,000 items in the list, then the query must scan 10,000 items. In this case, users get errors when they try to access the view. To resolve this problem, you can use folders, filters and indexing, and metadata navigation.

Folders

If a list is organized so that no folder contains more items than the list view threshold, you can select the option to show items in folders. Showing all items outside of folders should be avoided unless you have mechanisms in place to filter the results to amounts less than the list view threshold.

Indexing

In an earlier example, performing a query on the column Color failed because it was not indexed. To resolve this problem, the Color column could be indexed, and then queries will work if the values are distinct enough. If there are only 100 red items, then this will work. However, if there are more items that match than the list view threshold, then even with indexing it will still be blocked. By default, the ID field, folder structure and multiple value lookups are indexed in the system. Any new columns that are created and used for filtering must have indexes that are manually created.

Recently changed items

The Recently Changed Items view is used to show the most recently changed items. It can be used for the default view when users frequently access content from a variety of sources that have been recently modified. It is easy to configure this view because it uses system metadata that is always set for every item. For a large list, you must either set the item limit to an amount under the list view threshold or filter the results to an amount less than the list view threshold. To create this view, you must index the modified field and sort in descending order. Specify a filter for the Modified column and use the formula, [Today-x], where x is the number of days’ worth of data to be shown. Select the option, greater than or equal. The formula, [Today-x], should return an amount of items less than the list view threshold.

My Items

The My Items view can be used in repositories where users frequently access their own documents. This view is easy to configure because it uses system metadata that is always set for every item. In this view, you filter by Modified By or Created By columns by both columns. To create this view in the filters, select the Modified By column, set the value to [ME], and then set a second filter with OR on the Created By column with the value also set to [ME]. The Created By column should be used in addition to the Modified By column when multiple users edit the same documents. Modified By is not a multiple user column. Therefore, this view will not necessarily show all the documents a user has ever modified. In this example, both columns should be indexed because the operation is an OR operation.

SharePoint Server 2010 provides new and improved features that enhance the user experience and performance of working with large lists. Throttles and limits protect the performance of the farm for other users and prevent operations from consuming a disproportionately large amount of SQL Server resources. Metadata enhancements and metadata navigation provide an enhanced experience for retrieving list data, and Content Query Web Parts, search, and list views can be configured to work with large lists. Careful planning is still required to create the correct solutions for your requirements. However, large lists can be developed quickly with configuring out-of-box solutions that have been designed to work with good performance.

Was this page helpful?
(1500 characters remaining)
Thank you for your feedback
Show:
© 2014 Microsoft