Plan enterprise content storage

Applies To: Office SharePoint Server 2007

This Office product will reach end of support on October 10, 2017. To stay supported, you will need to upgrade. For more information, see , Resources to help you upgrade your Office 2007 servers and clients.

 

Topic Last Modified: 2016-11-14

This article contains information to help solution planners and designers properly plan and configure a large-scale enterprise content management solution based on Microsoft Office SharePoint Server 2007 so that it performs well while providing the features needed by site users. Office SharePoint Server 2007 supports high-capacity document storage; a document library can contain up to 5 million documents. However, depending on how the content is used, the performance of sites containing a very large number of documents can degrade. The prescriptive guidance provided in this article can help you design large-scale content management solutions that scale to the requirements of your enterprise while providing the users of your solution with a well-performing environment in which to create and use documents.

Decisions you make about the capacities of site collections, sites, and libraries in Office SharePoint Server 2007 should take into account not only the physical storage constraints of your Office SharePoint Server 2007 environment but also the content usage and viewing patterns of your users. For example, if users view or query a set of documents in a document library containing thousands of documents, performance can degrade if the site is not configured properly. Or if a service-level agreement requires that content be backed up twice a day, the service might not be satisfactorily performed if the set of content is too large. This article discusses techniques you can use to provide necessary content management functionality while maintaining acceptable performance.

In this topic, four levels of content storage are discussed:

  • Site collection

  • Site

  • Library

  • Folder

For each level of storage, this article describes the benefits of organizing content at that level, discusses how performance can decrease as the number of stored documents increases, and provides recommendations for improving performance when high volumes of content are present.

In this article:

  • Typical large-scale content management scenarios

  • Site collections: content storage benefits and limitations

  • Sites: content storage benefits and limitations

  • Libraries: content storage benefits and limitations

  • Folders: content storage benefits and considerations

  • Summary of recommendations

Note

Although the examples in this article are primarily relevant for solutions based on Office SharePoint Server 2007, the prescriptive guidance information provided here applies to both Office SharePoint Server 2007 and Windows SharePoint Services 3.0.

Typical large-scale content management scenarios

Typically, large-scale content management scenarios are variants of one of the following scenarios:

  • Large-scale authoring environment

  • Large-scale content archive

  • Extremely large-scale content archive

The scenario descriptions provided here are intended to clarify what we mean by large-scale solutions and to provide examples that hopefully reflect your content management goals.

Large-scale authoring environment

In a large-scale authoring environment, a site can contain a library in which users actively edit 50,000 or more documents across 500 or more folders. Versioning is enabled, and typically 10 or more previous versions of each document exist. Documents are checked in and out frequently and workflows are used to control their life cycles. Twenty or more content types might be in use. A typical database for this type of site contains approximately 150 gigabytes (GB) of data. (Note that each version of a document is stored separately in the database.) Typically, in a large-scale authoring environment, 80% of site users are authors who have access to major and minor versions of documents, while 20% of site users have read-only permissions and can only view major versions of the content.

A large-scale authoring environment site can be based on the Office SharePoint Server 2007 Document Center site template, which includes a single, large document library and which is optimized for large-scale authoring. See The Document Center site.

Large-scale content archive

A large-scale archive is a document repository in which users are either viewing documents or uploading new documents. Little or no authoring takes place in the site. There are two primary large-scale archive scenarios: knowledge base and records management.

In a knowledge base site, there is only a single version of most documents, so the site can scale to easily hold 1,000,000 or more documents. The content is typically stored in a single database as large as 400 GB. In a typical scenario, such as an enterprise's technical support center, 10,000 users might access the content, primarily to read it. A subset of users (3,000–4,000) uploads new content to the site. A knowledge base site can be based on the Document Center site template.

Another type of large-scale archive is a records center, based on the Records Center site template. This site template contains features that you can use to manage the retention and disposition of records (documents that serve as evidence of activities or transactions performed by the organization and that must be retained for some time period). Similar to a knowledge base site, a records center contains a single version of each document and could typically hold 1,000,000 or more documents. Many more users submit content to a records center than view or read it.

Extremely large-scale content archive

If the user interface of a site is customized to remove resource-intensive user interface operations such as complex viewing queries, an extremely large-scale content archive can be used as a reference library or content repository. An extremely large-scale archive might contain up to 10,000,000 documents distributed across 5,000 or more folders. The database can grow larger than three terabytes (TB).

In an extremely large-scale archive, users (50,000 or more) primarily browse content by searching. Content is submitted by using a custom submission form.

Site collections: content storage benefits and limitations

A site collection is a set of Web sites that has the same owner and shares administration settings. Each site collection contains a top-level Web site and can contain one or more subsites. A site collection usually has a shared navigation structure.

Benefits of storing content in the same site collection

The sites in a site collection are usually interrelated by purpose. To maximize your solution's usability, store all related data and content within a single site collection. Benefits of doing this include:

  • Content types and columns managed in a site collection can be shared across all sites in the site collection. Conversely, there is no automatic mechanism for propagating content types and column definitions across multiple site collections.

  • Information management policies managed in the site collection can be made available to content in all sites in the site collection.

  • Office SharePoint Server 2007 automatically updates links to renamed or moved files within a site collection to reflect their new names or locations. Conversely, links to documents in other site collections are not updated.

  • If the site collection is on a server running Windows SharePoint Services 3.0, searching can only be done over the content in that site collection. If the site collection is on a server running Office SharePoint Server 2007, content can be searched across multiple site collections.

  • Some views in Windows SharePoint Services 3.0 and Office SharePoint Server 2007 list documents from multiple sites within a single site collection (for example, a view enumerating all tasks assigned to a user across a site collection). Also, developers can create cross-site database queries within a site collection, but cross-site queries are not supported across multiple site collections.

  • Content quotas and other quotas can only be managed at the site-collection level.

Limits on storing content in the same site collection

Keep the following limits in mind when planning how to allocate your content across one or more site collections:

  • Creating too many subsites of any site in a site collection might affect performance and usability. Limit the number of subsites of any site to 2,000 at most.

  • All sites in a site collection share the same back-end resources. In particular, all content in a site collection must be stored in the same content database. Because of this, the performance of database operations — such as backing up and restoring content — will depend on the amount of content across the entire site collection, the size of the database, the speed of the servers hosting the database, and other factors. Depending on the amount of content and the configuration of the database, you might need to segment a site collection into multiple site collections to meet service-level agreements for backing up and restoring, throughput, or other requirements. It is beyond the scope of this article to provide prescriptive guidance about managing the size and performance of databases. For more information about capacity planning, see Plan for performance and capacity (Office SharePoint Server).

  • Particularly, keep extremely active sites in separate site collections. For example, a knowledge base site on the Internet that allows anonymous browsing could generate a lot of database activity. If other sites use the same database, their performance could be impacted. By putting the knowledge base site in a separate site collection with its own database, you can free up resources for other sites that no longer have to compete with it for database resources.

Note that Windows SharePoint Services 3.0 and Office SharePoint Server 2007 have a number of features that mitigate the need to have your IT department restore content. The Recycle Bin and the Site Collection Recycle Bin provide a double safety mechanism for restoring inadvertently deleted items. Document versioning also provides a safety net for lost documents because their previous versions are available. To further ensure the availability of previous versions, an administrator can remove the delete versions permission from authors' permissions; this can help to guarantee that previous versions of content are available without having to restore them from the database.

Sites: content storage benefits and limitations

A Web site is the primary means of organizing related content in Office SharePoint Server 2007 and Windows SharePoint Services 3.0.

Benefits of storing content in the same site

  • It is easier to create pages that display views of multiple libraries and lists when they are in the same site.

  • The site navigation user interface is optimized to make it easy to find and navigate to libraries within the same site.

  • You define and assign permissions to groups at the site level.

  • You can define a set of content types and site columns for use in a site.

The Document Center site

Office SharePoint Server 2007 includes a Document Center site template. Use this template to create a site that is optimized for creating and using large numbers of documents.

To enable document management best practices, sites based on the Document Center site template have recommended document management features enabled by default, including:

  • Navigation features to help authors find their content.

  • Major/minor versioning enabled.

  • Required check-in and check-out of documents.

  • Multiple content types enabled.

  • A Relevant Documents Web Part that generates a personalized view of documents checked out by, created by, or last modified by the current user. You can configure the Web Part to use more than one criterion.

  • An Upcoming Tasks Web Part that generates a personalized view of document-related tasks assigned to the current user.

Column indexing is a technique that helps ensure that a view or query returns a list of items in the recommended range of 2,000 or fewer items. Use the following table to determine the right columns to index for each query that the Relevant Documents Web Part supports:

If you configure the Relevant Documents Web Part to… Then, in the Shared Documents library, index the following column:

Include documents last modified by me

Modified By

Include documents created by me

Created By

Include documents checked out by me

Checked Out By

Along with indexing columns to improve the performance of the Relevant Documents Web Part, make sure that the Show items from the entire site collection checkbox is not selected when configuring the Web Part in a large-scale document management environment.

Limits on storing content in the same site

  • More than 2,000 libraries and lists in a single site will degrade performance.

  • Usability tests show that having more than 50 lists and libraries in the site's navigation structure makes it more difficult to navigate the content by using the user interface.

Libraries: content storage benefits and limitations

A document library is a location in a site containing files of one or more content types. Document libraries are designed to manage and store related documents and to let users create new documents of the appropriate types.

Benefits of storing content in the same library

  • It is easier for users to add new documents or find existing documents within a single library.

  • Many document management settings — such as permissions, content versioning, and approval — are applied at the library level.

  • Views created by using the user interface are bound to a particular library.

  • Information management policies, such as content auditing and retention settings, can be applied to a library.

Limits on storing content in the same library

  • The maximum recommended size of a library is 10,000,000 documents.

  • To apply unique document management settings to content, such as required checkouts or versioning, the content must be stored in a separate library.

  • If multiple content types are used in a library and each content type has one or more columns of metadata that only apply to that content type, views can become confusing. To alleviate this, you can associate each content type with a separate library.

  • The performance of views of content degrades when the number of items viewed exceeds 2,000 items. Remedies for this limitation are to organize the content in the library into folders each containing 2,000 or fewer items, or to create views that take advantage of indexed columns to return sets of 2,000 or fewer items (see below for a discussion of using indexed columns in views).

Note

All Web page content in a site is stored in a single Pages library in that site, which contains all of that site's Web content pages. The recommended limitation of 2,000 or fewer items per view or query applies to Pages libraries in addition to document libraries.

Using indexed columns to improve view performance

As mentioned above, the performance of views degrades if the number of items displayed exceeds 2,000 items. A useful technique for limiting the number of items to display in a view is to index a column used in the view, and then to filter the view based on that column so that 2,000 or fewer items are displayed. (An indexed column is one that Office SharePoint Server 2007 maintains a record of to make view-related queries more efficient.)

For example, if it is unlikely that more than 2,000 items in a library will be modified in any seven-day period, you could index the Modified column in a library and then filter a view so that only items changed in the last seven days are displayed. (To do this, specify that the Modified column is less than Today-7.) As another example, if it is likely that each author will create less than 2,000 items, you could index the Created By column and then filter a view so that authors only see the documents they created. (To do this, specify that the Created By column is equal to Me.)

The following types of column types can be indexed and used to filter views:

  • Single line of text

  • Multiple lines of text

  • Number

  • Currency

  • Choice

  • Date and Time

  • Lookup

  • Yes/No

  • Person or Group

  • Calculated

Here are other considerations in creating views filtered by indexed columns:

  • Only one indexed column can be used in a view.

  • Do not create filters using "Or" to provide multiple criteria when using an indexed column to filter a view.

  • Using the Item Limit feature to modify a view does not improve the view's performance.

Note

If a user tries to create a view that could benefit from using an indexed column, Office SharePoint Server 2007 will display a warning message recommending that approach.

Folders: content storage benefits and considerations

A folder is a named subdivision of the content in a library similar to folders in a file system. The primary purpose of folders is to organize content to match the expected functionality of the library. For example, if a library is intended to provide product specifications, the set of folders in the library could be named for each feature area in the product or for each team member who writes product specifications.

Folders can be used to enhance library performance. By dividing content across multiple folders, each containing 2,000 or fewer items, views on the folders will perform well. Note that, to take advantage of this, views available within folders must be configured to only show items inside the folders (this feature is available in the default Office SharePoint Server 2007 view-creation interface). Note also that, if folders contain 2,000 or fewer items, views in the folders do not have to be filtered using indexed columns.

Summary of recommendations

Here is a summary of the recommendations for improving performance at each level of storage when high volumes of content are being stored:

Level Performance limits

Site collection

2,000 subsites of any site is the recommended limit.

The same content database is used for an entire site collection. This may affect performance in operations such as backup and restore.

Site

2,000 libraries and lists is the recommended limit.

Library

10,000,000 documents is the recommended limit.

2,000 items per view is the recommended limit.

Folder

2,000 items per folder is the recommended limit.

Download this book

This topic is included in the following downloadable book for easier reading and printing:

See the full list of available books at Downloadable content for Office SharePoint Server 2007.

See Also

Concepts

Plan for performance and capacity (Office SharePoint Server)
Plan records management