Storing your millions of SharePoint documents externally can boost both SharePoint and SQL Server performance.
SharePoint has clearly staked its claim as one of the premier collaboration platforms. Working together means sharing documents, so among its many other features, SharePoint includes document management. Therefore, you’re probably using SharePoint as a place to store your documents to make it easier to share them with others.
Every time you store another document in SharePoint, the database grows. Each document you store in SharePoint may be anywhere from 500KB to 2MB, sometimes some even larger. When you start talking about hundreds of thousands or even millions of documents stored in SharePoint, it can soon become massive. SharePoint stores all these documents in a SQL Server database. As the database size grows beyond a certain threshold, it can become difficult to manage and degrades the performance of both SharePoint and SQL Server.
Like other relational databases, SQL Server wasn’t necessarily designed to store documents (called binary large objects, or BLOBs, in SharePoint lingo). Instead, it was designed to store structured relational data. SharePoint stores all the documents as BLOBs in the database, and by storing these documents as such, your SQL Server database can get real big real quick (see Figure 1).
Figure 1 A bloated content database causes performance bottlenecks
If the SQL Server database you’re using is 20GB or less, it will still be manageable. In many cases, your SQL Server database may grow into the hundreds of gigabytes or even the terabyte range. At this size, database management operations become more challenging, time consuming and problematic if anything goes wrong. Even simple backup and restore operations take on a whole new meaning.
And, when all your documents are stored in SQL Server, SharePoint performance can suffer. SharePoint provides a simple document storage mechanism within SQL Server that lets you archive older documents to cheaper storage, keeping documents around for a certain period of time for compliance reasons and more.
The only real solution to any storage-related performance problems is to migrate all documents (BLOBs) out of your SQL Server database and into external storage. Besides other benefits, this can immediately reduce the size of your SQL Server database by almost 95 percent. Then, no matter how large and how many your documents you’re storing in SharePoint, your SQL Server database size stays very small. All the documents now reside outside in external storage.
SharePoint provides two mechanisms for externalizing BLOBs. One is external BLOB storage (EBS), which works with Microsoft Office SharePoint Server 2007 and SharePoint 2010. However, SharePoint doesn’t provide any EBS providers out of the box. You have to use a third-party EBS provider.
In addition to EBS, SharePoint 2010 (if used with SQL Server 2008 R2) provides remote BLOB storage (RBS). Unlike EBS, SharePoint 2010 comes with a built-in RBS provider for file streaming. RBS is actually a feature added to SQL Server 2008 R2 that SharePoint 2010 is tapping into. By providing RBS in SQL Server 2008 R2, Microsoft has essentially acknowledged that storing BLOBs in the database is not the ideal approach and keeping them outside is often a better strategy.
If you decide to use a third-party EBS or RBS provider, you can quickly migrate your BLOBs out of SQL Server. Most of these providers are installed on Web front-end (WFE) servers and also provide GUI-based administration features (normally as part of SharePoint Administration GUI).
You plug both EBS and RBS modules into the WFE servers. Once that happens, SharePoint directs all runtime document requests (either for read or write) to the EBS or RBS module. This module is then responsible for reading or writing the BLOB to the external storage. So, any documents created or updated at runtime are automatically stored externally.
If you decide to use a third-party EBS or RBS provider, you’ll probably have multiple storage options (see Figure 2). Before you migrate your BLOBs out of the database, you’ll need to choose an option. Here are several typical options:
Figure 2 RBS- or EBS-based external BLOB storage in SharePoint
If you decide to use the built-in RBS provider with SharePoint, your only storage option is the File System. However, third-party providers will give you all of the previously mentioned options and make your job easier.
If you have an exceptionally large SharePoint content database, it will likely take some time for you to migrate all the BLOBs out of the database and to external storage. Still, you should be able to migrate your BLOBs without having to stop SharePoint.
Using a third-party solution can help you start the migration process while SharePoint is still up and available to all users. You’ll be able to keep track of the migration process while your users are reading and writing documents in SharePoint. Any documents added or updated at runtime while the migration is underway are also migrated.
Because migrating BLOBs out of the database is a lengthy task, any third-party provider you choose should offer you some additional features, such as:
If you have a lot of legacy documents you’ve just recently scanned or haven’t moved into SharePoint, you should consider how you’ll import those. While the normal import feature of SharePoint will copy all your documents into the SQL Server database, this may be undesirable for two reasons: First, keeping so many BLOBs in SQL Server is not a good idea. Second, copying so much data from one location into your SQL Server database can be cumbersome. Your legacy documents are probably in some file system. If you’re using a third-party provider to externalize your documents, you should be able to move them to external storage yourself and start an import process provided by the third-party solution. This import process should simply iterate over the folders where your legacy documents were copied in the external storage. Then it should update the SQL Server database with only the meta data of those documents. This will expedite importing legacy documents.
Now that you’re migrating documents out of your SQL Server database and into external storage, you should specify multiple tiers of storage (see Figure 3). Each tier would have an importance level and will let you keep the most-active and newest documents in the highest-priority tier. You’d store the least-active and oldest documents in the lowest-priority tier.
Having multi-tier storage gives you various benefits. First, you can grow your storage capacity because each tier is likely a separate storage location. Second, you can improve SharePoint performance because the active tiers will each have fewer documents. This makes it faster for SharePoint to fetch documents in a folder.
Figure 3 Multi-tier storage and archiving can be helpful for compliance reasons
Another important aspect of multi-tier storage is that you can specify archiving rules. As a document ages, you can move it to lower priority tiers. Also, because the older documents don’t change as frequently, you can simplify overall document management tasks (including backup and restore).
Automatic archiving is also useful if your company is subject to compliance rules. You may be asked to keep documents for a certain period of time but you don’t want your primary tier to become overloaded with data. Archiving older documents to another tier frees up storage space on your primary tier.
Archiving also lets you eventually delete documents if your rules specify that certain data must be deleted after a given time. You can set up your system to perform all archiving tasks automatically.
In addition to the other benefits of external storage, if you decide to use a third-party solution, you’re likely to also get in-memory caching of BLOBs and list data. Caching BLOBs on WFE servers dramatically improves SharePoint response time and makes it more scalable, because caching reduces expensive trips to the database and even to external storage (see Figure 4).
Figure 4 BLOB and list caching to improve SharePoint performance
You can specify how much memory caching should consume on your WFE servers. This way you can balance the overall memory capacity with other applications. You can also specify expiration rules so BLOBs and lists are cached only as long as you need.
Moving documents out of the SharePoint content database can be an involved process, but they’ll be kept in outside storage, which is an important benefit. You also improve SharePoint performance with caching. It makes sense when you consider all the benefits you get by externalizing your BLOBs with the help of a third-party EBS or RBS provider.
Iqbal Khan is the president and technology evangelist of Alachisoft (alachisoft.com). Alachisoft provides NCachePoint & NCache. NCachePoint is the industry’s leading SharePoint performance and scalability product, and NCache is a popular .NET distributed cache. You can reach him at firstname.lastname@example.org.