Storing BLOBs outside the database can elevate the performance of SharePoint 2010.
SharePoint has become an extremely popular portal platform. It provides a great number of applications for collaborating among different company groups or even between different companies. Process management, document management and other major applications are all part of SharePoint’s array of features.
But SharePoint’s popularity has its downside as well. As more people use it, there can be major performance and scalability bottlenecks because SharePoint makes a heavy use of SQL Server database for everything. This could include structured data or documents coming from Microsoft Word, Excel, PowerPoint and Adobe Acrobat. All these documents are coming from the database.
There are two potential issues associated with SharePoint 2010. One: Databases can grow exceptionally large because of all the binary large object (BLOB) data. Two: Reading and writing BLOBs, as well as other relational data, can slow down SQL Server performance because it’s not the ideal place for storing BLOBs.
Relational databases are designed to handle structured relational data. Their architecture is geared toward that. Microsoft has added support for BLOBs, but it’s not the ideal situation. On the other hand, the file storage is designed to store files, which are basically streams of data or BLOBs.
SharePoint 2010 is heavily document-centric with Word, Excel, PDF, and PowerPoint files. That vast number of large documents quickly makes the database larger than is practical. As a result, SharePoint performance can take a hit. Figure 1 shows the performance issues associated with the SQL Server database and BLOBs.
Figure 1 Bloated SQL Server database slows down SharePoint 2010.
This can become more aggravated as you add more users and documents to SharePoint. If you have tens of thousands of documents stored in the database, the database becomes overloaded and there are a lot of documents that need to go back and forth between the SharePoint Web farm and the SQL database.
As the database size grows abnormally large, this contributes significantly to database sluggishness. If this SharePoint data were structured relational data, it could be indexed intelligently and properly handled by SQL Server.
Total size of the BLOB data can expand quickly and grow larger than the total size of the document metadata and other structure data stored in the database. It is helpful to move BLOB data out of the SQL Server database and into a separate storage. That’s because BLOB data can consume a lot of file space and uses server resources that are optimized for database access patterns.
Moving BLOBs out of the Database
In Microsoft Office SharePoint Server (MOSS) 2007, Microsoft provided a mechanism called External BLOB Storage, or EBS. The EBS plug-in architecture helped third-party vendors install an EBS module to intercept SharePoint database traffic and redirect all the BLOB traffic to a separate BLOB storage. EBS works fine and it handles this issue quite effectively.
EBS lets you migrate BLOBs out of the database, keep them in file system storage, a storage-area network (SAN) or a network-attached storage (NAS). These storage systems were well-suited for storing BLOBs because these are all documents—Excel, Word, PDF, those types of things—that a typical organization creates and shares across its users.
However, you can’t directly use EBS because no EBS provider is bundled with MOSS 2007. Another drawback, EBS is not 100 percent architecturally .NET, but still based on the old COM interface. In this case, you must rely on a third-party provider of EBS. Without the help of a third-party module, IT can’t use EBS. A third-party module has to also take care of issues like BLOB cleanup. When a BLOB is deleted, SharePoint never asks the module to delete the BLOB. It just stops referring to it.
The EBS module vendors write must have a BLOB garbage collection or BLOB cleanup feature that periodically searches for all those BLOBs no longer referenced by SharePoint. It then deletes them because the user deleted those documents.
Similarly, if the user updates a document, SharePoint never actually updates the existing BLOB. It always creates a new BLOB, so that the older BLOB is still around. The garbage collection feature of the EBS module must remove those BLOBs. EBS does the job appropriately in that it actually improves the SharePoint performance quite a bit by moving the BLOBs out of the database.
Enter SQL Server 2008 and RBS
Microsoft SQL Server 2008 has a built-in Remote BLOB Storage (RBS) feature. RBS lets SQL Server users store all the BLOBs outside the database. Microsoft provides a built-in FILESTREAM provider for the regular file system. It has also published the interface and the specs for third-party storage companies to develop providers for this special storage. Companies such as Hitachi, EMC or even cloud storage like Windows Azure or Amazon cloud storage, can now implement or provide the implementation of RBS for their own storage systems.
All of these features are available for SQL Server 2008 users. SharePoint also taps into this so you can configure SharePoint 2010 to use SQL Server 2008 as its content database. To use the RBS feature in SharePoint 2010, you have to be using SQL Server 2008, even though SharePoint 2010 does work with SQL Server 2005; SQL Server 2005 does not have the RBS feature.
From SharePoint’s perspective, RBS does exactly the same thing as EBS. It’s a mechanism for storing BLOBs outside the database. The only difference is that with EBS, third-party vendors had to provide the EBS modules. With RBS, Microsoft already has a FILESTREAM provider.
You’ll find that the most convenient way to use RBS in SharePoint is through a third-party product. Doing so makes the entire process extremely easy to use, comprehensive and with everything administered through GUI tools. If you don’t like third-party software, you can configure RBS for BLOB storage within SQL Server 2008 and SharePoint 2010.
RBS in SQL Server 2008 has a built-in garbage collection process called RBS Maintainer, which was not there for EBS. RBS Maintainer is a separate process and it cleans up all the unreferenced BLOBs, which, with EBS, the third-party vendor had to implement on its own.
However, from a user’s perspective, both RBS and EBS provide the same value with the third-party implementation. If the user doesn’t want to use a third-party vendor, then RBS is the only option. If the end user doesn’t mind a third-party solution for taking the BLOBs out of the database, then that could either be for RBS or for EBS.
Both EBS and RBS improve SharePoint performance equally. The difference is that EBS relies on a legacy COM interface, whereas RBS is a purely .NET-based solution. From a technology perspective, RBS fits in to .NET quite nicely. EBS still depends on a legacy interface.
You can configure SharePoint 2010 to use the FILESTREAM RBS provider, which is built into SQL Server 2008. Currently, that’s the only provider that comes with SQL Server 2008. In the future, you’ll see other third-party RBS providers. With these, you can move the BLOBs into this outside storage (see Figure 2).
Figure 2 BLOBs moved out of SQL Server 2008 with RBS
Moving the BLOBs outside the database is a key element of improving your SharePoint performance and making your database much more manageable. However, you need to be aware that there are other things you need to do if you’re going to further improve SharePoint performance.
BLOB and List Caching Also Improve Performance
Once you externalize the BLOBs, caching BLOB data can further and significantly improve SharePoint application performance. This is particularly true when you use a distributed cache for frequently used BLOBs in the Web front-end (WFE) server memories. This minimizes trips to BLOB storage. This way, you can read those BLOBs quickly, and speed up the response time of your SharePoint application.
You can also use caching for other types of data. For example, SharePoint makes heavy use of the list data you can cache. Caching your list data is another advantage that in-memory cache provides. Virtually everything in SharePoint 2010 is shown through the list. SharePoint has to make database trips each time to read the list. By caching those lists in the WFE server memory, you avoid countless database trips and improve performance. Caching BLOBs and list data can significantly elevate performance levels (see Figure 3).
In-memory caching is gaining considerable traction in both the .NET and Java spaces. Your cache should run on your WFE servers. Depending on whether you have 32- or 64-bit servers, it can allocate as few as 500MB to 5GB or 10GB of memory, depending on how much memory you have available.
This cache is all the data the WFE is reading, either from the BLOB storage or from SQL Server, and it’s all transparent. Whatever is fetched is automatically kept in the cache. The next time SharePoint needs the same data, whether it’s the BLOB or the list, it will find it in the distributed cache. This means the cache actually spans multiple WFE servers.
For large installations, you can also migrate the cache to a dedicated caching tier. Being a distributed cache helps it stay synchronized across multiple servers. So if a document is updated from one Web server, the others will also know about it. Caching helps take performance to the next level because in-memory cache is extremely fast.
Figure 3 Data caching further improves SharePoint performance
If your SharePoint is configured to use a single-worker process on each WFE, you can keep the cache within the worker process. However, you need to consider the worker process memory size limit on a 32-bit platform—a single-worker process can’t have more than 1GB. If that’s the case, you can keep the cache in a separate process.
After BLOBs are externalized and caching is implemented, you improve performance multiple times because you’re no longer making those expensive network trips. You’re not going to BLOB storage and you’re not going to SQL Server for the list data. Everything is right there in the WFE server memory.
Performance benchmarks with an in-proc cache show that the performance is at least three to four times faster because the data is in your own process memory. Even if it’s out-proc, inter-process communication on the same box is much faster than when going across the network.
While you can certainly implement your own caching by having your developers perform some programming, it’s best to use a third-party distributed cache to avoid time-consuming and error-prone installation.
Take advantage of BLOB externalization with SQL Server 2008 RBS and improve your SharePoint 2010 performance. If you’re not prepared to move to SQL Server 2008, but still want to externalize BLOBs, you can do it yourself or use a third-party option to move BLOBs out of the database via EBS.
Iqbal Khan is the president and technology evangelist of Alachisoft (alachisoft.com). Alachisoft provides NCachePoint & NCache. NCachePoint is the industry’s leading SharePoint performance and scalability product, and NCache is a popular .NET distributed cache. You can reach him at firstname.lastname@example.org.