FILESTREAM (OLTP)---a Technical Reference Guide for Designing Mission-Critical OLTP Solutions

Want more guides like this one? Go to Technical Reference Guides for Designing Mission-Critical Solutions.

FILESTREAM integrates the Microsoft SQL Server Database Engine with an NTFS file system by storing large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Microsoft Win32–based file system interfaces provide high-performance streaming-based access to the data.

Historically, databases—at least to the average user—serve as data stores for well-defined data types, such as integers and strings. If a solution is required to store large images, such as pictures and documents, the application developer must decide whether to store the file on a file server with a pointer in the database or store the file in the database. Each option has its own set of benefits and important considerations.

FILESTREAM takes the benefits of both options and delivers them as an integrated part of the database engine.

Best Practices

Following are some best practices and pitfalls to avoid when using FILESTREAM. (Note that the full URLs for the hyperlinked text are provided in the Appendix at the end of this document.)

Overview

Development

  • FILESTREAM values must never be accessed without obtaining a handle from SQL Server. This is important from a consistency point of view (where SQL Server may have multiple copies of an object present in the filesystem to provide transactional consistency) and from a locking point of view (it is important that no attempt is made to move or delete files while data is being streamed in or out of the files). Note the example of Win32 access shown in the article FILESTREAM Data in SQL Server 2008 (ADO.NET)7.

  • The Microsoft Research white paper To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem8 presents the trade-offs between storing data in the filesystem and storing data in a database. The conclusion from that article is that objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are the most important factors. As a specific example, thumbnails (less than 250 KB) would be best stored as row data. The largest impact of storing small objects using FILESTREAM typically occurs during restore operations.

Case Studies and References

Examples of successful architectures are described in the following case studies and white papers:

Questions and Considerations

This section provides questions and issues to consider when working with your customers. The considerations listed below are based on SQL Server 2008 experiences and may change in future. For additional background and discussion of the Questions and Considerations summarized below, please refer to FILESTREAM Design and Implementation Considerations11.

Planning and Migrations

  • Solutions that will host large volumes of FILESTREAM data require thoughtful development considerations, mostly in terms of accommodating features available to database administrators (DBAs), such as piecemeal backups and disk management.

  • Involve the existing application tier for migrations. While it is possible to migrate to FILESTREAM in a way that is transparent to the application tier, doing so is inefficient.

  • Very large destinations (multiple terabytes) for FILESTREAM will require a partitioning design that overcomes the limitation of having one directory per FILESTREAM filegroup.

  • For migrations, you cannot bulk load individual files into FILESTREAM. You can use BULK COPY to mass move values from row data into FILESTREAM. Mass loading of data is possible from a table that does not use FILESTREAM to a table that does.

  • If the solution delivery is a migration plan, can it be done offline? An offline approach is often far simpler to carry out.

  • New solutions are relatively easy to deploy, even outside of the guidance provided in BOL. However, migrations can be complex, depending on the availability requirements, the total data size, and the capacity to accommodate copies of original records.

  • Do not underestimate the complexity of migrations.

Development

  • For very large databases (multiple terabytes), develop so that the database can be restored in a piece-meal fashion.

  • The NEWSEQUENTIALID() function cannot be used if true sequential GUIDs are required. This is because of server restarts and network card changes. The function is also limited to use as a default value for a column. While it is possible to generate a sequential GUID, doing so requires a custom function.

  • Use Microsoft Visual Studio Team System for Database Professionals to develop solutions, especially complex solutions that involve many partitions and filegroups.

Deployment

  • FILESTREAM has its own default filegroup. The use of FILESTREAM implies the use of at least two filegroups: one for row data and one for FILESTREAM data. If data can be marked read-only, implement this. You can then store this data in filegroups that can be marked as read-only.

  • Test the deployment and decide upon a rollback plan. You might need to have several rollback plans in place, depending on how much data is involved and where in the deployment a rollback-inducing issue might be encountered. Ensure that sufficient time windows are allowed for in a rollback plan. If you are migrating or on-boarding a large amount of data, a last-minute deployment failure might require rollback. Therefore, ensure that adequate planning is in place to respond quickly and effectively.

  • Refer to BOL for several requirements for enabling FILESTREAM. Be sure to decide whether the data will be accessed through the Win32-based API as this level of access needs to be specifically enabled.

  • If there is a large amount of data that will be stored in the FILESTREAM filegroup, ensure that the disks are ready and the solution can gracefully expand or shrink, should the destination run out of space.

  • Validate cluster implementation, testing after enabling FILESTREAM and prior to loading data.

Database Management and Administration

  • FILESTREAM filegroups can only have one directory. Adding FILESTREAM filegroups will not resolve disk limitations. If the solution is not partition aware for the FILESTREAM content, the only way to overcome out-of-space issues is to extend the size of the volume.

  • Running index rebuilds or another maintenance tasks do not access the FILESTREAM data; maintenance task durations, therefore, are not only shorter, but are more predictable.

  • Storing the data in the database requires pre-allocation of space. A FILESTREAM destination grows incrementally, value-by-value. There are no tools within SQL Server Management Studio or included in SQL Server 2008 R2 to view the amount of consumed space or free space. Make sure that any tools you use do not try to connect to the files.

  • FILESTREAM environments can get very large; familiarity with the many backup and restore features is critical. Validating backups and restores can be tricky in very large environments.

  • While administering systems, do not try to access FILESTREAM data using Windows Explorer.

  • The garbage collector that clears out records to be deleted is not aggressive enough. To make this occur more quickly, force a checkpoint, run a backup, wait ten minutes, and then run another backup.

Feature Interaction

  • You cannot use database mirroring with a database configured for FILESTREAM, however other databases within a SQL Server instance that support FILESTREAM can still participate in database mirroring.

  • Log shipping does support FILESTREAM-enabled databases.

  • Refer to BOL for additional considerations when using FILESTREAM with other SQL Server features.

Performance

  • Disable 8.3 filename generation for folders that hold FILESTREAM data and exclude these folders from operating system indexing services.

  • Accessing the FILESTREAM values through the Win32-based API is far more efficient than through Transact-SQL. The TDS protocol is not optimized for streaming access. However, set-based operations can only be carried out by using Transact-SQL.

  • Using the DATALENGTH() function to get the size of a value is far more efficient than using LEN().

Appendix

Following are the full URLs for the hyperlinked text.

1 Designing and Implementing FILESTREAM Storage https://msdn.microsoft.com/en-us/library/bb895234.aspx

2 FILESTREAM Storage in SQL Server 2008https://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx

3 FILESTREAM Design and Implementation Considerationshttp://blog.sqlcat.com/whitepapers/archive/2011/02/22/filestream-design-and-implementation-considerations.aspx

4 Managing Unstructured Data with SQL Server 2008https://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manage-unstructured.aspx

5 MSDN Webcast: 24 Hours of SQL Server 2008: Manage Your Unstructured Data with SQL Server 2008 https://www.microsoft.com/events/series/msdnsqlserver2008.aspx?tab=Webcasts&seriesid=114&webcastid=4620

6 Using FileTables to Manage Unstructured FILESTREAM Datahttps://msdn.microsoft.com/en-us/library/ff929144(SQL.110).aspx

7 FILESTREAM Data in SQL Server 2008 (ADO.NET)https://msdn.microsoft.com/en-us/library/cc716724.aspx

8 To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem https://research.microsoft.com/apps/pubs/default.aspx?id=64525

9 London Drugs: Retailer Improves Data Management to Reduce Costs and Safeguard Customer Goodwillhttps://www.microsoft.com/canada/casestudies/Case_Study_Detail.aspx?casestudyid=4000008137

10 McLaren Electronic Systems: McLaren Electronics Fuels Analysis of Formula One Racing Data with SQL Serverhttps://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4000001476

11 FILESTREAM Design and Implementation Considerations http://blog.sqlcat.com/whitepapers/archive/2011/02/22/filestream-design-and-implementation-considerations.aspx