Securing Data in Hosted Applications
By John deVadoss, Senior Microsoft Application Development and Platform Marketing; Fred Chong, Microsoft Application Architecture; and Gianpaolo Carraro, Microsoft Service Delivery
See other Viewpoint articles.
Trust, or the lack thereof, is a key factor with respect to the adoption of the Hosted Application and the Software as a Service (SaaS) model. The case could be made that data is the most important asset of any business application—data about products, customers, employees, suppliers, and more. In a hosted model, an organization must surrender a level of control over its own data, trusting the service provider.
In order to earn this trust, one of the highest priorities for a provider is to create a data architecture that is both robust and secure enough to satisfy tenants or customers who are concerned about surrendering control of vital business data to a third party. A highly secure data architecture is one that provides defense in depth, using multiple defense levels that complement one another to provide data protection in different ways, under different circumstances, against both internal and external threats.
In this article, we rely on three underlying patterns to realize multi-tenant data architectures:
Filtering: Using an intermediary layer between a tenant and a data source that acts like a sieve, making it appear to the tenant as though its data is the only data in the database
Permissions: Using access control lists (ACLs) to determine who can access data in the application and what they can do with it
Encryption: Obscuring every tenant's critical data so that it will remain inaccessible to unauthorized parties even if they come into possession of it
In a multi-tier application environment, application developers traditionally use one of two methods to secure access to data stored in databases: impersonation or a trusted subsystem account.
With the impersonation access method, the database is set up to allow individual users to access different tables, views, queries, stored procedures, and other database objects. When an end user performs an action that directly or indirectly requires a call to a database, the application presents itself to the database as that user, literally impersonating the user for the purposes of accessing the database. (In technical terms, the application employs the user's security context). A mechanism such as Kerberos delegation can be used to allow the application process to connect to the database on behalf of the user.
With the trusted subsystem access method, the application always connects to the database using its own application process identity, independent of the identity of the user; the server then grants the application access to the database objects that the application can read or manipulate. Any additional security must be implemented within the application itself to prevent individual end users from accessing any database objects that should not be exposed to them. This approach makes security management easier, eliminating the need to configure access to database objects on a per-user basis, but it means giving up the ability to secure database objects for individual users.
In a SaaS or hosted application, the concept of "users" is a bit more complicated than in traditional applications because of the distinction between a tenant and an end user. The tenant is an organization that uses the application to access its own data store, which is logically isolated from data stores belonging to any other tenants. Each tenant grants access to the application to one or more end users, allowing them to access some portion of the tenant's data using end user accounts controlled by the tenant.
In this scenario, you can use a hybrid approach to data access that combines aspects of both the impersonation and trusted subsystem access methods. Therefore, you can take advantage of the database server's native security mechanisms to enforce the maximum logical isolation of tenant data without creating an unworkably complex security model.
This approach involves creating a database access account for each tenant and using ACLs to grant each of these tenant accounts access to the database objects the tenant is allowed to use. When an end user performs an action that directly or indirectly requires a call to a database, the application uses credentials associated with the tenant account, rather than credentials associated with the end user. (One way for the application to obtain the proper credentials is through impersonation, in conjunction with a credentialing system like Kerberos. A second approach is to use a security token service that returns Fan actual set of encrypted logon credentials established for the tenant that the application process can then submit to the database.) The database server does not distinguish between requests originating from different end users associated with the same tenant, and it grants all such requests access to the tenant's data.
For example, consider an end user of a customer relationship management (CRM) application who performs an operation that queries the database for customer records matching a certain string. The application submits the query to the database using the security context of the tenant, so instead of returning all of the matching records in the database, the query retrieves only the matching rows from the tables the tenant is allowed to access. So far, so good—but suppose the end user's role allows her to access only records of customers located within a certain geographic region. The application must intercept the query results and present the user with only the records that she is entitled to see.
Securing Database Tables
To secure a database on the table level, use the GRANT command in Microsoft SQL Server 2005 to grant a tenant user account access to a table or other database object:
GRANT SELECT, UPDATE, INSERT, DELETE ON [TableName] FOR [UserName]
This adds the user account to the ACL for the table. If you use the hybrid approach to database access discussed earlier, in which end users are associated with the security contexts of their respective tenants, this only needs to be done once, during the tenant provisioning process; any end user accounts created by the tenant will be able to access the table.
This pattern is appropriate for use with both separate-database and separate-schema approaches. In the separate-database approach, you can isolate data by simply restricting access on a database-wide level to the tenant associated with that database, although you can also use this pattern on the table level to create another layer of security.
Tenant View Filters
Views in SQL Server can be used to grant individual tenants access to some of the rows in a given table, while preventing them from accessing other rows.
In SQL Server, a view is a virtual table defined by the results of a SELECT query. The resulting view can then be queried and used in stored procedures as if it were an actual database table. For example, the following statement in SQL Server creates a view of a table called Employees, which has been filtered so that only the rows belonging to a single tenant are visible:
CREATE VIEW TenantEmployees AS SELECT * FROM Employees WHERE TenantID = SUSER_SID()
This statement obtains the security identifier (SID) of the user account accessing the database (which, you'll recall, is an account belonging to the tenant, not the end user) and uses it to determine which rows should be included in the view. (The example assumes that the unique tenant ID number is identical to the tenant's SID. If this is not the case, one or more additional steps would be required to associate each tenant with the correct rows.) Each individual tenant's data access account would be granted permission to use the TenantEmployees view, but granted no permissions to the Employees source table itself. You can build queries and shared procedures to take advantage of views, which provides tenants with the appearance of data isolation even within a multi-tenant database.
This pattern is slightly more complex than the Secure Database Tables pattern, but is an appropriate way to secure tenant data in a shared-schema application, in which multiple tenants share the same set of tables.
Tenant Data Encryption
A way to further protect tenant data is by encrypting it within the database, so that data will remain secure even if it falls into the wrong hands.
Cryptographic methods are categorized as either symmetric or asymmetric. In symmetric cryptography, a key is generated that is used to encrypt and decrypt data. Data encrypted with a symmetric key can be decrypted with the same key. In asymmetric cryptography (also called public-key cryptography), two keys are used, designated the public key and the private key. Data that is encrypted with a given public key can only be decrypted with the corresponding private key, and vice versa. Generally, public keys are distributed to any and all parties interested in communicating with the key holder, while private keys are held secure. For example, if Alice wants to send an encrypted message to Bob, she obtains Bob's public key through some agreed-upon means, and uses it to encrypt the message. The resulting encrypted message, or cyphertext, can only be decrypted by someone in possession of Bob's private key (in practice, this should only be Bob). This way, Bob never has to share his private key with Alice. To send a message to Bob using symmetric encryption, Alice would have to send the symmetric key separately—which runs the risk that the key might be intercepted by a third party during transmission.
Public-key cryptography requires significantly more computing power than symmetric cryptography; a strong key pair can take hundreds or even thousands of times as long to encrypt and decrypt data as a symmetric key of similar quality. For hosted applications in which every piece of stored data is encrypted, the resulting processing overhead can render public-key cryptography infeasible as an overall solution. A better approach is to use a key wrapping system that combines the advantages of both systems.
With this approach, three keys are created for each tenant as part of the provisioning process: a symmetric key and an asymmetric key pair consisting of a public key and a private key. The more efficient symmetric key is used to encrypt the tenant's critical data for storage. To add another layer of security, a public/private key pair is used to encrypt and decrypt the symmetric key, to keep it secure from any potential interlopers. When an end user logs on, the application uses impersonation to access the database using the tenant's security context, which grants the application process access to the tenant's private key. The application (still impersonating the tenant, of course) can then use the tenant's private key to decrypt the tenant's symmetric key and use it to read and write data.
This is another example of the defense-in-depth principle in action. Accidental or malicious exposure of tenant data to other tenants—a nightmare scenario for the security-conscious service provider—is prevented on multiple levels. The first line of defense, at the database level, prevents end users from accessing the private data of other tenants. If a bug or a virus in the database server were to cause an incorrect row to be delivered to the tenant, the encrypted contents of the row would be useless without access to the tenant's private key.
The importance of encryption increases the closer a hosted application is to the "shared" end of the isolated/shared continuum. Encryption is especially important in situations involving high-value data or privacy concerns, or when multiple tenants share the same set of database tables.
Because you can't index encrypted columns, selecting which columns of which tables to encrypt involves making a tradeoff between data security and performance. Think about the uses and sensitivity of the various kinds of data in your data model when making decisions about encryption.
Designing a hosted data architecture that reconciles the competing benefits and demands of sharing and isolation isn't a trivial task, but the patterns that we have discussed should help you identify many of the critical questions you will face and, we hope, help you create the foundation layer of trust that's vital to the success of your hosted application.