Printer Friendly Version      Send     
Click to Rate and Give Feedback
TechNet
TechNet Library
Enabling Information Security through HBI Information Classification

Technical Case Study

Published: December 2007
Updated June 2008

To prevent the inadvertent disclosure of High Business Impact (HBI) information Microsoft IT designed and implemented a system using Microsoft technologies in conjunction with a third-party solution that automatically identifies and classifies HBI information at risk, and then starts the remediation process.

The security of sensitive information is one of the greatest concerns facing many companies today. The loss or theft of HBI information is of particular concern as it could expose the company to an information breach that could potentially cause a loss in revenue, productivity, reputation, brand value, or even a company's competitive advantage if the information includes key intellectual property (IP).

This paper describes the approach, design, implementation, and benefits of such a technical solution at Microsoft. The paper also provides suggested best practices so that Microsoft customers can benefit from the lessons that the project team learned. This paper is intended for IT professionals who design and manage compliance systems, in addition to risk managers and compliance auditors.

Situation

At Microsoft, about 100 terabytes of data is disbursed over 110,000 managed Microsoft® SharePoint® sites and over 30,000 file shares across the company where HBI information could reside. Microsoft IT needed to create a solution that relied on technology to identify information that could be at risk and then to help prevent the unauthorized disclosure—whether inadvertent or malicious—of this information.

Microsoft has long had content policies in place in accordance with a number of regulatory and corporate mandates. The missing component was an automated identification and monitoring mechanism through which line-of-business owners and end users could confirm their compliance with policies and guidelines at a detailed level for handling that content. The sheer volume of information at Microsoft made manual inspection of all SharePoint sites and file shares, and manual notification of policy compliance issues to information owners or custodians of HBI information, an impossible task.

The challenge was how to deliver this missing capability across the global organization without installing a huge new IT infrastructure or incurring enormous costs. Prior to the development of the information classification solution, there were concerns about �the potential for unintended accessibility of HBI information to a wide range of Microsoft personnel. These personnel included those who simply used basic search tools to gather information in the daily course of their work. At the same time, it was important to raise awareness to the many end users about the risks of non-secure HBI information and their role in helping to ensure the security of such sensitive information.

Considering the volume of systems, content, employees, and business processes potentially affected by the implementation of a content loss prevention (CLP) solution, many chief information officers, chief information security officers, and chief security officers struggle with identifying how and where to start. Implementing a technology solution to prevent the loss or misuse of sensitive content is just one part of the picture. In fact, an organization must address an entire set of business processes and operations to prepare for such an implementation and to manage the resulting incidents and intelligence that arise from the use of this technology. The most effective content loss prevention efforts are those that an organization meticulously plans and executes based on a deep understanding of its most important content governance, risk, and compliance challenges.

For large enterprises, the technical aspects of a CLP solution can play a critical role in enabling automation of discovery and remediation activities. CLP solutions that use Microsoft technologies can provide a solid foundation that enables an enterprise to scan and classify enormous volumes of information in a timely and regularly scheduled manner. The enterprise can then focus valuable human resources on remediation efforts. Automation of these otherwise time-intensive activities also enables the creation of repeatable, service-oriented operations processes with the lowest possible total cost of ownership (TCO) for the solution.

Solution

In 2006, Microsoft IT initiated a CLP project with the objective of addressing content security and compliance objectives at Microsoft regarding HBI information, while minimizing impact to business operations. By using Microsoft technologies such as Microsoft Office SharePoint Server 2007 in conjunction with a third-party application, Microsoft IT ultimately designed and implemented the HBI Information Classification solution. This solution automates the identification and classification of HBI information at risk, as well as a portion of the subsequent remediation process. It enables users to effectively classify and help protect HBI information contained in SharePoint sites and file shares according to Microsoft data-handling standards. The third-party part of the solution, Tablus Content Sentinel, was itself built on the Microsoft .NET Framework, Windows® Compute Cluster Server 2003, and Microsoft SQL Server® 2005.

Prior to embarking on designing and implementing the technical solution, the project team spent a considerable amount of time and effort defining the scope and approach of the project. The team's primary goals were to:

  • Develop an initial solution for HBI data, which could then be expanded to various classifications of data in various locations.


  • Establish an effective, repeatable service while minimizing impact to daily business activities.


  • Make the HBI project widely recognizable across the business.

Solution Approach

As part of a global corporation with approximately 71,000 employees working in more than 500 Microsoft offices around the world, Microsoft IT realized that it needed to approach HBI information security in incremental steps and address multiple, sometimes competing requirements. Microsoft IT overall information security policies had to take into consideration regulatory compliance requirements as well as the protection of intellectual property, which considerably broadened the scope of the HBI project.

Many organizations focus first on network monitoring–based solutions to prevent unwanted transmission of HBI information. But the project team realized that at the scale Microsoft had to address, catching content in motion would be overly costly and ineffective without first addressing the root of the problem: gaining visibility and control over HBI content at rest.

The team established five objectives for the initial project:

  • Identify the location of HBI content across the network.


  • Reduce the volume of HBI content that could move across the network or be used on workstations.


  • Implement ownership and access controls for HBI content.


  • Understand and address the business processes that contribute to the sprawl of HBI content.


  • Establish the content loss prevention capability as a valued service within the larger IT organization in accordance with the Microsoft Operations Framework (MOF).

"One of the primary project objectives was to establish the content loss prevention capability as a valued service within the larger IT organization in accordance with the Microsoft Operations Framework."

Olav Opedal, Security Program Manager

On a strategic level, the MOF posits that IT groups, including IT security groups, must clearly focus on supporting the business objectives of the organization and emphasizing the business value that IT provides. The idea is that IT can help reduce risks and enable new ways of doing business. In addition, IT systems and services are more effectively managed when regarded as an asset to the development and execution of key business strategies. This approach requires IT groups to demonstrate how their services make specific, tangible, and critical contributions to achieving business outcomes.

Note: For more information about the Microsoft Operations Framework, visit http://www.microsoft.com/technet/solutionaccelerators/cits/mo/mof/mofeo.mspx.

In the context of content security at Microsoft, the project team created a project plan. The goal of this plan was to demonstrate that the proposed content loss prevention strategies and technologies were the most effective means of achieving compliance and maintaining policy mandates now and well into the future. The plan focused on rapidly advancing the maturity of the content loss prevention service from a basic level, where the new IT infrastructure is generally considered a cost center, to a fully mature level. At that level, the business value of the IT infrastructure is clearly understood and viewed as a strategic business asset and enabler within the first three months following implementation.

From a scope perspective, the project team decided to start by inventorying HBI information within the enormous volume of content stored across the network file shares and SharePoint sites at Microsoft. The team decided to develop an automated scanning tool and apply it to those data repositories to identify HBI information. Remediation of issues with HBI information would then follow, including limiting broad access, asset classification, asset lockdown, asset removal, and data rights management encryption. This approach would also become the framework that would eventually include all data assets in motion and, potentially, additional classifications of data, such as Medium Business Impact (MBI) or Low Business Impact (LBI) information.

In defining the scope and approach for the project, the team adopted the following methodology:

  1. Develop proof of concept


  2. Conduct risk analysis


  3. Design and build


  4. Pilot and deploy


  5. Provide service management

The approach that the team followed for the initial project supported control requirements to mitigate critical information security risk for data at rest. It required the design of several compliance modules and deployment of an existing incident tracking and remediation tool integrated with the MSE ticketing system used throughout Microsoft. Each compliance module would focus on specific types of search and remediation activities, such as scanning and locking down SharePoint sites or file shares, or identifying specific types of intellectual property, such as source code in various locations. The deployment of these combined elements would provide precise, automated detection of HBI data at rest in documents located on SharePoint sites and file shares, or elsewhere, and methods to quickly remediate potential issues. In general, after an organization identifies issues with HBI information, it has a duty to address those issues and safeguard that information. Therefore, the remediation component of the solution was crucial to implementing a complete solution.

Because of the large volume of content at Microsoft, and the heavy reliance on SharePoint sites that facilitate the sharing of information and that have varying levels of data owners and users, the project team had to transform high-level corporate policies into detailed guidelines for how to apply content security to IT services. This transformation required close collaboration between IT service owners, the corporate legal department, and various other stakeholders to determine appropriate remediation steps.

The parameters that Microsoft IT developed for its initial discovery required the ability to define stringent criteria for automated content evaluation. With enormous data loads and thousands of locations to scan, enterprise scalability, performance, and accuracy were all top considerations. Precision of content detection, in particular, was a concern. Microsoft IT wanted a system would reliably catch most at-risk content while maintaining a very low rate of false positives. Previous research that the project team conducted indicated that systems that generate high false positives require much higher levels of human intervention, resulting in a much higher TCO.

Finally, the project team developed an education campaign improve end users' awareness and understanding of their role in helping to ensure compliance and the security of the company's sensitive digital information assets. Indeed, the implementation of the HBI project is a step in Microsoft IT's evolving governance model. With a strong focus on enforcing Microsoft data-handling policies and standards, the compliance framework that the project established is a major component for the articulation of IT governance.

Solution Technical Design

Accuracy, performance, and scalability are the three most important attributes in an enterprise content scanning solution and the HBI project in particular. The project team evaluated the third-party Tablus Content Sentinel application in a proof-of-concept phase to determine firsthand how well the technology could meet its needs. In November 2006, after a successful proof of concept followed by extensive risk analysis in conjunction with the business owners, the project team selected Tablus Content Sentinel as the core content scanning tool for the solution.

This application enabled Microsoft IT help identify and classify HBI data within the Microsoft environment. At a very high level, the core intellectual property of the application is a content analysis engine used to recognize confidential information. The content analysis engine evaluates information assets by using a variety of techniques to identify protected data. These techniques include searching for specific keywords, phrases, or entities; identifying patterns in data; and analyzing the context in which a suspicious string is detected.

With enormous volumes of data to scan and remediate, the infrastructure supporting the automated scanning tool must be high performance and highly scalable. The Tablus Content Sentinel engine is built on the Microsoft .NET Framework. It is capable of running on the Windows Compute Cluster Server 2003 operating system to create a grid computing architecture that allows for capacity expansion by adding servers to the grid of compute clusters. Figure 1 shows how compute clusters are located in various regions where significant amounts of data reside:

Windows Compute Cluster Server 2003 grid architecture for Tablus Content Sentinel

Figure 1: Windows Compute Cluster Server 2003 grid architecture for Tablus Content Sentinel

Microsoft IT dedicated 10 load-balanced grid computers to scan all of the SharePoint sites and file shares connected to its storage area network (SAN). A lightweight agent was automatically deployed to scan contractor workstations in Asia and the stand-alone file shares not connected to the SAN. Tablus Content Sentinel scanning activities are coordinated through the enterprise controller. The site connectors for each location manage both grid computers and lightweight agents.

The grid computers are permanent components of the infrastructure and are used to scan large, centrally located data stores. The lightweight agents deploy temporarily to workstations or servers where content resides in remote locations, and then remove themselves after each scanning activity. The results of the scans are stored in a SQL Server 2005 database at each location, and then combined into the SQL Server 2005 Enterprise results database. Approximately 1 percent of the data at rest changes daily. Incremental scans of the systems analyze only new, moved, edited, or renamed files.

Compliance Modules

Although Tablus Content Sentinel provides a content analysis engine for the solution, the project team needed to create or customize additional components to automate as many processes as possible. Based on the business requirements, the project team identified the need for the following technical components, called compliance modules, for automated tool development:

  • SharePoint Lockdown module


  • File Share Lockdown module


  • WinSE IP Identification module

The project team created modules based on custom Web services and Office SharePoint Server 2007 workflow capabilities. These modules enable content classification, automated lockdown, and remediation notification for the two main content sources, SharePoint sites and file shares. This part of the solution bridges the gap between the Tablus Content Sentinel content scanning engine and end users by providing relevant workflow for security operators. The Web service and workflow engine solution integrates file share and SharePoint classification and lockdown modules with the content scanning solution to lock down and reclassify content appropriately. Figure 2 shows the high-level automated workflow actions that both the SharePoint and file share compliance modules accomplish.

Automated workflow for SharePoint and File Share compliance module

Figure 2: Automated workflow for SharePoint and File Share compliance modules

Apart from the specific automated lockdown remediation actions that each of the SharePoint and File Share compliance modules performs, the additional automated workflow activities are generally the same for SharePoint sites and file shares. The content for a particular site or share is scanned based on well-defined criteria, and HBI information is identified. Results are compared to a known list of exceptions and previously identified false positives. Content owners are then automatically notified by e-mail to take appropriate actions, including notification to Microsoft IT in the event of a false positive, visual classification of the site or share, or deletion of the information.

SharePoint Lockdown Module

The SharePoint Lockdown module provides the capability to help lock down IT-managed SharePoint sites by using a three-pronged strategy:

  • Content monitoring to identify sensitive content


  • Classifying data by classifying SharePoint sites


  • Enforcing higher levels of access controls on HBI data

File Share Lockdown Module

The File Share Lockdown module provides the capability to help lock down IT-managed file shares and achieves the following:

  • Classifying each file share and tracking the owners of all managed file shares


  • Enabling the administrator to specify a list of disallowed user groups (from policy, Microsoft Windows NT®-authenticated users, etc.)


  • Removing those from the access control lists (ACLs) in file shares and directories every 24 hours


  • Removing groups larger than a specified size from shares that are classified at a higher level


  • Notifying share owners of the removal with information about compliance policy


  • Providing workflow for security operators and end users to perform remediation to comply with standards

WinSE IP Identification Module

The project team also developed a compliance module to detect and remediate nonsecure source code on vendor-assigned desktop computers. The WinSE IP Identification module includes the following capabilities:

  • Rules to identify Windows source code


  • A workflow to approve identification of source code


  • Tools to lock down data

Figure 3 shows the high-level automated workflow actions that the WinSE IP Identification module accomplishes.

Automated workflow for WinSE Ip Identification compliance module

Figure 3: Automated workflow for WinSE IP Identification compliance module

The automated workflow activities for the WinSE IP Identification compliance module begin with content scans of specific workstations based on well-defined criteria. Potential Windows source code is identified, and the results are compared to a known list of exceptions and previously identified false positives. The appropriate security operators are then automatically notified by e-mail to take appropriate actions, including notifying Microsoft IT in the event of a false positive, locking down access to data and potentially investigating the source of that data, or deleting data and potentially investigating the source of that data.

Solution Implementation

The initial content scan to locate and remediate HBI content focused on 12 terabytes of content across the file shares and SharePoint sites located in a single data center, the Redmond data center. That initial scan took only nine days to complete. After three months, the total volume scanned was up to 75 percent of the HBI content across the file shares and SharePoint sites worldwide. The project team completed 100 percent of scanning for the HBI portion of the project in September 2007, when the total scanned content exceeded 100 terabytes.

The project team progressed from initial deployment to an established IT service in just 90 days. Incremental scans now occur on a scheduled basis, and end users routinely use remediation tools that the solution provides when they are notified of issues with HBI information.

As a critical part of the implementation, Microsoft IT pursued a range of awareness and outreach efforts to internal customers. Because long-term success would depend on building a culture of compliance across the company, Microsoft IT planned to create a grass-roots awareness of, and ultimately demand for, content discovery and other services built around remediation. The internal promotional tactics included poster campaigns, e-mail, and newsletter notices that educated users on HBI, MBI, and LBI data.

In all cases, these marketing messages educated end users on compliance priorities and emerging capabilities. For instance, Microsoft IT sent e-mail that alerted users to the availability of content scanning and remediation capabilities for individual business users as Tablus Content Sentinel scanning capabilities came online. Ultimately, all these efforts fostered awareness among end users that they are frontline data custodians and play a lead role in maintaining policy compliance.

The solution further empowers a culture of compliance within Microsoft by involving line-of-business and content owners and others within the company in remediation of security issues. For instance, when a Tablus Content Sentinel scan of a particular network share reveals a highly sensitive document that has been misclassified as Low Business Impact, the system automatically notifies the owner of that document that a problem needs to be addressed. The system can then monitor the problem and recognize whether it is adequately resolved within a certain period. The project team measures risk reduction and success rate by using key performance indicators (KPIs)—for instance, the time needed to remediate, how many notifications are sent until an incident is closed, and how many incidents are uncovered in each content category.

In the near future, Microsoft IT plans to implement a non-compliance amnesty program. Users will be able to use Tablus Content Sentinel to scan their laptops, desktop computers, or other kinds of systems on their own, and then remediate any issues that might arise. By using subtle societal pressure, the company can progress toward its goal of cultural change. Rather than trying to implement technology unilaterally, the self-scan empowers users across the company to support security objectives. It also encourages people who would otherwise be hard to reach through direct on-network scanning to appropriately manage the sensitive content on their systems in compliance with corporate policy.

Best Practices

Prioritize Content According to Governance, Risk, and Compliance

The first step in a content loss prevention effort is to assess enterprise content: what it is, how much of it the organization has, how it is used, and where it is located. Table 1 provides some basic guidelines for evaluating content.

Table 1. Content Evaluation Guidelines

Inventories

Purpose

Types of content that are or should be classified as sensitive

Begin to understand what content requires protection

Locations where content resides

Outline and quantify the systems that will need to be monitored

Business functions that require access to this content

Understand how the content is currently used to keep business flowing

Individuals, by business function, who require access to this content

Learn which individuals can potentially access and expose sensitive content

 

To understand what content must be protected and how it should be protected, an organization first needs to clearly understand any industry or government regulations with which it must comply. The organization should start by listing the regulations that pertain to the business and then any business governance requirements that exist for the protection of content that is most sensitive. In other words, each type of content requires an evaluation of the impacts of a potential breach. The goal is to prioritize risks and address the most serious threats first. An organization best accomplishes prioritization through a thorough understanding of risk in the context of business impact and content type.

Build a Project Plan to Establish the Solution as an Operational Service

After an organization has implemented an initial set of content protection goals, the next step is to create an overall project plan with clearly delineated benchmarks and steps to reach these goals. The plan should drive the team beyond a proof of concept or initial implementation with a special IT project team toward a complete operational solution that is fully integrated with the day-to-day business operations at all appropriate points throughout the organization. This effort will require mapping content protection policies into guidelines that will determine how to handle the myriad content protection situations that may arise.

Start at the Root of the Problem

After an organization develops an understanding of its sensitive content according to business and policy priorities, and develops a basic understanding of how that content is stored and where it travels on the network, the next logical step is to create an inventory of this content stored across the network. Starting with content discovery enables the organization to understand the magnitude of the sprawl of sensitive content that has organically accumulated over time. This aids greatly in estimating and focusing subsequent efforts.

It is also wise to approach a content inventory activity with a narrow initial scope, or content vector, that scans for a single class of content or a limited number of classes—for example, high-impact personal information or PCI-regulated content. Limiting the class of content for discovery initially enables IT and compliance executives to keep tighter control over discovery and remediation.

Use Cross-Functional Teams

One of the most important aspects of a successful content security strategy is to obtain the involvement of the key business team members from across the organization in the effort. The reason is that different employees handle sensitive content for different purposes and in different ways. The flow of content across the company varies from one business process to the next. Because predicting where content might ultimately go in or outside the network can be difficult, an organization needs the support of staff from all departments—for example, IT, privacy/compliance, human resources, legal, marketing/communications, and business operations—to act on policies and remediate any incidents that are discovered.

Promote a Culture of Content Protection and Awareness

Technology and policies alone will not protect an organization. The organization needs to continuously evangelize the importance of protecting sensitive content, and provide training on the do's and don'ts of sharing content. Establishing who within the organization has ownership over content is just the first step in promoting an attentive and vigilant culture of content security. Training and ongoing oversight are also key, and are just as important as the technical safeguards and solutions that organizations implement.

Expand Coverage

After an organization completes the process of implementing and rolling out content discovery for the highest-priority segment of the sensitive content, it should expand the program. There are two directions for this expansion:

  • Implementing additional safeguards, such as network and desktop monitoring of HBI segments


  • Expanding content segments covered, such as MBI and LBI data classifications

Benefits

Microsoft IT estimates that the return on investment (ROI) for the HBI project is as high as 600 percent since the project's implementation. The automated solution has significantly reduced the number of operators required to conduct manual search and notification efforts, while performing a far more comprehensive analysis of all digital information assets. In fact, Microsoft IT estimates that manual scanning reached less than 1 percent of all these assets over the course of a year, whereas the initial automated comprehensive scan of huge volumes of shares and sites finished in just 14 days.

Perhaps the greatest benefit from the solution has been the reduction in risk to Microsoft.

Conclusion

Adequate safeguarding of HBI information in large organizations is a critical but daunting undertaking. The sheer volume of information at Microsoft made manual methods of identifying and classifying HBI information a challenging task. To streamline the effort, Microsoft IT developed a comprehensive approach that includes clear articulation and enforcement of IT governance, thorough engagement of business owners to prioritize risks, and service-oriented operational processes.

By using Microsoft technologies and the third-party Tablus Content Sentinel application, Microsoft IT implemented automated discovery scanning and remediation methods for HBI information. These methods can examine and classify enormous volumes of information in a short period of time. This solution has resulted in significant ROI, increased compliance with data-handling standards, and an impressive reduction in the overall risk associated with the loss of sensitive information.

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:

http://www.microsoft.com

http://www.microsoft.com/technet/itshowcase

© 2008 Microsoft Corporation. All rights reserved.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, SharePoint, Windows, and Windows NT are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.


Situation

Information security processes that included manual methods of inspection and remediation could not reach all HBI content across the Microsoft network. Microsoft IT needed an automated, service-oriented solution to help reduce the biggest operational risk, the loss of sensitive information.

Solution

A highly scalable content scanning solution based on a grid computing architecture, coupled with automated remediation and workflow components, help enable users to identify, classify, and protect their sensitive data in various locations across the Microsoft network.

Benefits

  • Nearly all SharePoint sites and file shares are automatically scanned for HBI content on a regular basis, compared to less than 1 percent achievement via manual methods.
  • Fewer personnel are required to perform identification and remediation of HBI information.
  • Automatically generated remediation actions or notification to information owners occurs as content issues are discovered.
  • Process improvements have led to better compliance with data-handling standards, more efficient handling of sensitive information, and a significant reduction of overall risk associated with the loss of sensitive information.

Products & Technologies

  • Tablus Content Sentinel
  • .NET Framework
  • Windows Compute Cluster Server 2003
  • SQL Server 2005
  • Office SharePoint Server 2007
© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Page view tracker