Microsoft IT Uses File Classification Infrastructure to Help Secure Personally Identifiable Information
Technical Case Study
Published: May 2011
Learn how Microsoft Information Technology (IT) used File Classification Infrastructure (FCI) to create a solution to automatically classify, manage, and protect sensitive data, including personally identifiable information and financial information. Using the new FCI-based solution, Microsoft IT can obtain file-level details about content sensitivity while reducing misclassification of personally identifiable information from 30% to 3%. The new solution enables risk mitigation for sensitive data by using automatically applied controls. Ultimately, this solution helped Microsoft IT achieve industry best practices for regulatory compliance.
Technical Case Study, 686 KB Microsoft Word file
Products & Technologies
Microsoft Information Technology (IT) was using a custom-built classification system that required manually tagging sensitive material, and could only operate at the file-share level. Microsoft IT needed an automated classification system that could help secure personally identifiable information (PII) within files.
Microsoft IT used Windows Server 2008 R2 File Classification Infrastructure (FCI) technology as the centerpiece of their new content classification solution.
In today's high-tech world, collecting and storing data are business-critical processes that form an integral component of daily operations. However, the ever-increasing dependency on and use of electronic data also make data management more challenging—especially in light of government regulations for the appropriate use and storage of personally identifiable information (PII) and financial information. Improper storage of PII can also be a significant financial concern, as the cost of storage-related security breaches can be hundreds of dollars per record.
Microsoft Information Technology (IT) had been using an internally built solution to help secure personally identifiable information (PII), financial information, and other types of sensitive data by classifying internal file shares and Microsoft® SharePoint® sites. However, this solution was limited to defining information sensitivity at a file-share level. It also required each user to specify the sensitivity level of his or her file shares manually, which frequently led to mislabeled information.
This custom, internally developed solution also had a high total cost of ownership, requiring a significant amount of development and maintenance resources to fix identified issues and keep the system up to date, as each upgrade to the storage operating systems required upgrading the code.
Microsoft IT needed a solution that would bring consistency to the file classification process across all teams, and be able to scan content automatically at the file level for key words, terms, and patterns. It then had to apply the correct rights management protection based upon predefined security policies. Cost of ownership and performance were also important drivers for developing a new solution. Microsoft IT needed a system built from off-the-shelf, standardized Microsoft technology, that could scale across terabytes of data. With such a large amount of information, the solution had to be efficient at scanning files while maintaining a high degree of accuracy when identifying sensitive PII.
As the company's first and best customer, Microsoft IT regularly adopts early releases of Microsoft technologies, tests them in a real-world environment, and provides critical feedback to improve products before they are generally available to the public. Microsoft IT worked closely with the Windows Server team who were developing Windows Server® 2008 R2 File Classification Infrastructure (FCI) technology as the centerpiece of their new content classification, file tagging, and file protection solution.
Why File Classification Infrastructure?
With File Classification Infrastructure (FCI), Microsoft IT could make two significant advances in how they classified sensitive information. First, instead of relying on manual input to set sensitivity, the new system would automate the classification processes using predefined policies. In addition, the new system could scan data within files, providing a much greater level of granularity as compared to the way the older system classified at the file-share level.
"Our ability to implement a taxonomy with the new FCI-based solution enables life-cycle management of sensitive documents. This key feature helps us comply with Microsoft security, compliance, and retention policies."
Olav Opedal, Senior Solution Manager,
Microsoft Information Security and Risk Management
Microsoft IT also saw FCI as a means to improve the manageability of the classification system (including retention management) and total cost of ownership. As a ready-made solution for file classification within Windows Server 2008 R2, FCI gave Microsoft IT the means to build a new solution using off-the-shelf technology.
Microsoft IT implemented its new file classification solution by enabling FCI on existing Windows Server 2008 R2 servers, using both content analysis (such as terms and patterns) and location (share sets files to moderate business impact by default) to identify sensitive data.
As illustrated in Figure 1, the Microsoft IT systems were designed to support employees accessing internal file shares, as well as partners or vendors accessing externally-facing SharePoint servers.
Figure 1: Microsoft IT's implementation of FCI, which manages classification access, encryption, and data retention
Additional details concerning the preceding figure:
A. Employees can upload new financial documents to SharePoint and retrieve archived documents via file share or SharePoint.
B. Authenticated partners or vendors can upload new financial documents and retrieve archived documents via SharePoint.
C. Users upload documents to SharePoint Servers, with a Microsoft Silverlight® application controlling the flow.
D. At the end of the month, documents are transferred to file shares and file editing is closed. The documents are retained for 10 years.
Windows PowerShell Script Development
Microsoft IT created Windows PowerShell™ scripts that were used to configure FCI across multiple servers, including the regular expressions used to identify PII and financial data, as well as the actions FCI should take for those document types.
For more information about Windows PowerShell scripts and other automation tools that can be used with FCI, see http://www.microsoft.com/fci.
Microsoft IT deployed their new FCI-based classification solution in the following manner:
- Microsoft IT started by designing an appropriate ontology and taxonomy
based on Microsoft policy and data handling standards (see Figure 2).
For more information about developing an ontology and taxonomy for FCI, see http://go.microsoft.com/fwlink/?LinkId=217117.
Figure 2: Microsoft IT's FCI ontology
When the taxonomy was ready, Microsoft IT deployed FCI to a single
Windows Server 2008 R2 server to test:
- Proper execution and configuration set by the Windows PowerShell scripts.
- FCI's ability to scan, tag, and protect data automatically using Microsoft Active Directory® Rights Management Services (RMS) and access control list (ACL) management with the ICACLS.exe tool called from FCI. Applying RMS and ICACLS.exe was done from a File Management Task (part of FCI functionality).
- Efficient use of regular expressions to determine performance impact to servers and optimize classification accuracy. By creating a set of test documents with known levels of content sensitivity, Microsoft IT gauged the regular expression's accuracy and optimized its use.
- After the basic tagging, use of regular expressions, and application of security controls were confirmed, Microsoft IT expanded from their single test server to a small-scale deployment onto low-impact systems in the Windows® File Storage product team that used the same Windows PowerShell scripts.
- Once the smaller-scale deployment in the Windows File Storage product
team was running smoothly, Microsoft IT implemented a large-scale system that
supports the company's Global Payroll/Finance servers. Systems in this
department were an especially good fit for the FCI solution due to their high
levels of regulatory compliance requirements.
- The first task was to identify, tag, and consolidate data on the File Server Utility servers to be managed by FCI on the Global Payroll servers.
- The Active Directory Rights Management Services Bulk Protection Tool was
downloaded and installed onto the file share servers.
Note: For more information about the Active Directory Rights Management Services Bulk Protection Tool, see http://technet.microsoft.com/library/ff625714(WS.10).aspx.
You can also download a copy of the tool from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=f9fbe58f-c175-41d0-afdc-6f160ab809cd.
- Windows PowerShell scripts were run to enable FCI and configure settings.
- After the data migration, a communication was sent to users of the payroll data that described the new FCI-based system and explained how the level of protection would display in the RMS-encrypted documents.
- The Active Directory Rights Management Services Bulk Protection Tool was enabled by an FCI file management task.
- Business teams could stop performing the old manual file classification
process. Individual files had file protection applied to them via RMS (see Figure 3).
Figure 3: An example RMS-encrypted document that is tagged as Microsoft confidential and proprietary
Microsoft IT's deployment of their new FCI solution for the Windows Server team and Global payroll:
- Resulted in a consolidation of all data under one top-level main file share that contained over 20,000 total sub-shares spread across twenty countries.
- Involved an estimated 82,000 documents, including approximately 40,000 encrypted documents that totaled an estimated 50 gigabytes of data.
- Repurposed existing security policies to control access to all documents.
With the ability to scan down to the file level, the new FCI-based system tagged sensitivity to each file and applied the appropriate file retention and protection settings. The FCI classification solution displayed a significant improvement in classification accuracy. As shown in Figure 4, the previous solution's dependency of having users manually enter sensitivity tags resulted in a 30% average misclassification rate for the Windows Server team, whereas only 3% of the PII and financial content was misclassified by the new FCI-based classification system. Microsoft IT further mitigated FCI's 3% misclassification by collaborating with the business data owner to identify which sub-shares contain sensitive information.
Figure 4: The FCI solution reduced the frequency of Windows Server team misclassifications to approximately 3%
In the course of working with FCI to design, implement, and operate the new malware monitoring solution, Microsoft IT followed these best practices:
- Build an inclusive
ontology and taxonomy that supports your data. Consider auditing
requirements and encryption scenarios when defining your file classification taxonomy.
The richness of the metadata that FCI applies depends on a common taxonomy—a
taxonomy that is based on approved standards (for example, NIST), and properly
applied across the entire organization. Also keep in mind that the more
granular the taxonomy, the more unwieldy it can be to implement.
Note: At Microsoft, end users are only exposed to three levels (high, moderate, and low business impact); the detailed underpinnings of the taxonomic hierarchy are not made visible.
- Ensure rollback mechanisms are in place. During your deployment, make sure that all involved systems have a rollback mechanism in place to ensure a painless resetting of systems to known states, if necessary.
- Find a balance between regular expression efficiency and accuracy. Carefully test regular expressions for performance impact versus efficacy. Regular expressions can have a significant impact on overall system performance, so make sure to test and optimize them to maximize performance while achieving an acceptable percentage of false positives or false negatives. This includes creating a test body of content with known types and classifications of sensitivity that you can use to evaluate the regular expression's accuracy.
- Promote internal collaboration between business owners, Security, Compliance, Legal, and those who provide the infrastructure. Due to the sensitive nature of the information and the number of different teams in your organization that may need to be involved, it is important to ensure that all stakeholders can provide input at an early stage, and work together to design a system that fulfills all key criteria.
- Consider your FCI deployment in light of existing permissions and security groups. Engage with the business early on to gain insight into how their permissions are configured and what their pain points are. Leverage the FCI configuration wherever possible to streamline your system implementation and maintenance.
- Base your implementation and operational processes on industry standards, and then document them. Use a standard, such as the Microsoft Operations Framework (MOF), and then carefully document your file classification processes to ensure smooth operations between people within the same team, as well as across teams. You should plan to review your policies at least on an annual basis.
By implementing FCI, Microsoft IT derived a number of benefits:
- Mitigated risk. Microsoft IT can help safeguard its most important information by applying controls based on data sensitivity. Using the FCI-based solution, Microsoft IT achieved industry best practices for regulatory compliance.
- Reduced audit time. Previously, an audit would require slow manual validation of information to confirm whether the security controls were appropriate. With FCI, Microsoft IT can provide server reports that include the description of classification processes to help confirm compliance with Microsoft data handling standards.
- Reduced total cost of ownership. Building a solution based on FCI enabled Microsoft IT to focus on their operations rather than maintaining custom code. It also reduced their operations overhead in information security.
- Automated, continuous process. FCI's scalable solution removed the need for manual interaction from the business teams. The consistency of the new process mitigates any risk Microsoft IT previously had of misinterpretation of company information handling and classification standards.
- More granular view of data. With FCI, Microsoft IT can obtain file-level details about content sensitivity, offering a significantly improved level of detail as compared to the previous solution's limitation of classifying at the file-share level.
- Leverage current permission structure. FCI's ability to use existing policies to validate access to encrypted documents avoided the creation and management of up to 140 new security groups.
- Faster, more accurate tagging. The new system reduced misclassification of PII from 30% to 3%.
- Improved security. Because FCI attaches metadata to the file data, it supports RMS file encryption. This protects the content while maintaining the file classification details.
Using Microsoft Windows Server 2008 R2's File Classification Infrastructure (FCI), Microsoft IT created a new automated content classification, file tagging, and file protection solution for the Windows Server Team and for Microsoft Global Payroll. This new solution consolidated all data under one top-level main file share that contained over 20,000 total sub-shares spread across twenty countries, and involved an estimated 82,000 documents, totaling approximately 50 gigabytes of data.
Before implementing the FCI-based classification solution, Microsoft IT had been using an internally developed system that could only define information sensitivity at a file-share level. It relied on manual user input by the business teams to specify the sensitivity level of their individual files, which was time consuming and increased the risk of mislabeled information.
After implementing the new solution, Microsoft IT was able to obtain file-level details about content sensitivity, while simultaneously gaining a magnitude of order improvement in classification accuracy. Misclassifications of File Server Utility PII data dropped from 30% to 3%, which is mitigated by collaborating with the business data owner to identify which sub-shares contain sensitive information. Files that are tagged as containing high business impact (HBI) information are encrypted automatically via RMS to enhance their security further, both during transmission and in storage.
Total cost of ownership has also significantly improved in the FCI-based solution, as Microsoft IT no longer has to maintain custom code and has reduced their operations overhead in information security. Ultimately, FCI allows Microsoft IT to focus on business needs rather than building infrastructure.
Overall, the FCI-based classification system enables Microsoft IT to more accurately identify HBI content and manage it more effectively while reducing costs and mitigating risks. Microsoft IT is currently expanding its FCI implementation into the file servers of the company's Legal department. In addition, Microsoft IT is continuing to work closely with the File Server team as they develop further enhancements to FCI, such as multi-server management and reporting capabilities.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:
© 2011 Microsoft Corporation. All rights reserved.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, Active Directory, SharePoint, Silverlight, Windows, Windows PowerShell, and Windows Server are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.