Security Patch Management Evolution for Data-Center Servers at Microsoft

Technical Case Study

The following content may no longer reflect Microsoft’s current position or infrastructure. This content should be viewed as reference documentation only, to inform IT business decisions within your own company or organization.

Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security. By focusing on policies, technologies, and processes, Microsoft Information Technology (MSIT) was able to reduce risk, improve performance, and improve availability of software resources at Microsoft.

Download

DownloadTechnical Case Study, 436 KB, Microsoft Word Document

 

Situation

Solution Benefits

Without a standardized tool or process, Microsoft IT was challenged to manage data center server patching. This resulted in unacceptable vulnerability to Microsoft's server environment.

Microsoft IT chose a multi-pronged approach to address this situation. Focusing on policy changes, technology solutions and well defined processes enabled MSIT to achieve their goals.

Security patch management is a process that gives organizations control over the deployment and maintenance of interim software patches into their production environments. It helps organizations maintain the security and stability of the production environment.

At Microsoft, the configuration management program of today evolved from a program that initially used Microsoft Systems Management Server (SMS) to address only security patch management. When System Center Configuration Manager released, MSIT began to use the product as a discovery mechanism for asset inventory information and security patch management.

From the perspective of managing security patches, not much has changed from the core activities of earlier efforts. The number of servers for which MSIT manages the configuration continually grows—up from 24,000 servers in 2010 to 34,000 servers in 2013. The integration and enhancement of the features available in Configuration Manager has helped MSIT keep up with the ever-increasing number of threats and the volume of security patches now regularly released.

  • Patch compliance increased from 70% to 96%
  • Patch variability decreased from 40% to 5%
  • The patch cycle improved from 30 to 19 days

Situation

In 2010 MSIT continued with renewed rigor, a journey to improve the security of data center servers. The primary driver of this effort centered on server security.

Server security is as important as network security because servers often hold a great deal of an organization's vital information. If a server is compromised, all of its contents may become available to steal or manipulate at will. Applying security patches in a timely fashion highly reduces the risk of having a security breach and all the related problems that come with it, like data theft, data loss, or even legal penalties. Patches were being applied to Microsoft servers on average 30 days from patch release, leaving vulnerabilities to zero-day attacks that occur during the vulnerability window that exists in the time between when a vulnerability is first exploited and when developers start to develop and publish a patch to counter that threat.

Contributing factors to the situation included having many instances of Microsoft System Center Configuration Manager spread across IT adding cost and complexity to operational management of the environment. In addition, patch compliance was running at 60% with variability of 20-40%. This resulted in compounding vulnerabilities month over month as patches lagged.

Long patching cycles, low patch compliance and high variability left Microsoft vulnerable to well published hacks as well as emergency situations. This necessitated emergency scrambles and out of band patching requirements resulting in increased costs as large teams of people rallied to address the issue. An additional negative outcome was outages for users as patching took line of business applications and operations offline.

Solution

MSIT's solution approach included policy changes, use of new technologies and process changes. This multi-pronged approach supported the increasing need Microsoft had to ensure a secure environment.

Policies

One of the foundational policies required to improve patching at Microsoft was the implementation of compliance deadlines. The organization was serious about limiting and meeting their risk obligation to the board of directors which required senior leadership to uphold compliance deadlines. MSIT adopted the policy that that a patch not installed by a server owner prior to a compliance deadline would be installed for them. Executive sponsorship was key to getting server owners to participate and adhere to deadlines.

Technologies

In addition to policy implementation, technology was also adopted to support the goals. For 2010 the focus was on configuration manager server agent health. Instead of continually reviewing issues server by server, MSIT started grouping issues by symptom and doing root cause analysis on the largest buckets of issues. Once root cause was determined and the fix implemented, a new baseline would be measured and the process repeated until that bucket of symptoms was at zero. This focus was responsible for the jump in patch compliance in 2011 from 70% to 90%.

In 2013 MSIT expanded their automation tool set to include System Center Orchestrator, a component of the System Center suite.

The first scenario targeted was patching servers in a clustered environment. In this complex scenario, the goal is to patch and reboot each server participating in a cluster in sequenced fashion, ensuring the end-user experience is not compromised. Traditionally an operator running scripts and validation steps tailored to an application would perform these steps until each server in the cluster was compliant. Using Orchestrator, these scripts and business logic were transformed into a workflow and programmatically executed across the entire cluster. The result was improved predictability by reducing error-prone manual activities.

Orchestrator is also used as the "suspenders" to the "belt" provided by System Center Configuration Manager. In situations where Configuration Manager logs a failed attempt to patch a server, a signal is passed to Orchestrator to initiate a standard patch workflow. The workflow repeats until the server is successfully patched, or the service windows expires. In this scenario transient infrastructure or unhealthy SCCM issues are mitigated.

With the addition of System Center Orchestrator, MSIT has improved patch compliance from 90% to 96%, and done so with a smaller labor footprint.

Processes

Along with policy and technology efforts, there was a significant focus on processes. This included re-engineering current processes as well as implementing new ones. One of the initial wins in 2010 was consolidating system center configuration manager server instances into one operational group in MSIT. Approximately 150 instances were consolidated into a handful of centrally managed servers. This resulted in decreased operational and maintenance costs as the footprint to manage became much smaller.

Also in 2010, MSIT implemented a new role called Service Transition Managers to be interface between IT operations and internal IT group needs. This provided an opportunity to onboard internal clients to more automated processes and tools decreasing variability further and decreasing the need for manual patching across the company. The priority was on driving adoption of the automated patching service with internal MSIT groups. Service Transition Managers collected requirements for further features to the service to increase adoption.

In 2012, MSIT instituted the Patch Cycle Triage process. This included weekly instead of monthly reviews of agent health issues and publishing of metrics and reports to all patching stakeholders. This process change increased the visibility of the patching efforts and clear accountability resulted in more complete and rapid resolution of issues.

Below is a list of example metrics that MSIT gathers data on for review and to ensure visibility to the overall performance of the area.

Table 1. Patch Management Metrics

Metric

Description

Number of patches released

Number of released patched per month, provides a baseline for month-over-month comparison.

Overall compliance per patch cycle

Overall compliance metric for all patched servers in the environment against the successful deployment of all updates during a patch cycle.

Patch success ratio (per patch)

This metric can be used to determine whether a single patch failure negatively impacted overall compliance metrics.

Patch success ratio (per server)

Can be used to determine whether a specific type of server or configuration is the common factor in patch success or failure

Number of support incidents (per patch)

Number of support engagements that are initiated during a patch deployment per patch.

Agent health — 98% healthy

(daily measurement)

Number of systems with a CM agent installed which have successfully returned inventory data and patch results within configured refresh schedule

Time from smoke test success to 60% saturation deployment

This measurement establishes an ongoing baseline comparison that helps validate each milestone success of the patch process in meeting overall compliance goals for each patch cycle.

MSIT has formalized the security patch process. Patches are released the second Tuesday of every month. MSIT has adopted a 19-day cycle to complete patch and software updates. The 19-day cycle, developed in cooperation with executive leadership, operations, server and application owners, and Information Security, balances the desire to reduce risk and provide the business the time to prepare and orchestrate updates across test and production servers. The process drives the activities of the teams that are accountable for security patching. This 19 day cycle is a significant improvement from the 30 day cycle followed in 2010.

To provide context for this process, the below diagram outlines the architecture that Microsoft uses for server configuration management.

Architecture for server configuration management at Microsoft

Architecture for server configuration management at Microsoft

Conclusion

The biggest change that has occurred since Microsoft first employed a patch management process has been the cadence and consistency in which patches are applied. The established process of patch management allows predictability for a server or application owner, resulting in the ability to meet compliance expectations. Patching compliance increased from 70% in 2010 to 96% in 2013 and patch variability decreased from 20-40% in 2010 to 3-5% in 2013.

Improved processes and the use of System Center Configuration Manager and System Center Orchestrator have reduced the patch cycle from 30 to 19 days, despite a steady increase in the number of released patches, the inclusion of non-security software updates, software distributions, and growth in the number of servers in the environment. Successfully deploying System Center Configuration Manager and the Orchestrator based solutions functionality has automated patching and significantly reduced manual patching efforts.

The security patch management service was designed to proactively narrow risk by shortening the amount of time that a security or configuration vulnerability can affect servers on the network. This has been achieved through the creation of a predictable global process, centralized reporting and administration, and policy support to ensure compliance.

Resources

Server Configuration Management at Microsoft

Microsoft IT was able to improve performance and server availability and reduce risks by shortening the cycle time to deliver security and non-security updates. Desired configuration management has enabled IT administrators to identify configuration drift across platforms services and Line of Business applications.

Technical White Paper

How Microsoft IT Implements Server Patch Management

Minimizing the threat of vulnerabilities requires organizations to have properly configured systems, to use the latest software, and to install the recommended software updates. Assessing and maintaining the integrity of software in a networked environment through a well-defined patch management program is a key first step toward successful information security. Microsoft IT uses the Systems Center Suite as the primary solution in its server patch management process.

Watch video

Learn more

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information via the World Wide Web, go to:

https://www.microsoft.com

https://www.microsoft.com/microsoft-IT

© 2013 Microsoft Corporation. All rights reserved. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.