Patch Management on Business-Critical Servers
Published: January 17, 2012
Authors: Dan Griffin, Microsoft MVP - Enterprise Security and Tom Jones, Software Architect, JW Secure
Best Practice Rules for Patching
Software system security has come to depend on customer information technology (IT) organizations closely monitoring patches for vulnerabilities, and on the ability of those organizations to test and deploy the patches before they can be exploited. While, on one hand, many IT organizations are measured, formally or informally, on the uptime of the systems they maintain, many software patches are not effective until the affected computer has been rebooted. Rebooting a business-critical computer decreases the system's uptime and introduces the potential for longer service outages if incompatibilities are introduced. This article provides three rules as actionable advice about how to manage patches to reduce downtime while still maintaining the security of software services through the proactive reduction of dependencies and the use of workaround solutions.
Before any patch can be evaluated, you must recognize the existence of a relevant patch. For organizations with a large number of software packages on a large number of computers, that can be a problem. So, a list of installed software and the computer systems where they are found is critical. Only then can an evaluation of the impact that the patches might have on the organization be understood. Patches can cause two types of disruptions. First, applying a patch can potentially cause servers to reboot. Second, the patched software might have different behavior that causes application incompatibility problems. Because you cannot know of either of these conditions in advance, it is necessary to use a test server, configured exactly like a production server, to test the impact of the patch.
Rule 1: Check patches.
You can use the following process to analyze a patch that is recommended by a security bulletin:
With the above process in mind, try the patch on a test server. If the patch installs without a reboot, chances are good that it will install on running servers with no reboot as well. It is still a best practice to take one production server out of rotation and then test the patch to be sure that the patch is applied properly. Once you have verified the patch, then you can apply it to all servers.
Practices to Avoid Downtime
You can avoid downtime by limiting the code that actually runs on business-critical machines during normal operation. No production technology has yet been able to reliably patch executable code that is loaded into computer memory and running. That implies that any patch to running code will require the code to be stopped and restarted. In the case of the operating system, that means a reboot.
To understand the impact of patching on server downtime, XXX analyzed all Microsoft Security Bulletins for Windows Server 2008 and Windows Server 2008 R2. The bulletins were separated into those that did not require rebooting the operating system to complete the patch, and those that might require reboots specifically for server installations. It was apparent that only a small number could be installed with no fear of a reboot. For the remainder of the bulletins, there are methods that can limit the number of required reboots.
It's a useful best practice to think of reboot reduction using the 10-80-10 rule. The data from Windows Server 2008 showed that patches not requiring a reboot are about 10 percent of the total. Evidence shows that another 10 percent of patches apply to the Windows Server kernel and thus require a reboot unless there is a workaround. The remaining 80 percent will only require reboots if the specific executable code being patched is running when the patch is applied. Paying attention to this last class of patches can help increase the time between reboots and even improve the overall security of the server.
Rule 2: Wherever possible, ensure that only essential code is running during normal operation.
When building the server operating system image, do not include any role that is not absolutely essential. If possible, configure the system to avoid starting services automatically unless their use during normal operation is certain. As a general rule, the less code running, the smaller the attack surface exposed and the less likely that a patched executable will force a reboot.
How to Extend Operating Time Without Reboots
Rule 3: Look for a workaround.
If the patch requires a reboot, review the patch documentation for a workaround that can be safely used, for the near-term, in lieu of the patch.. For longer-term maintainability, we recommend including the deferred patch in the next patch cycle for which you determine that a reboot is required. Alternately, you can wait to apply the patch until it can be installed together with a new operating system image (e.g. with a new service pack).
This brings us to the closing theme of this paper: Dial back the reboots by a) reducing the number of running code dependencies, and b) using workaround solutions instead of the patch. As stated above, this two-prong strategy applies to a high percentage of released patches.
There are two additional specific steps you can take to reduce the number of running code dependencies on a server:
As discussed above, there are steps that any IT professional can take to maintain the security and availability of a business-critical infrastructure. One good metric of success could be the number of days between reboots. The rules, which are summarized below, can substantially increase the time between reboots from one per month to one every other month.
Helpful information on patching policies can be obtained from these sites.
About the Authors
Dan Griffin is the founder of JW Secure, Inc., a software security consultancy based in Seattle. He has published several articles on Windows security software development and is a frequent conference speaker and security blogger. You can also follow him on Twitter.
Tom Jones is a software architect and author specializing in security, reliability and usability for networked solutions for financial and other critical cloud-based enterprises. His innovations in security span a full range from mandatory integrity to encrypting modems.