Chapter 2: Survey of Security Risk Management Practices

Article
08/25/2008

Published: October 15, 2004 | Updated: March 15, 2006

This chapter starts with a review of the strengths and weaknesses of the proactive and reactive approaches to security risk management. The chapter then assesses and compares qualitative security risk management and quantitative security risk management, the two traditional methods. The Microsoft security risk management process is presented as an alternative method, one that provides a balance between these methodologies, resulting in a process that has proven to be extremely effective within Microsoft.

Note It is important to lay a foundation for the Microsoft security risk management process by reviewing the different ways that organizations have approached security risk management in the past. Readers who are already well versed in security risk management may want to skim through the chapter quickly; others who are relatively new to security or risk management are encouraged to read it thoroughly.

Comparing Approaches to Risk Management

Many organizations are introduced to security risk management by the necessity of responding to a relatively small security incident. A staff member's computer becomes infected with a virus, for example, and an office-manager-turned-in-house-PC-expert must figure out how to eradicate the virus without destroying the computer or the data that it held. Whatever the initial incident, as more and more issues relating to security arise and begin to impact the business, many organizations get frustrated with responding to one crisis after another. They want an alternative to this reactive approach, one that seeks to reduce the probability that security incidents will occur in the first place. Organizations that effectively manage risk evolve toward a more proactive approach, but as you will learn in this chapter, it is only part of the solution.

The Reactive Approach

Today, many information technology (IT) professionals feel tremendous pressure to complete their tasks quickly with as little inconvenience to users as possible. When a security event occurs, many IT professionals feel like the only things they have time to do are to contain the situation, figure out what happened, and fix the affected systems as quickly as possible. Some may try to identify the root cause, but even that might seem like a luxury for those under extreme resource constraints. While a reactive approach can be an effective tactical response to security risks that have been exploited and turned into security incidents, imposing a small degree of rigor to the reactive approach can help organizations of all types to better use their resources.

Recent security incidents may help an organization to predict and prepare for future problems. This means that an organization that takes time to respond to security incidents in a calm and rational manner while determining the underlying reasons that allowed the incident to transpire will be better able to both protect itself from similar problems in the future and respond more quickly to other issues that may arise.

A deep examination into incident response is beyond the scope of this guide, but following six steps when you respond to security incidents can help you manage them quickly and efficiently:

Protect human life and people's safety. This should always be your first priority. For example, if affected computers include life support systems, shutting them off may not be an option; perhaps you could logically isolate the systems on the network by reconfiguring routers and switches without disrupting their ability to help patients.
Contain the damage. Containing the harm that the attack caused helps to limit additional damage. Protect important data, software, and hardware quickly. Minimizing disruption of computing resources is an important consideration, but keeping systems up during an attack may result in greater and more widespread problems in the long run. For example, if you contract a worm in your environment, you could try to limit the damage by disconnecting servers from the network. However, sometimes disconnecting servers can cause more harm than good. Use your best judgment and your knowledge of your own network and systems to make this determination. If you determine that there will be no adverse effects, or that they would be outweighed by the positive benefits of activity, containment should begin as quickly as possible during a security incident by disconnecting from the network the systems known to be affected. If you cannot contain the damage by isolating the servers, ensure that you actively monitor the attacker’s actions in order to be able to remedy the damage as soon as possible. And in any event, ensure that all log files are saved before shutting off any server, in order to preserve the information contained in those files as evidence if you (or your lawyers) need it later.
Assess the damage. Immediately make a duplicate of the hard disks in any servers that were attacked and put those aside for forensic use later. Then assess the damage. You should begin to determine the extent of the damage that the attack caused as soon as possible, right after you contain the situation and duplicate the hard disks. This is important so that you can restore the organization's operations as soon as possible while preserving a copy of the hard disks for investigative purposes. If it is not possible to assess the damage in a timely manner, you should implement a contingency plan so that normal business operations and productivity can continue. It is at this point that organizations may want to engage law enforcement regarding the incident; however, you should establish and maintain working relationships with law enforcement agencies that have jurisdiction over your organization's business before an incident occurs so that when a serious problem arises you know whom to contact and how to work with them. You should also advise your company’s legal department immediately, so that they can determine whether a civil lawsuit can be brought against anyone as a result of the damage.
Determine the cause of the damage. In order to ascertain the origin of the assault, it is necessary to understand the resources at which the attack was aimed and what vulnerabilities were exploited to gain access or disrupt services. Review the system configuration, patch level, system logs, audit logs, and audit trails on both the systems that were directly affected as well as network devices that route traffic to them. These reviews often help you to discover where the attack originated in the system and what other resources were affected. You should conduct this activity on the computer systems in place and not on the backed up drives created in step 3. Those drives must be preserved intact for forensic purposes so that law enforcement or your lawyers can use them to trace the perpetrators of the attack and bring them to justice. If you need to create a backup for testing purposes to determine the cause of the damage, create a second backup from your original system and leave the drives created in step 3 unused.
Repair the damage. In most cases, it is very important that the damage be repaired as quickly as possible to restore normal business operations and recover data lost during the attack. The organization's business continuity plans and procedures should cover the restoration strategy. The incident response team should also be available to handle the restore and recovery process or to provide guidance on the process to the responsible team. During recovery, contingency procedures are executed to limit the spread of the damage and isolate it. Before returning repaired systems to service be careful that they are not reinfected immediately by ensuring that you have mitigated whatever vulnerabilities were exploited during the incident.
Review response and update policies. After the documentation and recovery phases are complete, you should review the process thoroughly. Determine with your team the steps that were executed successfully and what mistakes were made. In almost all cases, you will find that your processes need to be modified to allow you to handle incidents better in the future. You will inevitably find weaknesses in your incident response plan. This is the point of this after-the-fact exercise—you are looking for opportunities for improvement. Any flaws should prompt another round of the incident-response planning process so that you can handle future incidents more smoothly.

This methodology is illustrated in the following diagram:

Figure 2.1: Incident Response Process

Figure 2.1: Incident Response Process

The Proactive Approach

Proactive security risk management has many advantages over a reactive approach. Instead of waiting for bad things to happen and then responding to them afterwards, you minimize the possibility of the bad things ever occurring in the first place. You make plans to protect your organization's important assets by implementing controls that reduce the risk of vulnerabilities being exploited by malicious software, attackers, or accidental misuse. An analogy may help to illustrate this idea. Influenza is a deadly respiratory disease that infects millions of people in the United States alone each year. Of those, over 100,000 must be treated in hospitals, and about 36,000 die. You could choose to deal with the threat of the disease by waiting to see if you get infected and then taking medicine to treat the symptoms if you do become ill. Alternatively, you could choose to get vaccinated before the influenza season begins.

Organizations should not, of course, completely forsake incident response. An effective proactive approach can help organizations to significantly reduce the number of security incidents that arise in the future, but it is not likely that such problems will completely disappear. Therefore, organizations should continue to improve their incident response processes while simultaneously developing long-term proactive approaches.

Later sections in this chapter, and the remaining chapters of this guide, will examine proactive security risk management in detail. Each of the security risk management methodologies shares some common high-level procedures:

Identify business assets.
Determine what damage an attack against an asset could cause to the organization.
Identify the security vulnerabilities that the attack could exploit.
Determine how to minimize the risk of attack by implementing appropriate controls.

Approaches to Risk Prioritization

The terms risk management and risk assessment are used frequently throughout this guide, and, although related, they are not interchangeable. The Microsoft security risk management process defines risk management as the overall effort to manage risk to an acceptable level across the business. Risk assessment is defined as the process to identify and prioritize risks to the business.

There are many different methodologies for prioritizing or assessing risks, but most are based on one of two approaches or a combination of the two: quantitative risk management or qualitative risk management. Refer to the list of resources in the "More Information" section at the end of Chapter 1, "Introduction to the Security Risk Management Guide," for links to some other risk assessment methodologies. The next few sections of this chapter are a summary and comparison of quantitative risk assessment and qualitative risk assessment, followed by a brief description of the Microsoft security risk management process so that you can see how it combines aspects of both approaches.

Quantitative Risk Assessment

In quantitative risk assessments, the goal is to try to calculate objective numeric values for each of the components gathered during the risk assessment and cost-benefit analysis. For example, you estimate the true value of each business asset in terms of what it would cost to replace it, what it would cost in terms of lost productivity, what it would cost in terms of brand reputation, and other direct and indirect business values. You endeavor to use the same objectivity when computing asset exposure, cost of controls, and all of the other values that you identify during the risk management process.

Note This section is intended to show at a high level some of the steps involved in quantitative risk assessments; it is not a prescriptive guide for using that approach in security risk management projects.

There are some significant weaknesses inherent in this approach that are not easily overcome. First, there is no formal and rigorous way to effectively calculate values for assets and controls. In other words, while it may appear to give you more detail, the financial values actually obscure the fact that the numbers are based on estimates. How can you precisely and accurately calculate the impact that a highly public security incident might have on your brand? If it is available you can examine historical data, but quite often it is not.

Second, organizations that have tried to meticulously apply all aspects of quantitative risk management have found the process to be extremely costly. Such projects usually take a very long time to complete their first full cycle, and they usually involve a lot of staff members arguing over the details of how specific fiscal values were calculated. Third, for organizations with high value assets, the cost of exposure may be so high that you would spend an exceedingly large amount of money to mitigate any risks to which you were exposed. This is not realistic, though; an organization would not spend its entire budget to protect a single asset, or even its top five assets.

Details of the Quantitative Approach

At this point, it may be helpful to gain a general understanding of both the advantages and drawbacks of quantitative risk assessments. The rest of this section looks at some of the factors and values that are typically evaluated during a quantitative risk assessment such as asset valuation; costing controls; determining Return On Security Investment (ROSI); and calculating values for Single Loss Expectancy (SLE), Annual Rate of Occurrence (ARO), and Annual Loss Expectancy (ALE). This is by no means a comprehensive examination of all aspects of quantitative risk assessment, merely a brief examination of some of the details of that approach so that you can see that the numbers that form the foundation of all the calculations are themselves subjective.

Valuing Assets

Determining the monetary value of an asset is an important part of security risk management. Business managers often rely on the value of an asset to guide them in determining how much money and time they should spend securing it. Many organizations maintain a list of asset values (AVs) as part of their business continuity plans. Note how the numbers calculated are actually subjective estimates, though: No objective tools or methods for determining the value of an asset exist. To assign a value to an asset, calculate the following three primary factors:

The overall value of the asset to your organization. Calculate or estimate the asset’s value in direct financial terms. Consider a simplified example of the impact of temporary disruption of an e-commerce Web site that normally runs seven days a week, 24 hours a day, generating an average of $2,000 per hour in revenue from customer orders. You can state with confidence that the annual value of the Web site in terms of sales revenue is $17,520,000.
The immediate financial impact of losing the asset. If you deliberately simplify the example and assume that the Web site generates a constant rate per hour, and the same Web site becomes unavailable for six hours, the calculated exposure is .000685 or .0685 percent per year. By multiplying this exposure percentage by the annual value of the asset, you can predict that the directly attributable losses in this case would be approximately $12,000. In reality, most e-commerce Web sites generate revenue at a wide range of rates depending upon the time of day, the day of the week, the season, marketing campaigns, and other factors. Additionally, some customers may find an alternative Web site that they prefer to the original, so the Web site may have some permanent loss of users. Calculating the revenue loss is actually quite complex if you want to be precise and consider all potential types of loss.
The indirect business impact of losing the asset. In this example, the company estimates that it would spend $10,000 on advertising to counteract the negative publicity from such an incident. Additionally, the company also estimates a loss of .01 or 1 percent of annual sales, or $175,200. By combining the extra advertising expenses and the loss in annual sales revenue, you can predict a total of $185,200 in indirect losses in this case.

Determining the SLE

The SLE is the total amount of revenue that is lost from a single occurrence of the risk. It is a monetary amount that is assigned to a single event that represents the company’s potential loss amount if a specific threat exploits a vulnerability. (The SLE is similar to the impact of a qualitative risk analysis.) Calculate the SLE by multiplying the asset value by the exposure factor (EF).The exposure factor represents the percentage of loss that a realized threat could have on a certain asset. If a Web farm has an asset value of $150,000, and a fire results in damages worth an estimated 25 percent of its value, then the SLE in this case would be $37,500. This is an oversimplified example, though; other expenses may need to be considered.

Determining the ARO

The ARO is the number of times that you reasonably expect the risk to occur during one year. Making these estimates is very difficult; there is very little actuarial data available. What has been gathered so far appears to be private information held by a few property insurance firms. To estimate the ARO, draw on your past experience and consult security risk management experts and security and business consultants. The ARO is similar to the probability of a qualitative risk analysis, and its range extends from 0 percent (never) to 100 percent (always).

Determining the ALE

The ALE is the total amount of money that your organization will lose in one year if nothing is done to mitigate the risk. Calculate this value by multiplying the SLE by the ARO. The ALE is similar to the relative rank of a qualitative risk analysis.

For example, if a fire at the same company’s Web farm results in $37,500 in damages, and the probability, or ARO, of a fire taking place has an ARO value of 0.1 (indicating once in ten years), then the ALE value in this case would be $3,750 ($37,500 x 0.1 = $3,750).

The ALE provides a value that your organization can work with to budget what it will cost to establish controls or safeguards to prevent this type of damage—in this case, $3,750 or less per year—and provide an adequate level of protection. It is important to quantify the real possibility of a risk and how much damage, in monetary terms, the threat may cause in order to be able to know how much can be spent to protect against the potential consequence of the threat.

Determining Cost of Controls

Determining the cost of controls requires accurate estimates on how much acquiring, testing, deploying, operating, and maintaining each control would cost. Such costs would include buying or developing the control solution; deploying and configuring the control solution; maintaining the control solution; communicating new policies or procedures related to the new control to users; training users and IT staff on how to use and support the control; monitoring the control; and contending with the loss of convenience or productivity that the control might impose. For example, to reduce the risk of fire damaging the Web farm, the fictional organization might consider deploying an automated fire suppression system. It would need to hire a contractor to design and install the system and would then need to monitor the system on an ongoing basis. It would also need to check the system periodically and, occasionally, recharge it with whatever chemical retardants the system uses.

ROSI

Estimate the cost of controls by using the following equation:

(ALE before control) – (ALE after control) – (annual cost of control) = ROSI

For example, the ALE of the threat of an attacker bringing down a Web server is $12,000, and after the suggested safeguard is implemented, the ALE is valued at $3,000. The annual cost of maintenance and operation of the safeguard is $650, so the ROSI is $8,350 each year as expressed in the following equation:

$12,000 - $3,000 - $650 = $8,350.

Results of the Quantitative Risk Analyses

The input items from the quantitative risk analyses provide clearly defined goals and results. The following items generally are derived from the results of the previous steps:

Assigned monetary values for assets
A comprehensive list of significant threats
The probability of each threat occurring
The loss potential for the company on a per-threat basis over 12 months
Recommended safeguards, controls, and actions

You have seen for yourself how all of these calculations are based on subjective estimates. Key numbers that provide the basis for the results are not drawn from objective equations or well-defined actuarial datasets but rather from the opinions of those performing the assessment. The AV, SLE, ARO, and cost of controls are all numbers that the participants themselves insert (after much discussion and compromise, typically).

Qualitative Risk Assessment

What differentiates qualitative risk assessment from quantitative risk assessment is that in the former you do not try to assign hard financial values to assets, expected losses, and cost of controls. Instead, you calculate relative values. Risk analysis is usually conducted through a combination of questionnaires and collaborative workshops involving people from a variety of groups within the organization such as information security experts; information technology managers and staff; business asset owners and users; and senior managers. If used, questionnaires are typically distributed a few days to a few weeks ahead of the first workshop. The questionnaires are designed to discover what assets and controls are already deployed, and the information gathered can be very helpful during the workshops that follow. In the workshops participants identify assets and estimate their relative values. Next they try to figure out what threats each asset may be facing, and then they try to imagine what types of vulnerabilities those threats might exploit in the future. The information security experts and the system administrators typically come up with controls to mitigate the risks for the group to consider and the approximate cost of each control. Finally, the results are presented to management for consideration during a cost-benefit analysis.

As you can see, the basic process for qualitative assessments is very similar to what happens in the quantitative approach. The difference is in the details. Comparisons between the value of one asset and another are relative, and participants do not invest a lot of time trying to calculate precise financial numbers for asset valuation. The same is true for calculating the possible impact from a risk being realized and the cost of implementing controls.

The benefits of a qualitative approach are that it overcomes the challenge of calculating accurate figures for asset value, cost of control, and so on, and the process is much less demanding on staff. Qualitative risk management projects can typically start to show significant results within a few weeks, whereas most organizations that choose a quantitative approach see little benefit for months, and sometimes even years, of effort. The drawback of a qualitative approach is that the resulting figures are vague; some Business Decision Makers (BDMs), especially those with finance or accounting backgrounds, may not be comfortable with the relative values determined during a qualitative risk assessment project.

Comparing the Two Approaches

Both qualitative and quantitative approaches to security risk management have their advantages and disadvantages. Certain situations may call for organizations to adopt the quantitative approach. Alternatively, organizations of small size or with limited resources will probably find the qualitative approach much more to their liking. The following table summarizes the benefits and drawbacks of each approach:

Table 2.1: Benefits and Drawbacks of Each Risk Management Approach

	Quantitative	Qualitative
Benefits	Risks are prioritized by financial impact; assets are prioritized by financial values. Results facilitate management of risk by return on security investment. Results can be expressed in management-specific terminology (for example, monetary values and probability expressed as a specific percentage). Accuracy tends to increase over time as the organization builds historic record of data while gaining experience.	Enables visibility and understanding of risk ranking. Easier to reach consensus. Not necessary to quantify threat frequency. Not necessary to determine financial values of assets. Easier to involve people who are not experts on security or computers.
Drawbacks	Impact values assigned to risks are based on subjective opinions of participants. Process to reach credible results and consensus is very time consuming. Calculations can be complex and time consuming. Results are presented in monetary terms only, and they may be difficult for non-technical people to interpret. Process requires expertise, so participants cannot be easily coached through it.	Insufficient differentiation between important risks. Difficult to justify investing in control implementation because there is no basis for a cost-benefit analysis. Results are dependent upon the quality of the risk management team that is created.

In years past, the quantitative approaches seemed to dominate security risk management; however, that has changed recently as more and more practitioners have admitted that strictly following quantitative risk management processes typically results in difficult, long-running projects that see few tangible benefits. As you will see in subsequent chapters, the Microsoft security risk management process combines the best of both methodologies into a unique, hybrid approach.

The Microsoft Security Risk Management Process

The Microsoft security risk management process is a hybrid approach that joins the best elements of the two traditional approaches. As you will see in the chapters that follow, this guide presents a unique approach to security risk management that is significantly faster than a traditional quantitative approach. Yet it still provides results that are more detailed and easily justified to executives than a typical qualitative approach. By combining the simplicity and elegance of the qualitative approach with some of the rigor of the quantitative approach, this guide offers a unique process for managing security risks that is both effective and usable. The goal of the process is for stakeholders to be able to understand every step of the assessment. This approach, significantly simpler than traditional quantitative risk management, minimizes resistance to results of the risk analysis and decision support phases, enabling consensus to be achieved more quickly and maintained throughout the process.

The Microsoft security risk management process consists of four phases. The first, the Assessing Risk phase, combines aspects of both quantitative and qualitative risk assessment methodologies. A qualitative approach is used to quickly triage the entire list of security risks. The most serious risks identified during this triage are then examined in more detail using a quantitative approach. The result is a relatively short list of the most important risks that have been examined in detail.

This short list is used during the next phase, Conducting Decision Support, in which potential control solutions are proposed and evaluated and the best ones are then presented to the organization's Security Steering Committee as recommendations for mitigating the top risks. During the third phase, Implementing Controls, the Mitigation Owners actually put control solutions in place. The fourth phase, Measuring Program Effectiveness, is used to verify that the controls are actually providing the expected degree of protection and to watch for changes in the environment such as new business applications or attack tools that might change the organization's risk profile.

Because the Microsoft security risk management process is ongoing, the cycle restarts with each new risk assessment. The frequency with which the cycle recurs will vary from one organization to another; many find that an annual recurrence is sufficient so long as the organization is proactively monitoring for new vulnerabilities, threats, and assets.

Figure 2.2: Phases of the Microsoft Security Risk Management Process

Figure 2.2: Phases of the Microsoft Security Risk Management Process

See full-sized image

Figure 2.2 illustrates the four phases of the Microsoft security risk management process. The next chapter, Chapter 3, "Security Risk Management Overview," provides a comprehensive look at the process. The chapters that succeed it explain in detail the steps and tasks associated with each of the four phases.

This accelerator is part of a larger series of tools and guidance from Solution Accelerators.