Click to Rate and Give Feedback
TechNet
TechNet Library
Event Monitoring and Response on the Microsoft Network

Technical White Paper

Published: June 30, 2006

Download

Download Technical White Paper, 340 KB, Microsoft Word file

PowerPoint PowerPoint Presentation, 1.16 MB, Microsoft PowerPoint file

PowerPoint IT Pro Webcast, WMA, MP3

Situation

Solution

Benefits

Products & Technologies

The network security team at Microsoft is responsible for monitoring a huge network for security threats, including hacker attacks, malicious software, unauthorized software, and violations of corporate policy. Because of the sheer scope of its responsibilities, the network security team also faces the challenge of processing a huge amount of data. Finding useful information in this volume is a serious challenge that must be overcome in order to maintain the security of critical assets.

The network security team has implemented a detection-in-depth strategy that aggregates information from multiple event feeds. These feeds include the proxy/firewall servers, a network-based intrusion detection system, a custom client agent monitoring tool, and a service that monitors compliance with corporate policy. These feeds are aggregated by a third-party correlation system that identifies possible threat patterns. The system automatically scans feeds for behaviors that correspond to known vulnerabilities.

  • This solution enables Microsoft to monitor its large, organic network and detect security threats with a high degree of confidence.
  • This solution enables the network security team to identify and respond to threats very rapidly.
  • By automating collection and evaluation of monitoring data, the network security team can monitor a huge volume of information effectively and efficiently.
  • Microsoft Internet Security and Acceleration Server 2004

Executive Summary

Microsoft has one of the largest experimental computer networks in the world. The network is highly organic, with business functions such as accounting, finance, and human resources on the same worldwide network as sales and marketing, product development, and testing. A huge variety of activities and applications coexist in the same environment, including development environments, testing environments, and more than 600 line-of-business applications. The network security team at Microsoft faces the challenge of monitoring this environment in an effort to detect, in near real time, events that result in disruption and compromise of network resources.

Planning a security strategy for this environment, or for any environment, requires careful attention to the assets to be monitored, the relative importance of these assets, and the relationship between infrastructure requirements and corporate policies. The security strategy must also ensure that there are tools, processes, and qualified personnel with specialized skills to perform the monitoring.

The Microsoft network security team monitors the network at multiple points as part of a detection-in-depth strategy to provide security to the network and provide information protection to digital assets.

The process of event monitoring and response is highly automated, enabling the network security team to capture, analyze, and respond to a huge volume of event data. The team uses the following applications, tools, and features to monitor the event data:

  • Microsoft® Internet Security and Acceleration (ISA) Server 2004 and a set of custom scripts detect clients that are passing unauthorized traffic over the network. This traffic is blocked at the firewall, and the client is prevented from communicating over the network.

  • The team uses a custom client agent monitoring tool to examine ISA Server proxy server logs, and to identify and categorize network traffic by the executable file sending the traffic. This action enables traffic associated with malicious software (malware) to be identified quickly, in addition to providing insight into policy violations and suspicious data transfers.

  • The team uses a network-based intrusion detection system (NIDS) to monitor internal networks and raise an alert when suspicious activity is detected.

  • The team uses Audit Collection Services (ACS) to monitor security events and to detect possible violations of corporate policy. Audit collection will be generally available in the next version of Microsoft Operations Manager, Microsoft System Center Operations Manager 2007.

  • Monitoring of events related to antivirus software helps to detect patterns of events related to direct malicious attacks, including keyloggers, remote control agents, and rootkits.

  • The team uses a third-party correlation system to aggregate multiple information feeds, including ISA Server proxy server logs, NIDS, client agent monitoring, and ACS. This aggregated data is then automatically correlated against behavior patterns that correspond to known threats. This correlation enables the team to detect security threats with a high degree of confidence.

Responding to security vulnerabilities when they are discovered is an important part of the network security team's job. Whenever a new software update is announced, the team identifies any vulnerabilities that the update addresses. The team then implements a process to monitor and mitigate any exploits that attempt to take advantage of these vulnerabilities.

Introduction

The Microsoft enterprise is large, complex, and constantly changing, with more than 120,000 users and 300,000 devices at 400 sites worldwide. The size and diversity of the computing environment create unique challenges to network security. The Microsoft network security team is responsible for monitoring and responding to threats initiated from sources inside and outside the corporate firewall, including malicious software and violations of corporate policy. When resources are compromised, the results can include lost productivity, privacy breaches, theft of intellectual property, and insertion and modification of digital assets, in addition to legal costs and damaged credibility.

In the past, Microsoft took a reactive approach to network security. The primary activities of the network security team were responding to intrusions and carrying out investigations when intrusions occurred. The team spent a high proportion of its time responding to security threats. The team recognized that if it developed a more proactive threat detection strategy, it would be able to better prepare for security threats and respond to them more quickly.

To achieve this goal, the network security team is constantly assessing risks and threats, and monitoring changes to the network infrastructure, product vulnerabilities, and emerging attack vectors. This work enables the team to create strategies and plans for detection of threats and protection of the network. The team successfully validated this approach for the Microsoft network, put NIDS into production, and deployed an event correlation engine.

This system enables the network security team to respond to security threats in near real time. This paper describes the tools and processes used to monitor security events on the Microsoft network and provides some of the lessons learned related to this effort.

Planning

As its network security strategy has evolved, Microsoft has gained valuable experience that can help other organizations plan their network security strategies. A successful monitoring and response strategy requires careful planning to meet the needs of the organization that it serves. The monitoring system and response processes are aligned not only with infrastructure requirements, but also with the organization's policies and legal requirements.

A necessary precursor to effective event monitoring and response is to reduce the attack surface as much as possible. An organization should ensure that computers receive all available security updates and are properly configured to maintain a level of security that the organization considers reasonable. The organization should implement some form of antivirus and antispyware solution to protect against attacks. Processes and personnel should be in place to ensure that updates and virus signatures are kept up to date and that IT-mandated computer configurations are maintained. This strategy provides a good basis from which to evolve a solution for event monitoring and response.

Just ensuring that the environment is up to date and correctly configured does not ensure security. For example, a well-maintained environment does not necessarily provide protection against new and evolving threats, such as worms delivered through instant messaging, or inside threats. An organization should take these types of threats into account throughout the planning process.

Each environment is different. However, based on the network security team's experience, the following planning process helps address the priorities for planning a successful strategy for monitoring and response:

  1. Identify assets.

  2. Determine how to protect assets.

  3. Review legal policies and requirements.

  4. Prioritize assets.

  5. Determine who will monitor the assets.

  6. Establish a response framework.

The following sections describe these areas in detail.

Identify Assets

First, an organization should determine and inventory the assets that must be protected. Typically, these assets include servers, clients, and corporate information, including confidential records and intellectual property. The organization should determine where these assets are located and what their current security situation is.

This stage can be very informative. In some cases, the organization may determine that assets are too disaggregated to be protected properly, in which case some reorganization of assets may be required. This stage may also identify good candidates for server consolidation, which can both reduce costs and simplify security requirements.

Microsoft Information Technology (Microsoft IT) has an internal initiative called Least Privilege Access (LPA) to provide ongoing guidance to business units and product groups to maintain the security of assets. LPA mandates that employees, vendors, and partners have access only to the resources that they need to do their work, and no more.

Determine How to Protect Assets

After assets have been identified, the organization must determine how those assets will be protected. Assets should be classified according to the potential business impact of a threat against them. This classification can be used to determine who should be able to access those assets and what level of security protection they require.

Review Legal Policies and Requirements

Reviewing the legal requirements for the monitoring system is an essential step in determining a monitoring architecture. Privacy and telecommunications laws are placing stricter requirements on how organizations monitor and store security event data and other information. Regulatory compliance is an important part of a corporate policy for event monitoring, and it is only becoming more important as regulation increases. Corporate policies for record retention, security event auditing, and monitoring of corporate policy violations all affect an organization's monitoring infrastructure and are closely tied to legal needs. International organizations need to take special heed of differences in regulations between countries. In some cases, it may be necessary to provide separate monitoring for locations that have different monitoring requirements due to differing privacy laws.

Prioritize Assets

To create an effective monitoring architecture, the organization must determine its priorities at the corporate level regarding the importance of the assets.

Resources allocated to processing and storing security event data must be capable of scaling to fit the size and growth of the network as well as data streams being captured and any unusual traffic that events and users generate. These requirements have to be balanced for the monitoring architecture to be effective. Some of the considerations to be determined at this stage are:

  • Asset importance. An organization's most critical assets require the highest level of protection. Conversely, a relatively low level of protection may be acceptable on some assets in cases where low-level systems cannot interact with high-value assets.

  • Scalability. An organization's monitoring system must be capable of growing as its network grows. The systems used to store monitoring and audit data must also be scalable.

  • Data volume. Monitoring can produce a high volume of data. This data must be managed effectively for monitoring staff to be able to make effective use of it and to respond to events in a timely manner.

  • Data storage. Determining what data will be stored and for how long determines the amount of data storage infrastructure required.

Determine Who Will Monitor the Assets

The organization should determine who will be responsible for monitoring assets. This effort includes determining the number and skill level of personnel required. It is important to ensure freedom from conflict of interest when creating a network security function. The team must be independent from business units, and digital assets being monitored to ensure objectivity and accountability. A successful network monitoring team will consist of highly trained and experienced staff.

The team should include deep knowledge of emerging threats in the areas of hacker methodology, malware distribution and behavior, and security vulnerabilities commonly used to accomplish compromise. Steep learning curves mean that it may take time to bring new staff up to full productivity.

Establish a Response Framework

Finally, the organization needs to develop a response framework. This framework is a set of defined roles, processes, and procedures for determining the appropriate response to an incident. This framework varies from organization to organization but typically includes:

  • An incident response plan.

  • Contact information and details for personnel to be notified, and criteria for that notification in case of an incident.

  • Procedures to help assess, contain, and remediate security incidents.

Monitoring Goals at Microsoft

The goal of the network security team at Microsoft is to protect the integrity and security of key assets, as well as to maintain the productivity of users by protecting systems from disruption.

Key assets at Microsoft include, but are not limited to, host devices, client devices, and servers. Digital assets include classifications like personally identifiable information, intellectual property, and confidential data. Monitoring goals provide context for specific objectives and elements of the detection strategy, response roles, and processes.

To achieve its mission, the network security team monitors Microsoft assets for many classes of threats, including:

  • The presence of malicious software, including viruses, Trojan horses, and worms.

  • Denial of service (DoS) attacks in progress, including those caused by malicious software and accidental DoS attacks caused by misconfigured software or hardware.

  • Hacker intrusions and attempted hacker intrusions.

  • Unauthorized transmission of intellectual property and theft of intellectual property.

  • Accidental or purposeful leaks of information.

  • Violations of corporate policy, including mishandling intellectual property, violating security policy, and browsing unauthorized Web sites.

In addition, the network security team actively monitors the network for unusual behavior and for traffic related to new applications that have not been encountered before. Investigating and researching unknown executable files or unusual network behavior are an important part of the network security team's job. By investigating these unknowns, the team is able to gain an understanding of the agent to determine the risk and threat that the executable file represents, and to determine the appropriate treatment upon detection. Anomalies typically have completely benign causes, but in some cases, they may indicate a new type of malicious code.

Microsoft Environment

No single architecture for event monitoring and response is going to be right for every organization. An event monitoring solution must be designed to meet the specific needs of the environment where it is implemented. The Microsoft environment is highly organic and includes tens of thousands of employees, servers, and clients. Development, testing, and production servers all run in the same environment. Monitoring this environment effectively is a significant challenge. Several unique characteristics of the Microsoft environment drive the monitoring strategy that the network security team uses, including the following:

  • The size of the user population and the high proportion of knowledgeable users

  • The presence of a large developer population with many applications under development and active testing environments

  • A wide range of assets to be protected

  • The presence of both managed and unmanaged clients on the network

User Population

Microsoft has a very large, knowledgeable, global user population, with more than 120,000 users using the network on a regular basis. The bulk of these users access the Internet every day.

Developer Population

Microsoft has a large, active developer population. This fact creates additional challenges, because a number of experimental, beta, and custom applications appear on the network. Developers also work on diverse platforms, including both Microsoft and non-Microsoft operating systems. Applications under development, test, and research may generate unusual network traffic, or they may trigger security events that suggest or appear as a threat. The presence of many labs and testing environments, including labs where stress testing occurs, creates network traffic that can be difficult to distinguish from traffic related to viruses or worms.

Company Assets

Microsoft must protect a wide variety of assets. Critical assets, such personally identifiable information, intellectual property, and confidential data, require careful security and maintenance. The importance and sensitivity of servers can vary greatly from server to server, even within the same server role. Microsoft also has a large number of line-of-business applications that handle sensitive and confidential employee information running on its network. This mix of assets requires sophisticated monitoring to ensure that they are all kept as secure as possible.

Managed and Unmanaged Clients

The Microsoft environment includes both managed and unmanaged clients. For purposes of event monitoring and response, an unmanaged client is a device or host that Microsoft IT does not manage. An unmanaged device is not domain joined and does not have elevated security credentials present to allow scanning and access to the device for investigative processes. Neither antivirus software with real-time monitoring enabled nor the Microsoft Systems Management Server (SMS) agent is present. The network security team cannot determine what software is running on an unmanaged client, or the team cannot enforce a security configuration that is consistent with security policies.

The presence of unmanaged clients is a major reason that Microsoft requires multiple levels of monitoring. The network security team needs to be able to detect and respond to threats and violations that originate from unmanaged computers. The team uses network-based intrusion detection to help detect security threats that originate from these unmanaged clients.

Monitoring Process

The network security team actively attempts to detect threats and prevent incidents. This practice enables the team to apply its resources much more efficiently, to respond in near real time, and to mitigate risk during the event.

At Microsoft, event monitoring occurs at the proxy/firewall servers as well as the internal network. Information from these sources is combined to produce a unified data format that is monitored to detect attack patterns.

Microsoft uses the following applications, tools, and features to monitor the event data:

  • Event correlation system. The network security team uses a third-party correlation system to collect and parse information from the ISA Server-based servers, client agent monitoring tool, NIDS, antivirus software, and ACS. This information is aggregated into a unified data format and correlated with security vulnerabilities to detect possible security threats.

  • ISA Server. Microsoft uses several arrays of proxy/firewall servers running ISA Server to help protect its network and manage traffic between the network and the Internet. Microsoft uses the built-in features of ISA Server, as well as some custom tools, to block badly behaving clients from communicating over the firewall servers.

  • Client agent monitoring tool. A custom tool called the client agent monitoring tool collects logs from ISA Server proxy logs. This tool parses the logs and identifies and categorizes network traffic according to the name of the executable file that created the traffic. It also records the ports used to communicate and the size of the traffic transmitted. This tool enables the identification of traffic from known malicious software.

  • NIDS. The network security team uses NIDS to monitor network activity within the corporate network. NIDS examines packets passed on the internal network to determine whether clients are carrying out any activities that correspond to known security threats.

  • ACS. ACS collects Microsoft Windows® Security Audit logs that relate to account administration, authorization, and access to corporate resources so that these events can be monitored for unauthorized activity.

Event Correlation System

The network security team monitors a huge volume of information from multiple sources to help protect network resources at Microsoft. Although many security threats can be detected at the firewall, at the network level through NIDS, or at the client level through the client agent monitoring tool, these tools in isolation do not provide sufficient protection for the Microsoft network.

Many security threats can be successfully detected only by comparing information from multiple sources and correlating it to identify patterns of behavior that represent security threats. This task is too large to be done manually, particularly in an environment as large as the Microsoft environment. This is one of the reasons that Microsoft uses a third-party correlation system to automate correlation of multiple feeds and detection of security threats.

The correlation system acts as a single point of aggregation for multiple information feeds related to security monitoring. The correlation system unifies these feeds and places them into a single format for easier analysis, reporting, and threat correlation.

The correlation system enables collection and analysis of monitoring information to be centralized and automated. Achieving a comparable level of monitoring without the correlation system would typically require some sort of solution at each monitoring point. This approach would likely be prohibitively expensive to implement and maintain.

The feeds that the event correlation system aggregates include:

  • ACS. ACS logs provide information about security activities such as creation of user accounts and granting of permissions. This information is used primarily to detect violations of corporate policy.

  • Firewall (ISA Server-based) server. Proxy server logs provide information about traffic through the proxy servers.

  • Client agent monitoring. The logs produced as the output of the client agent monitoring tool provide information about network activity associated with the executable file sending or receiving the information and the category of application that it represents.

  • Antivirus logs. Logs from the antivirus system provide information about virus infections and about the current state of antivirus software installed on clients on the network.

  • NIDS. The NIDS sensor network collects intrusion detection alerts. This network is deployed across the Microsoft environment in strategic locations, such as digital asset concentration areas or traffic aggregation points. The alerts reflect the presence of malicious traffic on the network or evidence of unusual behavior indicative of possible compromise.

The correlation system processes a huge number of events, more than 2 million every day. To enable the network security team to make sense of this high volume of information, the correlation system carries out automated analysis of the logs. The correlation system uses correlation rules that the information security team created to detect attack patterns that exploit known vulnerabilities. When an attack pattern is detected, an alert is raised for the network security team.

An attack typically has a known pattern of events that are executed in sequence. For example, a worm that attacks a particular vulnerability may use a file transfer request to download malicious code, and then conduct a port scan to find additional vulnerable computers, and finally attempt to connect to a vulnerable port. This kind of behavior may be very difficult to see when the elements are examined in isolation. For example, a port scan is not itself proof of an attack. So rather than examine events in isolation, the correlation system checks across all its monitored feeds for patterns that correspond to known threats. In this example, when the correlation system determines that these events executed from one client in sequence, it can identify a threat and generate an alert with a high degree of certainty.

Malicious software evolves very quickly. Names of executable files change as threats are updated or as new threats appear. One of the great advantages of the correlation system is that threats can be correlated to attack patterns that correspond to a vulnerability, rather than to specific units of malicious software. This correlation provides a high probability of catching future attempts to exploit that vulnerability, even when those attempts originate from a new piece of malicious software.

ISA Server

Microsoft uses ISA Server on dedicated proxy/firewall arrays to monitor and control traffic across the network and to the Internet. Microsoft maintains 15 proxy arrays located in 12 hub sites worldwide. These hubs serve more than 400 Microsoft sites in total.

Microsoft logs traffic that communicates over the proxy servers. These logs include a wealth of important information about this traffic, including the originating IP, the destination IP, and the name of the executable file that initiated the communication. If the client that is sending the traffic is fully managed, additional information can be gathered, including the user name of the person logged on to the client and the name of the client. These logs are stored centrally and consumed by the client agent monitoring tool and the correlation system.

Detection and Isolation of Badly Behaving Clients

One of the key challenges that the network security team faces is lead generation, identification, and isolation of badly behaving clients that are sending unauthorized traffic over the network and to the Internet. These clients may be infected by malicious software that is probing the network for additional vulnerabilities, or they may be misconfigured.

The network security team has implemented several measures to address badly behaving clients proactively through ISA Server. Microsoft uses a mix of ISA Server features and custom scripts to identify these clients, isolate them, and prevent them from consuming proxy server resources or accessing any other network resources.

Several other features of ISA Server facilitate the prevention and mitigation of security threats, including the following:

  • ISA Server can block a specific port on a specific client from communicating over the corporate network or the Internet. This capability enables the blocking of traffic that corresponds to known attacks.

  • ISA Server can block a particular executable file, such as a known malicious program, worm, or peer-to-peer application, from communicating.

  • ISA Server can be used to establish session limits to prevent a client from establishing a huge number of connections in an attempt to overload the proxy server.

  • The new firewall client in the latest version of ISA Server can block ports based on a wildcard port assignment. Because some new attacks create random application names, this feature enables the blocking of those attacks by closing a range of ports to all traffic.

Detection of Infected Clients

Detecting clients that may be infected with malicious code can be challenging. In an environment that contains traffic as varied and unpredictable as the Microsoft network, distinguishing malicious traffic from regular traffic is difficult. For example, network traffic associated with new products under development, or stress testing of products, can resemble traffic generated by a DoS attack.

The network security team uses custom scripts to proactively detect infected clients and prevent them from communicating, a process called ratholing. To detect these badly behaving clients, a script scans the proxy logs at regular intervals. This script searches for patterns of behavior that correspond to known attacks.

When the script identifies a badly behaving client, its name and IP address are added to a central database of known bad clients, and a message alerts the client owner that a problem exists with the client.

To communicate with a proxy array, a client must complete a three-way handshake with the proxy server. If the client appears on the list of bad clients, the proxy server prevents this handshake from occurring. This action effectively prevents the client from spreading its infection, launching an attack, or consuming proxy resources.

Every five minutes, every proxy server worldwide requests a list of bad clients from the central database of listed bad clients. It then compares this list to a local list of bad clients that the proxy server maintains. If any new clients have been added to the central list since the check, they are added to the local list. If any clients have been removed from the central list, they are removed from the local list. This action ensures that every proxy server on the Microsoft network blocks newly detected badly behaving clients within five minutes of their discovery.

Often, the first time that a user learns of a problem with one of his or her clients is when that client ceases to communicate over the network because it has been blocked at the proxy servers. For this reason, the Microsoft Helpdesk also has access to the list of bad clients. When a user calls the Helpdesk to determine why his or her client has lost connectivity, the Helpdesk can quickly determine whether the client has been ratholed. This capability enables the Helpdesk to respond to user Helpdesk requests related to infected clients more effectively. After the problem has been resolved, the Helpdesk has the ability to remove the client from the list of bad clients.

Future Directions

Microsoft ISA Server 2006 includes a new feature called Flood Resiliency, which helps protect proxy servers from traffic that badly behaving clients generate. With Flood Resiliency, ISA Server 2006 can respond to traffic from a badly behaving client by temporarily terminating its access and generating an alert. Security staff can then investigate the incident and take an appropriate response. This feature is currently in pilot within Microsoft and is expected to replace the custom ratholing scripts currently in use.

Client Agent Monitoring Tool

The network security team monitors logs from its ISA Server proxy/firewall servers to proactively identify possible security threats. The volume of information that these logs produce, however, is far too large to be monitored in its raw form. The logs must be processed in some way that filters out irrelevant data and makes identifying the most relevant events easier. To achieve this goal, the network security team uses a custom tool called the client agent monitoring tool.

Proxy server logs at Microsoft produce more than 500 gigabytes (GB) of data per day. To handle this huge volume, the client agent monitoring tool uses only the information that the network security team has determined is most useful—in particular, the names of executable files that are sending information over the network. This technique trims the amount of data to approximately 20 GB per day.

The tool periodically gathers proxy server logs. It normalizes the logs, removing information that is not relevant to the monitoring process. In addition to the name of the executable file, the tool collects identifying information from the client, such as IP address and current user.

The client agent monitoring tool includes a database of known names of executable files, organized by category. The tool looks up the name of each executable file logged in this database. The tool can then categorize traffic by executable file and prioritize the traffic appropriately. The categories of executable files include:

  • Known malicious software

  • Peer-to-peer application

  • Microsoft application

  • Non-Microsoft application

  • Spyware or other unwanted software

The tool produces four types of alerts that are fed into the correlation system:

  • Informational: for non-critical events that may be logged for future research and analysis

  • Notification: for non-critical events that may be of interest to the monitoring team

  • Warning: for events that may indicate a problem

  • Critical: for outbreaks and serious incidents in progress

Depending on the category of the executable file, the tool may raise an alert. For example, traffic related to known malicious software indicates an infected client on the network and demands some response.

When the tool detects suspicious traffic that originates from a client with the ISA Server proxy client installed, that client is identified along with the currently logged-on user and the amount of information being transferred. In this case, the network security team may contact that user or the user's manager to alert him or her to the problem.

A benefit of this system is that it makes it easy to identify when a new executable file appears and begins sending traffic on the network. When a new executable file is encountered, it is marked for further research and classification. The length of this process can vary. In many cases, Internet research may reveal the identity of an executable file extremely quickly. In other cases, more research may be required. After the executable file is identified, it can be classified in the client agent system. After the executable file is classified, the tool will take the appropriate action when it encounters the file again.

This approach does have a drawback. Some malicious software is named after Microsoft executable files. In this case, the way to discover suspicious traffic is by looking at the executable file's behavior in the context of its behavior on the network. If this behavior corresponds to a known vulnerability, the file can be flagged for closer examination.

Because the tool not only identifies traffic by executable file, but also classifies that executable file, the tool greatly simplifies response actions. For example, after a piece of software has been identified as a peer-to-peer application, the response can be formulated as a matter of corporate policy, rather than treating each peer-to-peer application as an individual case.

The client agent monitoring tool produces a log of traffic classified by executable file and sends that log to the correlation system.

Besides forwarding its output as a feed to the correlation system, the client agent monitoring tool stores logs in a Microsoft SQL Server™ database. This data is then analyzed over time through a custom-built online analytical processing (OLAP) cube system. This analysis provides additional insight into the health of the network. For example, analysis may uncover subtle plans of attack that were missed during routine monitoring. The database also provides a measure of the performance of the network security team for comparison against goals.

Network-Based Intrusion Detection System

Microsoft uses NIDS inside the corporate firewall to help protect corporate assets. NIDS examines packets at the network layer and monitors them for behavior that corresponds to known security threats. When suspicious behavior is detected, an event is raised to alert the network security team.

Several factors drove the network security team's decision to implement this type of NIDS solution. These factors include:

  • Monitoring for unmanaged clients. Because NIDS operates at the network level, it enables the network security team to monitor traffic that originates from unmanaged clients.

  • Difficulty in enforcing proxy client installation. Microsoft has an extremely open network. Although corporate policy mandates that the ISA Server proxy/firewall client is installed and enabled on every client on the corporate network, adherence to this policy cannot always be guaranteed. It is impossible to ensure that the ISA Server proxy client or a monitoring agent runs on every client.

  • No performance impact. Because network-based intrusion detection is passive, it has no impact on the performance of the network.

  • No baseline available for network activity. Many organizations use an intrusion prevention solution with active blocking to help protect their networks. Although this is often a valid approach, it requires a network activity baseline. Activity on the Microsoft internal network is unpredictable and often volatile. The lack of any network activity baseline rules out many active blocking solutions.

  • Large number of protocols in use. The Microsoft network experiences traffic in a very high number of protocols. These include new and emerging protocols associated with products that are still under development. Intrusion protection requires deep protocol analysis to function. Although a high number of protocols does not rule out intrusion prevention in principle, it is an important consideration at Microsoft. The NIDS implementation at Microsoft recognizes and parses more than 100 protocols natively.

Although the network security team has chosen to implement NIDS for monitoring the Microsoft network, this choice does not invalidate a host-based intrusion detection system (HIDS). The team has determined that HIDS may provide improved monitoring depth for selected assets.

Audit Collection Services

Corporate policy violations represent a serious class of threats that the network security team must monitor and respond to. Violations of corporate policy can include unauthorized leaks of confidential information, unauthorized access or transmission of intellectual property, and attempts to gain unauthorized access to corporate resources. To facilitate monitoring for compliance with corporate policy, the network security team uses ACS.

ACS collects, normalizes, and stores security events, providing both real-time and after-the-fact analysis of event data. Microsoft uses ACS to monitor the activity of users who have certain rights, particularly where that activity involves granting or changing access to corporate resources. Among the security events audited are creation of user accounts, changes in group membership, and changes to security policies.

ACS monitors systems via an agent installed on the client. The agent forwards events to a server that acts as an event collector. Rules on the server can raise an alert based on the events detected. In addition, events are logged to a SQL Server database for storage. This data is also periodically offloaded to a data warehouse for analysis and reporting. ACS logs are also provided to the correlation system for correlation with other security information.

The ACS logs contain important information that relates to security events on the Microsoft network. These events include changes made to user rights as well as how rights are used. This information can be used to detect when rights are used or changed in a way that violates corporate policy. In the event of a policy violation, these records can also help provide an audit trail to show how the violation occurred.

In addition to providing information for reporting and analysis, historical security event logs can be important to complying with government regulations. Many governments require the retention of security event records for a specific period of time.

Microsoft chose to use ACS to monitor policy compliance for several reasons. One of the primary reasons was the size of the Microsoft environment. The network security team monitors more than 250 million security events every day worldwide by using ACS. ACS provides an auditing solution capable of scaling to the extremely high volume of events that the Microsoft environment generates. In addition, ACS provides Microsoft with the ability to centralize collection and analysis of audit data, greatly facilitating analysis and monitoring.

Preparing for Security Vulnerabilities

An important part of the network security team's responsibilities is addressing security vulnerabilities when those vulnerabilities are discovered. Whenever an update is created to address a security vulnerability, attacks that seek to exploit that vulnerability can begin to appear quickly. Responding to new vulnerabilities before exploits begin to emerge is a key piece of any architecture for detection and response.

When a new update is announced for Microsoft software, the network security team carries out the following step-by-step response to vulnerabilities that the update addresses:

  1. The network security team researches the new update by using Microsoft TechNet and determines exactly what the vulnerability is. The team assesses the risk that this vulnerability entails in the Microsoft environment according to the number of users and systems affected. If the risk is significant, the team continues to the next step in the process.

  2. The network security team creates a profile for any threat that seeks to exploit that vulnerability. This profile typically consists of a sequence of actions that can be detected through the correlation system, including ports used and error messages generated when the vulnerability is exploited.

  3. A correlation rule is created in the correlation system to detect this exploit, and it is added to the signature of the correlation system.

  4. Because a new and emerging exploit is always a high-priority situation, the correlation system is configured to page a support engineer whenever an exploit is detected until the attack surface for this vulnerability is reduced to a degree where the risk is no longer urgent.

  5. When an exploit is detected, the network security team attempts to isolate the exploiting code as quickly as possible.

  6. As an outbreak evolves, the support team examines the information collected during the outbreak. This information is used to create a script that can scan for and identify infected clients at the proxy/firewall servers.

  7. This script can be run to periodically detect infected computers and block them from spreading the infection.

This system provides for very rapid response to new vulnerabilities.

Incident Response

When an incident arises that requires a response, the network security team has several options available. Some incidents can be resolved relatively simply, often by blocking a problem client and contacting the owner of the client. Other incidents are more serious and can involve a response by multiple departments and possible legal action.

When an incident occurs, the first priority of the network security team is to contain the problem. For example, in the case of network worm outbreak, the response may be to create a script that can scan for infected clients and adding those clients to the list of bad clients. This response effectively removes those clients from the network and minimizes the outbreak. The network security team may also attempt to contact the owners of infected clients and inform them of the problem.

Microsoft has a sophisticated incident response plan (IRP) that has evolved over the years, keeping pace as threats emerge and change. This plan uses internal best practices on roles, processes, procedures, and communications for a coordinated response to significant network events.

The network security team has engaged users worldwide with training in the IRP. This training prepares vendors, partners, and Microsoft staff to act as a first line of defense for the network. As a result, the network security team has assembled a virtual team of 90,000 users who work together in watching devices, severs, and the network for signs of compromise. The network security team also invests in localized training to ensure that the virtual team understands how to participate in security and response anywhere and in any language.

Lessons Learned

Implementing and maintaining an event monitoring and response system is a complex task that requires careful attention to the unique characteristics of the environment and the requirements of the individual company. In implementing its event monitoring and response system, the Microsoft network security team has learned a number of valuable lessons. The following sections describe some of the lessons learned.

Assigning Resources

To achieve the goal of an effective resource monitoring and response system, executives and managers must make the assignment of resources to this task a priority. They must ensure that the training, tools, and personnel are available to provide security coverage of assets 24 hours a day, seven days a week. This is a prerequisite to monitor resources efficiently and responsibly.

Training

To provide the highest level of protection for assets, every user on the network must have the skills necessary to be a virtual member of a response team. To achieve this goal, the network security team invests in worldwide training of users. This training has resulted in a global virtual team of users who partner with the network security team in detection and response.

Maintaining the System

Tuning the intrusion detection system is an ongoing process that requires active attention. Attacks evolve extremely quickly. Threats change their patterns, names, and vectors. New threats appear and must be addressed rapidly. Through careful authoring of correlation rules, the network security team has been able to identify intrusions with a high level of confidence. Correlation rules must be refined over time as threats are better understood, reducing the proportion of false positive results.

Planning to Implement Audit Collection Services

Microsoft uses ACS to monitor for corporate policy violations and to provide a consolidated audit trail for security events. Planning an ACS implementation requires important decisions regarding the information that will be logged and where and how long the information will be stored.

Different departments will likely have different auditing priorities. For example, the legal department may want the highest level of auditing possible, but the IT department may see that this will cause a prohibitive drain on resources. The correct balance will be a function of the individual environment and the organization's needs.

The depth of auditing that an organization implements has both practical and security costs. Auditing requires servers and storage. The amount of storage required can quickly grow quite large depending on the size of the environment, the number of events audited, and the amount of time the record is kept. Both the network operations team and the security team at the organization need to be involved in making these decisions.

Conclusion

The cost of security breaches can be high and can include disruption of productivity, loss of intellectual property, and compromise of confidential data, in addition to time and resources dedicated to responding to the breaches. To help protect Microsoft corporate resources from security threats, the network security team at Microsoft employs a proactive monitoring strategy that includes multiple monitoring points and data feeds. Each component of the event monitoring infrastructure plays a particular role in helping to protect the Microsoft network.

The network security team implements a detection-in-depth approach to information security. It monitors multiple points on the network by using the following activities and tools:

  • Monitoring of traffic at the ISA Server firewall proxy servers to detect and block badly behaving clients

  • Network-based intrusion detection to examine packets at the network level and detect patterns of behavior that may correspond to security threats

  • A custom client agent monitoring tool that examines proxy logs and detects activity that originates from known malicious, forbidden, or suspicious executable files

  • ACS to monitor security action for possible breaches of corporate policy

The heart of the event monitoring system at Microsoft is the correlation system. Correlation aggregates information feeds from each monitoring point into a single location. This aggregation of feeds enables one technology to correlate threats across all feeds. To implement a monitoring solution without correlation would typically require a separate solution for each monitoring point—an approach that is likely to be cumbersome and prohibitively expensive.

The correlation system also enables monitoring to be highly automated. This automation is essential, because the monitoring system generates millions of events every day. Automation enables the meaningful information to be found within this mass of data.

For More Information

For more information about Microsoft products or services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information through the World Wide Web, go to:

http://www.microsoft.com

http://www.microsoft.com/technet/itshowcase

© 2009 Microsoft Corporation. All rights reserved. Terms of Use | Trademarks | Privacy Statement
Page view tracker