Event Monitoring and Response on the Microsoft Network
Technical White Paper
Published: June 30, 2006
|
Situation
|
Solution
|
Benefits
|
Products & Technologies
|
|
The network security team at Microsoft is responsible for monitoring a huge network
for security threats, including hacker attacks, malicious software, unauthorized
software, and violations of corporate policy. Because of the sheer scope of its
responsibilities, the network security team also faces the challenge of processing
a huge amount of data. Finding useful information in this volume is a serious challenge
that must be overcome in order to maintain the security of critical assets.
|
The network security team has implemented a detection-in-depth strategy that aggregates
information from multiple event feeds. These feeds include the proxy/firewall servers,
a network-based intrusion detection system, a custom client agent monitoring tool,
and a service that monitors compliance with corporate policy. These feeds are aggregated
by a third-party correlation system that identifies possible threat patterns. The
system automatically scans feeds for behaviors that correspond to known vulnerabilities.
|
- This solution enables Microsoft to monitor its large, organic network and
detect security threats with a high degree of confidence.
- This solution enables the network security team to identify and respond to
threats very rapidly.
- By automating collection and evaluation of monitoring data, the network security
team can monitor a huge volume of information effectively and efficiently.
|
- Microsoft Internet Security and Acceleration Server 2004
|
Executive Summary
Microsoft has one of the largest experimental computer networks in the world. The
network is highly organic, with business functions such as accounting, finance,
and human resources on the same worldwide network as sales and marketing, product
development, and testing. A huge variety of activities and applications coexist
in the same environment, including development environments, testing environments,
and more than 600 line-of-business applications. The network security team at Microsoft
faces the challenge of monitoring this environment in an effort to detect, in near
real time, events that result in disruption and compromise of network resources.
Planning a security strategy for this environment, or for any environment, requires
careful attention to the assets to be monitored, the relative importance of these
assets, and the relationship between infrastructure requirements and corporate policies.
The security strategy must also ensure that there are tools, processes, and qualified
personnel with specialized skills to perform the monitoring.
The Microsoft network security team monitors the network at multiple points as part
of a detection-in-depth strategy to provide security to the network and provide
information protection to digital assets.
The process of event monitoring and response is highly automated, enabling the network
security team to capture, analyze, and respond to a huge volume of event data. The
team uses the following applications, tools, and features to monitor the event data:
-
Microsoft® Internet Security and Acceleration (ISA) Server 2004 and a set of
custom scripts detect clients that are passing unauthorized traffic over the network.
This traffic is blocked at the firewall, and the client is prevented from communicating
over the network.
-
The team uses a custom client agent monitoring tool to examine ISA Server proxy
server logs, and to identify and categorize network traffic by the executable file
sending the traffic. This action enables traffic associated with malicious software
(malware) to be identified quickly, in addition to providing insight into policy
violations and suspicious data transfers.
-
The team uses a network-based intrusion detection system (NIDS) to monitor internal
networks and raise an alert when suspicious activity is detected.
-
The team uses Audit Collection Services (ACS) to monitor security events and to
detect possible violations of corporate policy. Audit collection will be generally
available in the next version of Microsoft Operations Manager, Microsoft System
Center Operations Manager 2007.
-
Monitoring of events related to antivirus software helps to detect patterns of events
related to direct malicious attacks, including keyloggers, remote control agents,
and rootkits.
-
The team uses a third-party correlation system to aggregate multiple information
feeds, including ISA Server proxy server logs, NIDS, client agent monitoring, and
ACS. This aggregated data is then automatically correlated against behavior patterns
that correspond to known threats. This correlation enables the team to detect security
threats with a high degree of confidence.
Responding to security vulnerabilities when they are discovered is an important
part of the network security team's job. Whenever a new software update is announced,
the team identifies any vulnerabilities that the update addresses. The team then
implements a process to monitor and mitigate any exploits that attempt to take advantage
of these vulnerabilities.
Introduction
The Microsoft enterprise is large, complex, and constantly changing, with more than
120,000 users and 300,000 devices at 400 sites worldwide. The size and diversity
of the computing environment create unique challenges to network security. The Microsoft
network security team is responsible for monitoring and responding to threats initiated
from sources inside and outside the corporate firewall, including malicious software
and violations of corporate policy. When resources are compromised, the results
can include lost productivity, privacy breaches, theft of intellectual property,
and insertion and modification of digital assets, in addition to legal costs and
damaged credibility.
In the past, Microsoft took a reactive approach to network security. The primary
activities of the network security team were responding to intrusions and carrying
out investigations when intrusions occurred. The team spent a high proportion of
its time responding to security threats. The team recognized that if it developed
a more proactive threat detection strategy, it would be able to better prepare for
security threats and respond to them more quickly.
To achieve this goal, the network security team is constantly assessing risks and
threats, and monitoring changes to the network infrastructure, product vulnerabilities,
and emerging attack vectors. This work enables the team to create strategies and
plans for detection of threats and protection of the network. The team successfully
validated this approach for the Microsoft network, put NIDS into production, and
deployed an event correlation engine.
This system enables the network security team to respond to security threats in
near real time. This paper describes the tools and processes used to monitor security
events on the Microsoft network and provides some of the lessons learned related
to this effort.
Planning
As its network security strategy has evolved, Microsoft has gained valuable experience
that can help other organizations plan their network security strategies. A successful
monitoring and response strategy requires careful planning to meet the needs of
the organization that it serves. The monitoring system and response processes are
aligned not only with infrastructure requirements, but also with the organization's
policies and legal requirements.
A necessary precursor to effective event monitoring and response is to reduce the
attack surface as much as possible. An organization should ensure that computers
receive all available security updates and are properly configured to maintain a
level of security that the organization considers reasonable. The organization should
implement some form of antivirus and antispyware solution to protect against attacks.
Processes and personnel should be in place to ensure that updates and virus signatures
are kept up to date and that IT-mandated computer configurations are maintained.
This strategy provides a good basis from which to evolve a solution for event monitoring
and response.
Just ensuring that the environment is up to date and correctly configured does not
ensure security. For example, a well-maintained environment does not necessarily
provide protection against new and evolving threats, such as worms delivered through
instant messaging, or inside threats. An organization should take these types of
threats into account throughout the planning process.
Each environment is different. However, based on the network security team's experience,
the following planning process helps address the priorities for planning a successful
strategy for monitoring and response:
-
Identify assets.
-
Determine how to protect assets.
-
Review legal policies and requirements.
-
Prioritize assets.
-
Determine who will monitor the assets.
-
Establish a response framework.
The following sections describe these areas in detail.
Identify Assets
First, an organization should determine and inventory the assets that must be protected.
Typically, these assets include servers, clients, and corporate information, including
confidential records and intellectual property. The organization should determine
where these assets are located and what their current security situation is.
This stage can be very informative. In some cases, the organization may determine
that assets are too disaggregated to be protected properly, in which case some reorganization
of assets may be required. This stage may also identify good candidates for server
consolidation, which can both reduce costs and simplify security requirements.
Microsoft Information Technology (Microsoft IT) has an internal initiative called
Least Privilege Access (LPA) to provide ongoing guidance to business units and product
groups to maintain the security of assets. LPA mandates that employees, vendors,
and partners have access only to the resources that they need to do their work,
and no more.
Determine How to Protect Assets
After assets have been identified, the organization must determine how those assets
will be protected. Assets should be classified according to the potential business
impact of a threat against them. This classification can be used to determine who
should be able to access those assets and what level of security protection they
require.
Review Legal Policies and Requirements
Reviewing the legal requirements for the monitoring system is an essential step
in determining a monitoring architecture. Privacy and telecommunications laws are
placing stricter requirements on how organizations monitor and store security event
data and other information. Regulatory compliance is an important part of a corporate
policy for event monitoring, and it is only becoming more important as regulation
increases. Corporate policies for record retention, security event auditing, and
monitoring of corporate policy violations all affect an organization's monitoring
infrastructure and are closely tied to legal needs. International organizations
need to take special heed of differences in regulations between countries. In some
cases, it may be necessary to provide separate monitoring for locations that have
different monitoring requirements due to differing privacy laws.
Prioritize Assets
To create an effective monitoring architecture, the organization must determine
its priorities at the corporate level regarding the importance of the assets.
Resources allocated to processing and storing security event data must be capable
of scaling to fit the size and growth of the network as well as data streams being
captured and any unusual traffic that events and users generate. These requirements
have to be balanced for the monitoring architecture to be effective. Some of the
considerations to be determined at this stage are:
-
Asset importance. An organization's most critical assets require the highest
level of protection. Conversely, a relatively low level of protection may be acceptable
on some assets in cases where low-level systems cannot interact with high-value
assets.
-
Scalability. An organization's monitoring system must be capable of growing
as its network grows. The systems used to store monitoring and audit data must also
be scalable.
-
Data volume. Monitoring can produce a high volume of data. This data must
be managed effectively for monitoring staff to be able to make effective use of
it and to respond to events in a timely manner.
-
Data storage. Determining what data will be stored and for how long determines
the amount of data storage infrastructure required.
Determine Who Will Monitor the Assets
The organization should determine who will be responsible for monitoring assets.
This effort includes determining the number and skill level of personnel required.
It is important to ensure freedom from conflict of interest when creating a network
security function. The team must be independent from business units, and digital
assets being monitored to ensure objectivity and accountability. A successful network
monitoring team will consist of highly trained and experienced staff.
The team should include deep knowledge of emerging threats in the areas of hacker
methodology, malware distribution and behavior, and security vulnerabilities commonly
used to accomplish compromise. Steep learning curves mean that it may take time
to bring new staff up to full productivity.
Establish a Response Framework
Finally, the organization needs to develop a response framework. This framework
is a set of defined roles, processes, and procedures for determining the appropriate
response to an incident. This framework varies from organization to organization
but typically includes:
-
An incident response plan.
-
Contact information and details for personnel to be notified, and criteria for that
notification in case of an incident.
-
Procedures to help assess, contain, and remediate security incidents.
Monitoring Goals at Microsoft
The goal of the network security team at Microsoft is to protect the integrity and
security of key assets, as well as to maintain the productivity of users by protecting
systems from disruption.
Key assets at Microsoft include, but are not limited to, host devices, client devices,
and servers. Digital assets include classifications like personally identifiable
information, intellectual property, and confidential data. Monitoring goals provide
context for specific objectives and elements of the detection strategy, response
roles, and processes.
To achieve its mission, the network security team monitors Microsoft assets for
many classes of threats, including:
-
The presence of malicious software, including viruses, Trojan horses, and worms.
-
Denial of service (DoS) attacks in progress, including those caused by malicious
software and accidental DoS attacks caused by misconfigured software or hardware.
-
Hacker intrusions and attempted hacker intrusions.
-
Unauthorized transmission of intellectual property and theft of intellectual property.
-
Accidental or purposeful leaks of information.
-
Violations of corporate policy, including mishandling intellectual property, violating
security policy, and browsing unauthorized Web sites.
In addition, the network security team actively monitors the network for unusual
behavior and for traffic related to new applications that have not been encountered
before. Investigating and researching unknown executable files or unusual network
behavior are an important part of the network security team's job. By investigating
these unknowns, the team is able to gain an understanding of the agent to determine
the risk and threat that the executable file represents, and to determine the appropriate
treatment upon detection. Anomalies typically have completely benign causes, but
in some cases, they may indicate a new type of malicious code.
Microsoft Environment
No single architecture for event monitoring and response is going to be right for
every organization. An event monitoring solution must be designed to meet the specific
needs of the environment where it is implemented. The Microsoft environment is highly
organic and includes tens of thousands of employees, servers, and clients. Development,
testing, and production servers all run in the same environment. Monitoring this
environment effectively is a significant challenge. Several unique characteristics
of the Microsoft environment drive the monitoring strategy that the network security
team uses, including the following:
-
The size of the user population and the high proportion of knowledgeable users
-
The presence of a large developer population with many applications under development
and active testing environments
-
A wide range of assets to be protected
-
The presence of both managed and unmanaged clients on the network
User Population
Microsoft has a very large, knowledgeable, global user population, with more than
120,000 users using the network on a regular basis. The bulk of these users access
the Internet every day.
Developer Population
Microsoft has a large, active developer population. This fact creates additional
challenges, because a number of experimental, beta, and custom applications appear
on the network. Developers also work on diverse platforms, including both Microsoft
and non-Microsoft operating systems. Applications under development, test, and research
may generate unusual network traffic, or they may trigger security events that suggest
or appear as a threat. The presence of many labs and testing environments, including
labs where stress testing occurs, creates network traffic that can be difficult
to distinguish from traffic related to viruses or worms.
Company Assets
Microsoft must protect a wide variety of assets. Critical assets, such personally
identifiable information, intellectual property, and confidential data, require
careful security and maintenance. The importance and sensitivity of servers can
vary greatly from server to server, even within the same server role. Microsoft
also has a large number of line-of-business applications that handle sensitive and
confidential employee information running on its network. This mix of assets requires
sophisticated monitoring to ensure that they are all kept as secure as possible.
Managed and Unmanaged Clients
The Microsoft environment includes both managed and unmanaged clients. For purposes
of event monitoring and response, an unmanaged client is a device or host that Microsoft
IT does not manage. An unmanaged device is not domain joined and does not have elevated
security credentials present to allow scanning and access to the device for investigative
processes. Neither antivirus software with real-time monitoring enabled nor the
Microsoft Systems Management Server (SMS) agent is present. The network security
team cannot determine what software is running on an unmanaged client, or the team
cannot enforce a security configuration that is consistent with security policies.
The presence of unmanaged clients is a major reason that Microsoft requires multiple
levels of monitoring. The network security team needs to be able to detect and respond
to threats and violations that originate from unmanaged computers. The team uses
network-based intrusion detection to help detect security threats that originate
from these unmanaged clients.
Monitoring Process
The network security team actively attempts to detect threats and prevent incidents.
This practice enables the team to apply its resources much more efficiently, to
respond in near real time, and to mitigate risk during the event.
At Microsoft, event monitoring occurs at the proxy/firewall servers as well as the
internal network. Information from these sources is combined to produce a unified
data format that is monitored to detect attack patterns.
Microsoft uses the following applications, tools, and features to monitor the event
data:
-
Event correlation system. The network security team uses a third-party correlation
system to collect and parse information from the ISA Server-based servers, client
agent monitoring tool, NIDS, antivirus software, and ACS. This information is aggregated
into a unified data format and correlated with security vulnerabilities to detect
possible security threats.
-
ISA Server. Microsoft uses several arrays of proxy/firewall servers running
ISA Server to help protect its network and manage traffic between the network and
the Internet. Microsoft uses the built-in features of ISA Server, as well as some
custom tools, to block badly behaving clients from communicating over the firewall
servers.
-
Client agent monitoring tool. A custom tool called the client agent monitoring
tool collects logs from ISA Server proxy logs. This tool parses the logs and identifies
and categorizes network traffic according to the name of the executable file that
created the traffic. It also records the ports used to communicate and the size
of the traffic transmitted. This tool enables the identification of traffic from
known malicious software.
-
NIDS. The network security team uses NIDS to monitor network activity within
the corporate network. NIDS examines packets passed on the internal network to determine
whether clients are carrying out any activities that correspond to known security
threats.
-
ACS. ACS collects Microsoft Windows® Security Audit logs that relate
to account administration, authorization, and access to corporate resources so that
these events can be monitored for unauthorized activity.
Event Correlation System
The network security team monitors a huge volume of information from multiple sources
to help protect network resources at Microsoft. Although many security threats can
be detected at the firewall, at the network level through NIDS, or at the client
level through the client agent monitoring tool, these tools in isolation do not
provide sufficient protection for the Microsoft network.
Many security threats can be successfully detected only by comparing information
from multiple sources and correlating it to identify patterns of behavior that represent
security threats. This task is too large to be done manually, particularly in an
environment as large as the Microsoft environment. This is one of the reasons that
Microsoft uses a third-party correlation system to automate correlation of multiple
feeds and detection of security threats.
The correlation system acts as a single point of aggregation for multiple information
feeds related to security monitoring. The correlation system unifies these feeds
and places them into a single format for easier analysis, reporting, and threat
correlation.
The correlation system enables collection and analysis of monitoring information
to be centralized and automated. Achieving a comparable level of monitoring without
the correlation system would typically require some sort of solution at each monitoring
point. This approach would likely be prohibitively expensive to implement and maintain.
The feeds that the event correlation system aggregates include:
-
ACS. ACS logs provide information about security activities such as creation
of user accounts and granting of permissions. This information is used primarily
to detect violations of corporate policy.
-
Firewall (ISA Server-based) server. Proxy server logs provide information
about traffic through the proxy servers.
-
Client agent monitoring. The logs produced as the output of the client agent
monitoring tool provide information about network activity associated with the executable
file sending or receiving the information and the category of application that it
represents.
-
Antivirus logs. Logs from the antivirus system provide information about
virus infections and about the current state of antivirus software installed on
clients on the network.
-
NIDS. The NIDS sensor network collects intrusion detection alerts. This network
is deployed across the Microsoft environment in strategic locations, such as digital
asset concentration areas or traffic aggregation points. The alerts reflect the
presence of malicious traffic on the network or evidence of unusual behavior indicative
of possible compromise.
The correlation system processes a huge number of events, more than 2 million every
day. To enable the network security team to make sense of this high volume of information,
the correlation system carries out automated analysis of the logs. The correlation
system uses correlation rules that the information security team created to detect
attack patterns that exploit known vulnerabilities. When an attack pattern is detected,
an alert is raised for the network security team.
An attack typically has a known pattern of events that are executed in sequence.
For example, a worm that attacks a particular vulnerability may use a file transfer
request to download malicious code, and then conduct a port scan to find additional
vulnerable computers, and finally attempt to connect to a vulnerable port. This
kind of behavior may be very difficult to see when the elements are examined in
isolation. For example, a port scan is not itself proof of an attack. So rather
than examine events in isolation, the correlation system checks across all its monitored
feeds for patterns that correspond to known threats. In this example, when the correlation
system determines that these events executed from one client in sequence, it can
identify a threat and generate an alert with a high degree of certainty.
Malicious software evolves very quickly. Names of executable files change as threats
are updated or as new threats appear. One of the great advantages of the correlation
system is that threats can be correlated to attack patterns that correspond to a
vulnerability, rather than to specific units of malicious software. This correlation
provides a high probability of catching future attempts to exploit that vulnerability,
even when those attempts originate from a new piece of malicious software.
ISA Server
Microsoft uses ISA Server on dedicated proxy/firewall arrays to monitor and control
traffic across the network and to the Internet. Microsoft maintains 15 proxy arrays
located in 12 hub sites worldwide. These hubs serve more than 400 Microsoft sites
in total.
Microsoft logs traffic that communicates over the proxy servers. These logs include
a wealth of important information about this traffic, including the originating
IP, the destination IP, and the name of the executable file that initiated the communication.
If the client that is sending the traffic is fully managed, additional information
can be gathered, including the user name of the person logged on to the client and
the name of the client. These logs are stored centrally and consumed by the client
agent monitoring tool and the correlation system.
Detection and Isolation of Badly Behaving Clients
One of the key challenges that the network security team faces is lead generation,
identification, and isolation of badly behaving clients that are sending unauthorized
traffic over the network and to the Internet. These clients may be infected by malicious
software that is probing the network for additional vulnerabilities, or they may
be misconfigured.
The network security team has implemented several measures to address badly behaving
clients proactively through ISA Server. Microsoft uses a mix of ISA Server features
and custom scripts to identify these clients, isolate them, and prevent them from
consuming proxy server resources or accessing any other network resources.
Several other features of ISA Server facilitate the prevention and mitigation of
security threats, including the following:
-
ISA Server can block a specific port on a specific client from communicating over
the corporate network or the Internet. This capability enables the blocking of traffic
that corresponds to known attacks.
-
ISA Server can block a particular executable file, such as a known malicious program,
worm, or peer-to-peer application, from communicating.
-
ISA Server can be used to establish session limits to prevent a client from establishing
a huge number of connections in an attempt to overload the proxy server.
-
The new firewall client in the latest version of ISA Server can block ports based
on a wildcard port assignment. Because some new attacks create random application
names, this feature enables the blocking of those attacks by closing a range of
ports to all traffic.
Detection of Infected Clients
Detecting clients that may be infected with malicious code can be challenging. In
an environment that contains traffic as varied and unpredictable as the Microsoft
network, distinguishing malicious traffic from regular traffic is difficult. For
example, network traffic associated with new products under development, or stress
testing of products, can resemble traffic generated by a DoS attack.
The network security team uses custom scripts to proactively detect infected clients
and prevent them from communicating, a process called ratholing. To detect these
badly behaving clients, a script scans the proxy logs at regular intervals. This
script searches for patterns of behavior that correspond to known attacks.
When the script identifies a badly behaving client, its name and IP address are
added to a central database of known bad clients, and a message alerts the client
owner that a problem exists with the client.
To communicate with a proxy array, a client must complete a three-way handshake
with the proxy server. If the client appears on the list of bad clients, the proxy
server prevents this handshake from occurring. This action effectively prevents
the client from spreading its infection, launching an attack, or consuming proxy
resources.
Every five minutes, every proxy server worldwide requests a list of bad clients
from the central database of listed bad clients. It then compares this list to a
local list of bad clients that the proxy server maintains. If any new clients have
been added to the central list since the check, they are added to the local list.
If any clients have been removed from the central list, they are removed from the
local list. This action ensures that every proxy server on the Microsoft network
blocks newly detected badly behaving clients within five minutes of their discovery.
Often, the first time that a user learns of a problem with one of his or her clients
is when that client ceases to communicate over the network because it has been blocked
at the proxy servers. For this reason, the Microsoft Helpdesk also has access to
the list of bad clients. When a user calls the Helpdesk to determine why his or
her client has lost connectivity, the Helpdesk can quickly determine whether the
client has been ratholed. This capability enables the Helpdesk to respond to user
Helpdesk requests related to infected clients more effectively. After the problem
has been resolved, the Helpdesk has the ability to remove the client from the list
of bad clients.
Future Directions
Microsoft ISA Server 2006 includes a new feature called Flood Resiliency, which
helps protect proxy servers from traffic that badly behaving clients generate. With
Flood Resiliency, ISA Server 2006 can respond to traffic from a badly behaving client
by temporarily terminating its access and generating an alert. Security staff can
then investigate the incident and take an appropriate response. This feature is
currently in pilot within Microsoft and is expected to replace the custom ratholing
scripts currently in use.
Client Agent Monitoring Tool
The network security team monitors logs from its ISA Server proxy/firewall servers
to proactively identify possible security threats. The volume of information that
these logs produce, however, is far too large to be monitored in its raw form. The
logs must be processed in some way that filters out irrelevant data and makes identifying
the most relevant events easier. To achieve this goal, the network security team
uses a custom tool called the client agent monitoring tool.
Proxy server logs at Microsoft produce more than 500 gigabytes (GB) of data per
day. To handle this huge volume, the client agent monitoring tool uses only the
information that the network security team has determined is most useful—in particular,
the names of executable files that are sending information over the network. This
technique trims the amount of data to approximately 20 GB per day.
The tool periodically gathers proxy server logs. It normalizes the logs, removing
information that is not relevant to the monitoring process. In addition to the name
of the executable file, the tool collects identifying information from the client,
such as IP address and current user.
The client agent monitoring tool includes a database of known names of executable
files, organized by category. The tool looks up the name of each executable file
logged in this database. The tool can then categorize traffic by executable file
and prioritize the traffic appropriately. The categories of executable files include:
The tool produces four types of alerts that are fed into the correlation system:
-
Informational: for non-critical events that may be logged for future research and
analysis
-
Notification: for non-critical events that may be of interest to the monitoring
team
-
Warning: for events that may indicate a problem
-
Critical: for outbreaks and serious incidents in progress
Depending on the category of the executable file, the tool may raise an alert. For
example, traffic related to known malicious software indicates an infected client
on the network and demands some response.
When the tool detects suspicious traffic that originates from a client with the
ISA Server proxy client installed, that client is identified along with the currently
logged-on user and the amount of information being transferred. In this case, the
network security team may contact that user or the user's manager to alert him or
her to the problem.
A benefit of this system is that it makes it easy to identify when a new executable
file appears and begins sending traffic on the network. When a new executable file
is encountered, it is marked for further research and classification. The length
of this process can vary. In many cases, Internet research may reveal the identity
of an executable file extremely quickly. In other cases, more research may be required.
After the executable file is identified, it can be classified in the client agent
system. After the executable file is classified, the tool will take the appropriate
action when it encounters the file again.
This approach does have a drawback. Some malicious software is named after Microsoft
executable files. In this case, the way to discover suspicious traffic is by looking
at the executable file's behavior in the context of its behavior on the network.
If this behavior corresponds to a known vulnerability, the file can be flagged for
closer examination.
Because the tool not only identifies traffic by executable file, but also classifies
that executable file, the tool greatly simplifies response actions. For example,
after a piece of software has been identified as a peer-to-peer application, the
response can be formulated as a matter of corporate policy, rather than treating
each peer-to-peer application as an individual case.
The client agent monitoring tool produces a log of traffic classified by executable
file and sends that log to the correlation system.
Besides forwarding its output as a feed to the correlation system, the client agent
monitoring tool stores logs in a Microsoft SQL Server™ database. This data is then
analyzed over time through a custom-built online analytical processing (OLAP) cube
system. This analysis provides additional insight into the health of the network.
For example, analysis may uncover subtle plans of attack that were missed during
routine monitoring. The database also provides a measure of the performance of the
network security team for comparison against goals.
Network-Based Intrusion Detection System
Microsoft uses NIDS inside the corporate firewall to help protect corporate assets.
NIDS examines packets at the network layer and monitors them for behavior that corresponds
to known security threats. When suspicious behavior is detected, an event is raised
to alert the network security team.
Several factors drove the network security team's decision to implement this type
of NIDS solution. These factors include:
-
Monitoring for unmanaged clients. Because NIDS operates at the network level,
it enables the network security team to monitor traffic that originates from unmanaged
clients.
-
Difficulty in enforcing proxy client installation. Microsoft has an extremely
open network. Although corporate policy mandates that the ISA Server proxy/firewall
client is installed and enabled on every client on the corporate network, adherence
to this policy cannot always be guaranteed. It is impossible to ensure that the
ISA Server proxy client or a monitoring agent runs on every client.
-
No performance impact. Because network-based intrusion detection is passive,
it has no impact on the performance of the network.
-
No baseline available for network activity. Many organizations use an intrusion
prevention solution with active blocking to help protect their networks. Although
this is often a valid approach, it requires a network activity baseline. Activity
on the Microsoft internal network is unpredictable and often volatile. The lack
of any network activity baseline rules out many active blocking solutions.
-
Large number of protocols in use. The Microsoft network experiences traffic
in a very high number of protocols. These include new and emerging protocols associated
with products that are still under development. Intrusion protection requires deep
protocol analysis to function. Although a high number of protocols does not rule
out intrusion prevention in principle, it is an important consideration at Microsoft.
The NIDS implementation at Microsoft recognizes and parses more than 100 protocols
natively.
Although the network security team has chosen to implement NIDS for monitoring the
Microsoft network, this choice does not invalidate a host-based intrusion detection
system (HIDS). The team has determined that HIDS may provide improved monitoring
depth for selected assets.
Audit Collection Services
Corporate policy violations represent a serious class of threats that the network
security team must monitor and respond to. Violations of corporate policy can include
unauthorized leaks of confidential information, unauthorized access or transmission
of intellectual property, and attempts to gain unauthorized access to corporate
resources. To facilitate monitoring for compliance with corporate policy, the network
security team uses ACS.
ACS collects, normalizes, and stores security events, providing both real-time and
after-the-fact analysis of event data. Microsoft uses ACS to monitor the activity
of users who have certain rights, particularly where that activity involves granting
or changing access to corporate resources. Among the security events audited are
creation of user accounts, changes in group membership, and changes to security
policies.
ACS monitors systems via an agent installed on the client. The agent forwards events
to a server that acts as an event collector. Rules on the server can raise an alert
based on the events detected. In addition, events are logged to a SQL Server database
for storage. This data is also periodically offloaded to a data warehouse for analysis
and reporting. ACS logs are also provided to the correlation system for correlation
with other security information.
The ACS logs contain important information that relates to security events on the
Microsoft network. These events include changes made to user rights as well as how
rights are used. This information can be used to detect when rights are used or
changed in a way that violates corporate policy. In the event of a policy violation,
these records can also help provide an audit trail to show how the violation occurred.
In addition to providing information for reporting and analysis, historical security
event logs can be important to complying with government regulations. Many governments
require the retention of security event records for a specific period of time.
Microsoft chose to use ACS to monitor policy compliance for several reasons. One
of the primary reasons was the size of the Microsoft environment. The network security
team monitors more than 250 million security events every day worldwide by using
ACS. ACS provides an auditing solution capable of scaling to the extremely high
volume of events that the Microsoft environment generates. In addition, ACS provides
Microsoft with the ability to centralize collection and analysis of audit data,
greatly facilitating analysis and monitoring.
Preparing for Security Vulnerabilities
An important part of the network security team's responsibilities is addressing
security vulnerabilities when those vulnerabilities are discovered. Whenever an
update is created to address a security vulnerability, attacks that seek to exploit
that vulnerability can begin to appear quickly. Responding to new vulnerabilities
before exploits begin to emerge is a key piece of any architecture for detection
and response.
When a new update is announced for Microsoft software, the network security team
carries out the following step-by-step response to vulnerabilities that the update
addresses:
-
The network security team researches the new update by using Microsoft TechNet and
determines exactly what the vulnerability is. The team assesses the risk that this
vulnerability entails in the Microsoft environment according to the number of users
and systems affected. If the risk is significant, the team continues to the next
step in the process.
-
The network security team creates a profile for any threat that seeks to exploit
that vulnerability. This profile typically consists of a sequence of actions that
can be detected through the correlation system, including ports used and error messages
generated when the vulnerability is exploited.
-
A correlation rule is created in the correlation system to detect this exploit,
and it is added to the signature of the correlation system.
-
Because a new and emerging exploit is always a high-priority situation, the correlation
system is configured to page a support engineer whenever an exploit is detected
until the attack surface for this vulnerability is reduced to a degree where the
risk is no longer urgent.
-
When an exploit is detected, the network security team attempts to isolate the exploiting
code as quickly as possible.
-
As an outbreak evolves, the support team examines the information collected during
the outbreak. This information is used to create a script that can scan for and
identify infected clients at the proxy/firewall servers.
-
This script can be run to periodically detect infected computers and block them
from spreading the infection.
This system provides for very rapid response to new vulnerabilities.
Incident Response
When an incident arises that requires a response, the network security team has
several options available. Some incidents can be resolved relatively simply, often
by blocking a problem client and contacting the owner of the client. Other incidents
are more serious and can involve a response by multiple departments and possible
legal action.
When an incident occurs, the first priority of the network security team is to contain
the problem. For example, in the case of network worm outbreak, the response may
be to create a script that can scan for infected clients and adding those clients
to the list of bad clients. This response effectively removes those clients from
the network and minimizes the outbreak. The network security team may also attempt
to contact the owners of infected clients and inform them of the problem.
Microsoft has a sophisticated incident response plan (IRP) that has evolved over
the years, keeping pace as threats emerge and change. This plan uses internal best
practices on roles, processes, procedures, and communications for a coordinated
response to significant network events.
The network security team has engaged users worldwide with training in the IRP.
This training prepares vendors, partners, and Microsoft staff to act as a first
line of defense for the network. As a result, the network security team has assembled
a virtual team of 90,000 users who work together in watching devices, severs, and
the network for signs of compromise. The network security team also invests in localized
training to ensure that the virtual team understands how to participate in security
and response anywhere and in any language.
Lessons Learned
Implementing and maintaining an event monitoring and response system is a complex
task that requires careful attention to the unique characteristics of the environment
and the requirements of the individual company. In implementing its event monitoring
and response system, the Microsoft network security team has learned a number of
valuable lessons. The following sections describe some of the lessons learned.
Assigning Resources
To achieve the goal of an effective resource monitoring and response system, executives
and managers must make the assignment of resources to this task a priority. They
must ensure that the training, tools, and personnel are available to provide security
coverage of assets 24 hours a day, seven days a week. This is a prerequisite to
monitor resources efficiently and responsibly.
Training
To provide the highest level of protection for assets, every user on the network
must have the skills necessary to be a virtual member of a response team. To achieve
this goal, the network security team invests in worldwide training of users. This
training has resulted in a global virtual team of users who partner with the network
security team in detection and response.
Maintaining the System
Tuning the intrusion detection system is an ongoing process that requires active
attention. Attacks evolve extremely quickly. Threats change their patterns, names,
and vectors. New threats appear and must be addressed rapidly. Through careful authoring
of correlation rules, the network security team has been able to identify intrusions
with a high level of confidence. Correlation rules must be refined over time as
threats are better understood, reducing the proportion of false positive results.
Planning to Implement Audit Collection Services
Microsoft uses ACS to monitor for corporate policy violations and to provide a consolidated
audit trail for security events. Planning an ACS implementation requires important
decisions regarding the information that will be logged and where and how long the
information will be stored.
Different departments will likely have different auditing priorities. For example,
the legal department may want the highest level of auditing possible, but the IT
department may see that this will cause a prohibitive drain on resources. The correct
balance will be a function of the individual environment and the organization's
needs.
The depth of auditing that an organization implements has both practical and security
costs. Auditing requires servers and storage. The amount of storage required can
quickly grow quite large depending on the size of the environment, the number of
events audited, and the amount of time the record is kept. Both the network operations
team and the security team at the organization need to be involved in making these
decisions.
Conclusion
The cost of security breaches can be high and can include disruption of productivity,
loss of intellectual property, and compromise of confidential data, in addition
to time and resources dedicated to responding to the breaches. To help protect Microsoft
corporate resources from security threats, the network security team at Microsoft
employs a proactive monitoring strategy that includes multiple monitoring points
and data feeds. Each component of the event monitoring infrastructure plays a particular
role in helping to protect the Microsoft network.
The network security team implements a detection-in-depth approach to information
security. It monitors multiple points on the network by using the following activities
and tools:
-
Monitoring of traffic at the ISA Server firewall proxy servers to detect and block
badly behaving clients
-
Network-based intrusion detection to examine packets at the network level and detect
patterns of behavior that may correspond to security threats
-
A custom client agent monitoring tool that examines proxy logs and detects activity
that originates from known malicious, forbidden, or suspicious executable files
-
ACS to monitor security action for possible breaches of corporate policy
The heart of the event monitoring system at Microsoft is the correlation system.
Correlation aggregates information feeds from each monitoring point into a single
location. This aggregation of feeds enables one technology to correlate threats
across all feeds. To implement a monitoring solution without correlation would typically
require a separate solution for each monitoring point—an approach that is likely
to be cumbersome and prohibitively expensive.
The correlation system also enables monitoring to be highly automated. This automation
is essential, because the monitoring system generates millions of events every day.
Automation enables the meaningful information to be found within this mass of data.
For More Information
For more information about Microsoft products or services, call the Microsoft Sales
Information Center at (800) 426-9400. In Canada, call the Microsoft Canada information
Centre at (800) 563-9048. Outside the 50 United States and Canada, please contact
your local Microsoft subsidiary. To access information through the World Wide Web,
go to:
http://www.microsoft.com
http://www.microsoft.com/technet/itshowcase