4 Security Technologies Every IT Organization Must Have
Matt Clapham and Todd Thompson
At a Glance:
- Risk management dashboard
- Network anomaly detection
When it comes to IT security, most enterprises really have roughly the same issues to deal with. Microsoft is no exception. We spent two years on the Risk Management and Compliance team in
Microsoft® Managed Solutions (MMS). (See "Microsoft Gives Energizer a Recharge for Its IT Division" for more information about MMS.)
The Risk Management and Compliance team is charged with defining, monitoring, and correcting the risk posture of all MMS environments (for both customer-facing services and infrastructure coordination). Early on, our manager, Arjuna Shunn, recognized that we needed a technology solution that provided the desired controls and monitoring in a centralized, cohesive fashion. The technologies we’ll discuss here are a direct result of our early ideas, coupled with two years of experience using various Microsoft and third-party products in our operations.
First off, we needed security technologies that would cover the three primary control types—preventive, detective, and corrective—as well as provide auditing and reporting. We saw this collection of tools breaking down into four categories: risk management dashboard, anti-malware, network anomaly detection, and desired configuration management. We have tried to include at least one representative of each in our risk management operations. And we’ve found that by taking advantage of technologies from each of the four areas, the IT security team can achieve a reasonable balance between cost and effectiveness.
Our two primary risk management operations—security incident response and compliance management—have benefitted greatly from this approach, but we still have a way to go in achieving the coordination we ultimately desire amongst the tools. A cohesive set of technologies can offer so much more in the way of operational efficiency, but sadly the industry doesn’t yet have that integrated system.
Fortunately, all is not lost for IT security teams; it’s simply a matter of time before the four systems start working together and interoperating for a greater effect. Once these systems all work together, they’ll not only allow for active monitoring of system security posture, but they’ll also come in handy during audits or other routine IT operations. In this article, we describe the ideal function of each system, interjecting some examples from our runtime usage.
Risk Management Dashboard
In our opinion, a risk management dashboard (RMD) is absolutely essential. It is the single most important technology to the operation of an IT security team. Confidentiality, integrity, availability, and accountability (CIA2) risks in an enterprise are often monitored by disparate systems and processes with no single interface for data aggregation, correlation, and risk remediation. Additionally, regulatory requirements specify increasingly difficult levels of enterprise data transparency, and there is no streamlined system to readily track policy from creation to enterprise execution. This is evidenced by common enterprise difficulties in data acquisition, correlation, assessment, remediation, and compliance. (While not a complete RMD solution as we outline here, System Center Operations Manager 2007, as shown in Figure 1, does provide a single interface for monitoring multiple resources and gathering related alerts.)
Figure 1 System Center Operations Manager 2007 provides a single interface for viewing alerts and managing resources from across the network (Click the image for a larger view)
Data acquisition is hampered by an inability to aggregate and normalize data from disparate sources. Data aggregation in and of itself is challenging, as it requires breaking out of the all too common siloed approach to gathering and reporting data. Even where data aggregation is accomplished, normalization continues to pose an even bigger challenge because it is extremely difficult to establish the common framework needed to support the normalization of data. Without this normalization, it’s impossible to compare security and health-related events coming from different systems in a meaningful way.
To perform the needed automation, the risk management dashboard must have access to data feeds from sources other than the security technologies described here. A lot of non-security data can be used for determining overall risk posture. Information like router logs, asset tracking, patch status, currently logged on users, performance reporting, and change management data can all provide relevant information to the incident investigator. Thus, the overall system needs access to all of this data. We’re aware that even the most Microsoft-centric enterprise infrastructures include non-Microsoft technologies, so the RMD needs to accept feeds from non-Microsoft technologies via some common interface.
If data can be acquired and normalized, the next step is to correlate it. The goal is to correlate a sequence of events—such as a network anomaly event, an antivirus reporting event, and a desired configuration variance event—into an actionable piece of risk-related data. This requires intensive manual work in order to build the correlation logic that will result in meaningful risk-based alerting. Based on the technologies that are currently available, this correlation logic is, at best, difficult to achieve.
Assessing the data also requires intensive manual work. In the case where correlation occurs, an analyst must still look at the correlated data to determine its efficacy. The efficacy of the correlated data is only as strong as the rules upon which it is based. This leads to further human analysis of correlated data to assess the risk to the IT environment. A clear and codified risk management framework could help here to reduce the amount of manual intervention required to reach a triage level. Without this framework, developing a practical set of correlation rules in a given implementation is difficult.
Assuming a system has reached the point of assessing risk, the next step is automated remediation. Today, this is only really available in the enterprise, and even then it is only done for a limited set of technologies, such as patch management or antivirus. Automated remediation that is spearheaded by the RMD system will provide the IT admin with a tremendous ability to maintain an acceptable level of IT risk. When multiple risks are identified simultaneously, the correlation logic should be able to help prioritize the incident response based on asset classification data.
Finally, providing evidence of compliance with various regulatory requirements is a giant challenge for IT departments and, as we’ve mentioned, this system should support such functionality. A centralized risk management system tied into policy could be a great asset in producing reports and evidence of compliance. But the policy needs to be more than just the wheres and whys. To facilitate automated remediation, the policy needs to be translated into a set of standards that the dashboard can monitor, enforce, and provide feedback on.
So we can sum up the ideal risk management dashboard as a way to provide a unified interface for enterprise health assessment, regulatory and policy compliance, and risk management processes. This is achieved by combining information from disparate security and health-related products and sources into one cohesive risk-related display. A good RMD solution needs to do the following:
- Aggregate, normalize, and correlate data from disparate sources
- Provide automated information gathering and risk assessment
- Map regulatory requirements to policies and support auditing and reporting
- Provide a unified framework that can be modified to fit an enterprise’s needs
The dashboard should allow data to be organized and displayed with meaning, showing, for example, top incidents by type or source system, pending incidents, resolved incidents, and so on. It should also let you drill into each report line item for more detail. When looking at an event, the user should have easy access to any and all related data. This will improve decisions and allow them to be made faster.
We should point out that the RMD is also useful to administrators not on the security team. Since the dashboard encompasses a holistic view of the environment, the RMD can act as a central point for all staff to view current status. The dashboard may, for example, alert the messaging team about a denial of service attack at the SMTP gateway. While this is a security incident to the risk management team, the messaging team will see it as an availability incident. Though the messaging team may not be responsible for fielding and resolving such an incident, they will at least want to be aware of incidents like this that affect the assets the team manages.
A comprehensive anti-malware system is important for protecting your infrastructure against unforeseen threats hiding in code and user actions. Currently, there are generally two separate types of tools to protect against malware: antivirus and anti-spyware. (Windows® Defender, for example, shown in Figure 2, falls into the latter category.) Both effectively prevent, detect, and correct different types of infection. However, it’s only a matter of time before these two types of protection are unified into a single solution and there will be just one anti-malware stack on a system.
Figure 2 Windows Defender helps protect the client against spyware (Click the image for a larger view)
A thorough anti-malware solution needs to monitor in real-time and periodically scan. It should centrally report known malware (including viruses, spyware, and rootkits) and unknown malware based on typical risky behaviors. Robust anti-malware technology watches all the classic entry points (including the file system, registry, shell, browser, e-mail, and connected applications) via tight integration with the OS and apps.
Additionally, an anti-malware solution needs to cover more than just host security. It needs to watch common messaging and collaboration services where infected files often pass through, such as SharePoint® and instant messaging. It also needs to provide automated prevention by stopping operations (known and suspected) as well as carefully scanning user data to remove things like macro viruses that are hiding in user documents and have not yet infected the system.
It goes without saying that anti-malware is useless without updates. The system must keep its signature and removal systems updated to stay ahead of the latest threats. If an emerging threat appears, the vendor should get additional protections in place before the next zero-day threat breaks. An easy-to-manage anti-malware solution is also centrally configurable and updatable.
Of course, this protection can’t come at the cost of performance. If performance suffers, productivity will suffer. Users may even try to disable the anti-malware solution, leading to no protection at all.
Finally, don’t overlook the importance of auxiliary technologies to assist the anti-malware system. Firewalls, limiting user privilege, and other strategies also improve protection against malicious code and user actions.
Network Anomaly Detection
While anti-malware keeps an eye on systems, network anomaly detection (NAD) monitors the common pathways, watching for well-known indicators of suspicious behavior and reporting this information to the RMD for remediation. (A firewall, for example, as shown in Figure 3, would be included in this category.) Suspicious behaviors could be well-known attack traffic (such as a worm or denial of service) or data that fits a certain pattern (such as U.S. Social Security numbers) being sent via e-mail.
Figure 3 A firewall is an important part of a network anomaly detection solution (Click the image for a larger view)
Despite the best efforts in IT management, any large enterprise network will inevitably encounter an occasional malware incident. NAD can provide an early warning system that can help accelerate remediation. Furthermore, the NAD data-monitoring capabilities, and the ability to identify and stop sensitive information from being leaked, are handy tools in protecting information in an environment concerned about data leaks and regulatory compliance.
As with anti-malware technologies, NAD must constantly adapt to the latest set of threats and sensitive data types or its value is greatly diminished. Additionally, a good NAD system should understand enough about the actual anomalies to minimize the number of false positives being reported. Otherwise, administrators may start to ignore the reports coming from NAD, assuming that each is just another false alarm.
After a bit of tuning or training, the NAD system should be aware of and monitor for typical traffic usage patterns. This is important since new types of malware and other attacks may reveal themselves with a change in usage patterns. Networking equipment plays a significant role in the overall NAD system, as the solution must process data from routers, switches, and firewalls. The NAD alerts can then be processed by the correlation engine in the RMD.
One interesting possibility here is for the network anomaly detectors to be built into the host anti-malware software or firewall, casting a net of protection where all the included computers help watch for and potentially stop attacks before they spread.
Desired Configuration Management
One of the biggest challenges IT departments face in large enterprises is keeping systems configured appropriately. There are many motivations for wanting to maintain system configurations—ease of management, simplifying scalability, ensuring compliance, locking down against various forms of intrusion, and promoting productivity, to name a few. Many of these reasons factor into security.
Desired configuration management is a largely untapped area in most enterprises due to its complexity and startup costs. But studies show that in the long-term, maintaining systems leads to cost savings and improved reliability. A desired configuration monitoring (DCM) solution can help.
A good DCM system should automatically scan the network to assure that new systems are deployed as desired and also verify that established systems remain in compliance. Whether it’s making sure the latest patch is deployed or minimizing the number of users who have domain privilege, a complete DCM solution must configure systems, analyze them, and report how close each configuration is to the ideal. The corrections can be simple (such as patching new systems on the network as they are deployed) or forceful (such as enforcing e-mail client settings for users). The key is to tie them back to the organization’s policies and regulatory stance. (DCM can also assist in malware infection identification and removal, since simple types will be readily spotted in scan results.)
DCM is one of the primary data reporters to the RMD’s correlation engine, and DCM provides important detail about how variant from normal the host in question is. The timeliness of its data can make the difference between a one- or five-alarm security incident so an asset needs to be scanned in a frequency proportional to its value; for instance, a low-sensitivity asset may only be scanned monthly, while a moderate-sensitivity asset is scanned weekly, and a high-sensitivity asset is checked at least once daily.
A DCM setup is also a fundamental part of a good network access protection (NAP) mechanism. This is so the system can verify that all connected systems are configured appropriately and block new or unknown systems until validated. In addition, DCM should look for vulnerabilities of configuration (such as weak access control lists on a share) so admins can take appropriate action.
Don’t assume that DCM only applies to hosts like clients or servers. Networking devices and the core directory itself are candidates for DCM inclusion. If a reasonably well-understood network design is available, it should be trivial for the ideal DCM system to enforce standards for associated devices. Imagine the possibilities! Instead of tweaking router configurations manually, the DCM can handle the implementation. Or say a set of standard Group Policy settings are associated with a particular set of servers or clients. The DCM should monitor them and send an alert if the domain settings change.
The data collected by the DCM should be robust enough to be used in an IT audit. A good DCM must have hooks into change management and regulatory compliance controls so that audit concerns can be addressed in a timely, almost continuous fashion. For example, one of our routine control checks is to verify that the local administrator membership on MMS servers hasn’t changed. Enforcement of this rule is what DCM is made for!
Even if your IT department isn’t ready for a self-correcting DCM system, a variation known as desired configuration monitoring can still be deployed and offers great results. This, too, provides reports and alerts about configuration problems, but the same economies of scale aren’t necessarily achieved since the remediation is largely manual. Importantly, however, the notification and data that the RMD needs will be available for correlation.
One word of caution, though. DCM, in either form that I’ve just described, has to be scoped to just the minimal important configuration items coupled with the standard asset collection data. Otherwise, it can impinge on the flexibility the IT team may need. If the DCM configuration guides become dated, DCM will quickly be considered an operational tax rather than a useful tool. So keeping abreast of the latest thinking in how to configure assets optimally is important. Further, we suggest that the DCM system include regular updates to its configuration cookbooks according to the best practices for the asset or application in question.
Whether you build it in-house or buy it from a reputable vendor, the dashboard is essential, acting as your team’s primary tool for incident response. Anti-malware is also essential, helping protect against the threats that multiply daily. Network anomaly detection is on the verge of changing from just malware signatures and host intrusion detection to include data leakage discovery—and that last function can help prevent the next well-publicized network breach. Desired configuration management, while still new, will soon be a mainstay for monitoring and maintaining configurations. Regardless of who provides these tools, you must have at least one for each of these four categories!
From what we’ve seen so far, no single vendor (Microsoft included) offers a single holistic solution that addresses each of the four spaces. It is up to you find the selection of products that will suit your specific needs. We just want this article to provide insight as to what you need to consider, what you should be looking to achieve, and what the ideal solution will do.
Matt Clapham is a Security Engineer in the Infrastructure and Security group of the Microsoft IT division. Prior to that he worked for two years on the Microsoft Managed Solutions Risk Management and Compliance team.
Todd Thompson is a Security Engineer in the Infrastructure and Security group of the Microsoft IT division. Prior to that he worked for two years on the Microsoft Managed Solutions Risk Management and Compliance team.
© 2008 Microsoft Corporation and CMP Media, LLC. All rights reserved; reproduction in part or in whole without permission is prohibited.