Network payload-based anomaly detection and content-based alert correlation

Every computer on the Internet nowadays is a potential target for a new attack at any moment. The pervasive use of signature-based anti-virus scanners and misuse detection Intrusion Detection Systems have failed to provide adequate protection against a constant barrage of "zero-day" attacks. Such attacks may cause denial-of-service, system crashes, or information theft resulting in the loss of critical information. In this thesis, we consider the problem of detecting these "zero-day" intrusions quickly and accurately upon their very first appearance. Most current Network Intrusion Detection Systems (NIDS) use simple features, like packet headers and derived statistics describing connections and sessions (packet rates, bytes transferred, etc.) to detect unusual events that indicate a system is likely under attack. These approaches, however, are blind to the content of the packet stream, and in particular, the packet content delivered to a service that contains the data and code that exploits the vulnerable application software. We conjecture that fast and efficient detectors that focus on network packet content anomaly detection will improve defenses and identify zero-day attacks far more accurately than approaches that consider only header information. We therefore present two payload-based anomaly detectors, PAYL and Anagram, for intrusion detection. They are designed to detect attacks that are otherwise normal connections except that the packets carry bad (anomalous) content indicative of a new exploit. These payload-based anomaly sensors can augment other sensors and enrich the view of network traffic to detect malicious events. Both PAYL and Anagram create models of site-specific normal network application payload as n-grams in a fully automatic, unsupervised and very efficient fashion. PAYL computes, during a training phase, a profile of byte (1-gram ) frequency distribution and their standard deviation of the application payload flowing to a single host and port. PAYL produces a very fine-grained model that is conditioned on payload length. Anagram models high-order n-grams (n > 1) which capture the sequential information between bytes. We experimentally demonstrate that both of these sensors are capable of detecting new attacks with high accuracy and low false positive rates. Furthermore, in order to detect the very early onset of a worm attack, we designed an ingress/egress correlation function that is built in the sensors to quickly identify the worms' initial propagation. The sensors are also designed to generate robust signatures of validated malicious packet content. The technique does not depend upon the detection of probing or scanning behavior or the prevalence of common probe payload, so it is especially useful for the detection of slow and stealthy worms. An often-cited weakness of anomaly detection systems is that they suffer from "mimicry attack": clever adversaries may craft attacks that appear normal to an anomaly detector and hence will go unnoticed as a false negative. A mimicry attack against a site defended by a content-based anomaly detector can be executed by an attacker by sniffing the target site's traffic flow, modeling the byte distributions of that flow, and blending their exploit with "normal" appearing byte padding. To defend against such attacks, we further propose the techniques of randomized modeling and randomized testing. Under randomized modeling/testing, each sensor will randomly partition the payload into several subsequences, each of whom are modeled/tested separately, thus building a "model/test diversity" on each host that is unknown to the mimicry attacker. This raises the bar for the attackers as they have no means to know how and where to pad the exploit code to appear normal within each randomly computed partition, even if they have the global knowledge of the target site's content flow. Finally, PAYL/Anagram's speed andhigh detection rate makes it valuable not only as a stand-alone network-based sensor, but also as a host-based data-flow classifier in an instrumented, fault-tolerant host-based environment; this enables significant cost amortization and the possibility of a "symbiotic" feedback loop that can improve accuracy and reduce false positive rates over time. Besides building stand-alone anomaly sensors, we also demonstrate a collaborative security strategy whereby different hosts may exchange payload alerts to increase the accuracy of the local sensor and reduce false positives. We propose and examine several new approaches to enable the sharing of suspicious payloads via privacy-preserving technologies. We detail the work we have done with our PAYL and Anagram, to support generalized payload correlation and signature generation without releasing identifiable payload data. The important principle demonstrated is that correlation of multiple alerts can identify true positives from the set of anomaly alerts, reducing incorrect decisions and producing accurate mitigation against zero-day attacks. A new wave of cleverly crafted polymorphic attacks has substantially complicated the task of automatically generating "string-based" signatures to filter newly discovered zero-day attacks. Although the payload anomaly detection techniques we present are able to detect these attacks, correlating the individual packet content delivering distinct instances of the same polymorphic attack are shown to have limited value, requiring new approaches for generating robust signatures.

[1]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[2]  Helen J. Wang,et al.  Privacy-Preserving Friends Troubleshooting Network , 2005, NDSS.

[3]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[4]  Peter Szor,et al.  HUNTING FOR METAMORPHIC , 2001 .

[5]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[6]  Angelos D. Keromytis,et al.  Application communities: using monoculture for dependability , 2005 .

[7]  Philip K. Chan,et al.  Learning nonstationary models of normal network traffic for detecting novel attacks , 2002, KDD.

[8]  Somesh Jha,et al.  Global Intrusion Detection in the DOMINO Overlay System , 2004, NDSS.

[9]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[10]  Christopher Krügel,et al.  Polymorphic Worm Detection Using Structural Information of Executables , 2005, RAID.

[11]  Yong Tang,et al.  Defending against Internet worms: a signature-based approach , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[12]  Salvatore J. Stolfo,et al.  Anomaly Detection in Computer Security and an Application to File System Accesses , 2005, ISMIS.

[13]  Carla Marceau,et al.  Characterizing the behavior of a program using multiple-length N-grams , 2001, NSPW '00.

[14]  David Brumley,et al.  Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software , 2006, NDSS.

[15]  Salvatore J. Stolfo,et al.  A framework for constructing features and models for intrusion detection systems , 2000, TSEC.

[16]  Adi Shamir,et al.  Playing "Hide and Seek" with Stored Keys , 1999, Financial Cryptography.

[17]  Konrad Rieck,et al.  Detecting Unknown Network Attacks Using Language Models , 2006, DIMVA.

[18]  Harold S. Javitz,et al.  The NIDES Statistical Component Description and Justification , 1994 .

[19]  Kymie M. C. Tan,et al.  "Why 6?" Defining the operational limits of stide, an anomaly-based intrusion detector , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[20]  R. Sekar,et al.  Specification-based anomaly detection: a new approach for detecting network intrusions , 2002, CCS '02.

[21]  Giovanni Vigna,et al.  NetSTAT: a network-based intrusion detection approach , 1998, Proceedings 14th Annual Computer Security Applications Conference (Cat. No.98EX217).

[22]  Hari Balakrishnan,et al.  Fast portscan detection using sequential hypothesis testing , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[23]  Benny Pinkas,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[24]  Guofei Gu,et al.  Worm detection, early warning and response based on local victim information , 2004, 20th Annual Computer Security Applications Conference.

[25]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[26]  Wenke Lee,et al.  Modeling Botnet Propagation Using Time Zones , 2006, NDSS.

[27]  Vinod Yegneswaran,et al.  Characteristics of internet background radiation , 2004, IMC '04.

[28]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[29]  Zhenkai Liang,et al.  Fast and automated generation of attack signatures: a basis for building self-protecting servers , 2005, CCS '05.

[30]  Wenke Lee,et al.  Misleading worm signature generators using deliberate noise injection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[31]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[32]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[33]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[34]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[35]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[36]  Helen J. Wang,et al.  Shield: vulnerability-driven network filters for preventing known vulnerability exploits , 2004, SIGCOMM 2004.

[37]  Ming-Yang Kao,et al.  Hamsa: fast signature generation for zero-day polymorphic worms with provable attack resilience , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[38]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[39]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[40]  Helen J. Wang,et al.  Applications of secure electronic voting to automated privacy-preserving troubleshooting , 2005, CCS '05.

[41]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[42]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[43]  Farnam Jahanian,et al.  The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets , 2005, SRUTI.

[44]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[45]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[46]  Vern Paxson,et al.  How to Own the Internet in Your Spare Time , 2002, USENIX Security Symposium.

[47]  Angelos D. Keromytis,et al.  Software Self-Healing Using Collaborative Application Communities , 2006, NDSS.

[48]  Vern Paxson,et al.  The top speed of flash worms , 2004, WORM '04.

[49]  Vern Paxson,et al.  Proceedings of the 13th USENIX Security Symposium , 2022 .

[50]  David Moore,et al.  Internet quarantine: requirements for containing self-propagating code , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[51]  Vitaly Shmatikov,et al.  Privacy-Preserving Sharing and Correlation of Security Alerts , 2004, USENIX Security Symposium.

[52]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[53]  George Varghese,et al.  Automated Worm Fingerprinting , 2004, OSDI.

[54]  Peter G. Neumann,et al.  EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances , 1997, CCS 2002.

[55]  Peng Ning,et al.  Privacy-preserving alert correlation: a concept hierarchy based approach , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[56]  Kevin A. Kwiat,et al.  Modeling the spread of active worms , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[57]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[58]  Zhendong Su,et al.  On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits , 2005, CCS '05.

[59]  Christopher Krügel,et al.  Service specific anomaly detection for network intrusion detection , 2002, SAC '02.

[60]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[61]  Wenke Lee,et al.  Advanced Polymorphic Worms: Evading IDS by Blending in with Normal Traffic , 2005 .

[62]  Crispan Cowan,et al.  StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks , 1998, USENIX Security Symposium.

[63]  Angelos D. Keromytis,et al.  Bloodhound: Searching Out Malicious Input in Network Flows for Automatic Repair Validation , 2006 .

[64]  Kymie M. C. Tan,et al.  Undermining an Anomaly-Based Intrusion Detection System Using Common Exploits , 2002, RAID.

[65]  Philip K. Chan,et al.  Learning Models of Network Traffic for Detecting Novel Attacks , 2002 .

[66]  Moni Naor,et al.  Universal one-way hash functions and their cryptographic applications , 1989, STOC '89.

[67]  Salvatore J. Stolfo,et al.  Collaborative Distributed Intrusion Detection , 2004 .

[68]  Salvatore J. Stolfo Worm and Attack Early Warning , 2004, IEEE Secur. Priv..

[69]  Salvatore J. Stolfo,et al.  FLIPS: Hybrid Adaptive Intrusion Prevention , 2005, RAID.

[70]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[71]  Jon Crowcroft,et al.  Honeycomb , 2004, Comput. Commun. Rev..

[72]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[73]  Angelos D. Keromytis,et al.  A Dynamic Mechanism for Recovering from Buffer Overflow Attacks , 2005, ISC.

[74]  Matthew V. Mahoney,et al.  Network traffic anomaly detection based on packet bytes , 2003, SAC '03.

[75]  Carla E. Brodley,et al.  Approaches to Online Learning and Concept Drift for User Identification in Computer Security , 1998, KDD.

[76]  Angelos D. Keromytis,et al.  Building a Reactive Immune System for Software Services , 2005, USENIX Annual Technical Conference, General Track.

[77]  Salvatore J. Stolfo,et al.  Detecting Malicious Software by Monitoring Anomalous Windows Registry Accesses , 2002, RAID.