Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems

Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequency in the payload of n-grams. A simple model of normal traffic is then constructed according to this description of the packets' content. It has been demonstrated that anomaly detectors based on payload statistics can be "evaded" by mimicry attacks using byte substitution and padding techniques. In this paper we propose a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade. We propose a new technique to extract the features from the payload. We use a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space. Accuracy and hardness of evasion are obtained by constructing our anomaly-based IDS using an ensemble of one-class SVM classifiers that work on different feature spaces.

[1]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[2]  Fabio Roli,et al.  Fusion of multiple classifiers for intrusion detection in computer networks , 2003, Pattern Recognit. Lett..

[3]  David M. J. Tax,et al.  One-class classification , 2001 .

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[6]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Robert P. W. Duin,et al.  Combining One-Class Classifiers , 2001, Multiple Classifier Systems.

[9]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Worm Detection and Signature Generation , 2005, RAID.

[10]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[11]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[12]  Christopher Krügel,et al.  Service specific anomaly detection for network intrusion detection , 2002, SAC '02.

[13]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[14]  John McHugh,et al.  Defending Yourself: The Role of Intrusion Detection Systems , 2000, IEEE Software.

[15]  Matthew V. Mahoney,et al.  Network traffic anomaly detection based on packet bytes , 2003, SAC '03.

[16]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[19]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[20]  Eric van den Berg,et al.  A Fast Static Analysis Approach to Detect Exploit Code Inside Network Flows , 2005, RAID.

[21]  Christopher Krügel,et al.  Accurate Buffer Overflow Detection via Abstract Payload Execution , 2002, RAID.

[22]  Christopher Krügel,et al.  Automating Mimicry Attacks Using Static Binary Analysis , 2005, USENIX Security Symposium.

[23]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[24]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[25]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[26]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.