An anomaly detection system based on variable N-gram features and one-class SVM

Context: Run-time detection of system anomalies at the host level remains a challenging task. Existing techniques suffer from high rates of false alarms, hindering large-scale deployment of anomaly detection techniques in commercial settings. Objective: To reduce the false alarm rate, we present a new anomaly detection system based on a novel feature extraction technique, which combines the frequency with the temporal information from system call traces, and on one-class support vector machine (OC-SVM) detector.Method: The proposed feature extraction approach starts by segmenting the system call traces into multiple n-grams of variable length and mapping them to fixed-size sparse feature vectors, which are then used to train OC-SVM detectors.Results: The results achieved on a real-world system call dataset show that our feature vectors with up to 6-grams outperform the term vector models (using the most common weighting schemes) proposed in related work. More importantly, our anomaly detection system using OC-SVM with a Gaussian kernel, trained on our feature vectors, achieves a higher-level of detection accuracy (with a lower false alarm rate) than that achieved by Markovian and n-gram based models as well as by the state-of-the-art anomaly detection techniques.Conclusion: The proposed feature extraction approach from traces of events provides new and general data representations that are suitable for training standard one-class machine learning algorithms, while preserving the temporal dependencies among these events.

[1]  Carla Marceau,et al.  Characterizing the behavior of a program using multiple-length N-grams , 2001, NSPW '00.

[2]  Stephanie Forrest,et al.  The Evolution of System-Call Monitoring , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[3]  Jun Xu,et al.  Non-Control-Data Attacks Are Realistic Threats , 2005, USENIX Security Symposium.

[4]  Weibo Gong,et al.  Anomaly detection using call stack information , 2003, 2003 Symposium on Security and Privacy, 2003..

[5]  Stefan Axelsson,et al.  The base-rate fallacy and the difficulty of intrusion detection , 2000, TSEC.

[6]  Debin Gao,et al.  Gray-box extraction of execution graphs for anomaly detection , 2004, CCS '04.

[7]  AxelssonStefan The base-rate fallacy and the difficulty of intrusion detection , 2000 .

[8]  Chun-Hung Richard Lin,et al.  Intrusion detection system: A comprehensive review , 2013, J. Netw. Comput. Appl..

[9]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[10]  V. Rao Vemuri,et al.  Use of K-Nearest Neighbor classifier for intrusion detection , 2002, Comput. Secur..

[11]  Robert Sabourin,et al.  Combining Hidden Markov Models for Improved Anomaly Detection , 2009, 2009 IEEE International Conference on Communications.

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[14]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[15]  David A. Wagner,et al.  Intrusion detection via static analysis , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Fabio Roli,et al.  Adversarial attacks against intrusion detection systems: Taxonomy, solutions and open issues , 2013, Inf. Sci..

[18]  Somesh Jha,et al.  Markov chains, classifiers, and intrusion detection , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[19]  Jiankun Hu,et al.  Host-Based Anomaly Intrusion Detection , 2010, Handbook of Information and Communication Security.

[20]  Salvatore J. Stolfo,et al.  Learning Rules from System Call Arguments and Sequences for Anomaly 20 Detection , 2003 .

[21]  D. Vere-Jones Markov Chains , 1972, Nature.

[22]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[23]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2009, Parallel Comput..

[24]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[25]  S D Walter,et al.  The partial area under the summary ROC curve , 2005, Statistics in medicine.

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Vasant Honavar,et al.  Learning Classifiers for Misuse Detection Using a Bag of System Calls Representation , 2005, ISI.

[28]  Robert H. Deng,et al.  On Detection of Erratic Arguments , 2011, SecureComm.

[29]  Dae-Ki Kang,et al.  Learning classifiers for misuse and anomaly detection using a bag of system calls representation , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[30]  A. Nur Zincir-Heywood,et al.  Mimicry Attacks Demystified: What Can Attackers Do to Evade Detection? , 2008, 2008 Sixth Annual Conference on Privacy, Security and Trust.

[31]  Christopher Krügel,et al.  Bayesian event classification for intrusion detection , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[32]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[33]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[34]  Somesh Jha,et al.  Detecting Manipulated Remote Call Streams , 2002, USENIX Security Symposium.

[35]  Carrie Gates,et al.  Challenging the anomaly detection paradigm: a provocative discussion , 2006, NSPW '06.

[36]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[37]  Konrad Rieck,et al.  A close look on n-grams in intrusion detection: anomaly detection vs. classification , 2013, AISec.

[38]  Niels Provos,et al.  Improving Host Security with System Call Policies , 2003, USENIX Security Symposium.

[39]  Sheng-Hsun Hsu,et al.  Application of SVM and ANN for intrusion detection , 2005, Comput. Oper. Res..

[40]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[41]  Salvatore J. Stolfo,et al.  Casting out Demons: Sanitizing Training Data for Anomaly Sensors , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[42]  R. Sekar,et al.  Dataflow anomaly detection , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[43]  Kymie M. C. Tan,et al.  Undermining an Anomaly-Based Intrusion Detection System Using Common Exploits , 2002, RAID.

[44]  Anup Ghosh,et al.  Simple, state-based approaches to program-based anomaly detection , 2002, TSEC.

[45]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[46]  A. Föhrenbach,et al.  SIMPLE++ , 2000, OR Spectr..

[47]  Debin Gao,et al.  On Gray-Box Program Tracking for Anomaly Detection , 2004, USENIX Security Symposium.

[48]  Kymie M. C. Tan,et al.  "Why 6?" Defining the operational limits of stide, an anomaly-based intrusion detector , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[49]  Somesh Jha,et al.  Efficient Context-Sensitive Intrusion Detection , 2004, NDSS.

[50]  Stefan Axelsson,et al.  Intrusion Detection Systems: A Survey and Taxonomy , 2002 .

[51]  Kuldip K. Paliwal,et al.  Intrusion detection using text processing techniques with a kernel based similarity measure , 2007, Comput. Secur..

[52]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[53]  Jiankun Hu,et al.  A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns , 2014, IEEE Transactions on Computers.

[54]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[55]  R. Sekar,et al.  A practical mimicry attack against powerful system-call monitors , 2008, ASIACCS '08.

[56]  Christopher Krügel,et al.  On the Detection of Anomalous System Call Arguments , 2003, ESORICS.

[57]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[58]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[59]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[60]  Salvatore J. Stolfo,et al.  Using artificial anomalies to detect unknown and known network intrusions , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[61]  Marc Dacier,et al.  Intrusion Detection Using Variable-Length Audit Trail Patterns , 2000, Recent Advances in Intrusion Detection.

[62]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[63]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1986, 1986 IEEE Symposium on Security and Privacy.

[64]  Christopher Krügel,et al.  Automating Mimicry Attacks Using Static Binary Analysis , 2005, USENIX Security Symposium.

[65]  V. Rao Vemuri,et al.  Intrusion Detection Using Text Processing Techniques with a Binary-Weighted Cosine Metric , 2006 .

[66]  Stefano Zanero,et al.  Detecting Intrusions through System Call Sequence and Argument Analysis , 2010, IEEE Transactions on Dependable and Secure Computing.

[67]  Giovanni Vigna,et al.  Exploiting Execution Context for the Detection of Anomalous System Calls , 2007, RAID.

[68]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[69]  Somesh Jha,et al.  Formalizing sensitivity in static analysis for intrusion detection , 2004, IEEE Symposium on Security and Privacy, 2004. Proceedings. 2004.

[70]  Robert Sabourin,et al.  A survey of techniques for incremental learning of HMM parameters , 2012, Inf. Sci..

[71]  R. Sekar,et al.  A fast automaton-based method for detecting anomalous program behaviors , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[72]  John McHugh,et al.  Defending Yourself: The Role of Intrusion Detection Systems , 2000, IEEE Software.