Host-Based Intrusion Detection System with System Calls

In a contemporary data center, Linux applications often generate a large quantity of real-time system call traces, which are not suitable for traditional host-based intrusion detection systems deployed on every single host. Training data mining models with system calls on a single host that has static computing and storage capacity is time-consuming, and intermediate datasets are not capable of being efficiently handled. It is cumbersome for the maintenance and updating of host-based intrusion detection systems (HIDS) installed on every physical or virtual host, and comprehensive system call analysis can hardly be performed to detect complex and distributed attacks among multiple hosts. Considering these limitations of current system-call-based HIDS, in this article, we provide a review of the development of system-call-based HIDS and future research trends. Algorithms and techniques relevant to system-call-based HIDS are investigated, including feature extraction methods and various data mining algorithms. The HIDS dataset issues are discussed, including currently available datasets with system calls and approaches for researchers to generate new datasets. The application of system-call-based HIDS on current embedded systems is studied, and related works are investigated. Finally, future research trends are forecast regarding three aspects, namely, the reduction of the false-positive rate, the improvement of detection efficiency, and the enhancement of collaborative security.

[1]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[2]  Brian C. Lovell,et al.  Improved estimation of hidden Markov model parameters from multiple observation sequences , 2002, Object recognition supported by user interaction for service robots.

[3]  Tamás Szirányi,et al.  Improved Harris Feature Point Set for Orientation-Sensitive Urban-Area Detection in Aerial Images , 2013, IEEE Geoscience and Remote Sensing Letters.

[4]  Christopher Krügel,et al.  Anomalous system call detection , 2006, TSEC.

[5]  Panos Kampanakis,et al.  Security Automation and Threat Information-Sharing Options , 2014, IEEE Security & Privacy.

[6]  Abdelwahab Hamou-Lhadj,et al.  Total ADS: Automated Software Anomaly Detection System , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[7]  Haifeng Chen,et al.  Multiresolution Abnormal Trace Detection Using Varied-Length $n$-Grams and Automata , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Mehdi Kharrazi,et al.  Back to Static Analysis for Kernel-Level Rootkit Detection , 2014, IEEE Transactions on Information Forensics and Security.

[9]  J. Shane Culpepper,et al.  Efficient and effective realtime prediction of drive-by download attacks , 2014, J. Netw. Comput. Appl..

[10]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: Application of the one-class SVM algorithm to ADFA-LD , 2014, 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[11]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[12]  Abdelwahab Hamou-Lhadj,et al.  Combining heterogeneous anomaly detectors for improved software security , 2017, J. Syst. Softw..

[13]  Jiankun Hu,et al.  A Semantic Approach to Host-Based Intrusion Detection Systems Using Contiguousand Discontiguous System Call Patterns , 2014, IEEE Transactions on Computers.

[14]  M. Abdel-Azim,et al.  Performance analysis of artificial neural network intrusion detection systems , 2009, 2009 International Conference on Electrical and Electronics Engineering - ELECO 2009.

[15]  Kymie M. C. Tan,et al.  "Why 6?" Defining the operational limits of stide, an anomaly-based intrusion detector , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[16]  Gideon Creech,et al.  Developing a high-accuracy cross platform Host-Based Intrusion Detection System capable of reliably detecting zero-day attacks , 2014 .

[17]  Li Dong,et al.  Feature representation and selection in malicious code detection methods based on static system calls , 2011, Comput. Secur..

[18]  Bo Yan,et al.  An Intrusion Detection Approach Based on System Call Sequences and Rules Extraction , 2010, 2010 2nd International Conference on E-business and Information System Security.

[19]  Qi Li,et al.  ANNs on Co-occurrence Matrices for Mobile Malware Detection , 2015, KSII Trans. Internet Inf. Syst..

[20]  Abdelwahab Hamou-Lhadj,et al.  A host-based anomaly detection approach by representing system calls as states of kernel modules , 2013, 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE).

[21]  Huwaida Tagelsir Elshoush,et al.  Alert correlation in collaborative intelligent intrusion detection systems - A survey , 2011, Appl. Soft Comput..

[22]  Jiankun Hu,et al.  Generation of a new IDS test dataset: Time to retire the KDD collection , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[23]  Usman Ahmed,et al.  Host based intrusion detection using RBF neural networks , 2009, 2009 International Conference on Emerging Technologies.

[24]  M. Chuah,et al.  Smartphone Dual Defense Protection Framework: Detecting Malicious Applications in Android Markets , 2012, 2012 8th International Conference on Mobile Ad-hoc and Sensor Networks (MSN).

[25]  Avinash Devare,et al.  A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis , 2016 .

[26]  Gaurav Tandon,et al.  Machine learning for host-based anomaly detection , 2008 .

[27]  Jiankun Hu,et al.  A multi-layer model for anomaly intrusion detection using program sequences of system calls , 2003, The 11th IEEE International Conference on Networks, 2003. ICON2003..

[28]  Daniele Toscani,et al.  Hidden Markov Models for Scenario Generation , 2007 .

[29]  Claudia Eckert,et al.  Nitro: Hardware-Based System Call Tracing for Virtual Machines , 2011, IWSEC.

[30]  Stefano Zanero,et al.  Detecting Intrusions through System Call Sequence and Argument Analysis , 2010, IEEE Transactions on Dependable and Secure Computing.

[31]  L Vokorokos,et al.  Host-based intrusion detection system , 2010, 2010 IEEE 14th International Conference on Intelligent Engineering Systems.

[32]  Ainuddin Wahid Abdul Wahab,et al.  A review on feature selection in mobile malware detection , 2015, Digit. Investig..

[33]  Sahin Albayrak,et al.  An Android Application Sandbox system for suspicious software detection , 2010, 2010 5th International Conference on Malicious and Unwanted Software.

[34]  Taghi M. Khoshgoftaar,et al.  Intrusion detection and Big Heterogeneous Data: a Survey , 2015, Journal of Big Data.

[35]  Xuyun Zhang,et al.  Privacy Preservation over Big Data in Cloud Systems , 2014 .

[36]  Stephen D. Wolthusen,et al.  Detecting anomalies in IaaS environments through virtual machine host system call analysis , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  S. Shankar Sastry,et al.  Optimal thresholds for intrusion detection systems , 2016, HotSoS.

[39]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[40]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[41]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[42]  Xinghuo Yu,et al.  Integer Data Zero-Watermark Assisted System Calls Abstraction and Normalization for Host Based Anomaly Detection Systems , 2015, 2015 IEEE 2nd International Conference on Cyber Security and Cloud Computing.

[43]  Nouman Azam,et al.  A three-way decision making approach to malware analysis using probabilistic rough sets , 2016, Inf. Sci..

[44]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[45]  Matei A. Zaharia,et al.  An Architecture for and Fast and General Data Processing on Large Clusters , 2016 .

[46]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[47]  Xiangjian He,et al.  Detection of Denial-of-Service Attacks Based on Computer Vision Techniques , 2015, IEEE Transactions on Computers.

[48]  Weihua Sheng,et al.  Human action recognition with contextual constraints using a RGB-D sensor , 2013, 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[49]  EMMANOUIL VASILOMANOLAKIS,et al.  Taxonomy and Survey of Collaborative Intrusion Detection , 2015, ACM Comput. Surv..

[50]  Christopher Krügel,et al.  A quantitative study of accuracy in system call-based malware detection , 2012, ISSTA 2012.

[51]  Wei Zhang,et al.  Semantics-Based Online Malware Detection: Towards Efficient Real-Time Protection Against Malware , 2016, IEEE Transactions on Information Forensics and Security.

[52]  Abdelwahab Hamou-Lhadj,et al.  A trace abstraction approach for host-based anomaly detection , 2015, 2015 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[53]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[54]  Bhavani M. Thuraisingham,et al.  Online anomaly detection for multi‐source VMware using a distributed streaming framework , 2016, Softw. Pract. Exp..

[55]  Jie Xu,et al.  A novel intrusion severity analysis approach for Clouds , 2013, Future Gener. Comput. Syst..

[56]  Xudong Ma,et al.  Dynamic Android Malware Classification Using Graph-Based Representations , 2016, 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud).

[57]  Jiankun Hu,et al.  Towards reliable data feature retrieval and decision engine in host-based anomaly detection systems , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[58]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[59]  Samuel Kounev,et al.  Evaluating Computer Intrusion Detection Systems , 2015, ACM Comput. Surv..

[60]  Yuxin Ding,et al.  Host-based intrusion detection using dynamic and static behavioral models , 2003, Pattern Recognit..

[61]  Xiangjian He,et al.  A System for Denial-of-Service Attack Detection Based on Multivariate Correlation Analysis , 2014, IEEE Transactions on Parallel and Distributed Systems.

[62]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[63]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[64]  Ye Yuan,et al.  Encode, Review, and Decode: Reviewer Module for Caption Generation , 2016, ArXiv.

[65]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[66]  Yi Yang,et al.  Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Qing Wu,et al.  Self-structured confabulation network for fast anomaly detection and reasoning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[68]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[69]  Jinjun Chen,et al.  A Time Efficient Approach for Detecting Errors in Big Sensor Data on Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[70]  T.Y. Lin,et al.  Anomaly detection , 1994, Proceedings New Security Paradigms Workshop.

[71]  Xinghuo Yu,et al.  Detecting Anomalous Behavior in Cloud Servers by Nested-Arc Hidden SEMI-Markov Model with State Summarization , 2019, IEEE Transactions on Big Data.

[72]  Sameesha Vs A Scalable Two Phase Top Down Specialization Approach For Data Anonymization Using Mapreduce On Cloud , 2017 .

[73]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[74]  David Hutchison,et al.  Malware Detection in Cloud Computing Infrastructures , 2016, IEEE Transactions on Dependable and Secure Computing.

[75]  Michio Sugeno,et al.  A fuzzy-logic-based approach to qualitative modeling , 1993, IEEE Trans. Fuzzy Syst..

[76]  Parmeet Kaur,et al.  Resource provisioning and work flow scheduling in clouds using augmented Shuffled Frog Leaping Algorithm , 2017, J. Parallel Distributed Comput..

[77]  Philip K. Chan,et al.  Learning Patterns from Unix Process Execution Traces for Intrusion Detection , 1997 .

[78]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[79]  Chao Lan,et al.  Anomaly Detection , 2018, Encyclopedia of GIS.

[80]  Philip K. Chan,et al.  Learning rules for anomaly detection of hostile network traffic , 2003, Third IEEE International Conference on Data Mining.

[81]  Jiankun Hu,et al.  A program-based anomaly intrusion detection scheme using multiple detection engines and fuzzy inference , 2009, J. Netw. Comput. Appl..

[82]  Rasna R. Walia Sequence-based prediction of RNA-protein interactions , 2014 .

[83]  Govind P. Gupta,et al.  Performance analysis of network intrusion detection schemes using Apache Spark , 2016, 2016 International Conference on Communication and Signal Processing (ICCSP).

[84]  Xinghuo Yu,et al.  Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD , 2014, NSS.

[85]  Jiankun Hu,et al.  Host-Based Anomaly Intrusion Detection , 2010, Handbook of Information and Communication Security.

[86]  Ye Yuan,et al.  Review Networks for Caption Generation , 2016, NIPS.

[87]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[88]  Jiankun Hu,et al.  Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling , 2017, J. Netw. Comput. Appl..

[89]  Xinghuo Yu,et al.  A simple and efficient hidden Markov model scheme for host-based anomaly intrusion detection , 2009, IEEE Network.

[90]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[91]  Vijay Kumar Jha,et al.  Data Mining in Intrusion Detection: A Comparative Study of Methods, Types and Data Sets , 2013 .

[92]  Michael I. Jordan,et al.  SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[93]  Padam Kumar,et al.  An Immediate System Call Sequence Based Approach for Detecting Malicious Program Executions in Cloud Environment , 2015, Wirel. Pers. Commun..

[94]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[95]  Yiguo Qiao,et al.  Anomaly intrusion detection method based on HMM , 2002 .

[96]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[97]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[98]  Stephen D. Wolthusen,et al.  Anomaly Detection for Ephemeral Cloud IaaS Virtual Machines , 2013, NSS.

[99]  Xiangjian He,et al.  Enhancing Big Data Security with Collaborative Intrusion Detection , 2014, IEEE Cloud Computing.

[100]  Bing Wang,et al.  Manilyzer: Automated Android Malware Detection through Manifest Analysis , 2014, 2014 IEEE 11th International Conference on Mobile Ad Hoc and Sensor Systems.

[101]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[102]  Jiankun Hu,et al.  Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks , 2016, Future Internet.

[103]  Simin Nadjm-Tehrani,et al.  Crowdroid: behavior-based malware detection system for Android , 2011, SPSM '11.

[104]  Bhavani M. Thuraisingham,et al.  Statistical technique for online anomaly detection using Spark over heterogeneous data from multi-source VMware performance data , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[105]  M.I. Heywood,et al.  Host-based intrusion detection using self-organizing maps , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[106]  Michael Schatz,et al.  Learning Program Behavior Profiles for Intrusion Detection , 1999, Workshop on Intrusion Detection and Network Monitoring.

[107]  Stephanie Forrest,et al.  The Evolution of System-Call Monitoring , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[108]  Jinjun Chen,et al.  A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud , 2017, IEEE Transactions on Knowledge and Data Engineering.

[109]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[110]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[111]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[112]  Jian Pei,et al.  A spatiotemporal compression based approach for efficient big data processing on Cloud , 2014, J. Comput. Syst. Sci..

[113]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[114]  Mohiuddin Ahmed,et al.  A survey of network anomaly detection techniques , 2016, J. Netw. Comput. Appl..

[115]  Stephanie Forrest,et al.  Intrusion Detection Using Sequences of System Calls , 1998, J. Comput. Secur..

[116]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[117]  Jiankun Hu,et al.  Evaluating host-based anomaly detection systems: A preliminary analysis of ADFA-LD , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[118]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[119]  Jian Pei,et al.  Proximity-Aware Local-Recoding Anonymization with MapReduce for Scalable Big Data Privacy Preservation in Cloud , 2015, IEEE Transactions on Computers.

[120]  Jinjun Chen,et al.  A hybrid approach for scalable sub-tree anonymization over big data using MapReduce on cloud , 2014, J. Comput. Syst. Sci..

[121]  Qing Wu,et al.  AnRAD: A Neuromorphic Anomaly Detection Framework for Massive Concurrent Data Streams , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[122]  Mark Stamp,et al.  Handbook of Information and Communication Security , 2010, Handbook of Information and Communication Security.

[123]  Deepak Puthal,et al.  Big-Sensing-Data Curation for the Cloud is Coming: A Promise of Scalable Cloud-Data-Center Mitigation for Next-Generation IoT and Wireless Sensor Networks , 2017, IEEE Consumer Electronics Magazine.

[124]  François Charpillet,et al.  A Multi-HMM Approach to ECG Segmentation , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[125]  Robert Sabourin,et al.  Combining Hidden Markov Models for Improved Anomaly Detection , 2009, 2009 IEEE International Conference on Communications.