Machine learning techniques for web intrusion detection — A comparison

The rapid development of web applications has created many security problems related to intrusions not just on computer, network systems, but also on web applications themselves. In Web Intrusion Systems (WIS), most techniques used nowadays are not able to deal with the dynamic and complex nature of cyber-attacks on web applications and related issues. However, web intrusion techniques based on machine learning approaches with statistical analysis of data enable autonomous detect intrusive and non-intrusive traffic with low false-positive errors. In this paper, we present the survey of various machine learning techniques used to build WIS. In addition, we develop the experimental framework for comparative analysis of some machine learning techniques applying on the well-known benchmark data set - CSIC 2010 HTTP [13].

[1]  Edward P. K. Tsang,et al.  Simplifying Decision Trees Learned by Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[2]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[3]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[4]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[5]  Carsten Willems,et al.  Learning and Classification of Malware Behavior , 2008, DIMVA.

[6]  Wei-Yang Lin,et al.  Intrusion detection by machine learning: A review , 2009, Expert Syst. Appl..

[7]  Richard A. Kemmerer,et al.  State Transition Analysis: A Rule-Based Intrusion Detection Approach , 1995, IEEE Trans. Software Eng..

[8]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[9]  Nguyen Xuan Hoai,et al.  Generating artificial attack data for intrusion detection using machine learning , 2014, SoICT.

[10]  VanLoi Cao,et al.  A scheme for building a dataset for intrusion detection systems , 2013, 2013 Third World Congress on Information and Communication Technologies (WICT 2013).

[11]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[12]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[13]  A. Agresti,et al.  Approximate is Better than “Exact” for Interval Estimation of Binomial Proportions , 1998 .

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[17]  Guanhua Yan,et al.  Exploring Discriminatory Features for Automated Malware Classification , 2013, DIMVA.

[18]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[19]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[20]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[21]  Harris Drucker,et al.  Boosting Performance in Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..