Applying feature selection to payload-based Web Application Firewalls

Web Application Firewalls (WAFs) analyze the HTTP traffic in order to protect Web applications from attacks. To be effective, WAFs need to analyze the payload of the packets. One of the techniques used for intrusion detection is to extract features from the payload by means of n-grams. An n-gram is a subsequence of n items from a given sequence. The number of n-grams is 256 to the nth power. Since it grows exponentially with n, the curse of dimensionality and computational complexity problem arise. In this paper we propose to apply feature selection in order to reduce the number of features extracted by n-grams and thus to improve the effectiveness of WAFs. We conduct experiments on our own HTTP data set. After extracting n-grams from this data set, we apply the Generic-Feature-Selection (GeFS) measure for intrusion detection [5] to select important features. We use four different classifiers to test the detection accuracy before and after feature selection. The experiments show that we can remove more than 95% of irrelevant and redundant features from the original data set (and thus improve the performance by more than 80% on average), while reducing only slightly (by less than 6%) the accuracy of WAFs.

[1]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[2]  Salvatore J. Stolfo,et al.  Spectrogram: A Mixture-of-Markov-Chains Model for Anomaly Detection in Web Traffic , 2009, NDSS.

[3]  Daniel Q. Naiman Statistical anomaly detection via httpd data analysis , 2004, Comput. Stat. Data Anal..

[4]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[5]  Gregory J. Conti,et al.  Toward Instrumenting Network Warfare Competitions to Generate Labeled Datasets , 2009, CSET.

[6]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[7]  Gonzalo Álvarez,et al.  Application of the Generic Feature Selection Measure in Detection of Web Attacks , 2011, CISIS.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[10]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[11]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[12]  Konrad Rieck,et al.  Detecting Unknown Network Attacks Using Language Models , 2006, DIMVA.

[13]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[14]  Slobodan Petrovic,et al.  Towards a Generic Feature-Selection Measure for Intrusion Detection , 2010, 2010 20th International Conference on Pattern Recognition.

[15]  Michael Becher Web Application Firewalls , 2007 .

[16]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[17]  Gonzalo Álvarez,et al.  A Self-learning Anomaly-Based Web Application Firewall , 2009, CISIS.

[18]  Wenke Lee,et al.  McPAD: A multiple classifier system for accurate payload-based anomaly detection , 2009, Comput. Networks.

[19]  Carla Marceau,et al.  Characterizing the behavior of a program using multiple-length N-grams , 2001, NSPW '00.

[20]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[22]  R.K. Cunningham,et al.  Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[23]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .