Application of the Generic Feature Selection Measure in Detection of Web Attacks

Feature selection for filtering HTTP-traffic in Web application firewalls (WAFs) is an important task. We focus on the Generic-Feature-Selection (GeFS) measure [4], which was successfully tested on low-level package filters, i.e., the KDD CUP'99 dataset. However, the performance of the GeFS measure in analyzing high-level HTTP-traffic is still unknown. In this paper we study the GeFS measure for WAFs. We conduct experiments on the publicly available ECML/PKDD-2007 dataset. Since this dataset does not target any real Web application, we additionally generate our new CSIC-2010 dataset. We analyze the statistical properties of both two datasets to provide more insides of their nature and quality. Subsequently, we determine appropriate instances of the GeFS measure for feature selection. We use different classifiers to test the detection accuracies. The experiments show that we can remove 63% of irrelevant and redundant features from the original dataset, while reducing only 0.12% the detection accuracy of WAFs.