Web site defacement, the process of introducing unauthorized modifications to a Web site, is a very common form of attack. Detecting such events automatically is very difficult because Web pages are highly dynamic and their degree of dynamism may vary widely across different pages. In this paper we propose a novel detection approach based on genetic programming (GP), an established evolutionary computation paradigm for automatic generation of algorithms. What makes GP particularly attractive in this context is that it does not rely on any domain-specific knowledge, whose description and synthesis is invariably a hard job. In a preliminary learning phase, GP builds an algorithm based on a sequence of readings of the remote page to be monitored and on a sample set of attacks. Then, we monitor the remote page at regular intervals and apply that algorithm, which raises an alert when a suspect modification is found. We developed a prototype based on a broader Web detection framework we proposed earlier and we tested our approach over a dataset of 15 dynamic Web pages, observed for about a month, and a collection of real Web defacements. We compared the results to those of a solution we developed earlier, whose design embedded a substantial amount of domain specific knowledge, and the results clearly show that GP may be an effective approach for this job.
[1]
John R. Koza,et al.
Genetic programming - on the programming of computers by means of natural selection
,
1993,
Complex adaptive systems.
[2]
Salim Hariri,et al.
An efficient network intrusion detection method based on information theory and genetic algorithm
,
2005,
PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005..
[3]
Josef Pieprzyk,et al.
On-the-fly web content integrity check boosts users' confidence
,
2002,
CACM.
[4]
Eric Medvet,et al.
Automatic Integrity Checks for Remote Web Resources
,
2006,
IEEE Internet Computing.
[5]
Malcolm I. Heywood,et al.
Training genetic programming on half a million patterns: an example from anomaly detection
,
2005,
IEEE Transactions on Evolutionary Computation.
[6]
Andrew H. Sung,et al.
Modeling intrusion detection systems using linear genetic programming approach
,
2004
.
[7]
Houkuan Huang,et al.
Applying Genetic Programming to Evolve Learned Rules for Network Anomaly Detection
,
2005,
ICNC.
[8]
Yuehui Chen,et al.
Hybrid flexible neural-tree-based intrusion detection systems: Research Articles
,
2007
.