Detection of Web Defacements by means of Genetic Programming

Web site defacement, the process of introducing unauthorized modifications to a Web site, is a very common form of attack. Detecting such events automatically is very difficult because Web pages are highly dynamic and their degree of dynamism may vary widely across different pages. In this paper we propose a novel detection approach based on genetic programming (GP), an established evolutionary computation paradigm for automatic generation of algorithms. What makes GP particularly attractive in this context is that it does not rely on any domain-specific knowledge, whose description and synthesis is invariably a hard job. In a preliminary learning phase, GP builds an algorithm based on a sequence of readings of the remote page to be monitored and on a sample set of attacks. Then, we monitor the remote page at regular intervals and apply that algorithm, which raises an alert when a suspect modification is found. We developed a prototype based on a broader Web detection framework we proposed earlier and we tested our approach over a dataset of 15 dynamic Web pages, observed for about a month, and a collection of real Web defacements. We compared the results to those of a solution we developed earlier, whose design embedded a substantial amount of domain specific knowledge, and the results clearly show that GP may be an effective approach for this job.