Identifying vulnerable websites by analysis of common strings in phishing URLs

It has been shown that most phishing sites are created by means of a vulnerable web server being re-purposed by a phisher to host a counterfeit website without the knowledge of the server's owner. In this paper, we examine common vulnerabilities which allow these phishing sites to be created and suggest a method for identifying common attack methods, as well as, help inform webmasters and their hosting companies in ways that help them to defend their servers. Our method involves applying a Longest Common Substring algorithm to known phishing URLs, and investigating the results of that string to identify common vulnerabilities, exploits, and attack tools which may be prevalent among those who hack servers for phishing. Following a Case Study approach, we then select four prevalent attacks that are suggested by our methodology, and use our findings to identify the underlying vulnerability, and document statistics showing that these vulnerabilities are responsible for the creation of phishing websites. Digging further, we identify attack tools created to exploit these vulnerabilities and how they are detected by current intrusion detection signatures. We suggest a means by which this work could be integrated with Intrusion Detection Systems to allow webmasters or hosting providers to reduce their vulnerability to hosting phishing websites.

[1]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[2]  Markus Jakobsson,et al.  Social phishing , 2007, CACM.

[3]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  A. Walairacht,et al.  Adaptive Spai Mail Filtering Using Genetic Algorithm , 2006, 2006 8th International Conference Advanced Communication Technology.

[6]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[7]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[8]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[9]  Gary Warner,et al.  Automating phishing website identification through deep MD5 matching , 2008, 2008 eCrime Researchers Summit.

[10]  Xiaotie Deng,et al.  Phishing Web page detection , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[11]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[12]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.