A method for the automated detection phishing websites through both site characteristics and image analysis

Phishing website analysis is largely still a time-consuming manual process of discovering potential phishing sites, verifying if suspicious sites truly are malicious spoofs and if so, distributing their URLs to the appropriate blacklisting services. Attackers increasingly use sophisticated systems for bringing phishing sites up and down rapidly at new locations, making automated response essential. In this paper, we present a method for rapid, automated detection and analysis of phishing websites. Our method relies on near real-time gathering and analysis of URLs posted on social media sites. We fetch the pages pointed to by each URL and characterize each page with a set of easily computed values such as number of images and links. We also capture a screen-shot of the rendered page image, compute a hash of the image and use the Hamming distance between these image hashes as a form of visual comparison. We provide initial results demonstrate the feasibility of our techniques by comparing legitimate sites to known fraudulent versions from Phishtank.com, by actively introducing a series of minor changes to a phishing toolkit captured in a local honeypot and by performing some initial analysis on a set of over 2.8 million URLs posted to Twitter over a 4 days in August 2011. We discuss the issues encountered during our testing such as resolvability and legitimacy of URL's posted on Twitter, the data sets used, the characteristics of the phishing sites we discovered, and our plans for future work.

[1]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[2]  Christopher Krügel,et al.  On the Effectiveness of Techniques to Detect Phishing Sites , 2007, DIMVA.

[3]  Eric Medvet,et al.  Visual-similarity-based phishing detection , 2008, SecureComm.

[4]  Tyler Moore,et al.  Temporal Correlations between Spam and Phishing Websites , 2009, LEET.

[5]  Christopher Krügel,et al.  A layout-similarity-based approach for detecting phishing pages , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[6]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[7]  Nick Feamster,et al.  Building a Dynamic Reputation System for DNS , 2010, USENIX Security Symposium.

[8]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[9]  Youssef Iraqi,et al.  Lexical URL analysis for discriminating phishing and legitimate websites , 2011, CEAS '11.

[10]  Philip S. Yu,et al.  A new method for similarity indexing of market basket data , 1999, SIGMOD '99.

[11]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[12]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[13]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[14]  Min Wu,et al.  Do security toolbars actually prevent phishing attacks? , 2006, CHI.

[15]  Gang Liu,et al.  Discovering phishing target based on semantic link network , 2010 .

[16]  Christoph Zauner,et al.  Implementation and Benchmarking of Perceptual Image Hash Functions , 2010 .

[17]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[18]  Lorrie Faith Cranor,et al.  Phinding Phish: An Evaluation of Anti-Phishing Toolbars , 2007, NDSS.

[19]  Fadi A. Thabtah,et al.  Predicting Phishing Websites Using Classification Mining Techniques with Experimental Case Studies , 2010, 2010 Seventh International Conference on Information Technology: New Generations.