Phishing URL Detection Using URL Ranking

The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

[1]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[2]  T. Kalamboukis,et al.  Text Classification Using Clustering , 2006 .

[3]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[4]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[5]  Markus Jakobsson,et al.  Using Cartoons to Teach Internet Security , 2008, Cryptologia.

[6]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[9]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[10]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[11]  JakobssonMarkus,et al.  Using Cartoons to Teach Internet Security , 2008 .

[12]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[13]  Susan Mengel,et al.  Examination of data, rule generation and detection of phishing URLs using online logistic regression , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[14]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[15]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[16]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.