Everything Is in the Name - A URL Based Approach for Phishing Detection

Phishing attack, in which a user is tricked into revealing sensitive information on a spoofed website, is one of the most common threat to cybersecurity. Most modern web browsers counter phishing attacks using a blacklist of confirmed phishing URLs. However, one major disadvantage of the blacklist method is that it is ineffective against newly generated phishes. Machine learning based techniques that rely on features extracted from URL (e.g., URL length and bag-of-words) or web page (e.g., TF-IDF and form fields) are considered to be more effective in identifying new phishing attacks. The main benefit of using URL based features over page based features is that the machine learning model can classify new URLs on-the-fly even before the page is loaded by the web browser, thus avoiding other potential dangers such as drive-by download attacks and cryptojacking attacks.

[1]  Baojiang Cui,et al.  Detecting Malicious URLs via a Keyword-Based Convolutional Gated-Recurrent-Unit Neural Network , 2019, IEEE Access.

[2]  Michalis Faloutsos,et al.  PhishDef: URL names say it all , 2010, 2011 Proceedings IEEE INFOCOM.

[3]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[4]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[5]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[6]  Lorrie Faith Cranor,et al.  Anti-Phishing Phil: the design and evaluation of a game that teaches people not to fall for phish , 2007, SOUPS '07.

[7]  Sonia Chiasson,et al.  Why phishing still works: User strategies for combating phishing attacks , 2015, Int. J. Hum. Comput. Stud..

[8]  Radu State,et al.  PhishStorm: Detecting Phishing With Streaming Analytics , 2014, IEEE Transactions on Network and Service Management.

[9]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[10]  Harshal Tupsamudre,et al.  PHISHY - A Serious Game to Train Enterprise Users on Phishing Awareness , 2018, CHI PLAY.

[11]  Sunny Consolvo,et al.  An Experience Sampling Study of User Reactions to Browser Warnings in the Field , 2018, CHI.

[12]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[13]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[14]  Melanie Volkamer,et al.  NoPhish App Evaluation: Lab and Retention Study , 2015 .

[15]  Nikolaos Pitropakis,et al.  Hiding in Plain Sight: A Longitudinal Study of Combosquatting Abuse , 2017, CCS.

[16]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[17]  ChiassonSonia,et al.  Why phishing still works , 2015 .

[18]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.

[19]  Rakesh M. Verma,et al.  What's in a URL: Fast Feature Extraction and Malicious URL Detection , 2017, IWSPA@CODASPY.

[20]  Jason Hong,et al.  The state of phishing attacks , 2012, Commun. ACM.

[21]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[22]  Sunny Consolvo,et al.  Improving SSL Warnings: Comprehension and Adherence , 2015, CHI.

[23]  John Heidemann,et al.  AuntieTuna: Personalized Content-based Phishing Detection , 2016 .