Systematization of Knowledge (SoK): A Systematic Review of Software-Based Web Phishing Detection

Phishing is a form of cyber attack that leverages social engineering approaches and other sophisticated techniques to harvest personal information from users of websites. The average annual growth rate of the number of unique phishing websites detected by the Anti Phishing Working Group is 36.29% for the past six years and 97.36% for the past two years. In the wake of this rise, alleviating phishing attacks has received a growing interest from the cyber security community. Extensive research and development have been conducted to detect phishing attempts based on their unique content, network, and URL characteristics. Existing approaches differ significantly in terms of intuitions, data analysis methods, as well as evaluation methodologies. This warrants a careful systematization so that the advantages and limitations of each approach, as well as the applicability in different contexts, could be analyzed and contrasted in a rigorous and principled way. This paper presents a systematic study of phishing detection schemes, especially software based ones. Starting from the phishing detection taxonomy, we study evaluation datasets, detection features, detection techniques, and evaluation metrics. Finally, we provide insights that we believe will help guide the development of more effective and efficient phishing detection schemes.

[1]  Jiye Shi,et al.  Use of Network Latency Profiling and Redundancy for Cloud Server Selection , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[2]  Issa M. Khalil,et al.  Secure inter cloud data migration , 2016, 2016 7th International Conference on Information and Communication Systems (ICICS).

[3]  Xiaotie Deng,et al.  EMD based Visual Similarity for Detection of Phishing Webpages , 2005 .

[4]  Jason Hong,et al.  The state of phishing attacks , 2012, Commun. ACM.

[5]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[6]  Sadia Afroz,et al.  PhishZoo : An Automated Web Phishing Detection Approach Based on Profiling and Fuzzy Matching , 2009 .

[7]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[8]  Zhou Li,et al.  Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data , 2014, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[9]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[10]  Abdallah Khreishah,et al.  A Novel and Robust Authentication Factor Based on Network Communications Latency , 2018, IEEE Systems Journal.

[11]  Issa M. Khalil,et al.  TPM-Based Authentication Mechanism for Apache Hadoop , 2014, SecureComm.

[12]  David A. Wagner,et al.  Dynamic pharming attacks and locked same-origin policies for web browsers , 2007, CCS '07.

[13]  Max-Emanuel Maurer,et al.  Using visual website similarity for phishing detection and reporting , 2012, CHI Extended Abstracts.

[14]  Sherali Zeadally,et al.  A Taxonomy of Domain-Generation Algorithms , 2016, IEEE Security & Privacy.

[15]  Ponnurangam Kumaraguru,et al.  Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions , 2010, CHI.

[16]  Issa M. Khalil,et al.  CLAS: A Novel Communications Latency Based Authentication Scheme , 2017, Secur. Commun. Networks.

[17]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[18]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[19]  Lorrie Faith Cranor,et al.  Anti-Phishing Phil: the design and evaluation of a game that teaches people not to fall for phish , 2007, SOUPS '07.

[20]  Lorrie Faith Cranor,et al.  Decision strategies and susceptibility to phishing , 2006, SOUPS '06.

[21]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[22]  Shari Lawrence Pfleeger,et al.  Going Spear Phishing: Exploring Embedded Training and Awareness , 2014, IEEE Security & Privacy.

[23]  Radu State,et al.  PhishStorm: Detecting Phishing With Streaming Analytics , 2014, IEEE Transactions on Network and Service Management.

[24]  Harry Wechsler,et al.  Phishing website detection using Latent Dirichlet Allocation and AdaBoost , 2012, 2012 IEEE International Conference on Intelligence and Security Informatics.

[25]  Suku Nair,et al.  Bypassing Security Toolbars and Phishing Filters via DNS Poisoning , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[26]  Issa M. Khalil,et al.  Efficient wireless reprogramming through reduced bandwidth usage and opportunistic sleeping , 2009, Ad Hoc Networks.

[27]  Issa M. Khalil,et al.  Analysis and evaluation of Secos, a protocol for energy efficient and secure communication in sensor networks , 2007, Ad Hoc Networks.

[28]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[29]  Wassim El-Hajj,et al.  Two factor authentication using mobile phones , 2009, 2009 IEEE/ACS International Conference on Computer Systems and Applications.

[30]  Kuan-Ta Chen,et al.  Fighting Phishing with Discriminative Keypoint Features , 2009, IEEE Internet Computing.

[31]  Pradeep K. Atrey,et al.  A survey and classification of web phishing detection schemes , 2016, Secur. Commun. Networks.

[32]  Farnam Jahanian,et al.  Shades of grey: On the effectiveness of reputation-based “blacklists” , 2008, 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE).

[33]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[34]  Richard Weber,et al.  Latent semantic analysis and keyword extraction for phishing classification , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[35]  Mohsen Guizani,et al.  Smart Cities: A Survey on Data Management, Security, and Enabling Technologies , 2017, IEEE Communications Surveys & Tutorials.

[36]  Markus Jakobsson,et al.  Social phishing , 2007, CACM.

[37]  Musa A. Mammadov,et al.  Profiling Phishing Emails Based on Hyperlink Information , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[38]  Choon Lin Tan,et al.  PhishWHO: Phishing webpage detection via identity keywords extraction and target domain name finder , 2016, Decis. Support Syst..

[39]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[40]  Min Wu,et al.  Do security toolbars actually prevent phishing attacks? , 2006, CHI.

[41]  Xiaotie Deng,et al.  Phishing Web page detection , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[42]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[43]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[44]  John C. Russ,et al.  The image processing handbook (3. ed.) , 1995 .

[45]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[46]  Fadi A. Thabtah,et al.  Intelligent phishing detection system for e-banking using fuzzy data mining , 2010, Expert Syst. Appl..

[47]  Lorrie Faith Cranor,et al.  Getting users to pay attention to anti-phishing education: evaluation of retention and transfer , 2007, eCrime '07.

[48]  Mark Crovella,et al.  Studying interdomain routing over long timescales , 2013, Internet Measurement Conference.

[49]  Ali Yazdian Varjani,et al.  New rule-based phishing detection method , 2016, Expert Syst. Appl..

[50]  Yin Zhang,et al.  BGP routing stability of popular destinations , 2002, IMW '02.

[51]  Abdallah Khreishah,et al.  Robust Insider Attacks Countermeasure for Hadoop: Design and Implementation , 2018, IEEE Systems Journal.

[52]  Rui Chen,et al.  Research Article Phishing Susceptibility: An Investigation Into the Processing of a Targeted Spear Phishing Email , 2012, IEEE Transactions on Professional Communication.

[53]  Gang Liu,et al.  Antiphishing through Phishing Target Discovery , 2012, IEEE Internet Computing.

[54]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[55]  Arnon Rungsawang,et al.  Using Domain Top-page Similarity Feature in Machine Learning-Based Web Phishing Detection , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[56]  Nazife Baykal,et al.  Feature extraction and classification phishing websites based on URL , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[57]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[58]  Akira Yamada,et al.  Visual similarity-based phishing detection without victim site information , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.

[59]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[60]  Lixia Zhang,et al.  A Study of Internet Routing Stability Using Link Weight , 2008 .

[61]  Weili Han,et al.  Anti-phishing based on automated individual white-list , 2008, DIM '08.

[62]  Xiao Han,et al.  PhishEye: Live Monitoring of Sandboxed Phishing Kits , 2016, CCS.

[63]  Bimal Parmar,et al.  Protecting against spear-phishing , 2012 .

[64]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[65]  Issa M. Khalil,et al.  SECOS : Key Management for Scalable and Energy Efficient Crypto On Sensors , 2003 .

[66]  Christophe Ley,et al.  Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .

[67]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[68]  Lorrie Faith Cranor,et al.  Teaching Johnny not to fall for phish , 2010, TOIT.

[69]  Nick Feamster,et al.  Can DNS-Based Blacklists Keep Up with Bots? , 2006, CEAS.

[70]  Ting Yu,et al.  Discovering Malicious Domains through Passive DNS Data Graph Analysis , 2016, AsiaCCS.

[71]  Adel Khelifi,et al.  Phishing Detection Plug-In Toolbar Using Intelligent Fuzzy-Classification Mining Techniques , 2013 .

[72]  Lorrie Faith Cranor,et al.  Phinding Phish: An Evaluation of Anti-Phishing Toolbars , 2007, NDSS.

[73]  John C. Mitchell,et al.  Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[74]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[75]  Sung-Ju Lee,et al.  Systematic Mining of Associated Server Herds for Malware Campaign Discovery , 2015, 2015 IEEE 35th International Conference on Distributed Computing Systems.

[76]  Susan Mengel,et al.  Examination of data, rule generation and detection of phishing URLs using online logistic regression , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[77]  Tommy W. S. Chow,et al.  Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach , 2011, IEEE Transactions on Neural Networks.

[78]  Xiaotie Deng,et al.  Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD) , 2006, IEEE Transactions on Dependable and Secure Computing.

[79]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[80]  Maryline Laurent-Maknavicius,et al.  A Dual Approach to Detect Pharming Attacks at the Client-Side , 2011, 2011 4th IFIP International Conference on New Technologies, Mobility and Security.

[81]  Ponnurangam Kumaraguru,et al.  PhishAri : Automatic Realtime Phishing Detection on Twitter Anupama Aggarwal , 2012 .

[82]  Lorrie Faith Cranor,et al.  You've been warned: an empirical study of the effectiveness of web browser phishing warnings , 2008, CHI.

[83]  Juan Pablo Hourcade,et al.  B-APT: Bayesian Anti-Phishing Toolbar , 2008, 2008 IEEE International Conference on Communications.

[84]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[85]  Aman Shaikh,et al.  Routing stability in congested networks: experimentation and analysis , 2000 .

[86]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[87]  Mohsen Sharifi,et al.  A phishing sites blacklist generator , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[88]  Issa M. Khalil,et al.  Your Credentials Are Compromised, Do Not Panic: You Can Be Well Protected , 2016, AsiaCCS.

[89]  Weider D. Yu,et al.  A phishing vulnerability analysis of web based systems , 2008, 2008 IEEE Symposium on Computers and Communications.

[90]  Ilango Krishnamurthi,et al.  An efficacious method for detecting phishing webpages through target domain identification , 2014, Decis. Support Syst..

[91]  Scott Dick,et al.  Detecting visually similar Web pages: Application to phishing detection , 2010, TOIT.

[92]  P. Lalitha,et al.  New Filtering Approaches for Phishing Email , 2013 .

[93]  Gang Liu,et al.  Automatic Detection of Phishing Target from Phishing Webpage , 2010, 2010 20th International Conference on Pattern Recognition.

[94]  F. L. Hitchcock The Distribution of a Product from Several Sources to Numerous Localities , 1941 .

[95]  Michalis Faloutsos,et al.  PhishDef: URL names say it all , 2010, 2011 Proceedings IEEE INFOCOM.

[96]  Scott Dick,et al.  An Anti-Phishing System Employing Diffused Information , 2014, TSEC.

[97]  John C. Russ,et al.  The Image Processing Handbook , 2016, Microscopy and Microanalysis.

[98]  Eric Medvet,et al.  Visual-similarity-based phishing detection , 2008, SecureComm.

[99]  Farnam Jahanian,et al.  Internet routing instability , 1997, SIGCOMM '97.