Feature selection for phishing detection: a review of research

Web services motivate phishers to evolve more deceptive websites as their never-ending threats to users. This intricate challenge enforces researchers to develop more proficient phishing detection approaches that incorporate hybrid features, machine learning classifiers, and feature selection methods. However, these detection approaches remain incompetent in classification performance over the vast web. This is attributed to the limited selection of the best features from the massive number of hybrid ones, and to the variant outcomes of applied feature selection methods in the realistic condition. In this topic, this paper surveys prominent researches, highlights their limitations, and emphasises on how they could be improved to escalate detection performance. This survey restates additional peculiarities to promote certain facets of the current research trend with the hope to help researchers on how to develop detection approaches and obtain the best quality outcomes of feature selection.

[1]  Mona Ghotaish Alkhozae,et al.  Phishing Websites Detection based on Phishing Characteristics in the Webpage Source Code , 2011 .

[2]  Ilango Krishnamurthi,et al.  A comprehensive and efficacious architecture for detecting phishing webpages , 2014, Comput. Secur..

[3]  Luiz Eduardo Soares de Oliveira,et al.  Obtaining the threat model for e-mail phishing , 2013, Appl. Soft Comput..

[4]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[5]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[6]  Mingxing He,et al.  An efficient phishing webpage detector , 2011, Expert Syst. Appl..

[7]  Zhijun Yan,et al.  A domain-feature enhanced classification model for the detection of Chinese phishing e-Business websites , 2014, Inf. Manag..

[8]  Jemal H. Abawajy,et al.  Hybrid Feature Selection for Phishing Email Detection , 2011, ICA3PP.

[9]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[10]  Lorrie Faith Cranor,et al.  Phinding Phish: An Evaluation of Anti-Phishing Toolbars , 2007, NDSS.

[11]  Tengke Xiong,et al.  An Intelligent Anti-phishing Strategy Model for Phishing Website Detection , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[12]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.

[13]  Xuehua Wang,et al.  Feature selection for high-dimensional imbalanced data , 2013, Neurocomputing.

[14]  Qingzhong Liu,et al.  Feature Selection for Improved Phishing Detection , 2012, IEA/AIE.

[15]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[16]  Hiroshi Yoshiura,et al.  In-depth Evaluation of Content-Based Phishing Detection to Clarify Its Strengths and Limitations , 2010, FGIT-UNESST.

[17]  Swapan Purkait,et al.  Information Management & Computer Security Phishing counter measures and their effectiveness – literature review , 2016 .

[18]  Youssef Iraqi,et al.  A study of feature subset evaluators and feature subset searching methods for phishing classification , 2011, CEAS '11.

[19]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[20]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[21]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.

[22]  Md. Rafiqul Islam,et al.  A multi-tier phishing detection and filtering approach , 2013, J. Netw. Comput. Appl..

[23]  Xavier Perramon,et al.  Phishing Secrets: History, Effects, Countermeasures , 2010, Int. J. Netw. Secur..

[24]  B. B. Gupta,et al.  A Survey of Phishing Email Filtering Techniques , 2013, IEEE Communications Surveys & Tutorials.

[25]  Taghi M. Khoshgoftaar,et al.  Feature Selection with High-Dimensional Imbalanced Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[26]  Daisuke Miyamoto,et al.  An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites , 2008, ICONIP.

[27]  Knut Baumann,et al.  Cross-validation as the objective function for variable-selection techniques , 2003 .

[28]  Fadi Thabtah,et al.  An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods , 2014, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies.

[29]  Huan Liu,et al.  Advancing Feature Selection Research − ASU Feature Selection Repository , 2010 .

[30]  Fergus Toolan,et al.  Feature selection for Spam and Phishing detection , 2010, 2010 eCrime Researchers Summit.

[31]  Jemal H. Abawajy,et al.  An approach for profiling phishing activities , 2014, Comput. Secur..

[32]  Yuancheng Li,et al.  A semi-supervised learning approach for detection of phishing webpages , 2013 .

[33]  Anirban Mukhopadhyay,et al.  An Improved Minimum Redundancy Maximum Relevance Approach for Feature Selection in Gene Expression Data , 2013 .

[34]  Gary Warner,et al.  New tackle to catch a phisher , 2014, Int. J. Electron. Secur. Digit. Forensics.

[35]  Ohbyung Kwon,et al.  Effects of data set features on the performances of classification algorithms , 2013, Expert Syst. Appl..

[36]  Yuval Elovici,et al.  A Chronological Evaluation of Unknown Malcode Detection , 2009, PAISI.

[37]  M. Tech,et al.  Phishing Detection based on Visual-Similarity , 2012 .

[38]  Zahir Tari,et al.  Toward an efficient and scalable feature selection approach for internet traffic classification , 2013, Comput. Networks.

[39]  Simon Brown,et al.  Detecting Phishing Emails Using Hybrid Features , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[40]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[41]  Li Guo,et al.  Survey and Taxonomy of Feature Selection Algorithms in Intrusion Detection System , 2006, Inscrypt.

[42]  Hayri Volkan Agun,et al.  A hybrid approach for extracting informative content from web pages , 2013, Inf. Process. Manag..

[43]  Andrew H. Sung,et al.  Mining Web to Detect Phishing URLs , 2012, 2012 11th International Conference on Machine Learning and Applications.

[44]  Doaa Hassan,et al.  On Determining the Most Effective Subset of Features for Detecting Phishing Websites , 2015 .

[45]  Ilango Krishnamurthi,et al.  PhishTackle—a web services architecture for anti-phishing , 2013, Cluster Computing.

[46]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[47]  Salma A. Ghoneim,et al.  PhishBlock: A hybrid anti-phishing tool , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[48]  Junshan Tan,et al.  Countermeasure Techniques for Deceptive Phishing Attack , 2009, 2009 International Conference on New Trends in Information and Service Science.