Application of Feature Engineering for Phishing Detection

Phishing attacks target financial returns by luring Internet users to exposure their sensitive information. Phishing originates from e-mail fraud, and recently it is also spread by social networks and short message service (SMS), which makes phishing become more widespread. Phishing attacks have drawn great attention due to their high volume and causing heavy losses, and many methods have been developed to fight against them. However, most of researches suffered low detection accuracy or high false positive (FP) rate, and phishing attacks are facing the Internet users continuously. In this paper, we are concerned about feature engineering for improving the classification performance on phishing web pages detection. We propose a novel anti-phishing framework that employs feature engineering including feature selection and feature extraction. First, we perform feature selection based on genetic algorithm (GA) to divide features into critical features and non-critical features. Then, the non-critical features are projected to a new feature by implementing feature extraction based on a two-stage projection pursuit (PP) algorithm. Finally, we take the critical features and the new feature as input data to construct the detection model. Our anti-phishing framework does not simply eliminate the non-critical features, but considers utilizing their projection in the process of classification, which is different from literatures. Experimental results show that the proposed framework is effective in detecting phishing web pages. key words: phishing detection, feature engineering, feature selection, feature extraction, two-stage projection pursuit

[1]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[2]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[3]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[4]  姜青山 Intelligent anti-phishing framework using multiple classifiers combination , 2012 .

[5]  T. L. McCluskey,et al.  Intelligent rule-based phishing websites classification , 2014, IET Inf. Secur..

[6]  Mingxing He,et al.  An efficient phishing webpage detector , 2011, Expert Syst. Appl..

[7]  Tommy W. S. Chow,et al.  Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach , 2011, IEEE Transactions on Neural Networks.

[8]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[9]  Yu Zhou,et al.  Visual Similarity Based Anti-phishing with the Combination of Local and Global Features , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[10]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[11]  Zhijun Yan,et al.  A domain-feature enhanced classification model for the detection of Chinese phishing e-Business websites , 2014, Inf. Manag..

[12]  Qingzhong Liu,et al.  Feature Selection for Improved Phishing Detection , 2012, IEA/AIE.

[13]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[14]  Steve Love,et al.  A game design framework for avoiding phishing attacks , 2013, Comput. Hum. Behav..

[15]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[16]  Gang Liu,et al.  Antiphishing through Phishing Target Discovery , 2012, IEEE Internet Computing.

[17]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[18]  Huajun Huang,et al.  A SVM-based Technique to Detect Phishing URLs , 2012 .

[19]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[20]  Wen Gao,et al.  Classifiability-Based Discriminatory Projection Pursuit , 2011, IEEE Transactions on Neural Networks.

[21]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[22]  Ilango Krishnamurthi,et al.  An efficacious method for detecting phishing webpages through target domain identification , 2014, Decis. Support Syst..

[23]  Lorrie Faith Cranor,et al.  Anti-Phishing Phil: the design and evaluation of a game that teaches people not to fall for phish , 2007, SOUPS '07.

[24]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.