Two-stage ELM for phishing Web pages detection using hybrid features

Increasing high volume phishing attacks are being encountered every day due to attackers’ high financial returns. Recently, there has been significant interest in applying machine learning for phishing Web pages detection. Different from literatures, this paper introduces predicted labels of textual contents to be part of the features and proposes a novel framework for phishing Web pages detection using hybrid features consisting of URL-based, Web-based, rule-based and textual content-based features. We achieve this framework by developing an efficient two-stage extreme learning machine (ELM). The first stage is to construct classification models on textual contents of Web pages using ELM. In particular, we take Optical Character Recognition (OCR) as an assistant tool to extract textual contents from image format Web pages in this stage. In the second stage, a classification model on hybrid features is developed by using a linear combination model-based ensemble ELMs (LC-ELMs), with the weights calculated by the generalized inverse. Experimental results indicate the proposed framework is promising for detecting phishing Web pages.

[1]  Susan Mengel,et al.  Examination of data, rule generation and detection of phishing URLs using online logistic regression , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[2]  Tommy W. S. Chow,et al.  Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach , 2011, IEEE Transactions on Neural Networks.

[3]  Harry Wechsler,et al.  Phishing website detection using Latent Dirichlet Allocation and AdaBoost , 2012, 2012 IEEE International Conference on Intelligence and Security Informatics.

[4]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[5]  Nauman Aslam,et al.  Intelligent phishing detection and protection scheme for online transactions , 2013, Expert Syst. Appl..

[6]  Hsinchun Chen,et al.  A Comparison of Tools for Detecting Fake Websites , 2009, Computer.

[7]  Sam Kwong,et al.  A weighted voting method using minimum square error based on Extreme Learning Machine , 2012, 2012 International Conference on Machine Learning and Cybernetics.

[8]  Han Zhao,et al.  Extreme learning machine: algorithm, theory and applications , 2013, Artificial Intelligence Review.

[9]  Han Wang,et al.  Ensemble Based Extreme Learning Machine , 2010, IEEE Signal Processing Letters.

[10]  Pedro J. García-Laencina Improving Predictions Using Linear Combination Of Multiple Extreme Learning Machines , 2013, Inf. Technol. Control..

[11]  姜青山 Intelligent anti-phishing framework using multiple classifiers combination , 2012 .

[12]  Jian Pei,et al.  Malicious URL detection by dynamically mining patterns without pre-defined elements , 2013, World Wide Web.

[13]  Lorrie Faith Cranor,et al.  Teaching Johnny not to fall for phish , 2010, TOIT.

[14]  Mingxing He,et al.  An efficient phishing webpage detector , 2011, Expert Syst. Appl..

[15]  Yuan Lan,et al.  Ensemble of online sequential extreme learning machine , 2009, Neurocomputing.

[16]  Ninghui Li,et al.  Introduction to special section SACMAT'08 , 2011, TSEC.

[17]  Yanchun Zhang,et al.  Collaborative Topic Ranking: Leveraging Item Meta-Data for Sparsity Reduction , 2015, AAAI.

[18]  Fiona Fui-Hoon Nah,et al.  A study on tolerable waiting time: how long are Web users willing to wait? , 2004, AMCIS.

[19]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[20]  Shujun Li,et al.  A novel anti-phishing framework based on honeypots , 2009, 2009 eCrime Researchers Summit.

[21]  Steve Love,et al.  A game design framework for avoiding phishing attacks , 2013, Comput. Hum. Behav..

[22]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[23]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[24]  Hongming Zhou,et al.  Optimization method based extreme learning machine for classification , 2010, Neurocomputing.

[25]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[26]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[27]  Yuancheng Li,et al.  A pharming attack hybrid detection model based on IP addresses and web content , 2015 .

[28]  Amaury Lendasse,et al.  OP-ELM: Optimally Pruned Extreme Learning Machine , 2010, IEEE Transactions on Neural Networks.

[29]  Nan Liu,et al.  Voting based extreme learning machine , 2012, Inf. Sci..

[30]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[31]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[32]  Stephen Groat,et al.  GoldPhish: Using Images for Content-Based Phishing Analysis , 2010, 2010 Fifth International Conference on Internet Monitoring and Protection.

[33]  Lorrie Faith Cranor,et al.  Lessons from a real world evaluation of anti-phishing training , 2008, 2008 eCrime Researchers Summit.

[34]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).