A Prior-based Transfer Learning Method for the Phishing Detection

In this paper, we introduce a prior-based transfer  learning method for our statistical machine learning  classifier which based on the logistic regression to detect the  phishing sites that relies on our selected features of the  URLs. Because of the mismatched distributions of the  features in different phishing domains, we employ multiple  models for different regions. Since it is impossible for us to  collect enough data from a new region to rebuild the  detection model, we adjust the existing models by the  transfer learning algorithm to solve these problems. The  proposed algorithm was evaluated on a real-world task of  detecting the phishing websites. After a number of  experiments, our proposed transfer learning algorithm  achieves more than 97% accuracy. The result demonstrates  the use of this algorithm in the anti-phishing scenario is  feasible and ready for our large scale detection engine.

[1]  Lorrie Faith Cranor,et al.  Phinding Phish: An Evaluation of Anti-Phishing Toolbars , 2007, NDSS.

[2]  John C. Mitchell,et al.  Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[3]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[4]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[5]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[6]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[7]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[8]  Stanley Lemeshow,et al.  Applied Logistic Regression, Second Edition , 1989 .

[9]  John Yearwood,et al.  Application of Rank Correlation, Clustering and Classification in Information Security , 2012, J. Networks.

[10]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[11]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[12]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[13]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[14]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[15]  Lorrie Faith Cranor,et al.  An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.