论文信息 - A Prior-based Transfer Learning Method for the Phishing Detection

A Prior-based Transfer Learning Method for the Phishing Detection

In this paper, we introduce a prior-based transfer learning method for our statistical machine learning classifier which based on the logistic regression to detect the phishing sites that relies on our selected features of the URLs. Because of the mismatched distributions of the features in different phishing domains, we employ multiple models for different regions. Since it is impossible for us to collect enough data from a new region to rebuild the detection model, we adjust the existing models by the transfer learning algorithm to solve these problems. The proposed algorithm was evaluated on a real-world task of detecting the phishing websites. After a number of experiments, our proposed transfer learning algorithm achieves more than 97% accuracy. The result demonstrates the use of this algorithm in the anti-phishing scenario is feasible and ready for our large scale detection engine.

Yang Xin | Dan Li | Jianyi Zhang | Yangxi Ou

[1] Lorrie Faith Cranor,et al. Phinding Phish: An Evaluation of Anti-Phishing Toolbars , 2007, NDSS.

[2] John C. Mitchell,et al. Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[3] Niels Provos,et al. A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[4] Justin Tung Ma,et al. Learning to detect malicious URLs , 2011, TIST.

[5] Lorrie Faith Cranor,et al. Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[6] Brian Ryner,et al. Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[7] Jason I. Hong,et al. A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[8] Stanley Lemeshow,et al. Applied Logistic Regression, Second Edition , 1989 .

[9] John Yearwood,et al. Application of Rank Correlation, Clustering and Classification in Information Security , 2012, J. Networks.

[10] John Langford,et al. Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[11] Lawrence K. Saul,et al. Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[12] Minaxi Gupta,et al. Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[13] David W. Hosmer,et al. Applied Logistic Regression , 1991 .

[14] Lawrence K. Saul,et al. Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[15] Lorrie Faith Cranor,et al. An Empirical Analysis of Phishing Blacklists , 2009, CEAS 2009.