A real-time automatic detection of phishing URLs

Phishing scam, a fraudulent attempt that masquerades as a trustworthy entity to obtain users' sensitive data, has become the most dangerous form of online fraud to hit online businesses and information security. In this paper, we reveal some new aspects of the common features that appear in the phishing URLs, and introduce a statistical machine learning classifier to detect the phishing sites, which relies on these selected features. Unlike previous studies, we do not utilize an ordinary feature extraction method since some of these features need to be treated differently and some of these cannot be retrieved by the traditional way. A number of comprehensive experiments show that our proposed method achieves high accuracy over a balanced dataset and less than 1% error rates in the simulated real phishing scene with a high processing speed. And moreover, the well performance of our proposed algorithm demonstrates the new characteristics and the corresponding extraction methods are useful in the anti-phishing scenario.

[1]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[2]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[3]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[4]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[5]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[6]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[7]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[8]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[9]  John C. Mitchell,et al.  Client-Side Defense Against Web-Based Identity Theft , 2004, NDSS.

[10]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[11]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[12]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[13]  Fadi A. Thabtah,et al.  Intelligent phishing detection system for e-banking using fuzzy data mining , 2010, Expert Syst. Appl..

[14]  Ming-Wei Chang,et al.  Partitioned logistic regression for spam filtering , 2008, KDD.

[15]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[16]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[17]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.