Distributed Phishing Detection by Applying Variable Selection Using Bayesian Additive Regression Trees

Phishing continue to be one of the most drastic attacks causing both financial institutions and customers huge monetary losses. Nowadays mobile devices are widely used to access the Internet and therefore access financial and confidential data. However, unlike PCs and wired devices, such devices lack basic defensive applications to protect against various types of attacks. In consequence, phishing has evolved to target mobile users in Vishing and SMishing attacks recently. This study presents a client-server distributed architecture to detect phishing e-mails by taking advantage of automatic variable selection in Bayesian Additive Regression Trees (BART). When combined with other classifiers, BART improves their predictive accuracy. Further the overall architecture proves to leverage well in resource constrained environments.

[1]  Howard B. Lee,et al.  Elementary Statistics: A Problem-Solving Approach , 1994 .

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[4]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Amr M. Youssef,et al.  On Some Feature Selection Strategies for Spam Filter Design , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.

[6]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[7]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[8]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[9]  Suku Nair,et al.  Bayesian Additive Regression Trees-Based Spam Detection for Enhanced Email Privacy , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[10]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.