Hybrid Feature Selection for Phishing Email Detection

Phishing emails are more active than ever before and putting the average computer user and organizations at risk of significant data, brand and financial loss. Through an analysis of a number of phishing and ham email collected, this paper focused on fundamental attacker behavior which could be extracted from email header. It also put forward a hybrid feature selection approach based on combination of content-based and behavior-based. The approach could mine the attacker behavior based on email header. On a publicly available test corpus, our hybrid features selections are able to achieve 96% accuracy rate. In addition, we successfully tested the quality of our proposed behavior-based feature using the information gain.

[1]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[2]  Simon Brown,et al.  Detecting Phishing Emails Using Hybrid Features , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[3]  Andrew H. Sung,et al.  Classifying Phishing Emails Using Confidence-Weighted Linear Classifiers , 2010 .

[4]  Fergus Toolan,et al.  Feature selection for Spam and Phishing detection , 2010, 2010 eCrime Researchers Summit.

[5]  Dongsong Zhang,et al.  A Statistical Language Modeling Approach to Online Deception Detection , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  Fergus Toolan,et al.  Phishing detection using classifier ensembles , 2009, 2009 eCrime Researchers Summit.

[7]  Wei Liu,et al.  A Behavior-Based Detection Approach to Mass-Mailing Host , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[8]  Sid Stamm,et al.  Fighting unicode-obfuscated spam , 2007, eCrime '07.

[9]  Wilfried N. Gansterer,et al.  E-Mail Classification for Phishing Defense , 2009, ECIR.

[10]  Gerhard Paass,et al.  Improved Phishing Detection using Model-Based Features , 2008, CEAS.

[11]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.