Machine Learning Approaches for Modeling Spammer Behavior

Spam is commonly known as unsolicited or unwanted email messages in the Internet causing potential threat to Internet Security. Users spend a valuable amount of time deleting spam emails. More importantly, ever increasing spam emails occupy server storage space and consume network bandwidth. Keyword-based spam email filtering strategies will eventually be less successful to model spammer behavior as the spammer constantly changes their tricks to circumvent these filters. The evasive tactics that the spammer uses are patterns and these patterns can be modeled to combat spam. This paper investigates the possibilities of modeling spammer behavioral patterns by well-known classification algorithms such as Naive Bayesian classifier (Naive Bayes), Decision Tree Induction (DTI) and Support Vector Machines (SVMs). Preliminary experimental results demonstrate a promising detection rate of around 92%, which is considerably an enhancement of performance compared to similar spammer behavior modeling research.

[1]  Kathrin Eichler Automatic Classification of Swedish Email Messages , 2005 .

[2]  Md. Saiful Islam,et al.  An Architecture of Active Learning SVMs with Relevance Feedback for Classifying E-mail , 2010, ArXiv.

[3]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[4]  Vasile Palade,et al.  Multi-Classifier Systems: Review and a roadmap for developers , 2006, Int. J. Hybrid Intell. Syst..

[5]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6]  Sung-Hyuk Cha,et al.  A Neural Network Classifier for Junk E-Mail , 2004, Document Analysis Systems.

[7]  Minyi Guo,et al.  An innovative analyser for multi-classifier e-mail classification based on grey list analysis , 2009, J. Netw. Comput. Appl..

[8]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[9]  Wanlei Zhou,et al.  Spam filtering for network traffic security on a multi-core environment , 2009, NSS 2009.

[10]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[11]  Sharma Chakravarthy,et al.  eMailSift: eMail classification based on structure and content , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Morshed U. Chowdhury,et al.  Spam filtering using ML algorithms , 2005 .