Modeling Spammer Behavior: Naïve Bayes vs. Artificial Neural Networks

Addressing the problem of spam emails in the Internet, this paper presents a comparative study on Naïve Bayes and Artificial Neural Networks (ANN) based modeling of spammer behavior. Keyword-based spam email filtering techniques fall short to model spammer behavior as the spammer constantly changes tactics to circumvent these filters. The evasive tactics that the spammer uses are themselves patterns that can be modeled to combat spam. It has been observed that both Naïve Bayes and ANN are best suitable for modeling spammer common patterns. Experimental results demonstrate that both of them achieve a promising detection rate of around 92%, which is considerably an improvement of performance compared to the keyword-based contemporary filtering approaches.

[1]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[2]  Ralph Weischedel,et al.  PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[3]  Kathrin Eichler Automatic Classification of Swedish Email Messages , 2005 .

[4]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[5]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[6]  Md. Saiful Islam,et al.  An Architecture of Active Learning SVMs with Relevance Feedback for Classifying E-mail , 2010, ArXiv.

[7]  Sung-Hyuk Cha,et al.  A Neural Network Classifier for Junk E-Mail , 2004, Document Analysis Systems.

[8]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[9]  Sharma Chakravarthy,et al.  eMailSift: eMail classification based on structure and content , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Morshed U. Chowdhury,et al.  Spam filtering using ML algorithms , 2005 .