Adaptive Spam Detection Inspired by the Immune System

This paper proposes a novel solution to spam detection inspired by a model of the adaptive immune system known as the cross-regulation model. We report on the testing of a preliminary algorithm on six e-mail corpora. We also compare our results with those obtained by the Naive Bayes classifier and another binary classification method we developed previously for biomedical text-mining applications. We obtained very encouraging results which can be further improved with development of this bio-inspired model. We show that the cross-regulation model is promising as a bio-inspired algorithm for spam detection in particular, and binary classification in general. Finally, we also present evidence that our bio-inspired model is relevant for understanding immune regulation itself.

[1]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[2]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[3]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[4]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[5]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[6]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[7]  Padraig Cunningham,et al.  ECUE: A Spam Filter that Uses Machine Leaming to Track Concept Drift , 2006, ECAI.

[8]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[9]  Juan M. Corchado,et al.  Tracking Concept Drift at Feature Selection Stage in SpamHunting: An Anti-spam Instance-Based Reasoning System , 2006, ECCBR.

[10]  Ajith Abraham,et al.  Artificial immune system inspired behavior-based anti-spam filter , 2007, Soft Comput..

[11]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[12]  Ana Gabriela Maguitman,et al.  Uncovering Protein-Protein Interactions in the Bibliome , 2007 .

[13]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[14]  Padraig Cunningham,et al.  A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering , 2006, FLAIRS.

[15]  Fernando José Von Zuben,et al.  An Immunological Filter for Spam , 2006, ICARIS.

[16]  C. van den Dool,et al.  When three is not a crowd: a Crossregulation Model of the dynamics and repertoire selection of regulatory CD4+ T cells , 2007, Immunological reviews.

[17]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[18]  Juan M. Corchado,et al.  SpamHunting: An instance-based reasoning system for spam labelling and filtering , 2007, Decis. Support Syst..

[19]  Joshua Alspector,et al.  SVM-based Filtering of E-mail Spam with Content-specic Misclassication Costs , 2001 .

[20]  Luis Mateus Rocha,et al.  Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift , 2008, ICARIS.

[21]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[22]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[23]  L. Segel,et al.  Design Principles for the Immune System and Other Distributed Autonomous Systems , 2001 .

[24]  Terri Kimiko Oda A Spam-Detecting Artificial Immune System by , 2005 .

[25]  Ronen Feldman,et al.  Book Reviews: The Text Mining Handbook: Advanced Approaches to Analyzing Unstructured Data by Ronen Feldman and James Sanger , 2008, CL.

[26]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[27]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.