Optimising anti-spam filters with evolutionary algorithms

This work is devoted to the problem of optimising scores for anti-spam filters, which is essential for the accuracy of any filter based anti-spam system, and is also one of the biggest challenges in this research area. In particular, this optimisation problem is considered from two different points of view: single and multiobjective problem formulations. Some of existing approaches within both formulations are surveyed, and their advantages and disadvantages are discussed. Two most popular evolutionary multiobjective algorithms and one single objective algorithm are adapted to optimisation of the anti-spam filters' scores and compared on publicly available datasets widely used for benchmarking purposes. This comparison is discussed, and the recommendations for the developers and users of optimising anti-spam filters are provided.

[1]  Antonio J. Nebro,et al.  On the Effect of Applying a Steady-State Selection Scheme in the Multi-Objective Genetic Algorithm NSGA-II , 2009, Nature-Inspired Algorithms for Optimisation.

[2]  Juan M. Corchado,et al.  SpamHunting: An instance-based reasoning system for spam labelling and filtering , 2007, Decis. Support Syst..

[3]  Florentino Fernández Riverola,et al.  Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification , 2012, Appl. Soft Comput..

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  Eric Allman,et al.  DomainKeys Identified Mail (DKIM) Signatures , 2007, RFC.

[6]  Theodor J. Stewart,et al.  Multiple criteria decision analysis - an integrated approach , 2001 .

[7]  Lothar Thiele,et al.  Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study , 1998, PPSN.

[8]  Thomas Stützle,et al.  Exploratory Analysis of Stochastic Local Search Algorithms in Biobjective Optimization , 2010, Experimental Methods for the Analysis of Optimization Algorithms.

[9]  Nathaniel S. Borenstein,et al.  Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples , 1996, RFC.

[10]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[11]  Peter J. Fleming,et al.  On the Performance Assessment and Comparison of Stochastic Multiobjective Optimizers , 1996, PPSN.

[12]  Eduardo Díaz,et al.  Grindstone4Spam: An optimization toolkit for boosting e-mail classification , 2012, J. Syst. Softw..

[13]  Ned Freed,et al.  Media Type Specifications and Registration Procedures , 2005, RFC.

[14]  Florentino Fernández Riverola,et al.  Wirebrush4SPAM: a novel framework for improving efficiency on spam filtering services , 2013, Softw. Pract. Exp..

[15]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[16]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[17]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[18]  Florentino Fernández Riverola,et al.  SDAI: An integral evaluation methodology for content-based spam filtering models , 2012, Expert Syst. Appl..

[19]  R. Lyndon While,et al.  Multi-objective spam filtering using an evolutionary algorithm , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[20]  Warren S. Sarle,et al.  Stopped Training and Other Remedies for Overfitting , 1995 .

[21]  Raymond Chiong,et al.  Nature-Inspired Algorithms for Optimisation , 2009, Nature-Inspired Algorithms for Optimisation.

[22]  Paul V. Mockapetris,et al.  Domain names - implementation and specification , 1987, RFC.

[23]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[24]  Carlos M. Fonseca,et al.  Inferential Performance Assessment of Stochastic Optimisers and the Attainment Function , 2001, EMO.

[25]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[26]  Antonio J. Nebro,et al.  jMetal: A Java framework for multi-objective optimization , 2011, Adv. Eng. Softw..

[27]  Iryna Yevseyeva,et al.  Optimization of Anti-Spam Systems with Multiobjective Evolutionary Algorithms , 2013, Inf. Resour. Manag. J..

[28]  Constantin Orasan,et al.  A corpus-based investigation of junk emails , 2002, LREC.

[29]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[30]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..