A spam filtering multi-objective optimization study covering parsimony maximization and three-way classification

Display Omitted Advances on applications of multi-objective optimization to anti-SPAM filtering.Parsimony maximization of rule-based SPAM classifiers.Three-way classification balancing user effort and confidence level.Indicator-based/machine learning/decomposition-based evolutionary optimization. Classifier performance optimization in machine learning can be stated as a multi-objective optimization problem. In this context, recent works have shown the utility of simple evolutionary multi-objective algorithms (NSGA-II, SPEA2) to conveniently optimize the global performance of different anti-spam filters. The present work extends existing contributions in the spam filtering domain by using three novel indicator-based (SMS-EMOA, CH-EMOA) and decomposition-based (MOEA/D) evolutionary multi-objective algorithms. The proposed approaches are used to optimize the performance of a heterogeneous ensemble of classifiers into two different but complementary scenarios: parsimony maximization and e-mail classification under low confidence level. Experimental results using a publicly available standard corpus allowed us to identify interesting conclusions regarding both the utility of rule-based classification filters and the appropriateness of a three-way classification system in the spam filtering domain.

[1]  Jim Fenton,et al.  Analysis of Threats Motivating DomainKeys Identified Mail (DKIM) , 2006, RFC.

[2]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[3]  Antonio J. Nebro,et al.  jMetal: A Java framework for multi-objective optimization , 2011, Adv. Eng. Softw..

[4]  Yiyu Yao,et al.  The superiority of three-way decisions in probabilistic rough set models , 2011, Inf. Sci..

[5]  Robert P. W. Duin,et al.  A simplified extension of the Area under the ROC to the multiclass domain , 2006 .

[6]  Iryna Yevseyeva,et al.  Optimization of Anti-Spam Systems with Multiobjective Evolutionary Algorithms , 2013, Inf. Resour. Manag. J..

[7]  Yaochu Jin,et al.  Multi-Objective Machine Learning , 2006, Studies in Computational Intelligence.

[8]  Licheng Jiao,et al.  Multiobjective optimization of classifiers by means of 3D convex-hull-based evolutionary algorithms , 2014, Inf. Sci..

[9]  Yiyu Yao,et al.  A Three-Way Decision Approach to Email Spam Filtering , 2010, Canadian Conference on AI.

[10]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[11]  Qingfu Zhang,et al.  MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition , 2007, IEEE Transactions on Evolutionary Computation.

[12]  Scott Kitterman,et al.  Sender Policy Framework (SPF) Authentication Failure Reporting Using the Abuse Reporting Format , 2012, RFC.

[13]  Lothar Thiele,et al.  Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study , 1998, PPSN.

[14]  Robert P. W. Duin,et al.  Efficient Multiclass ROC Approximation by Decomposition via Confusion Matrix Perturbation Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Iryna Yevseyeva,et al.  Optimising anti-spam filters with evolutionary algorithms , 2013, Expert Syst. Appl..

[16]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[17]  Marco Laumanns,et al.  Analysis and applications of evolutionary multiobjective optimization algorithms , 2003 .

[18]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[19]  Xin Yao,et al.  Convex Hull-Based Multiobjective Genetic Programming for Maximizing Receiver Operating Characteristic Performance , 2015, IEEE Transactions on Evolutionary Computation.

[20]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[21]  Carlos A. Coello Coello,et al.  Evolutionary multi-objective optimization: a historical view of the field , 2006, IEEE Comput. Intell. Mag..

[22]  David Beasley,et al.  Possible applications of evolutionary computation , 2018, Evolutionary Computation 1.

[23]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[24]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[25]  Florentino Fernández Riverola,et al.  SDAI: An integral evaluation methodology for content-based spam filtering models , 2012, Expert Syst. Appl..

[26]  Nicola Beume,et al.  An EMO Algorithm Using the Hypervolume Measure as Selection Criterion , 2005, EMO.

[27]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[28]  Eckart Zitzler,et al.  Indicator-Based Selection in Multiobjective Search , 2004, PPSN.

[29]  Nicola Beume,et al.  SMS-EMOA: Multiobjective selection based on dominated hypervolume , 2007, Eur. J. Oper. Res..

[30]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[31]  Iryna Yevseyeva,et al.  Survey on Anti-spam Single and Multi-objective Optimization , 2011, CENTERIS.

[32]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[33]  Enrique Alba,et al.  On the Effect of the Steady-State Selection Scheme in Multi-Objective Genetic Algorithms , 2009, EMO.

[34]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[35]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[36]  Anne Auger,et al.  Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications , 2012, Theor. Comput. Sci..

[37]  Ofer M. Shir,et al.  Enhancing Decision Space Diversity in Evolutionary Multiobjective Algorithms , 2009, EMO.

[38]  Peter W. Resnick,et al.  Internet Message Format , 2001, RFC.

[39]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[40]  Harry Wechsler,et al.  Spam detection using Random Boost , 2012, Pattern Recognit. Lett..

[41]  José Hernández-Orallo,et al.  Volume under the ROC Surface for Multi-class Problems , 2003, ECML.

[42]  Eduardo Díaz,et al.  Grindstone4Spam: An optimization toolkit for boosting e-mail classification , 2012, J. Syst. Softw..

[43]  Rafael Z. Frantz,et al.  An automatic generation of textual pattern rules for digital content filters proposal, using grammatical evolution genetic programming , 2014 .

[44]  Robert M. Nishikawa,et al.  The hypervolume under the ROC hypersurface of "Near-Guessing" and "Near-Perfect" observers in N-class classification tasks , 2005, IEEE Transactions on Medical Imaging.

[45]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[46]  Martin J. Oates,et al.  PESA-II: region-based selection in evolutionary multiobjective optimization , 2001 .