Spam detection in email body using hybrid of artificial neural network and evolutionary algorithms

Spam detection is a significant problem that is considered by many researchers through various developed strategies. Creating a particular model to categorize the wide range of spam categories is complex; with understanding of spam types, which are always changing. In spam detection, low accuracy and the high false positive are substantial problems. So the trend to hire a global optimization algorithm is an appropriate way to resolve these problems due to its ability to create new solutions and non-compliance with local solutions. In this study, a hybrid machine learning approach inspired by Artificial Neural Network (ANN) and Differential Evolution (DE) are designed for effectively detect the spams. Comparisons have been done between ANN-DE with Genetic Algorithm (GA) and ANN-DE with InfoGain algorithm to show which approach has the best performance in spam detection. Spambase dataset of 4061 E-mail in which 1813 were spam (39.40%) and 2788 were non-spam (59.60%) were used to training and testing on these algorithms. The popular performance measure is a classification accuracy, which deals with false positive, false negative, accuracy, precision, and recall. These metrics were used for performance evaluation on the hybrid of ANN-DE with GA and InfoGain algorithm as feature selection algorithms. Performance of ANN-DE with GA and ANN-DE with InfoGain are compared. The experimental results show that the proposed hybrid technique of ANN-DE and GA gives better result with 93.81% accuracy compared to ANN-DE and InfoGain with 93.28% accuracy. The results recommend that the effectiveness of proposed ANN-DE with GA is promising and this study provided a new method to practically train ANN for spam detection.

[1]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[2]  Miranda Mowbray,et al.  Email Prioritization: Reducing Delays on Legitimate Mail Caused by Junk Mail , 2004, USENIX Annual Technical Conference, General Track.

[3]  John R. Levine Experiences with Greylisting , 2005, CEAS.

[4]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[5]  M. Basavaraju,et al.  A Novel Method of Spam Mail Detection using Text Based Clustering Approach , 2010 .

[6]  Xiao Luo,et al.  Comparison of a SOM based sequence analysis system and naive Bayesian classifier for spam filtering , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[7]  Hamid A. Jalab,et al.  Overview of textual anti-spam filtering techniques , 2010 .

[8]  Ali Selamat,et al.  Improved email spam detection model with negative selection algorithm and particle swarm optimization , 2014, Appl. Soft Comput..

[9]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[10]  Richard J. Enbody,et al.  Further Research on Feature Selection and Classification Using Genetic Algorithms , 1993, ICGA.

[11]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[12]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[13]  S. Dhanaraj,et al.  A study on e-mail image spam filtering techniques , 2013, 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering.

[14]  Jon Kågström,et al.  IMPROVING NAIVE BAYESIAN SPAM FILTERING , 2005 .

[15]  Dino Isa,et al.  Using the self organizing map for clustering of text documents , 2009, Expert Syst. Appl..

[16]  Riccardo Poli,et al.  New ideas in optimization , 1999 .

[17]  Anazida Zainal,et al.  Spam detection using hybrid Artificial Neural Network and Genetic algorithm , 2013, 2013 13th International Conference on Intellient Systems Design and Applications.

[18]  Joel Scanlan,et al.  Catching spam before it arrives: domain specific dynamic blacklists , 2006, ACSW.

[19]  Geoff Hulten,et al.  Trends in Spam Products and Methods , 2004, CEAS.

[20]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[21]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[22]  Jonathan J. Oliver,et al.  Anatomy of a Phishing Email , 2004, CEAS.

[23]  Kartik Gopalan,et al.  A Differentiated Message Delivery Architecture to Control Spam , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).

[24]  Puteh Saad Enhancement of neural network convergence with hidden layer and memory part control , 2003 .

[25]  Mohamed Ghailani,et al.  A Study on Email Spam Filtering Techniques , 2010 .

[26]  G. Heijne,et al.  ChloroP, a neural network‐based method for predicting chloroplast transit peptides and their cleavage sites , 1999, Protein science : a publication of the Protein Society.

[27]  Takamichi Saito Anti-SPAM System: Another Way of Preventing SPAM , 2005, 16th International Workshop on Database and Expert Systems Applications (DEXA'05).

[28]  Chih-Chin Lai,et al.  An empirical performance comparison of machine learning methods for spam e-mail categorization , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[29]  Nicola Lugaresi European Union vs. Spam: A Legal Response , 2004, CEAS.

[30]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[31]  Carlos A. Coello Coello,et al.  A comparative study of differential evolution variants for global optimization , 2006, GECCO.

[32]  Tsuhan Chen,et al.  A collaborative anti-spam system , 2009, Expert Syst. Appl..

[33]  Ron Kohavi,et al.  Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology , 1995, KDD.

[34]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[35]  Kay Hameyer,et al.  Optimization of radial active magnetic bearings using the finite element technique and the differential evolution algorithm , 2000 .

[36]  Jae-Kwang Lee,et al.  Multi Layer Approach to Defend DDoS Attacks Caused by Spam , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).

[37]  Ray Hunt,et al.  Tightening the net: A review of current and next generation spam filtering tools , 2006, Comput. Secur..

[38]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[39]  Benjamin Kuipers,et al.  Zmail: zero-sum free market control of spam , 2005, 25th IEEE International Conference on Distributed Computing Systems Workshops.

[40]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognit. Lett..

[41]  W. Land,et al.  A new training algorithm for the general regression neural network , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[42]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[43]  Christopher G. Langton,et al.  Artificial Life , 2019, Philosophical Posthumanism.

[44]  Madhumita Panda,et al.  A Hybrid Differential Evolution and Back-Propagation Algorithm for Feedforward Neural Network Training , 2013 .

[45]  Gordon V. Cormack,et al.  Spam and the ongoing battle for the inbox , 2007, CACM.

[46]  HE Yi-gang A NEW NEURAL NETWORK BASED POWER SYSTEM HARMONICS ANALYSIS ALGORITHM WITH HIGH ACCURACY , 2005 .

[47]  Igor V. Tetko,et al.  Neural network studies, 1. Comparison of overfitting and overtraining , 1995, J. Chem. Inf. Comput. Sci..

[48]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[49]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[50]  Ali Selamat,et al.  Hybrid simple artificial immune system (SAIS) and particle swarm optimization (PSO) for spam detection , 2011, 2011 Malaysian Conference in Software Engineering.

[51]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[52]  Konstantin Tretyakov,et al.  Machine Learning Techniques in Spam Filtering , 2004 .

[53]  Younghwa Lee The CAN-SPAM Act: a silver bullet solution? , 2005, CACM.

[54]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..