A lifelong spam emails classification model

Abstract Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.

[1]  Florentino Fernández Riverola,et al.  Concept drift in e-mail datasets: An empirical study with practical implications , 2018, Inf. Sci..

[2]  R. Brits,et al.  A clustering approach to incremental learning for feedforward neural networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[3]  Ali Shafigh Aski,et al.  Proposed efficient algorithm to filter spam using machine learning techniques , 2016 .

[4]  Robi Polikar,et al.  An Ensemble-Based Incremental Learning Approach to Data Fusion , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Zhen-fang Zhu,et al.  Research on E-mail Filtering Based On Improved Bayesian , 2009, J. Comput..

[6]  Gurpreet Singh,et al.  Prediction of Coronary Heart Disease using Machine Learning: An Experimental Analysis , 2019, ICDLT.

[7]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[8]  Rasim M. Alguliyev,et al.  Classification of Textual E-Mail Spam Using Data Mining Techniques , 2011, Appl. Comput. Intell. Soft Comput..

[9]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[10]  Yang Zhang,et al.  Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering , 2008, EuroGP.

[11]  Robert E. Mercer,et al.  Classifying Spam Emails Using Text and Readability Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Fadi Thabtah,et al.  Predicting Phishing Websites using Neural Network trained with Back-Propagation , 2013 .

[13]  Rami Mustafa A. Mohammad,et al.  An ensemble self-structuring neural network approach to solving classification problems with virtual concept drift and its application to phishing websites , 2016 .

[14]  Tunga Güngör,et al.  Time-efficient spam e-mail filtering using n-gram models , 2008, Pattern Recognit. Lett..

[15]  T. L. McCluskey,et al.  Intelligent rule-based phishing websites classification , 2014, IET Inf. Secur..

[16]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[17]  Feng Qian,et al.  A case for unsupervised-learning-based spam filtering , 2010, SIGMETRICS '10.

[18]  John A. Bullinaria,et al.  Evolving improved incremental learning schemes for neural network systems , 2005, 2005 IEEE Congress on Evolutionary Computation.

[19]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[20]  Jeffrey O. Kephart,et al.  SpamGuru: An Enterprise Anti-Spam Filtering System , 2004, CEAS.

[21]  Jerzy Stefanowski,et al.  Accuracy Updated Ensemble for Data Streams with Concept Drift , 2011, HAIS.

[22]  Levent Özgür,et al.  Spam Mail Detection Using Artificial Neural Network and Bayesian Filter , 2004, IDEAL.

[23]  Stanislaw Osowski,et al.  Ensemble of data mining methods for gene ranking , 2012 .

[24]  Bhavani M. Thuraisingham,et al.  A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams , 2009, PAKDD.

[25]  Nitesh V. Chawla,et al.  Creating ensembles of classifiers , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[26]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[27]  Eyke Hüllermeier,et al.  FURIA: an algorithm for unordered fuzzy rule induction , 2009, Data Mining and Knowledge Discovery.

[28]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[29]  Rami M. Mohammad,et al.  An intelligent model for trustworthiness evaluation in semantic web applications , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[30]  T. L. McCluskey,et al.  Tutorial and critical analysis of phishing websites methods , 2015, Comput. Sci. Rev..

[31]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[32]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[33]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[34]  Sunil Pranit Lal,et al.  Improving Spam Detection Using Neural Networks Trained by Memetic Algorithm , 2013, 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation.

[35]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[36]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[37]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[38]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[39]  T. L. McCluskey,et al.  An Improved Self-Structuring Neural Network , 2016, PAKDD Workshops.

[40]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[41]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains (Extended Abstract) , 1995, ECML.

[42]  Xin Liu,et al.  CPSFS: A Credible Personalized Spam Filtering Scheme by Crowdsourcing , 2017, Wirel. Commun. Mob. Comput..

[43]  Yang Song,et al.  Better Naive Bayes classification for high-precision spam detection , 2009 .

[44]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[45]  Ryszard Tadeusiewicz,et al.  Neural Networks In Mining Sciences – General Overview And Some Representative Examples , 2015 .

[46]  Vipul Ved Prakash,et al.  Fighting Spam with Reputation Systems , 2005, ACM Queue.

[47]  Dhananjay Kalbande,et al.  ANFIS based Spam filtering model for Social Networking Websites , 2012 .

[48]  Alaa El-Halees Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques , 2009, Int. Arab J. Inf. Technol..

[49]  Marcos Salganicoff,et al.  Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching , 1997, Artificial Intelligence Review.

[50]  Alireza Sadeghian,et al.  Spam detection system: A new approach based on interval type-2 fuzzy sets , 2011, 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE).

[51]  Lise Getoor,et al.  Trusting spam reporters: A reporter-based reputation system for email filtering , 2008, TOIS.

[52]  Li Zhang,et al.  An adaptive ensemble classifier for mining concept drifting data streams , 2013, Expert Syst. Appl..

[53]  Florentino Fernández Riverola,et al.  Using evolutionary computation for discovering spam patterns from e-mail samples , 2018, Inf. Process. Manag..

[54]  Ricard Gavaldà,et al.  Kalman Filters and Adaptive Windows for Learning in Data Streams , 2006, Discovery Science.

[55]  Qiao Liu,et al.  Text spam neural network classification algorithm , 2010, 2010 International Conference on Communications, Circuits and Systems (ICCCAS).

[56]  Rami M. Mohammad,et al.  A comparison of machine learning techniques for file system forensics analysis , 2019, J. Inf. Secur. Appl..

[57]  Rami Mustafa A. Mohammad,et al.  A Neural Network based Digital Forensics Classification , 2018, 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[58]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[59]  Rami M. Mohammad,et al.  An Enhanced Multiclass Support Vector Machine Model and its Application to Classifying File Systems Affected by a Digital Crime , 2019, J. King Saud Univ. Comput. Inf. Sci..

[60]  Sunday Olusanya Olatunji Improved email spam detection model based on support vector machines , 2017, Neural Computing and Applications.

[61]  Fred Henrik Hamker,et al.  Life-long learning Cell Structures--continuously learning without catastrophic interference , 2001, Neural Networks.