An Improved Multi-Class Classification Algorithm based on Association Classification Approach and its Application to Spam Emails

In our everyday life, we may use email for personal and professional matters. Yet, email benefits have been bedeviled with the remarkable use of annoying, harmful, and fraudulent messages that commonly referred to as spam emails. Several antispam campaigns hover around machine learning and data mining techniques were devised in literature. An intelligent data mining approach referred to as Associative Classification (AC) presents itself as a possible method that might efficiently identify spam emails. In this study, an improved Spam Classification based on the Association Classification algorithm (SCAC) is proposed. In addition to the robust rule generation procedure, the improved model creation process, and the enhanced prediction mechanism, the SCAC algorithm is able to derive a new class value that doesn't exist in the original dataset that is the “Uncertain” class value. Hence, the SCAC algorithm doesn't introduce several contributions in the field of AC only but also it has contributed to the spam detection domain. These contributions have mutually reinforced the superb classification abilities of the SCAC algorithm when compared to several other intelligent techniques.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[3]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[4]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[5]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[8]  Levent Özgür,et al.  Spam Mail Detection Using Artificial Neural Network and Bayesian Filter , 2004, IDEAL.

[9]  Jeffrey O. Kephart,et al.  SpamGuru: An Enterprise Anti-Spam Filtering System , 2004, CEAS.

[10]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[11]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[12]  Vipul Ved Prakash,et al.  Fighting Spam with Reputation Systems , 2005, ACM Queue.

[13]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[14]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[15]  Zhonghua Tang,et al.  A New Class Based Associative Classification Algorithm , 2007, IMECS.

[16]  Tunga Güngör,et al.  Time-efficient spam e-mail filtering using n-gram models , 2008, Pattern Recognit. Lett..

[17]  Yang Zhang,et al.  Applying Cost-Sensitive Multiobjective Genetic Programming to Feature Extraction for Spam E-mail Filtering , 2008, EuroGP.

[18]  Lise Getoor,et al.  Trusting spam reporters: A reporter-based reputation system for email filtering , 2008, TOIS.

[19]  Zhen-fang Zhu,et al.  Research on E-mail Filtering Based On Improved Bayesian , 2009, J. Comput..

[20]  Yang Song,et al.  Better Naive Bayes classification for high‐precision spam detection , 2009, Softw. Pract. Exp..

[21]  Alaa El-Halees Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques , 2009, Int. Arab J. Inf. Technol..

[22]  Eyke Hüllermeier,et al.  FURIA: an algorithm for unordered fuzzy rule induction , 2009, Data Mining and Knowledge Discovery.

[23]  Feng Qian,et al.  A case for unsupervised-learning-based spam filtering , 2010, SIGMETRICS '10.

[24]  Alaa H. Alhamami,et al.  Attack of Against Simplified Data Encryption Standard Cipher System Using Neural Networks , 2010 .

[25]  Qiao Liu,et al.  Text spam neural network classification algorithm , 2010, 2010 International Conference on Communications, Circuits and Systems (ICCCAS).

[26]  Charles L. A. Clarke,et al.  Efficient and effective spam filtering and re-ranking for large web datasets , 2010, Information Retrieval.

[27]  Alireza Sadeghian,et al.  Spam detection system: A new approach based on interval type-2 fuzzy sets , 2011, 2011 24th Canadian Conference on Electrical and Computer Engineering(CCECE).

[28]  Rasim M. Alguliyev,et al.  Classification of Textual E-Mail Spam Using Data Mining Techniques , 2011, Appl. Comput. Intell. Soft Comput..

[29]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[30]  Fadi A. Thabtah,et al.  MAC: A Multiclass Associative Classification Algorithm , 2012, J. Inf. Knowl. Manag..

[31]  Dhananjay Kalbande,et al.  ANFIS based Spam filtering model for Social Networking Websites , 2012 .

[32]  Hussein Y. Abu Mansour Rule pruning and prediction methods for associative classification approach in data mining , 2012 .

[33]  Gianni Costa,et al.  X-Class: Associative Classification of XML Documents by Structure , 2013, TOIS.

[34]  Raf Vandebril,et al.  Geometric Mean Algorithms Based on Harmonic and Arithmetic Iterations , 2013, GSI.

[35]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[36]  Sunil Pranit Lal,et al.  Improving Spam Detection Using Neural Networks Trained by Memetic Algorithm , 2013, 2013 Fifth International Conference on Computational Intelligence, Modelling and Simulation.

[37]  Fadi Thabtah,et al.  Predicting Phishing Websites using Neural Network trained with Back-Propagation , 2013 .

[38]  Robert E. Mercer,et al.  Classifying Spam Emails Using Text and Readability Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[39]  T. L. McCluskey,et al.  Intelligent rule-based phishing websites classification , 2014, IET Inf. Secur..

[40]  T. L. McCluskey,et al.  Tutorial and critical analysis of phishing websites methods , 2015, Comput. Sci. Rev..

[41]  Hussein Y. Mansour,et al.  Adapting associative classification for detecting phishing websites , 2015 .

[42]  Francisco Chiclana,et al.  Constrained dynamic rule induction learning , 2016, Expert Syst. Appl..

[43]  Rami M. Mohammad Investigating Trust Issue in Semantic Web Applications , 2016 .

[44]  T. L. McCluskey,et al.  An Improved Self-Structuring Neural Network , 2016, PAKDD Workshops.

[45]  Rami Mustafa A. Mohammad,et al.  An ensemble self-structuring neural network approach to solving classification problems with virtual concept drift and its application to phishing websites , 2016 .

[46]  Sunday Olusanya Olatunji Improved email spam detection model based on support vector machines , 2017, Neural Computing and Applications.

[47]  Rami M. Mohammad,et al.  An intelligent model for trustworthiness evaluation in semantic web applications , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[48]  Fadi Thabtah,et al.  Autism Spectrum Disorder Screening: Machine Learning Adaptation and DSM-5 Fulfillment , 2017, ICMHI.

[49]  Rami Mustafa A. Mohammad,et al.  A Neural Network based Digital Forensics Classification , 2018, 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[50]  Raja Srinivasa Reddy Boddu,et al.  Waikato Environment for Knowledge Analysis , 2019 .

[51]  Rami M. Mohammad,et al.  A comparison of machine learning techniques for file system forensics analysis , 2019, J. Inf. Secur. Appl..

[52]  Rami M. Mohammad,et al.  An Enhanced Multiclass Support Vector Machine Model and its Application to Classifying File Systems Affected by a Digital Crime , 2019, J. King Saud Univ. Comput. Inf. Sci..

[53]  Sukono,et al.  An Application of Genetic Algorithm Approach and Cobb-Douglas Model for Predicting the Gross Regional Domestic Product by Expenditure-Based in Indonesia , 2019 .