A Combining Classifiers Approach for Detecting Email Spams

Email is a rapid and cheap communication medium for sending and receiving information where spam is becoming a nuisance for such communication. A good spam filtering cannot only be achieved by high performance accuracy but low false positive is also necessary. This paper presents a combining classifiers approach with committee selection mechanism where the main objective is to combine individual decisions of the good classifiers for utmost classification outcome in spam classification domain. In this context, three different classifiers have been selected i.e. "Boosted Bayesian", "Boosted Naïve Bayes and Support Vector Machine (SVM). For combining classifiers, boosted bayesian and boosted naïve bayes are chosen as members of committee and SVM is taken as the president. The member of committee have been selected from our previous study where we have identified boosting with adaboost improves the performance of probabilistic classifier. Results show the best results of novel combining classifier approach in compression with individual classifiers compared in terms of good performance accuracy and low false positives. In addition, greedy step wise feature search method is found to be good in this study.

[1]  Gordon V. Cormack,et al.  Spam and the ongoing battle for the inbox , 2007, CACM.

[2]  Georgios Paliouras,et al.  Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.

[3]  Patrick Pantel,et al.  SpamCop: A Spam Classification & Organisation Program , 1998, AAAI 1998.

[4]  Jason D. M. Rennie ifile: An Application of Machine Learning to E-Mail Filtering , 2000 .

[5]  Shrawan Kumar Trivedi,et al.  Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails , 2014, SIAP.

[6]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[7]  Shrawan Kumar Trivedi,et al.  An Enhanced Genetic Programming Approach for Detecting Unsolicited Emails , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[8]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[9]  Chih-Chin Lai,et al.  An empirical study of three machine learning methods for spam filtering , 2007, Knowl. Based Syst..

[10]  Shrawan Kumar Trivedi,et al.  A study of ensemble based evolutionary classifiers for detecting unsolicited emails , 2014, RACS '14.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Shrawan Kumar Trivedi,et al.  Interplay between Probabilistic Classifiers and Boosting Algorithms for Detecting Complex Unsolicited Emails , 2013 .

[13]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .

[14]  TrivediShrawan Kumar,et al.  Interaction between feature subset selection techniques and machine learning classifiers for detecting unsolicited emails , 2014 .

[15]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[16]  Shrawan Kumar Trivedi,et al.  Effect of feature selection methods on machine learning classifiers for detecting email spams , 2013, RACS.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Shrawan Kumar Trivedi,et al.  Effect of Various Kernels and Feature Selection Methods on SVM Performance for Detecting Email Spams , 2013 .

[19]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.