Detection of Illegitimate Emails Using Boosting Algorithm

In this paper, we report on experiments to detect illegitimate emails using boosting algorithm. We call an email illegitimate if it is not useful for the receiver or for the society. We have divided the problem into two major areas of illegitimate email detection: suspicious email detection and spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we propose suitable weak learners and parameter settings for the boosting algorithm for the desired task. We have initially analyzed the problem using base learners. Then we have applied boosting algorithm with suitable weak learners and parameter settings such as the number of boosting iterations. We propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Dennis McLeod,et al.  Efficient Spam Email Filtering using Adaptive Ontology , 2007, Fourth International Conference on Information Technology (ITNG'07).

[3]  Rosina O. Weber,et al.  Integrated Approach to Detect Inconspicuous Contents , 2005, Wissensmanagement.

[4]  S. Appavu alias Balamurugan,et al.  Suspicious E-mail Detection via Decision Tree: A Data Mining Approach , 2007, J. Comput. Inf. Technol..

[5]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[6]  Mjh Lim,et al.  Computational intelligence in E-mail trafficanalysis , 2008 .

[7]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[8]  Steven L. Salzberg,et al.  Book Review: C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 , 1994, Machine Learning.

[9]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[10]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[11]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[12]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[13]  Haiying Tu,et al.  Detecting, tracking, and counteracting terrorist networks via hidden Markov models , 2004, 2004 IEEE Aerospace Conference Proceedings (IEEE Cat. No.04TH8720).

[14]  Dennis McLeod,et al.  A Comparative Study for Email Classification , 2007 .

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  S. Appavu alias Balamurugan,et al.  Association Rule Mining for Suspicious Email Detection: A Data Mining Approach , 2007, 2007 IEEE Intelligence and Security Informatics.

[17]  D. Karthika Renuka,et al.  Email classification for Spam Detection using Word Stemming , 2010 .

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[20]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[21]  Richard Clayton Email traffic: a quantitative snapshot , 2007, CEAS.

[22]  A. B. M. Shawkat Ali,et al.  Spam Classification Using Adaptive Boosting Algorithm , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[23]  Tom Bylander,et al.  Using Validation Sets to Avoid Overfitting in AdaBoost , 2006, FLAIRS.

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  Thorsten Joachims,et al.  A Statistical Learning Model of Text Classification for Support Vector Machines. , 2001, SIGIR 2002.

[27]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[28]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[29]  S. Appavu alias Balamurugan,et al.  Learning to classify threatening e-mail , 2008, Int. J. Artif. Intell. Soft Comput..

[30]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.