论文信息 - Random Forest Technique for E-mail Classification

Random Forest Technique for E-mail Classification

Email has been an efficient and popular communication mechanism as the number of Internet users increase. Therefore, email management is an important and growing problem for individuals and organizations because it is prone to misuse. The blind posting of unsolicited email messages, known as spam, is an example of misuse. Spam is commonly defined as the sending of unsolicited bulk email that is, email that was not asked for by multiple recipients. The classification algorithms such as Neural Network (NN), Support Vector Machine (SVM), and Naive Bayesian (NB) are currently used in various datasets and showing a good classification result. This paper described classification of emails by Random Forests Technique (RF). RF is ensemble learning technique. A data mining technique called "Ensemble learning" consists of methods that generate many classifiers like decision trees and aggregates the results by taking a weighted vote of their predictions is developed. First the Body of the message is evaluated and after preprocessing the tokens are extracted. Then using a term selection method, the best discriminative terms are retained and other terms are removed. Then iterative patterns are extracted and a feature vector is built for each sample. Finally Random Forest is applied as classifier. If identified category is 0 then it is non-spam otherwise if identified category is 1 then it is spam.

M. Tech Student | P. P. Halkarnikar | Bhagyashri U. Gaikwad

[1] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[2] Denil Vira,et al. An Approach to Email Classification Using Bayesian Theorem , 2012 .

[3] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[4] Yaping Lin,et al. Improved Bayesian Spam Filtering Based on Co-weighted Multi-area Information , 2005, PAKDD.

[5] Manasi Patwardhan,et al. EFFICIENT SPAM CLASSIFICATION BY APPROPRIATE FEATURE SELECTION , 2013 .

[6] Ratheesh Raghavan,et al. Study of the relationship of training set size to error rate in yet another decision tree And random forest algorithms , 2006 .

[7] Suku Nair,et al. A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[8] Dennis McLeod,et al. A Comparative Study for Email Classification , 2007 .

[9] R.F. Erbacher,et al. An Evaluation of Naïve Bayesian Anti-Spam Filtering Techniques , 2007, 2007 IEEE SMC Information Assurance and Security Workshop.

[10] Václav Snásel,et al. The Bayesian Spam Filter with NCD , 2012, DATESO.

[11] M. Basavaraju,et al. A Novel Method of Spam Mail Detection using Text Based Clustering Approach , 2010 .

[12] Zhou Xu,et al. An Improved Bayesian with Application to Anti-Spam Email , 2005 .