论文信息 - Spam filtering with several novel bayesian classifiers

Spam filtering with several novel bayesian classifiers

In this paper, we report our work on spam filtering with three novel Bayesian classification methods: aggregating one-dependence estimators (AODE), hidden Naive Bayes (HNB), locally weighted learning with Naive Bayes (LWNB). Other four traditional classifiers: Naive Bayes, k nearest neighbor (kNN), support vector machine (SVM), C4.5 are also performed for comparison. Four feature selection methods: gain ratio, information gain, symmetrical uncertainty and ReliefF, are used to select relevant words for spam filtering. Results of experiments on two corpora show the promising capabilities of Bayesian classifiers for spam filtering, especial for that of AODE.

Chunhua Zhang | Yingjie Tian | Chuanliang Chen

[1] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2] Liangxiao Jiang,et al. Hidden Naive Bayes , 2005, AAAI.

[3] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[4] Georgios Paliouras,et al. Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[5] Tianshun Yao,et al. An evaluation of statistical spam filtering techniques , 2004, TALIP.

[6] Joshua Alspector,et al. SVM-based Filtering of E-mail Spam with Content-specic Misclassication Costs , 2001 .

[7] Georgios Paliouras,et al. Learning to Filter Unsolicited Commercial E-Mail , 2006 .

[8] Georgios Paliouras,et al. A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[9] Geoffrey I. Webb,et al. Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[10] Bernhard Pfahringer,et al. Locally Weighted Naive Bayes , 2002, UAI.

[11] Lluís Màrquez i Villodre,et al. Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[12] Karl-Michael Schneider,et al. A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[13] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.