The article gives an overview of some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs) and of their applicability to the problem of spam-filtering. Brief descriptions of the algorithms are presented, which are meant to be understandable by a reader not familiar with them before. A most trivial sample implementation of the named techniques was made by the author, and the comparison of their performance on the PU1 spam corpus is presented. Finally, some ideas are given of how to construct a practically useful spam filter using the discussed techniques. The article is related to the author’s first attempt of applying the machine-learning techniques in practice, and may therefore be of interest primarily to those getting aquainted with machine-learning.
[1]
Thorsten Joachims,et al.
Making large scale SVM learning practical
,
1998
.
[2]
S. Hyakin,et al.
Neural Networks: A Comprehensive Foundation
,
1994
.
[3]
Susan T. Dumais,et al.
A Bayesian Approach to Filtering Junk E-Mail
,
1998,
AAAI 1998.
[4]
Georgios Paliouras,et al.
Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach
,
2000,
ArXiv.
[5]
Constantine D. Spyropoulos,et al.
An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages
,
2000,
SIGIR '00.
[7]
Nello Cristianini,et al.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
,
2000
.