论文信息 - Active learning with simplified SVMs for spam categorization

Active learning with simplified SVMs for spam categorization

We propose a method for spam categorization based on support vector machines (SVMs) using active learning strategy. We study the use of support vector machines in classifying e-mail as spam or nonspam. But the standard algorithms for training support vector machines generally produce solutions with a greater number of support vectors than strictly necessary. An algorithm is applied in the paper that allows the unnecessary support vectors to be recognized and eliminated. We analyze the particular properties of our special task and identify why SVMs especially the simplified SVMs are appropriate for dealing with spam. Instead of using a randomly selected training set, the learner has access to a pool of unlabeled instances and can request the labels for some number of them. We introduce a new method for choosing which instances to request next.

Hou-Kuan Huang | Kai Li | Kun-Lun Li | Sheng-Feng Tian

[1] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3] Gunnar Rätsch,et al. An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[4] William A. Gale,et al. A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5] Huang Houkuan,et al. An architecture of active learning SVMs for spam , 2002, 6th International Conference on Signal Processing, 2002..

[6] Christopher J. C. Burges,et al. Simplified Support Vector Decision Rules , 1996, ICML.

[7] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8] Colin Campbell,et al. An introduction to kernel methods , 2001 .

[9] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[10] William W. Cohen. Learning Rules that Classify E-Mail , 1996 .

[11] Nello Cristianini,et al. Query Learning with Large Margin Classi ersColin , 2000 .

[12] Harris Drucker,et al. Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[13] Tom Downs,et al. Exact Simplification of Support Vector Solutions , 2002, J. Mach. Learn. Res..