论文信息 - Online Discriminative Spam Filter Training

Online Discriminative Spam Filter Training

We describe a very simple technique for discriminatively training a spam filter. Our results on the TREC Enron spam corpus would have been the best for the Ham at .1% measure, and second best by the 1-ROCA measure. For the Mr. X corpus, our 1-ROCA measure was a close second best, and third best by the Ham at .1% measure. We use a very simple feature extractor (all words in the subject and headers). Our learning algorithm is also very simple: gradient descent of a logistic regression model.

Joshua Goodman | Wen-tau Yih | Joshua Goodman | Wen-tau Yih

[1] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[2] William S. Yerazunis,et al. CRM114 versus Mr. X: CRM114 Notes for the TREC 2005 Spam Track , 2005, TREC.

[3] A. Bratko,et al. Spam Filtering Using Compression Models , 2005 .

[4] L. A. Breyer. THE DBACL TEXT CLASSIFIER , 2007 .

[5] Richard Segal,et al. IBM SpamGuru on the TREC 2005 Spam Track , 2005, TREC.