Active Learning Method for Chinese Spam Filtering

An active learning method is put forward to filter Chinese spam. In terms of training the filtering model, labeling all of the emails seems to be costly and time-consuming, while unlabeled emails can be easily accessed. Misclassification and a low-certainty method is proposed to reduce the number of labeled emails. The ROSVM model is also utilized as the online filtering model. The experimental results show that the proposed method not only decreases the number of training emails and the computational cost of spam filter, but also improves the accuracy of the filter.

[1]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[2]  Ashraf Darwish,et al.  A Survey of Machine Learning Techniques for Spam Filtering , 2012 .

[3]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[4]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[5]  Sarah Jane Delany,et al.  SMS spam filtering: Methods and data , 2012, Expert Syst. Appl..

[6]  D. Sculley,et al.  Relaxed online SVMs for spam filtering , 2007, SIGIR.

[7]  D. Sculley,et al.  Online Active Learning Methods for Fast Label-Efficient Spam Filtering , 2007, CEAS.

[8]  Florentino Fernández Riverola,et al.  SDAI: An integral evaluation methodology for content-based spam filtering models , 2012, Expert Syst. Appl..

[9]  Ting Wang,et al.  Active Learning for Online Spam Filtering , 2008, AIRS.

[10]  Rong Jin,et al.  Online Feature Selection and Its Applications , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Ting Wang,et al.  Online active multi-field learning for efficient email spam filtering , 2011, Knowledge and Information Systems.

[12]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[13]  Michael Davy,et al.  A Review of Active Learning and Co-Training in Text Classification , 2005 .

[14]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[15]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[16]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .