Using cellular automata for improving knn based spam filtering

As rapid growth over the Internet nowadays, electronic mail (e-mails) has become a popular communication tool. However, junk mail also, known as spam has increasingly become a part of life for users as well as internet service providers. To address this problem, many solutions have been proposed in the last decade. Currently, content-based anti-spam filtering methods are an important issue; the spam filtering is considered as a special case of binary text categorization. Many machine learning techniques have been developed and applied to classify email as spam or non-spam. In this paper, we proposed an enhanced K-Nearest Neighbours (KNN) method called Cellular Automaton Combined with KNN (CA-KNN) for spam filtering. In our proposed method, a cellular automaton is used to identify which instances in training set should be selected to classify a new e-mail; CA-KNN selects the nearest neighbours not from the whole training set, but only from a reduced subset selected by a cellular automaton.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Kevin R. Gee Using latent semantic indexing to filter spam , 2003, SAC '03.

[3]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[4]  Alaa El-Halees Filtering spam e-mail from mixed arabic and english messages: a comparison of machine learning techniques , 2009, Int. Arab J. Inf. Technol..

[5]  Gordon V. Cormack,et al.  Email Spam Filtering: A Systematic Review , 2008, Found. Trends Inf. Retr..

[6]  Bouziane Beldjilali,et al.  Knowledge Discovery in Database: Induction Graph and Cellular Automaton , 2007, Comput. Informatics.

[7]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[8]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[9]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[10]  Georgios Paliouras,et al.  Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  Tianshun Yao,et al.  An evaluation of statistical spam filtering techniques , 2004, TALIP.

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[15]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[16]  Josef Kittler,et al.  Combining classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[17]  Ray Hunt,et al.  Tightening the net: A review of current and next generation spam filtering tools , 2006, Comput. Secur..

[18]  Baghdad Atmani,et al.  Combining Classifiers for Spam Detection , 2012, NDT.

[19]  Igor Santos,et al.  Enhanced Topic-based Vector Space Model for semantics-aware spam filtering , 2012, Expert Syst. Appl..

[20]  Georgios Paliouras,et al.  Stacking Classifiers for Anti-Spam Filtering of E-Mail , 2001, EMNLP.

[21]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[22]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[23]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[24]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[25]  Shampa Chakraverty,et al.  A Review of T ext Classification Approaches for E-mail Management , 2011 .

[26]  Barigou Naouel,et al.  A boolean model for spam detection , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[27]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..