Relaxing Feature Selection in Spam Filtering by Using Case-Based Reasoning Systems

This paper presents a comparison between two alternative strategies for addressing feature selection on a well known case-based reasoning spam filtering system called SPAMHUNTING. We present the usage of the k more predictive features and a percentage-based strategy for the exploitation of our amount of information measure. Finally, we confirm the idea that the percentage feature selection method is more adequate for spam filtering domain.

[1]  Juan M. Corchado,et al.  Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain , 2005, CAEPIA.

[2]  Florentino Fernández Riverola,et al.  Analyzing the Performance of Spam Filtering Methods When Dimensionality of Input Vector Changes , 2007, MLDM.

[3]  Juan M. Corchado,et al.  Applying lazy learning algorithms to tackle concept drift in spam filtering , 2007, Expert Syst. Appl..

[4]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[5]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[6]  Padraig Cunningham,et al.  An Evaluation of the Usefulness of Case-Based Explanation , 2003, ICCBR.

[7]  Georgios Paliouras,et al.  Learning to Filter Unsolicited Commercial E-Mail , 2006 .

[8]  Juan M. Corchado,et al.  A Comparative Performance Study of Feature Selection Methods for the Anti-spam Filtering Domain , 2006, ICDM.

[9]  Juan M. Corchado,et al.  SpamHunting: An instance-based reasoning system for spam labelling and filtering , 2007, Decis. Support Syst..

[10]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[11]  Juan M. Corchado,et al.  Assessing Classification Accuracy in the Revision Stage of a CBR Spam Filtering System , 2007, ICCBR.

[12]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[13]  Padraig Cunningham,et al.  An Analysis of Case-Base Editing in a Spam Filtering System , 2004, ECCBR.

[14]  Mads Haahr,et al.  A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[15]  Juan M. Corchado,et al.  Tracking Concept Drift at Feature Selection Stage in SpamHunting: An Anti-spam Instance-Based Reasoning System , 2006, ECCBR.

[16]  L. Hedges,et al.  Meta-analysis of screening and diagnostic tests. , 1995, Psychological bulletin.