Feature selection and similarity coefficient based method for email spam filtering

Many threats in the real world can be related to activities of persons on the Internet. Spam is one of the most pressing security problems online. Spam filters try to identify likely spam either manually or automatically. Most of the spam datasets used in the spam filtering area of study deal with large amounts of data containing irrelevant and/or redundant features. This redundant information has a negative impact on the accuracy and detection rate of many methods that have been used for detection and filtering. In this study, statistical feature selection approach combined with similarity coefficients are used to improve the accuracy and detection rate for the spam detection and filtering. At the end, the study results based on email spam datasets show that our proposed approach enhanced the detection rate, false alarm rate and the accuracy.

[1]  José Francisco Martínez Trinidad,et al.  Taking Advantage of Class-Specific Feature Selection , 2009, IDEAL.

[2]  Zhiqing Zhu,et al.  An Email Classification Model Based on Rough Set and Support Vector Machine , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Naomie Salim,et al.  Ligand-based Virtual screening using Fuzzy Correlation Coefficient , 2011 .

[4]  B. Bursteinas,et al.  Transforming supervised classifiers for feature extraction , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[5]  José Manuel Benítez,et al.  Empirical Study of Feature Selection Methods in Classification , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[6]  Naomie Salim,et al.  An enhancement of bayesian inference network for ligand-based virtual screening using features selection , 2011 .

[7]  Peter Willett,et al.  Measuring the degree of similarity between objects in text retrieval systems , 1993 .

[8]  Hamid A. Jalab,et al.  Overview of textual anti-spam filtering techniques , 2010 .

[9]  K. Selvakuberan,et al.  Combined Feature Selection and classification – A novel approach for the categorization of web pages , 2008 .

[10]  Alireza Osareh,et al.  Spam Filtering by Using a Compound Method of Feature Selection , 2012 .

[11]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[12]  Ammar Ahmed E. Elhadi,et al.  Malware detection based on hybrid signature behavior application programming interface call graph , 2012 .

[13]  Uffe Kock Wiil,et al.  Modeling Suspicious Email Detection using Enhanced Feature Selection , 2013, ArXiv.

[14]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[15]  Hossein Nezamabadi-pour,et al.  GA-based feature subset selection in a spam/non-spam detection system , 2012, 2012 International Conference on Computer and Communication Engineering (ICCCE).

[16]  Antonio Ara,et al.  Empirical Study of Feature Selection Methods in Classification , 2008 .

[17]  A.P.J. van den Bosch,et al.  Using language models for spam detection in social bookmarking , 2008 .

[18]  Dong Seong Kim,et al.  Spam Detection Using Feature Selection and Parameters Optimization , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.