An E-mail Filtering Approach Using Classification Techniques

E-mail is one of the most popular ways of communication due to its accessibility, low sending cost and fast message transfer. However, Spam emails appear as a severe problem affecting this application of today’s Internet. Filtering is an important approach to isolate those spam emails. In this paper, an approach for filtering spam email is proposed, which is based on classification techniques. The approach analyses the body of Email messages and assigns weights to terms (features) that can help identifying spam and clean (ham) emails. An adaptation is proposed that tries to reduce the dimensionality of the extracted features, in which only determined (meaningful) terms are regarded by consulting a dictionary. A thorough comparative study has been studied among different classification algorithms that prove the efficiency of the filtering approach proposed. The approach has been evaluated using Enron dataset.

[1]  M. Dolores del Castillo,et al.  An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail , 2006, IDEAL.

[2]  L M Patnaik,et al.  Classification of email using BeaKS: Behavior and keyword stemming , 2011, TENCON 2011 - 2011 IEEE Region 10 Conference.

[3]  Sujeet More,et al.  Data mining with machine learning applied for email deception , 2013, 2013 International Conference on Optical Imaging Sensor and Security (ICOSS).

[4]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[5]  Chih-Chin Lai,et al.  An empirical performance comparison of machine learning methods for spam e-mail categorization , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[6]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[7]  Wanlei Zhou,et al.  Architecture of Adaptive Spam Filtering Based on Machine Learning Algorithms , 2007, ICA3PP.

[8]  Robert E. Mercer,et al.  Classifying Spam Emails Using Text and Readability Features , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  D. Karthika Renuka,et al.  Spam Classification Based on Supervised Learning Using Machine Learning Techniques , 2011, 2011 International Conference on Process Automation, Control and Computing.

[10]  Yang Xiang,et al.  Email classification using data reduction method , 2010, 2010 5th International ICST Conference on Communications and Networking in China.

[11]  Ali Ahmed A. Abdelrahim,et al.  Feature selection and similarity coefficient based method for email spam filtering , 2013, 2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE).

[12]  Md. Rafiqul Islam,et al.  Machine Learning Approaches for Modeling Spammer Behavior , 2010, AIRS.

[13]  Liang Ting,et al.  Spam Feature Selection Based on the Improved Mutual Information Algorithm , 2012, 2012 Fourth International Conference on Multimedia Information Networking and Security.

[14]  Amr M. Youssef,et al.  On Some Feature Selection Strategies for Spam Filter Design , 2006, 2006 Canadian Conference on Electrical and Computer Engineering.