Email classification using data reduction method

Classifying user emails correctly from penetration of spam is an important research issue for anti-spam researchers. This paper has presented an effective and efficient email classification technique based on data filtering method. In our testing we have introduced an innovative filtering technique using instance selection method (ISM) to reduce the pointless data instances from training model and then classify the test data. The objective of ISM is to identify which instances (examples, patterns) in email corpora should be selected as representatives of the entire dataset, without significant loss of information. We have used WEKA interface in our integrated classification model and tested diverse classification algorithms. Our empirical studies show significant performance in terms of classification accuracy with reduction of false positive instances.

[1]  Minyi Guo,et al.  An innovative analyser for multi-classifier e-mail classification based on grey list analysis , 2009, J. Netw. Comput. Appl..

[2]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[3]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  A. Eleyan,et al.  Face Recognition using Multiresolution PCA , 2007, 2007 IEEE International Symposium on Signal Processing and Information Technology.

[5]  Wanlei Zhou,et al.  Architecture of Adaptive Spam Filtering Based on Machine Learning Algorithms , 2007, ICA3PP.

[6]  Georgios Paliouras,et al.  Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach , 2000, ArXiv.

[7]  David C. Gibbon,et al.  Support vector machines: relevance feedback and information retrieval , 2002, Inf. Process. Manag..

[8]  Ray Hunt,et al.  Current and New Developments in Spam Filtering , 2006, 2006 14th IEEE International Conference on Networks.

[9]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[10]  Wanlei Zhou,et al.  Dynamic Feature Selection for Spam Filtering Using Support Vector Machine , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[11]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[12]  Wanlei Zhou,et al.  An Innovative Spam Filtering Model Based on Support Vector Machine , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[13]  Kate Smith-Miles,et al.  Meta-Learning of Instance Selection for Data Summarization , 2011, Meta-Learning in Computational Intelligence.

[14]  Xue Jun Li,et al.  Design and implementation of user interface for mobile devices , 2004, IEEE Transactions on Consumer Electronics.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  I. Cloete,et al.  Learning to classify email: a survey , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Morshed U. Chowdhury,et al.  Spam filtering using ML algorithms , 2005 .