Filtering spam messages and mails using fuzzy C means algorithm

Advancement in computer technology has changed the world in many different ways. Communication is just a click away with the power of internet. For effective, low cost and fast communication between people email plays a very important role and thus there is a great need of email services in daily life of users. From all the transactions to business or general communication these are done through the help of emails. But often the communication is effected by the attacks on the email system which include spam mails. Spamming is the use of messaging or electronic messaging system that send huge amount of data. Spam often fills the internet with multiple copies of a message and are sent to different recipients repeatedly without their request and urges to open them. In this paper we analyze different machine learning techniques with feature selection and without feature selection algorithms and their performance to detect the best classifier for spam mail classification. First, we apply each classifier without selecting any features in order to experiment on the dataset and examine the outcome. Next, to select the desired features we apply best first feature selection algorithm and apply various algorithms for classification. We found that the accuracy has improved when we applied feature selection process in the experimentation.

[1]  Bernhard Schölkopf,et al.  Statistical Learning Theory: Models, Concepts, and Results , 2008, Inductive Logic.

[2]  Jácint Szabó,et al.  Linked latent Dirichlet allocation in web spam filtering , 2009, AIRWeb '09.

[3]  N. T. Mohammad A Fuzzy Clustering Approach to Filter Spam E-Mail , 2011 .

[4]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[5]  Judy Kay,et al.  Automatic Induction of Rules of e-mail Classification , 2001 .

[6]  Victoria Bellotti,et al.  E-mail as habitat: an exploration of embedded personal information management , 2001, INTR.

[7]  Lluís Màrquez i Villodre,et al.  Boosting Trees for Anti-Spam Email Filtering , 2001, ArXiv.

[8]  Jurandy Almeida,et al.  Evaluation of Approaches for Dimensionality Reduction Applied with Naive Bayes Anti-Spam Filters , 2009, 2009 International Conference on Machine Learning and Applications.

[9]  Ahmed Khorsi,et al.  An Overview of Content-Based Spam Filtering Techniques , 2007, Informatica.

[10]  Rasim M. Alguliyev,et al.  Classification of Textual E-Mail Spam Using Data Mining Techniques , 2011, Appl. Comput. Intell. Soft Comput..

[11]  Huan Liu,et al.  A Monotonic Measure for Optimal Feature Selection , 1998, ECML.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[14]  Justin Doak,et al.  An evaluation of feature selection methods and their application to computer security , 1992 .

[15]  Stephen E. Fienberg,et al.  Bayesian Mixed Membership Models for Soft Clustering and Classification , 2004, GfKl.

[16]  Owen Kufandirimbwa,et al.  Spam Detection Using Artificial Neural Networks (Perceptron Learning Rule) , 2012 .

[17]  Rich Caruana,et al.  How Useful Is Relevance , 1994 .