Latent Semantic Indexing Based SVM Model for Email Spam Classification

Internet plays a drastic role in part of communication nowadays but in e-mail, spam is the major problem. Email spam is unwanted, inappropriate or no longer wanted mails also known as junk email. To eliminate these spam mails, spam filtering methods are implemented using classification algorithms. Among various algorithms, Support Vector Machine (SVM) is used as an effective classifier for spam classification by various researchers. But, the accuracy level is not up to notable level so further. To improve the accuracy, Latent Semantic Indexing (LSI) is used as feature extraction method to select the suitable feature space. The hybrid model of spam mail classification can provide the effective results. The Ling spam email corpus is used as datasets for the experimentation. The performance of the system is evaluated using measures such as recall, precision and overall accuracy.

[1]  Wei Zhang,et al.  An Improvement to Naive Bayes for Text Classification , 2011 .

[2]  S. Baskaran Content based email classification system by applying conceptual maps , 2009, 2009 International Conference on Intelligent Agent & Multi-Agent Systems.

[3]  S. M. Elseuofi,et al.  MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION , 2011 .

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Florentino Fernández Riverola,et al.  Rough sets for spam filtering: Selecting appropriate decision rules for boundary e-mail classification , 2012, Appl. Soft Comput..

[6]  Ashraf Darwish,et al.  A Survey of Machine Learning Techniques for Spam Filtering , 2012 .

[7]  Shih-Wei Lin,et al.  An ensemble approach applied to classify spam e-mails , 2010, Expert Syst. Appl..

[8]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[9]  Wanlei Zhou,et al.  An analysis of spam and its classification techniques based on statistical learning algorithms , 2005 .

[10]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[11]  Fridolin Wild,et al.  Investigating Unstructured Texts with Latent Semantic Analysis , 2006, GfKl.

[12]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[13]  Zhen Liu,et al.  A new feature selection algorithm based on binomial hypothesis testing for spam filtering , 2011, Knowl. Based Syst..

[14]  Qing Yang,et al.  Support vector machine for customized email filtering based on improving latent semantic indexing , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[15]  D. Karthika Renuka,et al.  Blending Firefly and Bayes Classifier for Email Spam Classification , 2013 .

[16]  Thomas K. Landauer,et al.  Simulating Text Understanding for Educational Applications with Latent Semantic Analysis: Introduction to LSA , 2000, Interact. Learn. Environ..

[17]  Wanlei Zhou,et al.  Architecture of Adaptive Spam Filtering Based on Machine Learning Algorithms , 2007, ICA3PP.

[18]  Igor Santos,et al.  Enhanced Topic-based Vector Space Model for semantics-aware spam filtering , 2012, Expert Syst. Appl..

[19]  Wanlei Zhou,et al.  An Innovative Spam Filtering Model Based on Support Vector Machine , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[20]  Morshed U. Chowdhury,et al.  Spam filtering using ML algorithms , 2005 .