Spam e-mail classification for the Internet of Things environment using semantic similarity approach

Unauthorized service or product advertising messages sent via electronic mails are called as spam e-mails. Detecting spam e-mail remains a challenging task. Existing countermeasures based on the statistical keyword, conceptual and IP address-based blacklists are not efficient due to difficulty in finding new attack patterns generated by the Internet of Things botnet devices. The other spam detection approaches rely on a hybrid of conceptual knowledge engineering with machine learning techniques. But, modern spammers evade the hybrid techniques through word polysemy and word ambiguity due to the context-sensitive nature of words. In this paper, the integration of Naïve Bayesian classification with conceptual and semantic similarity technique is proposed to combat the ambiguity raised through polysemy in spam detection. To analyse the effectiveness of our approach, the experiments were conducted on benchmark data sets such as Spambase, PU1, Enron corpus, and Ling-spam. From the experimental results, it is evident that our proposed system achieves high accuracy of 98.89% than the existing approaches.

[1]  Yiyu Yao,et al.  Cost-sensitive three-way email spam filtering , 2013, Journal of Intelligent Information Systems.

[2]  Henryk Krawczyk,et al.  Detection Methods of Dynamic Spammers' Behavior , 2007, 2nd International Conference on Dependability of Computer Systems (DepCoS-RELCOMEX '07).

[3]  Mikko T. Siponen,et al.  Effective Anti-Spam Strategies in Companies: An International Study , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[4]  Iraklis Varlamis,et al.  Semantic smoothing for text clustering , 2013, Knowl. Based Syst..

[5]  Tao Ban,et al.  Detecting Malicious Spam Mails: An Online Machine Learning Approach , 2014, ICONIP.

[6]  IbrahimOthman,et al.  Enhancement of spam detection mechanism based on hybrid $$\varvec{k}$$k-mean clustering and support vector machine , 2015, SOCO 2015.

[7]  Luis Mateus Rocha,et al.  Adaptive Spam Detection Inspired by a Cross-Regulation Model of Immune Dynamics: A Study of Concept Drift , 2008, ICARIS.

[8]  Siyuan Li,et al.  Filtering Chinese Image Spam Using Pseudo-OCR , 2015 .

[9]  Geoff Hulten,et al.  Learning at Low False Positive Rates , 2006, CEAS.

[10]  Cheng Hua Li,et al.  Spam filtering using semantic similarity approach and adaptive BPNN , 2012, Neurocomputing.

[11]  Te-Ming Chang,et al.  An incremental cluster-based approach to spam filtering , 2008, Expert Syst. Appl..

[12]  Kang Li,et al.  ALPACAS: A Large-Scale Privacy-Aware Collaborative Anti-Spam System , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[13]  Mads Haahr,et al.  A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[14]  Analía Amandi,et al.  Semantic spam filtering from personalized ontologies , 2008, J. Web Eng..

[15]  Jean-Yves Le Boudec,et al.  Artificial Immune System for Collaborative Spam Filtering , 2007, NICSO.

[16]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[17]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[18]  Carla E. Brodley,et al.  Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers , 2006, TREC.

[19]  Irena Koprinska,et al.  A neural network based approach to automated e-mail classification , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[20]  Rui Chen,et al.  Research Article Phishing Susceptibility: An Investigation Into the Processing of a Targeted Spear Phishing Email , 2012, IEEE Transactions on Professional Communication.

[21]  Wolfgang Nejdl,et al.  Preventing shilling attacks in online recommender systems , 2005, WIDM '05.

[22]  Abolghasem Sadeghi-Niaraki,et al.  Collaborative spam filtering based on incremental ontology learning , 2013, Telecommun. Syst..

[23]  Ke Wang,et al.  Behavior-based modeling and its application to Email analysis , 2006, TOIT.

[24]  Binshan Lin,et al.  Collaborative spam filtering with heterogeneous agents , 2008, Expert Syst. Appl..

[25]  Ibrahim F. Moawad,et al.  Semantic-Based Feature Reduction Approach for E-mail Classification , 2016, AISI.

[26]  Hao Xu,et al.  Automatic thesaurus construction for spam filtering using revised back propagation neural network , 2010, Expert Syst. Appl..

[27]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[28]  Chi-Yuan Yeh,et al.  Effective spam classification based on meta-heuristics , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[29]  Vangelis Metsis,et al.  Spam Filtering with Naive Bayes - Which Naive Bayes? , 2006, CEAS.

[30]  Jason J. Jung Towards Collaborative Spam Filtering Based on Collective Intelligence , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[31]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[32]  Florentino Fernández Riverola,et al.  Boosting Accuracy of Classical Machine Learning Antispam Classifiers in Real Scenarios by Applying Rough Set Theory , 2016, Sci. Program..

[33]  Wei Hu,et al.  Spam filtering by semantics-based text classification , 2016, 2016 Eighth International Conference on Advanced Computational Intelligence (ICACI).

[34]  Ahmed Hamza Osman,et al.  Enhancement of spam detection mechanism based on hybrid $$\varvec{k}$$k-mean clustering and support vector machine , 2015, Soft Comput..

[35]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[36]  S. M. Elseuofi,et al.  Machine Learning methods for E-mail Classification , 2011 .

[37]  Geun-Sik Jo,et al.  Semantic Analysis of User Behaviors for Detecting Spam Mail , 2008, 2008 IEEE International Workshop on Semantic Computing and Applications.

[38]  Nizar Bouguila,et al.  A study of spam filtering using support vector machines , 2010, Artificial Intelligence Review.