A new semantic attribute deep learning with a linguistic attribute hierarchy for spam detection

The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. In this paper, we propose another form of deep learning, a linguistic attribute hierarchy, embedded with linguistic decision trees, for spam detection, and examine the effect of semantic attributes on the spam detection, represented by the linguistic attribute hierarchy. A case study on the SMS message database from the UCI machine learning repository has shown that a linguistic attribute hierarchy embedded with linguistic decision trees provides a transparent approach to in-depth analysing attribute impact on spam detection. This approach can not only efficiently tackle ‘curse of dimensionality’ in spam detection with massive attributes, but also improve the performance of spam detection when the semantic attributes are constructed to a proper hierarchy.

[1]  C. Aitken,et al.  The logic of decision , 2014 .

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Jiawei Han,et al.  Survey on web spam detection: principles and algorithms , 2012, SKDD.

[4]  Konstantin Tretyakov,et al.  Machine Learning Techniques in Spam Filtering , 2004 .

[5]  Jonathan Lawry,et al.  The linguistic attribute hierarchy and its optimisation for classification , 2014, Soft Comput..

[6]  Arkaitz Zubiaga,et al.  Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter , 2015, #MSM.

[7]  Akebo Yamakami,et al.  An Analysis of Machine Learning Methods for Spam Host Detection , 2012, 2012 11th International Conference on Machine Learning and Applications.

[8]  Zeng-Chang Qin,et al.  ROC analysis for predictions made by probabilistic classifiers , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[9]  Houshmand Shirani-mehr,et al.  SMS Spam Detection Using Machine Learning Approach , 2024, INTERNATIONAL JOURNAL OF RESEARCH IN SCIENCE AND TECHNOLOGY.

[10]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Why Bayesian filtering is the most effective anti-spam technology Achieving a 98%+ spam detection rate using a mathematical approach , 2007 .

[13]  Eric Brill,et al.  Beyond PageRank: machine learning for static ranking , 2006, WWW '06.

[14]  Akebo Yamakami,et al.  Contributions to the study of SMS spam filtering: new collection and results , 2011, DocEng '11.

[15]  Ashutosh Tiwari,et al.  A cascade of linguistic CMAC neural networks for decision making , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[16]  András A. Benczúr,et al.  SpamRank - fully automatic link spam detection. Work in progress , 2005 .

[17]  Taghi M. Khoshgoftaar,et al.  Survey of review spam detection using machine learning techniques , 2015, Journal of Big Data.

[18]  J. Lawry,et al.  Optimal Cascaded Hierarchies of Linguistic Decision Trees for Decision Making , 2008 .

[19]  Sanjeev Dhawan,et al.  Detection of Spam in Social Networks using Clustered k- Nearest Neighbour , 2015 .

[20]  Hossam Faris,et al.  Improving Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in Imbalance Data Distribution , 2015 .

[21]  Ashutosh Tiwari,et al.  Incremental information gain analysis of input attribute impact on RBF-kernel SVM spam detection , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[22]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[23]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[24]  Jonathan Lawry Modelling and Reasoning with Vague Concepts , 2006, Studies in Computational Intelligence.