How Spam Features Change in Twitter and the Impact to Machine Learning Based Detection

Twitter Spam is a critical problem and current solution is mainly about machine learning based detection. However, recent studies found that the spam features are continuously changing day by day (called ‘Spam Drift’ problem), which may significantly affect the performance of the detection. In this paper, we carried out a real-data driven study to explored the ‘Spam Drift’ problem and its impact to machine learning based detection. Our study found that only a small group of spam features will continuously change. The results also suggested a counter-intuitive conclusion that the ‘Spam Drift’ problem does not have serious impact on spam detection Precision (SP) and non-spam detection Recall (NR), two metrics that industries prioritise in practice.

[1]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[2]  Yue Xu,et al.  Toward Detecting Malicious Links in Online Social Networks through User Behavior , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW).

[3]  Jun Zhang,et al.  Spammers Are Becoming "Smarter" on Twitter , 2016, IT Professional.

[4]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[5]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[6]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[7]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[8]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[9]  Yu Wang,et al.  Statistical Features-Based Real-Time Detection of Drifted Twitter Spam , 2017, IEEE Transactions on Information Forensics and Security.

[10]  R. Kishore Kumar,et al.  Comparative Study on Email Spam Classifier using Data Mining Techniques , 2012 .

[11]  Jun Zhang,et al.  Statistical Detection of Online Drifting Twitter Spam: Invited Paper , 2016, AsiaCCS.

[12]  Jun Zhang,et al.  A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection , 2015, IEEE Transactions on Computational Social Systems.

[13]  Jun Zhang,et al.  Twitter spam detection based on deep learning , 2017, ACSW.