Asymmetric self-learning for tackling Twitter Spam Drift

Spam has become a critical problem on Twitter. In order to stop spammers, security companies apply blacklisting services to filter spam links. However, over 90% victims will visit a new malicious link before it is blocked by blacklists. To eliminate the limitation of blacklists, researchers have proposed a number of statistical features based mechanisms, and applied machine learning techniques to detect Twitter spam. In our labelled large dataset, we observe that the statistical properties of spam tweets vary over time, and thus the performance of existing ML based classifiers are poor. This phenomenon is referred as “Twitter Spam Drift”. In order to tackle this problem, we carry out deep analysis of 1 million spam tweets and 1 million non-spam tweets, and propose an asymmetric self-learning (ASL) approach. The proposed ASL can discover new information of changed tweeter spam and incorporate it into classifier training process. A number of experiments are performed to evaluate the ASL approach. The results show that the ASL approach can be used to significantly improve the spam detection accuracy of using traditional ML algorithms.

[1]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[2]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[3]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[4]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[5]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[6]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[7]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[8]  Alok N. Choudhary,et al.  Towards Online Spam Filtering in Social Networks , 2012, NDSS.

[9]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2013, IEEE Trans. Inf. Forensics Secur..

[10]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[11]  Gianluca Stringhini,et al.  COMPA: Detecting Compromised Accounts on Social Networks , 2013, NDSS.

[12]  Christopher Ke,et al.  AN IN-DEPTH ANALYSIS OF ABUSE ON TWITTER , 2014 .

[13]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[14]  Man Ho Au,et al.  Security and privacy in big data , 2016, Concurr. Comput. Pract. Exp..