Detecting spamming activities in twitter based on deep‐learning technique

Twitter spam has long been a critical but difficult problem to be addressed. So far, researchers have developed a series of machine learning–based methods and blacklisting techniques to detect spamming activities on Twitter. According to our investigation, current methods and techniques have achieved the accuracy of around 87%. However, because of the problems of spam drift and information fabrication, these machine learning–based methods cannot efficiently detect spam activities in real‐life scenarios. Meanwhile, the blacklisting method also cannot catch up with the variations of spamming activities, as manually inspecting suspicious URLs is extremely timeconsuming. In this paper, we proposed a novel technique based on deep‐learning technique to address the above challenges. The syntax of each tweet will be learned through WordVector and trained by deep learning. We then constructed a binary classifier to differentiate spam and regular tweets. In experiments, we collected and labeled a 10‐day real tweet dataset as ground truth to evaluate our proposed method. We first went for empirical analysis with a series of comparisons to other methods: (1) performance of different classifiers, (2) other existing text‐based methods, and (3) nontext‐based detection techniques. According to the experiment results, our proposed method largely outperformed previous methods. We further conducted principle component analysis on typical methods to theoretically justify the outperformance of our method. We extracted all kinds of features via dimensionality reduction. It was found that our features were most distinct among all the detection methods. This well demonstrated the outperformance of our method.

[1]  Jun Zhang,et al.  Statistical Detection of Online Drifting Twitter Spam: Invited Paper , 2016, AsiaCCS.

[2]  Jun Zhang,et al.  Asymmetric self-learning for tackling Twitter Spam Drift , 2015, 2015 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[3]  Christopher Ke,et al.  AN IN-DEPTH ANALYSIS OF ABUSE ON TWITTER , 2014 .

[4]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[7]  Bernard Widrow,et al.  30 years of adaptive neural networks: perceptron, Madaline, and backpropagation , 1990, Proc. IEEE.

[8]  Jong Kim,et al.  Spam Filtering in Twitter Using Sender-Receiver Relationship , 2011, RAID.

[9]  Jiebo Luo,et al.  SocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks , 2011, Proc. VLDB Endow..

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[12]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[13]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[14]  Calton Pu,et al.  Click traffic analysis of short URL spam on Twitter , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[15]  Divya,et al.  Techniques to Detect Spammers in Twitter- A Survey , 2014 .

[16]  Jun Zhang,et al.  A Performance Evaluation of Machine Learning-Based Streaming Spam Tweets Detection , 2015, IEEE Transactions on Computational Social Systems.

[17]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[18]  V. Paxson,et al.  The Underground on 140 Characters or Less ∗ , 2010 .

[19]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[20]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[21]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[22]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[23]  Huan Liu,et al.  Online Social Spammer Detection , 2014, AAAI.

[24]  Jong Kim,et al.  WarningBird: Detecting Suspicious URLs in Twitter Stream , 2012, NDSS.

[25]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[26]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[27]  T. M. Bhraguram,et al.  An Adaptive Subspace Clustering Dimension Reduction Framework for Time Series Indexing in Knime Workflows , 2011 .

[28]  Vilas N. Ghate,et al.  Optimal MLP neural network classifier for fault detection of three phase induction motor , 2010, Expert Syst. Appl..

[29]  Kateryna Rybina Sentiment analysis of contexts around query terms in documents , 2012 .