Social Spammer Detection in Microblogging

The availability of microblogging, like Twitter and Sina Weibo, makes it a popular platform for spammers to unfairly overpower normal users with unwanted content via social networks, known as social spamming. The rise of social spamming can significantly hinder the use of microblogging systems for effective information dissemination and sharing. Distinct features of microblogging systems present new challenges for social spammer detection. First, unlike traditional social networks, microblogging allows to establish some connections between two parties without mutual consent, which makes it easier for spammers to imitate normal users by quickly accumulating a large number of "human" friends. Second, microblogging messages are short, noisy, and unstructured. Traditional social spammer detection methods are not directly applicable to microblogging. In this paper, we investigate how to collectively use network and content information to perform effective social spammer detection in microblogging. In particular, we present an optimization formulation that models the social network and content information in a unified framework. Experiments on a real-world Twitter dataset demonstrate that our proposed method can effectively utilize both kinds of information for social spammer detection.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[3]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[4]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[5]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[6]  Subbarao Kambhampati,et al.  Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language , 2013, ICWSM.

[7]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[8]  Calton Pu,et al.  Social Honeypots: Making Friends With A Spammer Near You , 2008, CEAS.

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  George Danezis,et al.  SybilInfer: Detecting Sybil Nodes using Social Networks , 2009, NDSS.

[11]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[12]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[13]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[14]  Theodore Marinis,et al.  Psycholinguistic techniques in second language acquisition research , 2003 .

[15]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[16]  Padraig Cunningham,et al.  Network Analysis of Recurring YouTube Spam Campaigns , 2012, ICWSM.

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[19]  Leyla Bilge,et al.  All your contacts are belong to us: automated identity theft attacks on social networks , 2009, WWW '09.

[20]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[21]  Ben Y. Zhao,et al.  Uncovering social network Sybils in the wild , 2011, ACM Trans. Knowl. Discov. Data.

[22]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[23]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[24]  Kevin Borders,et al.  Social networks and context-aware spam , 2008, CSCW.

[25]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[26]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[27]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[30]  Jiawei Han,et al.  Towards feature selection in network , 2011, CIKM '11.

[31]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[32]  Laurent El Ghaoui,et al.  Sparse Machine Learning Methods for Understanding Large Text Corpora. , 2011 .