Detecting Marionette Microblog Users for Improved Information Credibility

In this paper, we mine a special group of microblog users: the "marionette" users, who are created or employed by backstage "puppeteers", either through programs or manually. Unlike normal users that access microblogs for information sharing or social communication, the marionette users perform specific tasks to earn financial profits. For example, they follow certain users to increase their "statistical popularity", or retweet some tweets to amplify their "statistical impact". The fabricated follower or retweet counts not only mislead normal users to wrong information, but also seriously impair microblog-based applications, such as popular tweets selection and expert finding. In this paper, we study the important problem of detecting marionette users on microblog platforms. This problem is challenging because puppeteers are employing complicated strategies to generate marionette users that present similar behaviors as normal ones. To tackle this challenge, we propose to take into account two types of discriminative information: (1) individual user tweeting behaviors and (2) the social interactions among users. By integrating both information into a semi-supervised probabilistic model, we can effectively distinguish marionette users from normal ones. By applying the proposed model to one of the most popular microblog platform (Sina Weibo) in China, we find that the model can detect marionette users with f-measure close to 0.9. In addition, we propose an application to measure the credibility of retweet counts.

[1]  Junjie Wu,et al.  How Many Zombies Around You? , 2013, 2013 IEEE 13th International Conference on Data Mining.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[4]  Eugénio C. Oliveira,et al.  Identifying Automatic Posting Systems in Microblogs , 2011, EPIA.

[5]  Xuanjing Huang,et al.  FudanNLP: A Toolkit for Chinese Natural Language Processing , 2013, ACL.

[6]  Konstantin Beznosov,et al.  Design and analysis of a social botnet , 2013, Comput. Networks.

[7]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[8]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[9]  Guofei Gu,et al.  Analyzing spammers' social networks for fun and profit: a case study of cyber criminal ecosystem on twitter , 2012, WWW.

[10]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Michalis Faloutsos,et al.  Efficient and Scalable Socware Detection in Online Social Networks , 2012, USENIX Security Symposium.

[13]  Harry Shum,et al.  Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality , 2012, COLING.

[14]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[15]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[16]  Mohanraj,et al.  A Survey on Spam Detection in Twitter , 2014 .

[17]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[18]  Bernardo A. Huberman,et al.  Artificial Inflation: The Real Story of Trends and Trend-Setters in Sina Weibo , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[19]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[20]  Hongwen Kang,et al.  Large-scale bot detection for search engines , 2010, WWW '10.

[21]  Huan Liu,et al.  Social Spammer Detection in Microblogging , 2013, IJCAI.

[22]  Krishna P. Gummadi,et al.  Understanding and combating link farming in the twitter social network , 2012, WWW.

[23]  Markus Strohmaier,et al.  Understanding the impact of socialbot attacks in online social networks , 2014, ArXiv.

[24]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2013, IEEE Trans. Inf. Forensics Secur..

[25]  Nitesh Kumar,et al.  Automatic Detection of Fake Profiles in Online Social Networks , 2012 .

[26]  Jianjun Yu,et al.  Automatic Fake Followers Detection in Chinese Micro-blogging System , 2014, PAKDD.

[27]  Scott Counts,et al.  Finding Users we Trust: Scaling up Verified Twitter Users Using their Communication Patterns , 2014, ICWSM.

[28]  Gregory Buehrer,et al.  A large-scale study of automated web search traffic , 2008, AIRWeb '08.

[29]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[30]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[31]  Gianluca Stringhini,et al.  Poultry markets: on the underground economy of twitter followers , 2012 .

[32]  Markus Strohmaier,et al.  When Social Bots Attack: Modeling Susceptibility of Users in Online Social Networks , 2012, #MSM.

[33]  Christos Faloutsos,et al.  Detecting suspicious following behavior in multimillion-node social networks , 2014, WWW.

[34]  Ponnurangam Kumaraguru,et al.  Followers or Phantoms? An Anatomy of Purchased Twitter Followers , 2014, ArXiv.

[35]  Rashmi Raj,et al.  Web Spam Detection with Anti-Trust Rank , 2006, AIRWeb.

[36]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[37]  Qiang Yang,et al.  Discovering Spammers in Social Networks , 2012, AAAI.

[38]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[39]  Brian D. Davison,et al.  Identifying link farm spam pages , 2005, WWW '05.

[40]  András A. Benczúr,et al.  SpamRank -- Fully Automatic Link Spam Detection , 2005, AIRWeb.

[41]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[42]  Qifa Ke,et al.  SBotMiner: large scale search bot detection , 2010, WSDM '10.