Fame for sale: Efficient detection of fake Twitter followers

Fake followers are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere-hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of the most relevant existing features and rules (proposed by Academia and Media) for anomalous Twitter accounts detection. Second, we create a baseline dataset of verified human and fake follower accounts. Such baseline dataset is publicly available to the scientific community. Then, we exploit the baseline dataset to train a set of machine-learning classifiers built over the reviewed rules and features. Our results show that most of the rules proposed by Media provide unsatisfactory performance in revealing fake followers, while features proposed in the past by Academia for spam detection provide good results. Building on the most promising features, we revise the classifiers both in terms of reduction of overfitting and cost for gathering the data needed to compute the features. The final result is a novel Class A classifier, general enough to thwart overfitting, lightweight thanks to the usage of the less costly features, and still able to correctly classify more than 95% of the accounts of the original training set. We ultimately perform an information fusion-based sensitivity analysis, to assess the global sensitivity of each of the features employed by the classifier.The findings reported in this paper, other than being supported by a thorough experimental methodology and interesting on their own, also pave the way for further investigation on the novel issue of fake Twitter followers.

[1]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[2]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[3]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[4]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2013, IEEE Trans. Inf. Forensics Secur..

[5]  Bin Wu,et al.  SDHM: A hybrid model for spammer detection in Weibo , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[6]  Jong Kim,et al.  WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream , 2013, IEEE Transactions on Dependable and Secure Computing.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Dawn Xiaodong Song,et al.  Design and Evaluation of a Real-Time URL Spam Filtering Service , 2011, 2011 IEEE Symposium on Security and Privacy.

[9]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[10]  Yi Yang,et al.  Spam ain't as diverse as it seems: throttling OSN spam with templates underneath , 2014, ACSAC.

[11]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[12]  Vern Paxson,et al.  Trafficking Fraudulent Accounts: The Role of the Underground Market in Twitter Spam and Abuse , 2013, USENIX Security Symposium.

[13]  M. Chuah,et al.  Spam Detection on Twitter Using Traditional Classifiers , 2011, ATC.

[14]  Guanhua Yan,et al.  Peri-Watchdog: Hunting for hidden botnets in the periphery of online social networks , 2013, Comput. Networks.

[15]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[16]  Gang Wang,et al.  Follow the green: growth and dynamics in twitter follower markets , 2013, Internet Measurement Conference.

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[20]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[21]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[22]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[23]  Philip S. Yu,et al.  Detecting deception in Online Social Networks , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[24]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[25]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[26]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[27]  Charles W. Chase Composite Forecasting: Combining Forecasts for Improved Accuracy , 2000 .

[28]  Alok N. Choudhary,et al.  Towards Online Spam Filtering in Social Networks , 2012, NDSS.

[29]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[30]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[31]  Gianluca Stringhini,et al.  Poultry markets: on the underground economy of twitter followers , 2012, CCRV.

[32]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[33]  Sajid Yousuf Bhat,et al.  Using communities against deception in online social networks , 2014 .

[34]  A. Saltelli,et al.  Making best use of model evaluations to compute sensitivity indices , 2002 .

[35]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[36]  Jong Kim,et al.  Early filtering of ephemeral malicious accounts on Twitter , 2014, Comput. Commun..

[37]  C. Goose,et al.  Glossary of Terms , 2004, Machine Learning.

[38]  Roberto Di Pietro,et al.  A Criticism to Society (As Seen by Twitter Analytics) , 2014, 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW).

[39]  Erkam Güresen,et al.  Developing an early warning system to predict currency crises , 2014, Eur. J. Oper. Res..

[40]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[41]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..

[42]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[43]  Heng Ji,et al.  Tweet, but verify: epistemic study of information verification on Twitter , 2013, Social Network Analysis and Mining.

[44]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.