Classification of Twitter Accounts into Automated Agents and Human Users

Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into ‘automated’ and ‘human’ users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results.

[1]  Fabrício Benevenuto,et al.  Reverse engineering socialbot infiltration strategies in Twitter , 2014, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[2]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[3]  Tobias Höllerer,et al.  Botivist: Calling Volunteers to Action using Online Bots , 2015, CSCW.

[4]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[5]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[6]  Jon Crowcroft,et al.  Stweeler: A Framework for Twitter Bot Analysis , 2016, WWW.

[7]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[8]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[9]  Patric R. Spence,et al.  Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter , 2014, Comput. Hum. Behav..

[10]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[13]  Jon Crowcroft,et al.  Do Bots impact Twitter activity? , 2017, WWW.

[14]  Jeff Yan,et al.  Bot, Cyborg and Automated Turing Test , 2009, Security Protocols Workshop.

[15]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[16]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[17]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[18]  Amos Azaria,et al.  The DARPA Twitter Bot Challenge , 2016, Computer.

[19]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[20]  Jon Crowcroft,et al.  Of Bots and Humans (on Twitter) , 2017, ASONAM.

[21]  Jon Crowcroft,et al.  An in-depth characterisation of Bots and Humans on Twitter , 2017, ArXiv.

[22]  Krishna P. Gummadi,et al.  Towards Detecting Anomalous User Behavior in Online Social Networks , 2014, USENIX Security Symposium.