Sifting robotic from organic text: A natural language approach for detecting automation on Twitter

Twitter, a popular social media outlet, has evolved into a vast source of linguistic data, rich with opinion, sentiment, and discussion. Due to the increasing popularity of Twitter, its perceived potential for exerting social influence has led to the rise of a diverse community of automatons, commonly referred to as bots. These inorganic and semi-organic Twitter entities can range from the benevolent (e.g., weather-update bots, help-wanted-alert bots) to the malevolent (e.g., spamming messages, advertisements, or radical opinions). Existing detection algorithms typically leverage meta-data (time between tweets, number of followers, etc.) to identify robotic accounts. Here, we present a powerful classification scheme that exclusively uses the natural language text from organic users to provide a criterion for identifying accounts posting automated messages. Since the classifier operates on text alone, it is flexible and may be applied to any textual data beyond the Twitter-sphere.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Yong Yu,et al.  Detecting Marionette Microblog Users for Improved Information Credibility , 2013, ECML/PKDD.

[3]  Walter Willinger,et al.  Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference , 2011, IMC 2011.

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Vern Paxson,et al.  Adapting Social Spam Infrastructure for Political Censorship , 2012, LEET.

[6]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, CCS '10.

[7]  S Chapman,et al.  Tobacco Control , 1992, Journal of the Royal College of Physicians of London.

[8]  James P. Bagrow,et al.  Text mixing shapes the anatomy of rank-frequency distributions: A modern Zipfian mechanics for natural language , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Alan Irwin Medical Sciences , 1983, Nature.

[10]  E. Meyera,et al.  Journal of Health Economics , 2015 .

[11]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[12]  Christopher M. Danforth,et al.  The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place , 2013, PloS one.

[13]  Mark Allman,et al.  Proceedings of the 10th ACM SIGCOMM conference on Internet measurement , 2010, IMC 2010.

[14]  S. Emery,et al.  A cross-sectional examination of marketing of electronic cigarettes on Twitter , 2014, Tobacco Control.

[15]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.

[16]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[17]  W.,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2017 .

[18]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[19]  George F. Riley,et al.  Proceedings of the 12th international conference on Passive and active measurement , 2011 .

[20]  W. Marsden I and J , 2012 .

[21]  Jeanne Marcum Gerlach,et al.  Is this collaboration , 1994 .

[22]  Michèle Sebag,et al.  Machine Learning and Knowledge Discovery in Databases , 2015, Lecture Notes in Computer Science.

[23]  Vidyasagar Potdar Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference , 2011 .

[24]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[25]  Michael J. Paul,et al.  National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic , 2013, PloS one.

[26]  Markus Strohmaier,et al.  When Social Bots Attack: Modeling Susceptibility of Users in Online Social Networks , 2012, #MSM.

[27]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[28]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[29]  Kyumin Lee,et al.  The social honeypot project: protecting online communities from spammers , 2010, WWW '10.

[30]  A. Culyer,et al.  Four Decades of Health Economics Through a Bibliometric Lens , 2011, Journal of health economics.

[31]  Carrie Gates,et al.  Proceedings of the 38th Annual Computer Security Applications Conference , 2010, ACSAC 2010.

[32]  Kevin Borders,et al.  Social networks and context-aware spam , 2008, CSCW.