Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg?

Twitter is a new web application playing dual roles of online social networking and microblogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot, and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot, and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: 1) an entropy-based component, 2) a spam detection component, 3) an account properties component, and 4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot, or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.

[1]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[2]  B. Huberman,et al.  Complexity and adaptation , 1986 .

[3]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[4]  Henry J. Fowler,et al.  Local Area Network Traffic Characteristics, with Implications for Broadband Network Congestion Management , 1991, IEEE J. Sel. Areas Commun..

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Giuseppe Baselli,et al.  Measuring regularity by means of a corrected conditional entropy in sympathetic outflow , 1998, Biological Cybernetics.

[8]  R. Quinlan,et al.  Decision tree discovery , 1999 .

[9]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[10]  Ron Kohavi,et al.  Data mining tasks and methods: Classification: decision-tree discovery , 2002 .

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[14]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[15]  Jonathan A. Zdziarski,et al.  Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification , 2005 .

[16]  Heng Yin,et al.  An effective defense against email spam laundering , 2006, CCS '06.

[17]  Jeff Yan,et al.  Bot, Cyborg and Automated Turing Test , 2009, Security Protocols Workshop.

[18]  Marcel Dischinger,et al.  Characterizing residential broadband networks , 2007, IMC '07.

[19]  Zhenyu Wu,et al.  HoneyIM: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-Like Networks , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[20]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[21]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[22]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[23]  Steven Gianvecchio,et al.  Detecting covert timing channels: an entropy-based approach , 2007, CCS '07.

[24]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[25]  H. Husna,et al.  Traffic Shaping of Spam Botnets , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[26]  Steven Gianvecchio,et al.  Measurement and Classification of Humans and Bots in Internet Chat , 2008, USENIX Security Symposium.

[27]  Zhenyu Wu,et al.  Battle of Botcraft: fighting bots in online games with human observational proofs , 2009, CCS.

[28]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[29]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[30]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[31]  Leysia Palen,et al.  Twitter adoption and use in mass convergence and emergency events , 2009 .

[32]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[33]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[34]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[35]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[36]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[37]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[38]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[39]  Alice Oh,et al.  Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .

[40]  Steven Gianvecchio,et al.  An Entropy-Based Approach to Detecting Covert Timing Channels , 2011, IEEE Transactions on Dependable and Secure Computing.

[41]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[42]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.