Who is tweeting on Twitter: human, bot, or cyborg?

Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: (1) an entropy-based component, (2) a machine-learning-based component, (3) an account properties component, and (4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.

[1]  P. Real A generalized analysis of variance program utilizing binary logic , 1959, ACM '59.

[2]  B. Huberman,et al.  Complexity and adaptation , 1986 .

[3]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[4]  Henry J. Fowler,et al.  Local Area Network Traffic Characteristics, with Implications for Broadband Network Congestion Management , 1991, IEEE J. Sel. Areas Commun..

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Giuseppe Baselli,et al.  Measuring regularity by means of a corrected conditional entropy in sympathetic outflow , 1998, Biological Cybernetics.

[7]  Marc Najork,et al.  On near-uniform URL sampling , 2000, Comput. Networks.

[8]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[9]  Christopher Michael Hill,et al.  Using simulated data in support of research on regression analysis , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[10]  Jonathan A. Zdziarski,et al.  Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification , 2005 .

[11]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[12]  Heng Yin,et al.  An effective defense against email spam laundering , 2006, CCS '06.

[13]  Jeff Yan Bot, Cyborg and Automated Turing Test , 2006, Security Protocols Workshop.

[14]  Marcel Dischinger,et al.  Characterizing residential broadband networks , 2007, IMC '07.

[15]  Zhenyu Wu,et al.  HoneyIM: Fast Detection and Suppression of Instant Messaging Malware in Enterprise-Like Networks , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[16]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[17]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[18]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[19]  Steven Gianvecchio,et al.  Detecting covert timing channels: an entropy-based approach , 2007, CCS '07.

[20]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[21]  H. Husna,et al.  Traffic Shaping of Spam Botnets , 2008, 2008 5th IEEE Consumer Communications and Networking Conference.

[22]  Steven Gianvecchio,et al.  Measurement and Classification of Humans and Bots in Internet Chat , 2008, USENIX Security Symposium.

[23]  Zhenyu Wu,et al.  Battle of Botcraft: fighting bots in online games with human observational proofs , 2009, CCS.

[24]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[25]  Christopher Krügel,et al.  Your botnet is my botnet: analysis of a botnet takeover , 2009, CCS.

[26]  Leysia Palen,et al.  Twitter adoption and use in mass convergence and emergency events , 2009 .

[27]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[28]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[29]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[30]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[31]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[32]  Alice Oh,et al.  Analysis of Twitter Lists as a Potential Source for Discovering Latent Characteristics of Users , 2010 .