Using sentiment to detect bots on Twitter: Are humans more opinionated than bots?

In many Twitter applications, developers collect only a limited sample of tweets and a local portion of the Twitter network. Given such Twitter applications with limited data, how can we classify Twitter users as either bots or humans? We develop a collection of network-, linguistic-, and application-oriented variables that could be used as possible features, and identify specific features that distinguish well between humans and bots. In particular, by analyzing a large dataset relating to the 2014 Indian election, we show that a number of sentimentrelated factors are key to the identification of bots, significantly increasing the Area under the ROC Curve (AUROC). The same method may be used for other applications as well.

[1]  Erdong Chen,et al.  Facebook immune system , 2011, SNS '11.

[2]  Diego Reforgiato Recupero,et al.  AVA: Adjective-Verb-Adverb Combinations for Sentiment Analysis , 2008, IEEE Intelligent Systems.

[3]  Chandra Prakash,et al.  SybilInfer: Detecting Sybil Nodes using Social Networks , 2011 .

[4]  Antoine Boutet,et al.  What’s in Twitter, I know what parties are popular and who you are supporting now! , 2013, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[5]  Diego Reforgiato Recupero,et al.  Sentiment Analysis: Adjectives and Adverbs are Better than Adjectives Alone , 2007, ICWSM.

[6]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[11]  Junlan Feng,et al.  Robust Sentiment Detection on Twitter from Biased and Noisy Data , 2010, COLING.

[12]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[13]  Owen Rambow,et al.  Sentiment Analysis of Twitter Data , 2011 .

[14]  Yanlei Wu,et al.  2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, Beijing, China, August 17-20, 2014 , 2014, ASONAM.

[15]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[16]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  Sushil Jajodia,et al.  Who is tweeting on Twitter: human, bot, or cyborg? , 2010, ACSAC '10.

[19]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[20]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Salah Bouktif,et al.  Ant colony based approach to predict stock market movement from mood collected on Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[23]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[24]  Sushil Jajodia,et al.  Detecting Automation of Twitter Accounts: Are You a Human, Bot, or Cyborg? , 2012, IEEE Transactions on Dependable and Secure Computing.