Characterizing Social Spambots by their Human Traits

Social spambots, an emerging class of spammers attempting to emulate people, are difficult for both human annotators and classic bot detection techniques to reliably distinguish from genuine accounts. We examine this human emulation through studying the human characteristics (personality, gender, age, emotions) exhibited by social spambots’ language, hypothesizing the values for these attributes will be unhuman-like (e.g. unusually high or low). We found our hypothesis mostly disconfirmed — individually, social bots exhibit very human-like attributes. However, a striking pattern emerged when consider the full distributions of these estimated human attributes: social bots were extremely similar and average in their expressed personality, demographics, and emotion (in contrast with traditional bots which we found to exhibit more variance and extreme values than genuine accounts). We thus consider how well social bots can be identified only using the 17 variables of these human attributes and ended up with a new state of the art in social spambot detection (e.g. F1 = .946). Further, simulating the situation of not knowing the bots a priori, we found that even an unsupervised clustering using the same 17 attributes could yield nearly as accurate of social bot identification (F1 = 0.925).

[1]  Roberto Di Pietro,et al.  The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race , 2017, WWW.

[2]  Chao Yang,et al.  Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers , 2011, IEEE Transactions on Information Forensics and Security.

[3]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[4]  Fabrizio Lillo,et al.  Cashtag Piggybacking , 2018, ACM Trans. Web.

[5]  Wei Hu,et al.  Twitter spammer detection using data stream clustering , 2014, Inf. Sci..

[6]  Angelo Spognardi,et al.  Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection , 2019, WebSci.

[7]  S. Gosling,et al.  Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. , 2015, The American psychologist.

[8]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[9]  A. Crooks,et al.  Examining Emergent Communities and Social Bots Within the Polarized Online Vaccination Debate in Twitter , 2019, Social Media + Society.

[10]  Y-Lan Boureau,et al.  Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset , 2018, ACL.

[11]  Amos Azaria,et al.  The DARPA Twitter Bot Challenge , 2016, Computer.

[12]  Roberto Di Pietro,et al.  DNA-Inspired Online Behavioral Modeling and Its Application to Spambot Detection , 2016, IEEE Intell. Syst..

[13]  Niranjan Balasubramanian,et al.  Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes , 2019, Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science.

[14]  Stefano Cresci,et al.  A decade of social bot detection , 2020, Commun. ACM.

[15]  Ahmed Al-Rawi,et al.  Bots as Active News Promoters: A Digital Analysis of COVID-19 Tweets , 2020, Inf..

[16]  William Yang Wang,et al.  MojiTalk: Generating Emotional Responses at Scale , 2017, ACL.

[17]  Emilio Ferrara,et al.  Social Bots Distort the 2016 US Presidential Election Online Discussion , 2016, First Monday.

[18]  Margaret L. Kern,et al.  Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach , 2013, PloS one.

[19]  David G. Novick,et al.  Inducing rapport-building behaviors in interaction with an embodied conversational agent , 2018, IVA.

[20]  Daniel McDuff,et al.  Emotional Dialogue Generation using Image-Grounded Language Models , 2018, CHI.

[21]  Kristina Lerman,et al.  Analyzing the Digital Traces of Political Manipulation: The 2016 Russian Interference Twitter Campaign , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Guido Caldarelli,et al.  The role of bot squads in the political propaganda on Twitter , 2019, Communications Physics.

[24]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[25]  Giovanni Luca Ciampaglia,et al.  The spread of low-credibility content by social bots , 2017, Nature Communications.

[26]  Jon-Patrick Allem,et al.  Cannabis Surveillance With Twitter Data: Emerging Topics and Social Bots. , 2019, American journal of public health.

[27]  Muhammad Abulaish,et al.  A generic statistical approach for spam detection in Online Social Networks , 2013, Comput. Commun..

[28]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[29]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[30]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[31]  Dan Mercea,et al.  The Brexit Botnet and User-Generated Hyperpartisan News , 2017 .

[32]  Antoine Bordes,et al.  Training Millions of Personalized Dialogue Agents , 2018, EMNLP.

[33]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[34]  Emilio Ferrara,et al.  Deep Neural Networks for Bot Detection , 2018, Inf. Sci..

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Gregory J. Park,et al.  Automatic personality assessment through social media language. , 2015, Journal of personality and social psychology.

[37]  Roberto Di Pietro,et al.  Fame for sale: Efficient detection of fake Twitter followers , 2015, Decis. Support Syst..

[38]  David A. Broniatowski,et al.  Weaponized Health Communication: Twitter Bots and Russian Trolls Amplify the Vaccine Debate , 2018, American journal of public health.

[39]  Maarten Sap,et al.  Developing Age and Gender Predictive Lexica over Social Media , 2014, EMNLP.

[40]  Lyle Ungar,et al.  Bots and Misinformation Spread on Social Media: Implications for COVID-19 , 2021, Journal of medical Internet research.

[41]  Franziska Oehmer,et al.  Communication Rights for Social Bots?: Options for the Governance of Automated Computer-Generated Online Identities , 2020, Journal of Information Policy.

[42]  Kyumin Lee,et al.  Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter , 2011, ICWSM.

[43]  Sree Priyanka Uppu,et al.  E-Cigarette Surveillance With Social Media Data: Social Bots, Emerging Topics, and Trends , 2017, JMIR public health and surveillance.

[44]  Guanhua Yan,et al.  The Rise of Social Botnets: Attacks and Countermeasures , 2016, IEEE Transactions on Dependable and Secure Computing.

[45]  Emilio Ferrara,et al.  What types of COVID-19 conspiracies are populated by Twitter bots? , 2020, First Monday.

[46]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[47]  V. S. Subrahmanian,et al.  Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[48]  Gregory J. Park,et al.  Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook , 2016, PloS one.

[49]  Saif Mohammad,et al.  Using Hashtags to Capture Fine Emotion Categories from Tweets , 2015, Comput. Intell..

[50]  David W. McDonald,et al.  Dissecting a Social Botnet: Growth, Content and Influence in Twitter , 2015, CSCW.