Still out there: Modeling and Identifying Russian Troll Accounts on Twitter

There is evidence that Russia’s Internet Research Agency attempted to interfere with the 2016 U.S. election by running fake accounts on Twitter—often referred to as “Russian trolls”. In this work, we: 1) develop machine learning models that predict whether a Twitter account is a Russian troll within a set of 170K control accounts; and, 2) demonstrate that it is possible to use this model to find active accounts on Twitter still likely acting on behalf of the Russian state. Using both behavioral and linguistic features, we show that it is possible to distinguish between a troll and a non-troll with a precision of 78.5% and an AUC of 98.9%, under cross-validation. Applying the model to out-of-sample accounts still active today, we find that up to 2.6% of top journalists’ mentions are occupied by Russian trolls. These findings imply that the Russian trolls are very likely still active today. Additional analysis shows that they are not merely software-controlled bots, and manage their online identities in various complex ways. Finally, we argue that if it is possible to discover these accounts using externally-accessible data, then the platforms—with access to a variety of private internal signals—should succeed at similar or better rates.

[1]  Alecia Swasy A Little Birdie Told Me: Factors that Influence the Diffusion of Twitter in Newsrooms , 2016 .

[2]  Ryan L. Boyd,et al.  Characterizing the Internet Research Agency’s Social Media Operations During the 2016 U.S. Presidential Election using Linguistic Analyses , 2018 .

[3]  Emilio Ferrara,et al.  Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election , 2017, First Monday.

[4]  L. Stewart,et al.  Examining Trolls and Polarization with a Retweet Network , 2018 .

[5]  Derrick L. Cogburn,et al.  From Networked Nominee to Networked Nation: Examining the Impact of Web 2.0 and Social Media on Political Participation and Civic Engagement in the 2008 Obama Campaign , 2011 .

[6]  K. Jamieson,et al.  Russian Twitter Accounts and the Partisan Polarization of Vaccine Discourse, 2015-2017. , 2020, American journal of public health.

[7]  P. Metaxas,et al.  Social Media and the Elections , 2012, Science.

[8]  Grzegorz Kondrak,et al.  Does the Phonology of L1 Show Up in L2 Texts? , 2014, ACL.

[9]  Kristina Lerman,et al.  Analyzing the Digital Traces of Political Manipulation: The 2016 Russian Interference Twitter Campaign , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[10]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[11]  Maria Luisa Zubizarreta,et al.  Sources of linguistic knowledge in the second language acquisition of English articles , 2008 .

[12]  Grzegorz Kondrak,et al.  Does the Phonology of L 1 Show Up in L 2 Texts ? , 2014 .

[13]  Jacob Eisenstein,et al.  Confounds and Consequences in Geotagged Twitter Data , 2015, EMNLP.

[14]  Eni Mustafaraj,et al.  From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search , 2010 .

[15]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[16]  Christopher Griffin,et al.  Unsupervised Machine Learning of Open Source Russian Twitter Data Reveals Global Scope and Operational Characteristics , 2018, ArXiv.

[17]  Clyde Kluckhohn,et al.  Human Behavior and the Principle of Least Effort. George Kingsley Zipf , 1950 .

[18]  Gianluca Stringhini,et al.  Who Let The Trolls Out?: Towards Understanding State-Sponsored Trolls , 2018, WebSci.

[19]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[20]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[21]  Emilio Ferrara,et al.  Social Bots Distort the 2016 US Presidential Election Online Discussion , 2016, First Monday.

[22]  Emilio Ferrara,et al.  Bots increase exposure to negative and inflammatory content in online social systems , 2018, Proceedings of the National Academy of Sciences.

[23]  Filippo Menczer,et al.  The spread of fake news by social bots , 2017, ArXiv.

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  Gianluca Stringhini,et al.  Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web , 2018, WWW.

[26]  Hugo Liu,et al.  Social Network Profiles as Taste Performances , 2007, J. Comput. Mediat. Commun..

[27]  Amos Azaria,et al.  The DARPA Twitter Bot Challenge , 2016, Computer.

[28]  Yuriy Gorodnichenko,et al.  Social Media, Sentiment and Public Opinions: Evidence from #Brexit and #Uselection , 2018, European Economic Review.

[29]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[30]  Judith Donath,et al.  Identity and deception in the virtual community , 1998 .

[31]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[32]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[33]  David Allen,et al.  Geotagging one hundred million Twitter accounts with total variation minimization , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[34]  Noah A. Smith,et al.  Measuring Ideological Proportions in Political Speeches , 2013, EMNLP.

[35]  D. Weaver,et al.  Social Media and U.S. Journalists , 2018, Digital Journalism.

[36]  Kristina Lerman,et al.  Who Falls for Online Political Manipulation? , 2018, WWW.

[37]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[38]  Kristina Lerman,et al.  Entropy-based Classification of 'Retweeting' Activity on Twitter , 2011, ArXiv.

[39]  Nicole B. Ellison,et al.  Managing Impressions Online: Self-Presentation Processes in the Online Dating Environment , 2006, J. Comput. Mediat. Commun..

[40]  Eric Horvitz,et al.  Analysis of Strategy and Spread of Russia-sponsored Content in the US in 2017 , 2018, ArXiv.