A longitudinal study of the top 1% toxic Twitter profiles

Toxicity is endemic to online social networks including Twitter. It follows a Pareto like distribution where most of the toxicity is generated by a very small number of profiles and as such, analyzing and characterizing these toxic profiles is critical. Prior research has largely focused on sporadic, event centric toxic content to characterize toxicity on the platform. Instead, we approach the problem of characterizing toxic content from a profile centric point of view. We study 143K Twitter profiles and focus on the behavior of the top 1 percent producers of toxic content on Twitter, based on toxicity scores of their tweets availed by Perspective API. With a total of 293M tweets, spanning 16 years of activity, the longitudinal data allow us to reconstruct the timelines of all profiles involved. We use these timelines to gauge the behavior of the most toxic Twitter profiles compared to the rest of the Twitter population. We study the pattern of tweet posting from highly toxic accounts, based on the frequency and how prolific they are, the nature of hashtags and URLs, profile metadata, and Botometer scores. We find that the highly toxic profiles post coherent and well articulated content, their tweets keep to a narrow theme with lower diversity in hashtags, URLs, and domains, they are thematically similar to each other, and have a high likelihood of bot like behavior, likely to have progenitors with intentions to influence, based on high fake followers score. Our work contributes insight into the top 1 percent of toxic profiles on Twitter and establishes the profile centric approach to investigate toxicity on Twitter to be beneficial.

[1]  Yoo Jung Oh,et al.  Emotions and Incivility in Vaccine Mandate Discourse: Natural Language Processing Insights , 2022, JMIR infodemiology.

[2]  Filippo Menczer,et al.  Uncovering Coordinated Networks on Social Media: Methods and Case Studies , 2021, ICWSM.

[3]  Roman Klinger,et al.  Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection , 2021, WASSA.

[4]  Sandeep Soni,et al.  Racism is a virus: anti-asian hate and counterspeech in social media during the COVID-19 crisis , 2020, ASONAM.

[5]  Karla Dhungana Sainju,et al.  Bullying discourse on Twitter: An examination of bully-related tweets using supervised machine learning , 2021, Comput. Hum. Behav..

[6]  S. Jarvis,et al.  Assessing the Validity of Lexical Diversity Indices Using Direct Judgements , 2020 .

[7]  Heyam H. Al-Baity,et al.  Detection of Hate Speech in COVID-19–Related Tweets in the Arab Region: Deep Learning and Topic Modeling Approach , 2020, Journal of Medical Internet Research.

[8]  C. A. Calderón,et al.  Topic Modeling and Characterization of Hate Speech against Immigrants on Twitter around the Emergence of a Far-Right Party in Spain , 2020, Social Sciences.

[9]  Eli K. Michaels,et al.  Twitter Fingers and Echo Chambers: Exploring Expressions and Experiences of Online Racism Using Twitter , 2020, Journal of Racial and Ethnic Health Disparities.

[10]  Jacob N. Shapiro,et al.  Content-based features predict social media influence operations , 2020, Science Advances.

[11]  A. Flammini,et al.  Detection of Novel Social Bots by Ensembles of Specialized Classifiers , 2020, CIKM.

[12]  Irina Illina,et al.  BERT and fastText Embeddings for Automatic Detection of Toxic Speech , 2020, 2020 International Multi-Conference on: “Organization of Knowledge and Advanced Technologies” (OCTA).

[13]  Lluis Gomez,et al.  Exploring Hate Speech Detection in Multimodal Publications , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Emilio Ferrara,et al.  Bots, elections, and social media: a brief overview , 2019, Lecture Notes in Social Networks.

[15]  Amir Houmansadr,et al.  Triplet Censors: Demystifying Great Firewall's DNS Censorship Behavior , 2020, FOCI @ USENIX Security Symposium.

[16]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[17]  Bryan C. McCannon Readability and Research Impact , 2018, Economics Letters.

[18]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[19]  Arild Bergh Social Network Centric Warfare – Understanding Influence Operations in Social Media , 2019 .

[20]  Animesh Mukherjee,et al.  Analyzing the hate and counter speech accounts on Twitter , 2018, ArXiv.

[21]  Huanying Gu,et al.  Adversarial Text Generation for Google's Perspective API , 2018, 2018 International Conference on Computational Science and Computational Intelligence (CSCI).

[22]  Tobias R. Keller,et al.  Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications , 2018, Political Communication.

[23]  Aditya Gaydhani,et al.  Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach , 2018, ArXiv.

[24]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[25]  Paolo Rosso,et al.  Automatic Identification and Classification of Misogynistic Language on Twitter , 2018, NLDB.

[26]  Harith Alani,et al.  Understanding the Roots of Radicalisation on Twitter , 2018, WebSci.

[27]  Aaron Smith,et al.  Bots in the Twittersphere , 2018 .

[28]  Virgílio A. F. Almeida,et al.  Characterizing and Detecting Hateful Users on Twitter , 2018, ICWSM.

[29]  Tomoaki Ohtsuki,et al.  Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection , 2018, IEEE Access.

[30]  Gianluca Stringhini,et al.  Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.

[31]  Virgílio A. F. Almeida,et al.  "Like Sheep Among Wolves": Characterizing Hateful Users on Twitter , 2017, ArXiv.

[32]  Lisa Singh,et al.  Detecting Users Who Share Extremist Content on Twitter , 2018 .

[33]  Kathleen M. Carley,et al.  Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter , 2017, PloS one.

[34]  Jack Grieve,et al.  Dimensions of Abusive Language on Twitter , 2017, ALW@ACL.

[35]  Radhika Mamidi,et al.  When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data , 2017, NLP+CSS@ACL.

[36]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[37]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[38]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[39]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[40]  Hang-Hyun Jo,et al.  Measuring burstiness for finite event sequences. , 2016, Physical review. E.

[41]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[42]  B. Lewis,et al.  Ethical research standards in a world of big data , 2014, F1000Research.

[43]  Po-Ching Lin,et al.  A study of effective features for detecting long-surviving Twitter spam accounts , 2013, 2013 15th International Conference on Advanced Communications Technology (ICACT).

[44]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[45]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[46]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[47]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .