Developing a socio-computational approach to examine toxicity propagation and regulation in COVID-19 discourse on YouTube

Abstract As the novel coronavirus (COVID-19) continues to ravage the world at an unprecedented rate, formal recommendations from medical experts are becoming muffled by the avalanche of toxic content posted on social media platforms. This high level of toxic content prevents the dissemination of important and time-sensitive information and jeopardizes the sense of community that online social networks (OSNs) seek to cultivate. In this article, we present techniques to analyze toxic content and actors that propagated it on YouTube during the initial months after COVID-19 information was made public. Our dataset consists of 544 channels, 3,488 videos, 453,111 commenters, and 849,689 comments. We applied topic modeling based on Latent Dirichlet Allocation (LDA) to identify dominant topics and evolving trends within the comments on relevant videos. We conducted social network analysis (SNA) to detect influential commenters, and toxicity analysis to measure the health of the network. SNA allows us to identify the top toxic users in the network, which led to the creation of experiments simulating the impact of removal of these users on toxicity in the network. Through this work, we demonstrate not only how to identify toxic content related to COVID-19 on YouTube and the actors who propagated this toxicity, but also how social media companies and policy makers can use this work. This work is novel in that we devised a set of experiments in an attempt to show how if social media platforms eliminate certain toxic users, they can improve the overall health of the network by reducing the overall toxicity level.

[1]  N. Christakis,et al.  Social network determinants of depression , 2011, Molecular Psychiatry.

[2]  Michael S. Bernstein,et al.  Anyone Can Become a Troll: Causes of Trolling Behavior in Online Discussions , 2017, CSCW.

[3]  K. Varjas,et al.  High School Students’ Perceptions of Motivations for Cyberbullying: An Exploratory Study , 2010, The western journal of emergency medicine.

[4]  John Pavlopoulos,et al.  ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT , 2019, *SEMEVAL.

[5]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[6]  Yulia Tsvetkov,et al.  Fortifying Toxic Speech Detectors Against Veiled Toxicity , 2020, EMNLP.

[7]  N. Christakis,et al.  Social Network Sensors for Early Detection of Contagious Outbreaks , 2010, PloS one.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  N. Christakis,et al.  Alone in the Crowd: The Structure and Spread of Loneliness in a Large Social Network , 2009 .

[10]  Ranganathan Chandrasekaran,et al.  Topics, Trends, and Sentiments of Tweets About the COVID-19 Pandemic: Temporal Infoveillance Study , 2020, Journal of Medical Internet Research.

[11]  N. Christakis,et al.  Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study , 2008, BMJ : British Medical Journal.

[12]  Nitin Agarwal,et al.  The Ebb and Flow of the COVID-19 Misinformation Themes , 2020, CIKM.

[13]  Robert F. Savinell,et al.  Heat-treated iron(III) tetramethoxyphenyl porphyrin chloride supported on high-area carbon as an electrocatalyst for oxygen reduction:: Part III. Detection of hydrogen-peroxide during oxygen reduction , 1999 .

[14]  Alexandru Iosup,et al.  Toxicity detection in multiplayer online games , 2015, 2015 International Workshop on Network and Systems Support for Games (NetGames).

[15]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[16]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[17]  Dennis McLeod,et al.  Dynamic Item Recommendation by Topic Modeling for Social Networks , 2011, 2011 Eighth International Conference on Information Technology: New Generations.

[18]  Radha Poovendran,et al.  Deceiving Google's Perspective API Built for Detecting Toxic Comments , 2017, ArXiv.

[19]  Bernard J. Jansen,et al.  Are These Comments Triggering? Predicting Triggers of Toxicity in Online Discussions , 2020, WWW.

[20]  Hee-Woong Kim,et al.  Why people post benevolent and malicious comments online , 2015, Commun. ACM.

[21]  Nicholas A. Christakis,et al.  The Spread of Sleep Loss Influences Drug Use in Adolescent Social Networks , 2010, PloS one.

[22]  Jure Leskovec,et al.  Antisocial Behavior in Online Discussion Communities , 2015, ICWSM.

[23]  N. Christakis,et al.  SUPPLEMENTARY ONLINE MATERIAL FOR: The Collective Dynamics of Smoking in a Large Social Network , 2022 .

[24]  Charles Perez,et al.  Graph Creation and Analysis for Linking Actors: Application to Social Data , 2016 .

[25]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[26]  N. Hara,et al.  Beyond vandalism: Wikipedia trolls , 2010, J. Inf. Sci..

[27]  Thibaut Horel,et al.  Modeling Contagion Through Social Networks to Explain and Predict Gunshot Violence in Chicago, 2006 to 2014 , 2017, JAMA internal medicine.

[28]  Nicholas A. Christakis,et al.  Social contagion theory: examining dynamic social networks and human behavior , 2011, Statistics in medicine.

[29]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[30]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[31]  Nitin Agarwal,et al.  Identifying Toxicity Within YouTube Video Comment , 2019, SBP-BRiMS.

[32]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[33]  Florian Probst,et al.  Identifying Key Users in Online Social Networks: A PageRank Based Approach , 2010, ICIS.

[34]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  John Suler,et al.  The Online Disinhibition Effect , 2004, Cyberpsychology Behav. Soc. Netw..

[36]  Mauro Conti,et al.  All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.

[37]  David A. Broniatowski,et al.  Detecting and Characterizing Bot-Like Behavior on Twitter , 2018, SBP-BRiMS.