Mean Birds: Detecting Aggression and Bullying on Twitter

In recent years, bullying and aggression against social media users have grown significantly, causing serious consequences to victims of all demographics. Nowadays, cyberbullying affects more than half of young social media users worldwide, suffering from prolonged and/or coordinated digital harassment. Also, tools and technologies geared to understand and mitigate it are scarce and mostly ineffective. In this paper, we present a principled and scalable approach to detect bullying and aggressive behavior on Twitter. We propose a robust methodology for extracting text, user, and network-based attributes, studying the properties of bullies and aggressors, and what features distinguish them from regular users. We find that bullies post less, participate in fewer online communities, and are less popular than normal users. Aggressors are relatively popular and tend to include more negativity in their posts. We evaluate our methodology using a corpus of 1.6M tweets posted over 3 months, and show that machine learning classification algorithms can accurately detect users exhibiting bullying and aggressive behavior, with over 90% AUC.

[1]  Robert S. Tokunaga,et al.  Following you home from school: A critical review and synthesis of research on cyberbullying victimization , 2010, Comput. Hum. Behav..

[2]  Naren Ramakrishnan,et al.  Epidemiological modeling of news and rumors on Twitter , 2013, SNAKDD '13.

[3]  Gianluca Stringhini,et al.  Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter , 2017, HT.

[4]  Kathleen M. Carley,et al.  Understanding online firestorms: Negative word-of-mouth dynamics in social media networks , 2014 .

[5]  Richard A. Fabes,et al.  Bullying among young children: The influence of peers and teachers , 2004 .

[6]  Chaoyi Pang,et al.  Sentiment Analysis for Effective Detection of Cyber Bullying , 2012, APWeb.

[7]  Shivakant Mishra,et al.  Towards understanding cyberbullying behavior in a semi-anonymous social network , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[8]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[9]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[10]  Shivakant Mishra,et al.  Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.

[11]  Conor Mc Guckin,et al.  Cyberbullying or Cyber Aggression?: A Review of Existing Definitions of Cyber-Based Peer-to-Peer Aggression , 2015 .

[12]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[15]  Matei Ripeanu,et al.  Branded with a scarlet "C": cheaters in a gaming social network , 2012, WWW.

[16]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[17]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[18]  Stephanie Pieschl,et al.  Relevant dimensions of cyberbullying — Results from two experimental studies , 2013 .

[19]  Xiaojin Zhu,et al.  Fast learning for sentiment analysis on bullying , 2012, WISDOM '12.

[20]  Ji-Hyun Kim,et al.  Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap , 2009, Comput. Stat. Data Anal..

[21]  Jerrad Arthur Patch Detecting bullying on Twitter using emotion lexicons , 2015 .

[22]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[23]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[24]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[25]  Christos Faloutsos,et al.  Retweeting Activity on Twitter: Signs of Deception , 2015, PAKDD.

[26]  Walter Daelemans,et al.  Automatic Detection and Prevention of Cyberbullying , 2015 .

[27]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[30]  D. Grigg Cyber-Aggression: Definition and Concept of Cyberbullying , 2010, Australian Journal of Guidance and Counselling.

[31]  Therese MacDermott,et al.  Bullying and Harassment in the Workplace : Developments in Theory, Research, and Practice, Second Edition , 2010 .

[32]  Gianluca Stringhini,et al.  Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying , 2017, WWW.

[33]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[34]  Daniele Quercia,et al.  The Social World of Content Abusers in Community Question Answering , 2015, WWW.

[35]  Xiao Chen,et al.  6 million spam tweets: A large ground truth for timely Twitter spam detection , 2015, 2015 IEEE International Conference on Communications (ICC).

[36]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[37]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[38]  Gianluca Stringhini,et al.  Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web , 2016, ICWSM.

[39]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[40]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[41]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[42]  Adrienne Massanari,et al.  #Gamergate and The Fappening: How Reddit’s algorithm, governance, and culture support toxic technocultures , 2017, New Media Soc..

[43]  Athena Vakali,et al.  Micro-blogging Content Analysis via Emotionally-Driven Clustering , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[44]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[45]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.