Minority Report: Cyberbullying Prediction on Instagram

Introduction. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of harassing messages sent from a bully to a victim over a period of time with the intent to harm the victim. Numerous automated, data-driven approaches have been developed for the automatic classification of cyberbullying instances, with emphasis on classification accuracy. While the importance of highly accurate classifiers is undoubted, a key pitfall of existing cyberbullying detection methods is that (i) they disregard the repetitive nature of the harassing process, and (ii) they work retrospectively (i.e., after a cyberbullying incident has occurred), making it difficult to intervene before an interaction escalates. Motivated by the scarcity of methods to anticipate cyberbullying, we focus on cyberbullying prediction with the goal of reducing the time from detection to intervention. Methods. We formulate the prediction of the number of harassing comments a media session will receive over a period of time as a regularized multi-task regression problem. In our formulation, we consider two settings where (i) the progression of cyberbullying behavior from some time point in the near future to subsequent time points further into the future is modeled given limited knowledge of the recent past, and (ii) increasingly more historical data is accumulated to improve prediction accuracy. To validate our approach, we conduct an extensive experimental evaluation on a real-world dataset from Instagram, the online social media platform with the highest percentage of users reporting experiencing cyberbullying. Results. Intuitively, the larger the number of observed comments in the recent past of a media session, the better the predictive power of our approach. The downside to using more historical data is that decisions must be postponed until more comments are collected. Therefore, the trade-off between accuracy and decision speed is examined. In general, our approach outperforms competing approaches by up to 31.4% and 46.2% in Recall and Mathew correlation coefficient respectively. Discussion. Our approach can be used to effectively prioritize media sessions for increased monitoring as time goes by or for immediate intervention before a conversation escalates. In future work, we plan to incorporate additional features and investigate the generalizability of our approach on other key social networking venues where users frequently become victims of cyberbullying. Beyond cyberbullying prediction, our work is, to the best of our knowledge, the first to provide insights on the forecasting performance of multi-task regression as a function of the prediction horizon and the length of available historical data. We thus believe that our work can serve as a reference point on the forecasting performance of multi-task regression both for researchers and practitioners.

[1]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[2]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[3]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[4]  Ricardo Ribeiro,et al.  Automatic cyberbullying detection: A systematic review , 2019, Comput. Hum. Behav..

[5]  Shivakant Mishra,et al.  Towards understanding cyberbullying behavior in a semi-anonymous social network , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[6]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[7]  Bert Huang,et al.  Weakly supervised cyberbullying detection with participant-vocabulary consistency , 2018, Social Network Analysis and Mining.

[8]  Kelly Reynolds,et al.  Detecting cyberbullying: query terms and techniques , 2013, WebSci.

[9]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[10]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[11]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[12]  Kush R. Varshney,et al.  The Limits of Abstract Evaluation Metrics: The Case of Hate Speech Detection , 2017, WebSci.

[13]  Daphney-Stavroula Zois,et al.  Mining Patterns of Cyberbullying on Twitter , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[16]  Cody Buntain,et al.  A Large Labeled Corpus for Online Harassment Research , 2017, WebSci.

[17]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[18]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[19]  Michelle F. Wright Cyberbullying: Bullying in the Digital Age , 2017 .

[20]  Jure Leskovec,et al.  Antisocial Behavior on the Web: Characterization and Detection , 2017, WWW.

[21]  Peter K. Smith,et al.  Cyberbullying: another main type of bullying? , 2008, Scandinavian journal of psychology.

[22]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[23]  Daphney-Stavroula Zois,et al.  Cyberbullying Detection on Instagram with Optimal Online Feature Selection , 2018, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[24]  Daphney-Stavroula Zois,et al.  Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social Media , 2019, WWW.

[25]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[26]  Jun-Ming Xu,et al.  The five W's of "bullying" on Twitter: Who, What, Why, Where, and When , 2015, Comput. Hum. Behav..

[27]  Zizi Papacharissi,et al.  The virtual geographies of social networks: a comparative analysis of Facebook, LinkedIn and ASmallWorld , 2009, New Media Soc..

[28]  Lu Cheng,et al.  XBully: Cyberbullying Detection within a Multi-Modal Context , 2019, WSDM.

[29]  Robin M. Kowalski,et al.  Cyber Bullying: Bullying in the Digital Age , 2007 .

[30]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[31]  Cornelia Caragea,et al.  Content-Driven Detection of Cyberbullying on the Instagram Social Network , 2016, IJCAI.

[32]  Amit P. Sheth,et al.  A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research , 2018, WebSci.

[33]  Maeve Duggan,et al.  Online Harassment 2017 , 2017 .

[34]  Justin W. Patchin,et al.  Bullying, Cyberbullying, and Suicide , 2010, Archives of suicide research : official journal of the International Academy for Suicide Research.

[35]  Ping Liu,et al.  Forecasting the presence and intensity of hostility on Instagram using linguistic and social features , 2018, ICWSM.

[36]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[37]  Anna Cinzia Squicciarini,et al.  Identification and characterization of cyberbullying dynamics in an online social network , 2022 .

[38]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[39]  J. Torrie,et al.  Principles and Procedures of Statistics with Special Reference to the Biological Sciences , 1962 .

[40]  Shivakant Mishra,et al.  Scalable and timely detection of cyberbullying in online social networks , 2018, SAC.

[41]  Kasturi Dewi Varathan,et al.  Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network , 2016, Comput. Hum. Behav..

[42]  Yulan He,et al.  Approaches to Automated Detection of Cyberbullying: A Survey , 2020, IEEE Transactions on Affective Computing.

[43]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[44]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[45]  Shivakant Mishra,et al.  Prediction of cyberbullying incidents in a media-based social network , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).