A multi-platform dataset for detecting cyberbullying in social media

Recent work on cyberbullying detection relies on using machine learning models with text and metadata in small datasets, mostly drawn from single social media platforms. Such models have succeeded in predicting cyberbullying when dealing with posts containing the text and the metadata structure as found on the platform. Instead, we develop a multi-platform dataset that consists purely of the text from posts gathered from seven social media platforms. We present a multi-stage and multi-technique annotation system that initially uses crowdsourcing for post and hashtag annotation and subsequently utilizes machine-learning methods to identify additional posts for annotation. This process has the benefit of selecting posts for annotation that have a significantly greater than chance likelihood of constituting clear cases of cyberbullying without limiting the range of samples to those containing predetermined features (as is the case when hashtags alone are used to select posts for annotation). We show that, despite the diversity of examples present in the dataset, good performance is possible for models trained on datasets produced in this manner. This becomes a clear advantage compared to traditional methods of post selection and labeling because it increases the number of positive examples that can be produced using the same resources and it enhances the diversity of communication media to which the models can be applied.

[1]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[2]  Robin M. Kowalski,et al.  Electronic bullying among middle school students. , 2007, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[3]  Qianjia Huang,et al.  Cyber Bullying Detection Using Social and Textual Analysis , 2014, SAM '14.

[4]  Al-garadiMohammed Ali,et al.  Cybercrime detection in online communications , 2016 .

[5]  C. Hagquist,et al.  Does the association with psychosomatic health problems differ between cyberbullying and traditional bullying? , 2012 .

[6]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Chaoyi Pang,et al.  Sentiment Analysis for Effective Detection of Cyber Bullying , 2012, APWeb.

[9]  Robert S. Tokunaga,et al.  Following you home from school: A critical review and synthesis of research on cyberbullying victimization , 2010, Comput. Hum. Behav..

[10]  J. I. Sheeba,et al.  Online Social Network Bullying Detection Using Intelligence Techniques , 2015 .

[11]  A. Sourander,et al.  Psychosocial risk factors associated with cyberbullying among adolescents: a population-based study. , 2010, Archives of general psychiatry.

[12]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[13]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[14]  Vivek K. Singh,et al.  Toward Multimodal Cyberbullying Detection , 2017, CHI Extended Abstracts.

[15]  R. Ordelman,et al.  Improved cyberbullying detection using gender information , 2012 .

[16]  Srijan Kumar,et al.  iAnon: Leveraging Social Network Big Data to Mitigate Behavioral Symptoms of Cyberbullying , 2014 .

[17]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[18]  Bert Huang,et al.  Cyberbullying Identification Using Participant-Vocabulary Consistency , 2016, ArXiv.

[19]  Lynne Edwards,et al.  Detecting Cyberbullying using Latent Semantic Indexing , 2016, CyberSafety@CIKM.

[20]  Kasturi Dewi Varathan,et al.  Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network , 2016, Comput. Hum. Behav..

[21]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[22]  Rui Zhao,et al.  Cyberbullying Detection Based on Semantic-Enhanced Marginalized Denoising Auto-Encoder , 2017, IEEE Transactions on Affective Computing.

[23]  Pradeep K. Atrey,et al.  Cyberbullying detection using probabilistic socio-textual information fusion , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[24]  Kelly Reynolds,et al.  Detecting cyberbullying: query terms and techniques , 2013, WebSci.

[25]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[26]  Diana Inkpen,et al.  Cyber-aggression Detection using Cross Segment-and-Concatenate Multi-Task Learning from Text , 2018, TRAC@COLING 2018.

[27]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[28]  Cornelia Caragea,et al.  Content-Driven Detection of Cyberbullying on the Instagram Social Network , 2016, IJCAI.

[29]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[30]  Anna Cinzia Squicciarini,et al.  Identification and characterization of cyberbullying dynamics in an online social network , 2022 .

[31]  Mifta Sintaha,et al.  Cyberbullying detection using sentiment analysis in social media , 2016 .

[32]  Henry Lieberman,et al.  Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying , 2012, TIIS.