Automatic detection of cyberbullying on social networks based on bullying features

With the increasing use of social media, cyberbullying behaviour has received more and more attention. Cyberbullying may cause many serious and negative impacts on a person's life and even lead to teen suicide. To reduce and stop cyberbullying, one effective solution is to automatically detect bullying content based on appropriate machine learning and natural language processing techniques. However, many existing approaches in the literature are just normal text classification models without considering bullying characteristics. In this paper, we propose a representation learning framework specific to cyberbullying detection. Based on word embeddings, we expand a list of pre-defined insulting words and assign different weights to obtain bullying features, which are then concatenated with Bag-of-Words and latent semantic features to form the final representation before feeding them into a linear SVM classifier. Experimental study on a twitter dataset is conducted, and our method is compared with several baseline text representation learning models and cyberbullying detection methods. The superior performance achieved by our method has been observed in this study.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  G. Gini,et al.  Association Between Bullying and Psychosomatic Problems: A Meta-analysis , 2009, Pediatrics.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Ton Vogels,et al.  Do Bullied Children Get Ill, or Do Ill Children Get Bullied? A Prospective Cohort Study on the Relationship Between Bullying and Health-Related Symptoms , 2006, Pediatrics.

[6]  Wesley De Neve,et al.  Multimedia Lab @ ACL WNUT NER Shared Task: Named Entity Recognition for Twitter Microposts using Distributed Word Representations , 2015, NUT@IJCNLP.

[7]  G. Chowdhury,et al.  Introduction to Modern Information Retrieval, 3rd Edition , 2010 .

[8]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[9]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[10]  Anthony A. Braga,et al.  Deadly Lessons: Understanding Lethal School Violence. , 2002 .

[11]  Kezhi Mao,et al.  Semi-Random Projection for Dimensionality Reduction and Extreme Learning Machine in High-Dimensional Space , 2015, IEEE Computational Intelligence Magazine.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[14]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[15]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[16]  Elisheva F. Gross,et al.  Extending the school grounds?--Bullying experiences in cyberspace. , 2008, The Journal of school health.

[17]  Kathie Reid DEADLY LESSONS: Understanding Lethal School Violence , 2003 .

[18]  Robin M. Kowalski,et al.  Bullying in the digital age: a critical review and meta-analysis of cyberbullying research among youth. , 2014, Psychological bulletin.

[19]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .