Cyberbullying detection in social media text based on character‐level convolutional neural network with shortcuts

As people spend increasingly more time on social networks, cyberbullying has become a social problem that needs to be solved by machine learning methods. Our research focuses on textual cyberbullying detection because text is the most common form of social media. However, the content information in social media is short, noisy, and unstructured with incorrect spellings and symbols, and this impacts the performance of some traditional machine learning methods based on vocabulary knowledge. For this reason, we propose a Char‐CNNS (Character‐level Convolutional Neural Network with Shortcuts) model to identify whether the text in social media contains cyberbullying. We use characters as the smallest unit of learning, enabling the model to overcome spelling errors and intentional obfuscation in real‐world corpora. Shortcuts are utilized to stitch different levels of features to learn more granular bullying signals, and a focal loss function is adopted to overcome the class imbalance problem. We also provide a new Chinese Weibo comment dataset specifically for cyberbullying detection, and experiments are performed on both the Chinese Weibo dataset and the English Tweet dataset. The experimental results show that our approach is competitive with state‐of‐the‐art techniques on cyberbullying detection task.

[1]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[2]  Rachel Dinkes,et al.  Indicators of School Crime and Safety: 2009. NCES 2010-012/NCJ 228478. , 2009 .

[3]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[4]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[5]  Justin W. Patchin,et al.  Bullying, Cyberbullying, and Suicide , 2010, Archives of suicide research : official journal of the International Academy for Suicide Research.

[6]  Bert Huang,et al.  Weakly supervised cyberbullying detection with participant-vocabulary consistency , 2018, Social Network Analysis and Mining.

[7]  Kelly Reynolds,et al.  Detecting cyberbullying: query terms and techniques , 2013, WebSci.

[8]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  R. Ordelman,et al.  Improved cyberbullying detection using gender information , 2012 .

[15]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[16]  Abhijeet Sudhir Kasture,et al.  A predictive model to detect online cyberbullying , 2015 .

[17]  Sung-Bae Cho,et al.  A Hybrid Deep Learning System of CNN and LRCN to Detect Cyberbullying from SNS Comments , 2018, HAIS.

[18]  Robert S. Tokunaga,et al.  Following you home from school: A critical review and synthesis of research on cyberbullying victimization , 2010, Comput. Hum. Behav..

[19]  Peter K. Smith,et al.  Cyberbullying: its nature and impact in secondary school pupils. , 2008, Journal of child psychology and psychiatry, and allied disciplines.

[20]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[21]  Peter K. Smith,et al.  Cyberbullying: another main type of bullying? , 2008, Scandinavian journal of psychology.

[22]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[23]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[24]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[25]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[26]  D. Swathi,et al.  Cyberbullying Detection based on Semantic- Enhanced Marginalized Denoising Auto-Encoder , 2018 .

[27]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[28]  Constantinos M. Kokkinos,et al.  The relationship between bullying, victimization, trait emotional intelligence, self-efficacy and empathy among preadolescents , 2011, Social Psychology of Education.

[29]  Abhishek Agrawal,et al.  Methods for detection of cyberbullying: A survey , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[30]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[31]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32]  Hongxin Hu,et al.  Cyberbullying Detection with a Pronunciation Based Convolutional Neural Network , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[33]  Tong Zhang,et al.  Convolutional Neural Networks for Text Categorization: Shallow Word-level vs. Deep Character-level , 2016, ArXiv.

[34]  Huan Liu,et al.  Sentiment Informed Cyberbullying Detection in Social Media , 2017, ECML/PKDD.

[35]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[36]  David Mandell Freeman,et al.  Using naive bayes to detect spammy names in social networks , 2013, AISec.

[37]  Kasturi Dewi Varathan,et al.  Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network , 2016, Comput. Hum. Behav..

[38]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.