A Study on the Methods to Identify and Classify Cyberbullying in Social Media

In recent years, researchers and organizations are working hard to tackle cyberbullying by creating websites to report, and developing algorithms to automatically classify abusive posts. In this research, a survey will be conducted to review current researches in cyberbullying classification. There are three steps to classify cyberbullying, i.e. collection of data set, training, and classification process. There are two approaches that can be used for the system namely, statistical and machine or deep learning approach. This study shows that the technique used to classify cyberbullying texts are shifting from statistical approach to machine learning such as SVM in 2015 and before, to deep learning such as CNN and LSTM in 2016 and later. Image analysis and social analysis of the victim or attacker can be added to help the cyberbullying classification. Deep learning is proven to be the most accurate method in most cases and data set. In this paper, we also contributed our Instagram dataset for public.

[1]  Elias Aboujaoude,et al.  Cyberbullying: Review of an Old Problem Gone Viral. , 2015, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[2]  Henry Lieberman,et al.  Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying , 2012, TIIS.

[3]  Shivakant Mishra,et al.  Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.

[4]  Joakim Nivre,et al.  On Statistical Methods in Natural Language Processing , 2001, NODALIDA.

[5]  Florentina Hristea Statistical Natural Language Processing , 2011, International Encyclopedia of Statistical Science.

[6]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[7]  R. Donegan Bullying and Cyberbullying : History , Statistics , Law , Prevention and Analysis , 2012 .

[8]  S. Bauman Types of Cyberbullying , 2015 .

[9]  Theodore Chu,et al.  Comment Abuse Classification with Deep Learning , 2017 .

[10]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[11]  Vimala Balakrishnan,et al.  Cyberbullying among young adults in Malaysia: The roles of gender, age and Internet frequency , 2015, Comput. Hum. Behav..

[12]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[13]  Anna Zięba,et al.  Google Books Ngram Viewer in Socio-Cultural Research , 2018, Research in Language.

[14]  Qianjia Huang,et al.  Cyber Bullying Detection Using Social and Textual Analysis , 2014, SAM '14.

[15]  Vishal. A. Kharde,et al.  Sentiment Analysis of Twitter Data : A Survey of Techniques , 2016, ArXiv.

[16]  Statistical Natural Language Processing 4.0. Introduction , .

[17]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[18]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[19]  Robin L. Wakefield,et al.  Friend or Foe: Cyberbullying in Social Network Sites , 2016, DATB.

[20]  Wiesław Wolny Sentiment Analysis of Twitter data Using Emoticons and Emoji Ideograms , 2016 .

[21]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[22]  Cornelia Caragea,et al.  Content-Driven Detection of Cyberbullying on the Instagram Social Network , 2016, IJCAI.

[23]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[24]  Narendra Shekokar,et al.  A Framework for Cyberbullying Detection in Social Network , 2015 .

[25]  Ned Horning,et al.  Random Forests : An algorithm for image classification and generation of continuous fields data sets , 2010 .

[26]  Fabio Massimo Zanzotto,et al.  Language Evolution in Social Media: a Preliminary Study , 2012 .

[27]  Barrett E. Lowe The Random Forest Algorithm with Application to Multispectral Image Analysis , 2015 .

[28]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[29]  Mandar Kale,et al.  Deep Learning for Digital Text Analytics: Sentiment Analysis , 2018, ArXiv.

[30]  Rajeev R. Raje,et al.  Collaborative detection of cyberbullying behavior in Twitter data , 2015, 2015 IEEE International Conference on Electro/Information Technology (EIT).

[31]  Walter Daelemans,et al.  Automatic Detection and Prevention of Cyberbullying , 2015 .

[32]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[33]  Kasturi Dewi Varathan,et al.  Cyberbullying Detection System on Twitter , 2015 .

[34]  Hao Wang,et al.  Sentiment expression via emoticons on social media , 2015, 2015 IEEE International Conference on Big Data (Big Data).