Optimizing Deep Learning for Detection Cyberbullying Text in Indonesian Language

Cyberbullying in Indonesia currently become a concern due to the increasing usage of social media. Cyberbullying detection is an important step to make good environments in social media interactions. This research is part of computational linguistics that focuses on the usage of deep learning to detect bullying sentence on Twitter. There are two important processes in this study. First, the process of forming a word representation. Second, the classification process for detecting bullying sentences. Pre-trained process to build the new representation of term/word is performed independently. Word2vec is used as a tool for the pre-trained process. There are two types of data used in the pre-training process. The first type of data only used testing data and training data, while the second type of data is the overall data, total 26,800 unique Twitter sentences including test data and training data. The classification process is formed using three main algorithms that are popular for text classification: LSTM, bi-LSTM, and CNN. 9.854 labeled sentences are extracted from 2.584 Twitter conversations used as the dataset. The dataset consists of 1.680 sentences are labeled as a bully and 6.343 sentences are labeled as neutral. A total of 504 experiments are conducted in this research by exploiting the preprocessing stage for determining machine learning features, dropout layers configuration and the algorithm of deep learning. The experiments show that the accuracy score reaches 90.57% while the recall score for bully class reaches 75.7%.

[1]  Erkki Sutinen,et al.  Automatic Detection of Antisocial Behaviour in Texts , 2014, Informatica.

[2]  Miftah Andriansyah,et al.  Cyberbullying comment classification on Indonesian Selebgram using support vector machine method , 2017, 2017 Second International Conference on Informatics and Computing (ICIC).

[3]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[4]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[5]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[6]  Mourad Ykhlef,et al.  Deep Learning Algorithm for Cyberbullying Detection , 2018 .

[7]  Anat Brunstein Klomek,et al.  Psychosocial Risk Factors Associated With Cyberbullying Among Adolescents , 2013 .

[8]  Franciska de Jong,et al.  Cyberbullying detection: a step toward a safer internet yard , 2012, WWW.

[9]  Xue Li,et al.  An Effective Approach for Cyberbullying Detection , 2013 .

[10]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[11]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[12]  Dade Nurjanah,et al.  Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility , 2018, 2018 International Conference on Information and Communications Technology (ICOIACT).