Indonesian Twitter Cyberbullying Detection using Text Classification and User Credibility

Cyberbullying is a repeated act that harasses, humiliates, threatens, or hassles other people through electronic devices and online social networking websites. Cyberbullying through the internet is more dangerous than traditional bullying, because it can potentially amplify the humiliation to an unlimited online audience. According to UNICEF and a survey by the Indonesian Ministry of Communication and Information, 58% of 435 adolescents do not understand about cyberbullying. Some of them might even have been the bullies, but since they did not understand about cyberbullying they could not recognise the negative effects of their bullying. The bullies may not recognise the harm of their actions, because they do not see immediate responses from their victims. Our study aimed to detect cyberbullying actors based on texts and the credibility analysis of users and notify them about the harm of cyberbullying. We collected data from Twitter. Since the data were unlabelled, we built a web-based labelling tool to classify tweets into cyberbullying and non-cyberbullying tweets. We obtained 301 cyberbullying tweets, 399 non-cyberbullying tweets, 2,053 negative words and 129 swear words from the tool. Afterwards, we applied SVM and KNN to learn about and detect cyberbullying texts. The results show that SVM results in the highest f1-score, 67%. We also measured the credibility analysis of users and found 257 Normal Users, 45 Harmful Bullying Actors, 53 Bullying Actors and 6 Prospective Bullying Actors.