SAJA at TRAC 2020 Shared Task: Transfer Learning for Aggressive Identification with XGBoost

we have developed a system based on transfer learning technique depending on universal sentence encoder (USE) embedding that will be trained in our developed model using xgboost classifier to identify the aggressive text data from English content. A reference dataset has been provided from TRAC 2020 to evaluate the developed approach. The developed approach achieved in sub-task EN-A 60.75% F1 (weighted) which ranked fourteenth out of sixteen teams and achieved 85.66% F1 (weighted) in sub-task EN-B which ranked six out of fifteen teams.

[1]  Atul Kr. Ojha,et al.  Developing a Multilingual Annotated Corpus of Misogyny and Aggression , 2020, TRAC.

[2]  Alexander F. Gelbukh,et al.  Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling , 2018, TRAC@COLING 2018.

[3]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[4]  Asif Ekbal,et al.  An Ensemble Approach for Aggression Identification in English and Hindi Text , 2018, TRAC@COLING 2018.

[5]  Nikola S. Nikolov,et al.  Detecting Offensive Language on Arabic Social Media Using Deep Learning , 2019, 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[6]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[7]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[8]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[9]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[10]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[14]  Nishant Nikhil,et al.  LSTMs with Attention for Aggression Detection , 2018, TRAC@COLING 2018.

[15]  Tomaz Erjavec,et al.  Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene , 2017, ALW@ACL.

[16]  Samhaa R. El-Beltagy,et al.  AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP , 2017, ACLING.

[17]  Amitava Das,et al.  NIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora , 2019, *SEMEVAL.

[18]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[19]  Hui-Po Su,et al.  Rephrasing Profanity in Chinese Text , 2017, ALW@ACL.

[20]  Liang Zou,et al.  NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers , 2019, *SEMEVAL.

[21]  Ellen Spertus,et al.  Smokey: Automatic Recognition of Hostile Messages , 1997, AAAI/IAAI.

[22]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[25]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[26]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[28]  Thamar Solorio,et al.  RiTUAL-UH at TRAC 2018 Shared Task: Aggression Identification , 2018, TRAC@COLING 2018.

[29]  A. Al-Hassan,et al.  DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS , 2019, Computer Science & Information Technology(CS & IT).

[30]  Xinyu Liu,et al.  jhan014 at SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media , 2019, *SEMEVAL.

[31]  Petra Kralj Novak,et al.  Embeddia at SemEval-2019 Task 6: Detecting Hate with Neural Network and Transfer Learning Approaches , 2019, *SEMEVAL.

[32]  Josiane Mothe,et al.  IRIT at TRAC 2018 , 2018, TRAC@COLING 2018.

[33]  Shervin Malmasi,et al.  Evaluating Aggression Identification in Social Media , 2020, TRAC.

[34]  Rajiv Ratn Shah,et al.  MIDAS at SemEval-2019 Task 6: Identifying Offensive Posts and Targeted Offense from Twitter , 2019, *SEMEVAL.

[35]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[36]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[37]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[38]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.