IIITT@LT-EDI-EACL2021-Hope Speech Detection: There is always hope in Transformers

In a world with serious challenges like climate change, religious and political conflicts, global pandemics, terrorism, and racial discrimination, an internet full of hate speech, abusive and offensive content is the last thing we desire for. In this paper, we work to identify and promote positive and supportive content on these platforms. We work with several transformer-based models to classify social media comments as hope speech or not hope speech in English, Malayalam, and Tamil languages. This paper portrays our work for the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2021- EACL 2021. The codes for our best submission can be viewed.

[1]  Athena Vakali,et al.  A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[2]  Eva Schlinger,et al.  How Multilingual is Multilingual BERT? , 2019, ACL.

[3]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[4]  Bharathi Raja Chakravarthi,et al.  Findings of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion , 2021, LTEDI.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Ivan Vulic,et al.  Unsupervised Cross-Lingual Representation Learning , 2019, ACL.

[7]  Bharathi Raja Chakravarthi,et al.  KanCMD: Kannada CodeMixed Dataset for Sentiment Analysis and Offensive Language Detection , 2020, PEOPLES.

[8]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[9]  Mitesh M. Khapra,et al.  iNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages , 2020, FINDINGS.

[10]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[11]  Ghada M. Abaido Cyberbullying on social media platforms among university students in the United Arab Emirates , 2020, International Journal of Adolescence and Youth.

[12]  Bharathi Raja Chakravarthi,et al.  Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German , 2020, FIRE.

[13]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[14]  John P. McCrae,et al.  A Sentiment Analysis Dataset for Code-Mixed Malayalam-English , 2020, SLTU.

[15]  Yogesh Kumar Dwivedi,et al.  Advances in Social Media Research: Past, Present and Future , 2017, Information Systems Frontiers.

[16]  Bharathi Raja Chakravarthi,et al.  IIITK@DravidianLangTech-EACL2021: Offensive Language Identification and Meme Classification in Tamil, Malayalam and Kannada , 2021, DRAVIDIANLANGTECH.

[17]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[18]  Sai Saket Aluru,et al.  Deep Learning Models for Multilingual Hate Speech Detection , 2020, ArXiv.

[19]  John P. McCrae,et al.  Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text , 2020, FIRE.

[20]  John P. McCrae,et al.  WordNet Gloss Translation for Under-resourced Languages using Multilingual Neural Machine Translation , 2019, MomenT@MTSummit.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Asif Ekbal,et al.  Hostility Detection Dataset in Hindi , 2020, ArXiv.

[23]  Asoka Chakravarthi,et al.  Leveraging orthographic information to improve machine translation of under-resourced languages , 2020 .

[24]  Hung-yi Lee,et al.  Pretrained Language Model Embryology: The Birth of ALBERT , 2020, EMNLP.

[25]  Pierre Zweigenbaum,et al.  CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters , 2020, COLING.

[26]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[27]  Bharathi Raja Chakravarthi,et al.  UVCE-IIITT@DravidianLangTech-EACL2021: Tamil Troll Meme Classification: You need to Pay more Attention , 2021, DRAVIDIANLANGTECH.

[28]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[29]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[30]  John P. McCrae,et al.  A Survey of Current Datasets for Code-Switching Research , 2020, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS).

[31]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[32]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[33]  John P. McCrae,et al.  Named Entity Recognition for Code-Mixed Indian Corpus using Meta Embedding , 2020, 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS).

[34]  John P. McCrae,et al.  Improving Wordnets for Under-Resourced Languages Using Machine Translation , 2018, GWC.

[35]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[36]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[37]  Bharathi Raja Chakravarthi HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion , 2020, PEOPLES.

[38]  Bharathi Raja Chakravarthi,et al.  IIITT@DravidianLangTech-EACL2021: Transfer Learning for Offensive Language Detection in Dravidian Languages , 2021, DRAVIDIANLANGTECH.

[39]  Bharathi Raja Chakravarthi,et al.  IIITK@LT-EDI-EACL2021: Hope Speech Detection for Equality, Diversity, and Inclusion in Tamil , Malayalam and English , 2021, LTEDI.

[40]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[41]  Jaime G. Carbonell,et al.  Hope Speech Detection: A Computational Analysis of the Voice of Peace , 2020, ECAI.

[42]  Asim Karim,et al.  Hate-Speech and Offensive Language Detection in Roman Urdu , 2020, EMNLP.

[43]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[44]  Lara Fontanella,et al.  Thirty years of research into hate speech: topics of interest and their evolution , 2020, Scientometrics.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Abien Fred Agarap Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.

[47]  John P. McCrae,et al.  Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text , 2020, SLTU.