TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. Most of the previous work in this area rely on language-specific resources making it difficult to generalise across languages. Considering this limitation, our approach to SemEval-2021 Task 2 is based only on pretrained transformer models and does not use any language-specific processing and resources. Despite that, our best model achieves 0.90 accuracy for English-English subtask which is very compatible compared to the best result of the subtask; 0.93 accuracy. Our approach also achieves satisfactory results in other monolingual and cross-lingual language pairs as well.

[1]  Ignacio Iacobacci,et al.  Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[2]  Tharindu Ranasinghe,et al.  Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media , 2019, RANLP.

[3]  Roland Vollgraf,et al.  Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[4]  Marcos Zampieri,et al.  Comparing Approaches to Dravidian Language Identification , 2021, VARDIAL.

[5]  Tharindu Ranasinghe,et al.  InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction , 2020, WNUT.

[6]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[7]  Federico Martelli,et al.  SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) , 2021, SEMEVAL.

[8]  Marcos Zampieri,et al.  Offensive Language Identification in Greek , 2020, LREC.

[9]  Roberto Navigli,et al.  Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation , 2019, RANLP.

[10]  Constantin Orasan,et al.  TransQuest: Translation Quality Estimation with Cross-lingual Transformers , 2020, COLING.

[11]  Roberto Navigli,et al.  Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[12]  Tharindu Ranasinghe,et al.  BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media , 2020, SEMEVAL.

[13]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[14]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification for Low-resource Languages , 2021, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[15]  Daniel Loureiro,et al.  LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC) , 2019, SemDeep@IJCAI.

[16]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[17]  Hansi Hettiarachchi,et al.  BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word Similarity , 2020, SEMEVAL.

[18]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[19]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[20]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[21]  Marcos Zampieri,et al.  BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification , 2019, FIRE.

[22]  Marcos Zampieri,et al.  MUDES: Multilingual Detection of Offensive Spans , 2021, NAACL.

[23]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[24]  Roberto Navigli,et al.  SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation , 2020, AAAI.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Ruslan Mitkov,et al.  Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations , 2019, RANLP.

[27]  Ruslan Mitkov,et al.  Semantic Textual Similarity with Siamese Neural Networks , 2019, RANLP.

[28]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification with Cross-lingual Embeddings , 2020, EMNLP.

[29]  Marcos Zampieri,et al.  WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans , 2021, International Workshop on Semantic Evaluation.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Ifeoma Nwogu,et al.  WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments , 2020, FIRE.

[32]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[33]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[34]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.