论文信息 - TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation - 字舞流文

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction. Most of the previous work in this area rely on language-specific resources making it difficult to generalise across languages. Considering this limitation, our approach to SemEval-2021 Task 2 is based only on pretrained transformer models and does not use any language-specific processing and resources. Despite that, our best model achieves 0.90 accuracy for English-English subtask which is very compatible compared to the best result of the subtask; 0.93 accuracy. Our approach also achieves satisfactory results in other monolingual and cross-lingual language pairs as well.

Tharindu Ranasinghe | Hansi Hettiarachchi | Hansi Hettiarachchi | Tharindu Ranasinghe

[1] Ignacio Iacobacci,et al. Embeddings for Word Sense Disambiguation: An Evaluation Study , 2016, ACL.

[2] Tharindu Ranasinghe,et al. Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media , 2019, RANLP.

[3] Roland Vollgraf,et al. Pooled Contextualized Embeddings for Named Entity Recognition , 2019, NAACL.

[4] Marcos Zampieri,et al. Comparing Approaches to Dravidian Language Identification , 2021, VARDIAL.

[5] Tharindu Ranasinghe,et al. InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction , 2020, WNUT.

[6] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[7] Federico Martelli,et al. SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) , 2021, SEMEVAL.

[8] Marcos Zampieri,et al. Offensive Language Identification in Greek , 2020, LREC.

[9] Roberto Navigli,et al. Quasi Bidirectional Encoder Representations from Transformers for Word Sense Disambiguation , 2019, RANLP.

[10] Constantin Orasan,et al. TransQuest: Translation Quality Estimation with Cross-lingual Transformers , 2020, COLING.

[11] Roberto Navigli,et al. Neural Sequence Learning Models for Word Sense Disambiguation , 2017, EMNLP.

[12] Tharindu Ranasinghe,et al. BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media , 2020, SEMEVAL.

[13] José Camacho-Collados,et al. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[14] Marcos Zampieri,et al. Multilingual Offensive Language Identification for Low-resource Languages , 2021, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[15] Daniel Loureiro,et al. LIAAD at SemDeep-5 Challenge: Word-in-Context (WiC) , 2019, SemDeep@IJCAI.

[16] Simone Paolo Ponzetto,et al. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[17] Hansi Hettiarachchi,et al. BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word Similarity , 2020, SEMEVAL.

[18] Jonas Mueller,et al. Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[19] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[20] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[21] Marcos Zampieri,et al. BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification , 2019, FIRE.

[22] Marcos Zampieri,et al. MUDES: Multilingual Detection of Offensive Spans , 2021, NAACL.

[23] Ido Dagan,et al. context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[24] Roberto Navigli,et al. SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation , 2020, AAAI.

[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26] Ruslan Mitkov,et al. Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations , 2019, RANLP.

[27] Ruslan Mitkov,et al. Semantic Textual Similarity with Siamese Neural Networks , 2019, RANLP.

[28] Marcos Zampieri,et al. Multilingual Offensive Language Identification with Cross-lingual Embeddings , 2020, EMNLP.

[29] Marcos Zampieri,et al. WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans , 2021, International Workshop on Semantic Evaluation.

[30] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[31] Ifeoma Nwogu,et al. WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments , 2020, FIRE.

[32] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.

[33] Jimmy J. Lin,et al. End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[34] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.