SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)

In this paper, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Context disambiguation (MCL-WiC). This task allows the largely under-investigated inherent ability of systems to discriminate between word senses within and across languages to be evaluated, dropping the requirement of a fixed sense inventory. Framed as a binary classification, our task is divided into two parts. In the multilingual sub-task, participating systems are required to determine whether two target words, each occurring in a different context within the same language, express the same meaning or not. Instead, in the cross-lingual part, systems are asked to perform the task in a cross-lingual scenario, in which the two target words and their corresponding contexts are provided in two different languages. We illustrate our task, as well as the construction of our manually-created dataset including five languages, namely Arabic, Chinese, English, French and Russian, and the results of the participating systems. Datasets and results are available at: https://github.com/SapienzaNLP/mcl-wic.

[1]  Zheng Yuan,et al.  Cambridge at SemEval-2021 Task 2: Neural WiC-Model with Data Augmentation and Exploration of Representation , 2021, SEMEVAL.

[2]  Hao Wu,et al.  Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification , 2018, IEEE Transactions on Image Processing.

[3]  Rada Mihalcea,et al.  SemEval-2010 Task 2: Cross-Lingual Lexical Substitution , 2009, SemEval@ACL.

[4]  Roberto Navigli,et al.  EViLBERT: Learning Task-Agnostic Multimodal Sense Embeddings , 2020, IJCAI.

[5]  Shiva Taslimipoor,et al.  SeCoDa: Sense Complexity Dataset , 2020, LREC.

[6]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[7]  Ignacio Iacobacci,et al.  SensEmbed: Learning Sense Embeddings for Word and Relational Similarity , 2015, ACL.

[8]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[9]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[10]  Roberto Navigli,et al.  SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation , 2020, AAAI.

[11]  Nikolay Arefyev,et al.  LIORI at SemEval-2021 Task 2: Span Prediction and Binary Classification approaches to Word-in-Context Disambiguation , 2021, SEMEVAL.

[12]  Qiang Dong,et al.  HowNet - a hybrid language and knowledge resource , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[13]  Roberto Navigli,et al.  With More Contexts Comes Better Performance: Contextualized Sense Embeddings for All-Round Word Sense Disambiguation , 2020, EMNLP.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Alessandro Raganato,et al.  XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization , 2020, EMNLP.

[17]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[18]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[19]  Jianping Shen,et al.  PALI at SemEval-2021 Task 2: Fine-Tune XLM-RoBERTa for Word in Context Disambiguation , 2021, SEMEVAL.

[20]  Boris Zhestiankin,et al.  Zhestyatsky at SemEval-2021 Task 2: ReLU over Cosine Similarity for BERT Fine-tuning , 2021, SEMEVAL.

[21]  Daniel Loureiro,et al.  Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation , 2019, ACL.

[22]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[23]  Marcin Junczys-Dowmunt,et al.  The United Nations Parallel Corpus v1.0 , 2016, LREC.

[24]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[25]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[26]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[27]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[28]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[29]  Senja Pollak,et al.  SemEval-2020 Task 3: Graded Word Similarity in Context , 2020, SEMEVAL.

[30]  Andrew McCallum,et al.  Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space , 2014, EMNLP.

[31]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[32]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[33]  Hao Tian,et al.  ERNIE 2.0: A Continual Pre-training Framework for Language Understanding , 2019, AAAI.

[34]  Ashutosh Modi,et al.  MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation using Augmented Data, Signals, and Transformers , 2021, SEMEVAL.