论文信息 - Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation

Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation

We describe the Uppsala NLP submission to SemEval-2021 Task 2 on multilingual and cross-lingual word-in-context disambiguation. We explore the usefulness of three pre-trained multilingual language models, XLM-RoBERTa (XLMR), Multilingual BERT (mBERT) and multilingual distilled BERT (mDistilBERT). We compare these three models in two setups, fine-tuning and as feature extractors. In the second case we also experiment with using dependency-based information. We find that fine-tuning is better than feature extraction. XLMR performs better than mBERT in the cross-lingual setting both with fine-tuning and feature extraction, whereas these two models give a similar performance in the multilingual setting. mDistilBERT performs poorly with fine-tuning but gives similar results to the other models when used as a feature extractor. We submitted our two best systems, fine-tuned with XLMR and mBERT.

[1] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.

[2] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[3] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[4] Roberto Navigli,et al. SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation , 2020, AAAI.

[5] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[6] José Camacho-Collados,et al. WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[7] Gregor Wiedemann,et al. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[8] Jacek Marciniak. Wordnet As a Backbone of Domain and Application Conceptualizations in Systems with Multimodal Data , 2020, MMW@LREC.

[9] Felipe Bravo-Marquez,et al. An ELMo-inspired approach to SemDeep-5's Word-in-Context task , 2019, SemDeep@IJCAI.

[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[12] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13] Maosong Sun,et al. Using BERT for Word Sense Disambiguation , 2019, ArXiv.

[14] Federico Martelli,et al. SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) , 2021, SEMEVAL.

[15] Alexander M. Fraser,et al. How Language-Neutral is Multilingual BERT? , 2019, ArXiv.

[16] Daniel Loureiro,et al. Language Modelling Makes Sense: Propagating Representations through WordNet for Full-Coverage Word Sense Disambiguation , 2019, ACL.

[17] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[18] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[19] Alexandre Allauzen,et al. LIMSI-MULTISEM at the IJCAI SemDeep-5 WiC Challenge: Context Representations for Word Usage Similarity Estimation , 2019, SemDeep@IJCAI.