Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy

This paper presents a multilingual study of word meaning representations in context. We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy. To do so, we created a new multilingual dataset that allows us to perform a controlled evaluation of several factors such as the impact of the surrounding context or the overlap between words, conveying the same or different senses. A systematic assessment on four scenarios shows that the best monolingual models based on Transformers can adequately disambiguate homonyms in context. However, as they rely heavily on context, these models fail at representing words with different senses when occurring in similar sentences. Experiments are performed in Galician, Portuguese, English, and Spanish, and both the dataset (with more than 3,000 evaluation items) and new models are freely released with this study.

[1]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[2]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  김정호 TRANSFORMER용 주요 자재 , 1999 .

[5]  Karsten Steinhauer,et al.  Not all ambiguous words are created equal: An EEG investigation of homonymy and polysemy , 2012, Brain and Language.

[6]  Agustín Vicente,et al.  On the nature of the lexicon: The status of rich lexical meanings , 2020, Journal of Linguistics.

[7]  Katrin Erk,et al.  Vector Space Models of Word Meaning and Phrase Meaning: A Survey , 2012, Lang. Linguistics Compass.

[8]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[9]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[10]  Adam Kilgarriff,et al.  GDEX: Automatically Finding Good Dictionary Examples in a Corpus , 2008 .

[11]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[12]  Ido Dagan,et al.  context2vec: Learning Generic Context Embedding with Bidirectional LSTM , 2016, CoNLL.

[13]  Jan Hajic,et al.  UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging , 2019, Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology.

[14]  Mikhail Arkhipov,et al.  Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language , 2019, ArXiv.

[15]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Gemma Boleda,et al.  Putting Words in Context: LSTM Language Models and Lexical Ambiguity , 2019, ACL.

[18]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[19]  Brenden M. Lake,et al.  Word meaning in minds and machines , 2020, Psychological review.

[20]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[21]  Scott McDonald,et al.  A Distributional Model of Semantic Context Effects in Lexical Processing , 2004, ACL.

[22]  Vishrav Chaudhary,et al.  CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[23]  Mahesh Srinivasan,et al.  Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge , 2020, COGALEX.

[24]  Katrin Erk,et al.  What Is Word Meaning, Really? (And How Can Distributional Models Help Us Describe It?) , 2010 .

[25]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  David Tuggy Ambiguity, polysemy, and vagueness , 1993 .

[28]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[29]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[30]  Xavier Gómez Guinovart,et al.  Developing New Linguistic Resources and Tools for the Galician Language , 2018, LREC.

[31]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[32]  Alessandro Raganato,et al.  XL-WiC: A Multilingual Benchmark for Evaluating Semantic Contextualization , 2020, EMNLP.

[33]  Nathan Hartmann,et al.  Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks , 2017, STIL.

[34]  Pablo Gamallo,et al.  Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação , 2010, Linguamática.

[35]  Jason Eisner,et al.  Lexical Semantics , 2020, The Handbook of English Linguistics.

[36]  Gerard de Melo,et al.  OpenWordNet-PT: A Project Report , 2014, GWC.

[37]  Gregor Wiedemann,et al.  Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[38]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[39]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[40]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[41]  John Lyons,et al.  Linguistic Semantics: An Introduction , 1995 .

[42]  Allyson Ettinger,et al.  Assessing Phrasal Representation and Composition in Transformers , 2020, EMNLP.

[43]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[44]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[45]  Marco Baroni,et al.  Composition in Distributional Semantics , 2013, Lang. Linguistics Compass.

[46]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[47]  Jurij D. Apresjan REGULAR POLYSEMY , 1974 .

[48]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[49]  José Camacho-Collados,et al.  WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations , 2018, NAACL.

[50]  Goran Glavas,et al.  Probing Pretrained Language Models for Lexical Semantics , 2020, EMNLP.

[51]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[52]  The Truth About Chickens and Bats , 2013, Psychological science.

[53]  Xavier Gómez Guinovart,et al.  Galnet: WordNet 3.0 do galego , 2011, Linguamática.

[54]  María Auxiliadora Castillo Carballo,et al.  La lexicografía en su dimensión teórica , 2010 .

[55]  Massimo Poesio,et al.  Assessing Polyseme Sense Similarity through Co-predication Acceptability and Contextualised Embedding Distance , 2020, STARSEM.

[56]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[57]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[58]  Walter Kintsch,et al.  Predication , 2001, Cogn. Sci..

[59]  Roberto de Alencar Lotufo,et al.  BERTimbau: Pretrained BERT Models for Brazilian Portuguese , 2020, BRACIS.

[60]  Ekaterini Klepousniotou,et al.  Sustained meaning activation for polysemous but not homonymous words: Evidence from EEG , 2015, Neuropsychologia.

[61]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.