Probing for idiomaticity in vector space models

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language. In this paper, we propose probing measures to assess if some of the expected linguistic properties of noun compounds, especially those related to idiomatic meanings, and their dependence on context and sensitivity to lexical choice, are readily available in some standard and widely used representations. For that, we constructed the Noun Compound Senses Dataset, which contains noun compounds and their paraphrases, in context neutral and context informative naturalistic sentences, in two languages: English and Portuguese. Results obtained using four types of probing measures with models like ELMo, BERT and some of its variants, indicate that idiomaticity is not yet accurately represented by contextualised models

[1]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[2]  Hinrich Schütze,et al.  BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance , 2020, ACL.

[3]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[4]  Hinrich Schütze,et al.  Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts , 2019, NAACL-HLT.

[5]  Tal Linzen,et al.  A Neural Model of Adaptation in Reading , 2018, EMNLP.

[6]  Anderson da Silva Soares,et al.  Portuguese Named Entity Recognition Using LSTM-CRF , 2018, PROPOR.

[7]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[10]  Carlos Ramisch,et al.  Unsupervised Compositionality Prediction of Nominal Compounds , 2019, CL.

[11]  Ido Dagan,et al.  Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition , 2019, TACL.

[12]  Hinrich Schütze,et al.  Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking , 2019, AAAI.

[13]  Christina L. Gagné,et al.  Constituent integration during the processing of compound words: Does it involve the use of relational structures? , 2009 .

[14]  Yoav Goldberg,et al.  Assessing BERT's Syntactic Abilities , 2019, ArXiv.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Pei Zhou,et al.  Retrofitting Contextualized Word Embeddings with Paraphrases , 2019, EMNLP.

[17]  Jason Eisner,et al.  Lexical Semantics , 2020, The Handbook of English Linguistics.

[18]  Iryna Gurevych,et al.  Alternative Weighting Schemes for ELMo Embeddings , 2019, ArXiv.

[19]  Tal Linzen,et al.  Using Priming to Uncover the Organization of Syntactic Representations in Neural Language Models , 2019, CoNLL.

[20]  Paul Cook,et al.  Leveraging distributed representations and lexico-syntactic fixedness for token-level prediction of the idiomaticity of English verb-noun combinations , 2018, ACL.

[21]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[22]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[23]  Gregor Wiedemann,et al.  Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings , 2019, KONVENS.

[24]  Allyson Ettinger,et al.  What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, TACL.

[25]  Regina Barzilay,et al.  Cross-Lingual Alignment of Contextual Word Embeddings, with Applications to Zero-shot Dependency Parsing , 2019, NAACL.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Paul Cook,et al.  Supervised and unsupervised approaches to measuring usage similarity , 2017 .

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  Gemma Boleda,et al.  Putting Words in Context: LSTM Language Models and Lexical Ambiguity , 2019, ACL.

[30]  Aline Villavicencio,et al.  The brWaC Corpus: A New Open Resource for Brazilian Portuguese , 2018, LREC.

[31]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[32]  Ton Dijkstra,et al.  Context-dependent Semantic Processing in the Human Brain: Evidence from Idiom Comprehension , 2013, Journal of Cognitive Neuroscience.

[33]  Nathan Hartmann,et al.  Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks , 2017, STIL.

[34]  Meghdad Farahmand,et al.  Modeling the Non-Substitutability of Multiword Expressions with Distributional Semantics and a Log-Linear Model , 2016, MWE@ACL.

[35]  Christina L. Gagné,et al.  Benefits and costs of lexical decomposition and semantic integration during the processing of transparent and opaque English compounds , 2011 .

[36]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[37]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[38]  Christine Passerieux,et al.  On understanding idiomatic language: The salience hypothesis assessed by ERPs , 2006, Brain Research.

[39]  Rachel Giora,et al.  On the priority of salient meanings: Studies of literal and figurative language , 1999 .

[40]  Yun-Nung Chen,et al.  What Does This Word Mean? Explaining Contextualized Embeddings with Natural Language Definition , 2019, EMNLP.

[41]  Edouard Grave,et al.  Colorless Green Recurrent Networks Dream Hierarchically , 2018, NAACL.

[42]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[43]  Timothy Baldwin,et al.  How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions , 2019, Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for.

[44]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[45]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[46]  J. H. Neely,et al.  Semantic priming in the lexical decision task: roles of prospective prime-generated expectancies and retrospective semantic matching. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[47]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.