On the Compositionality Prediction of Noun Phrases using Poincaré Embeddings

The compositionality degree of multiword expressions indicates to what extent the meaning of a phrase can be derived from the meaning of its constituents and their grammatical relations. Prediction of (non)-compositionality is a task that has been frequently addressed with distributional semantic models. We introduce a novel technique to blend hierarchical information with distributional information for predicting compositionality. In particular, we use hypernymy information of the multiword and its constituents encoded in the form of the recently introduced Poincare embeddings in addition to the distributional information to detect compositionality for noun phrases. Using a weighted average of the distributional similarity and a Poincare similarity function, we obtain consistent and substantial, statistically significant improvement across three gold standard datasets over state-of-the-art models based on distributional information only. Unlike traditional approaches that solely use an unsupervised setting, we have also framed the problem as a supervised task, obtaining comparable improvements. Further, we publicly release our Poincare embeddings, which are trained on the output of handcrafted lexical-syntactic patterns on a large corpus.

[1]  Carlos Ramisch,et al.  How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality , 2016, ACL.

[2]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.

[3]  Jing Peng,et al.  A Distributional Semantics Model for Idiom Detection - The Case of English and Russian , 2018, ICAART.

[4]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[5]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[6]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[7]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[8]  Matt Le,et al.  Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings , 2019, ACL.

[9]  Valentin Khrulkov,et al.  Hyperbolic Image Embeddings , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Christian Biemann,et al.  Distributional Semantics and Compositionality 2011: Shared Task Description and Results , 2011 .

[11]  Timothy Baldwin,et al.  A Word Embedding Approach to Predicting the Compositionality of Multiword Expressions , 2015, NAACL.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Georgiana Dinu,et al.  DISSECT - DIStributional SEmantics Composition Toolkit , 2013, ACL.

[15]  Meghdad Farahmand,et al.  Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions , 2015, EMNLP.

[16]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[18]  Beata Beigman Klebanov,et al.  Catching Idiomatic Expressions in EFL Essays , 2018, Fig-Lang@NAACL-HLT.

[19]  Shantanu Acharya,et al.  Every child should have parents: a taxonomy refinement algorithm based on hyperbolic term embeddings , 2019, ACL.

[20]  学 加納,et al.  Partial Least Squares Regression を用いた蒸留塔製品組成の推定制御 , 1998 .

[21]  Joakim Nivre,et al.  A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds , 2015, MWE@NAACL-HLT.

[22]  Jakob Grue Simonsen,et al.  Non-Compositional Term Dependence for Information Retrieval , 2015, SIGIR.

[23]  Alessandro Lenci,et al.  Finding the Neural Net: Deep-learning Idiom Type Identification from Distributional Vectors , 2018, Italian Journal of Computational Linguistics.

[24]  Stefano Faralli,et al.  TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling , 2016, *SEMEVAL.

[25]  Carlos Ramisch,et al.  Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time , 2016, ACL.

[26]  Silvia Bernardini,et al.  Introducing and evaluating ukWaC , a very large web-derived corpus of English , 2008 .

[27]  Aravind K. Joshi,et al.  Detecting Compositionality of Verb-Object Combinations using Selectional Preferences , 2007, EMNLP-CoNLL.

[28]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[29]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[30]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.

[31]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[32]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[33]  Alexander Panchenko,et al.  A semantic similarity measure based on lexico-syntactic patterns , 2012, KONVENS.

[34]  Timothy Baldwin,et al.  Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality , 2014, EACL.

[35]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[36]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[37]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[38]  Vladimir Vovk,et al.  Kernel Ridge Regression , 2013, Empirical Inference.

[39]  Carlos Ramisch,et al.  Unsupervised Compositionality Prediction of Nominal Compounds , 2019, CL.

[40]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[41]  Stephen Clark,et al.  Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models , 2013, EMNLP.

[42]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[43]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[44]  Eduard H. Hovy,et al.  ISI: Automatic Classification of Relations Between Nominals Using a Maximum Entropy Classifier , 2010, *SEMEVAL.

[45]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[46]  I. Helland Partial Least Squares Regression , 2006 .

[47]  Markus Neuhäuser,et al.  Wilcoxon Signed Rank Test , 2006 .