Adaptive Joint Learning of Compositional and Non-Compositional Phrase Embeddings

We present a novel method for jointly learning compositional and non-compositional phrase embeddings by adaptively weighting both types of embeddings using a compositionality scoring function. The scoring function is used to quantify the level of compositionality of each phrase, and the parameters of the function are jointly optimized with the objective for learning phrase embeddings. In experiments, we apply the adaptive joint learning method to the task of learning embeddings of transitive verb phrases, and show that the compositionality scores have strong correlation with human ratings for verb-object compositionality, substantially outperforming the previous state of the art. Moreover, our embeddings improve upon the previous best model on a transitive verb disambiguation task. We also show that a simple ensemble technique further improves the results for both tasks.

[1]  Tim Van de Cruys,et al.  A Neural Network Approach to Selectional Preference Acquisition , 2014, EMNLP.

[2]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[5]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[6]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[7]  Dimitri Kartsaklis,et al.  Resolving Lexical Ambiguity in Tensor Regression Models of Meaning , 2014, ACL.

[8]  Dimitri Kartsaklis,et al.  Evaluating Neural Word Representations in Tensor-Based Compositional Settings , 2014, EMNLP.

[9]  Dekang Lin,et al.  Automatic Identification of Non-compositional Phrases , 1999, ACL.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.

[12]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[13]  Angeliki Lazaridou,et al.  Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model , 2015, ACL.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[16]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[17]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[18]  Stephen Clark,et al.  An Exploration of Discourse-Based Sentence Spaces for Compositional Distributional Semantics , 2015, LSDSem@EMNLP.

[19]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[20]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[21]  Mona T. Diab,et al.  Verb Noun Construction MWE Token Classification , 2009, MWE@IJCNLP.

[22]  Veronika Vincze,et al.  Basic English Syntax with Exercises , 2006 .

[23]  Aravind K. Joshi,et al.  Detecting Compositionality of Verb-Object Combinations using Selectional Preferences , 2007, EMNLP-CoNLL.

[24]  Joakim Nivre,et al.  A Multiword Expression Data Set: Annotating Non-Compositionality and Conventionalization for English Noun Compounds , 2015, MWE@NAACL-HLT.

[25]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[26]  Meghdad Farahmand,et al.  Learning Semantic Composition to Detect Non-compositionality of Multiword Expressions , 2015, EMNLP.

[27]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[28]  Kazuma Hashimoto,et al.  Learning Embeddings for Transitive Verb Disambiguation by Implicit Tensor Factorization , 2015 .

[29]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[32]  Geoffrey Leech,et al.  100 Million Words of English:The British National Corpus (BNC) , 1992 .

[33]  Yotaro Watanabe,et al.  Finding The Best Model Among Representative Compositional Models , 2014, PACLIC.

[34]  Stephen Clark,et al.  Using Sentence Plausibility to Learn the Semantics of Transitive Verbs , 2014, ArXiv.

[35]  Yoshimasa Tsuruoka,et al.  Jointly Learning Word Representations and Composition Functions Using Predicate-Argument Structures , 2014, EMNLP.

[36]  Stephen Clark,et al.  Detecting Compositionality of Multi-Word Expressions using Nearest Neighbours in Vector Space Models , 2013, EMNLP.

[37]  Suresh Manandhar,et al.  An Empirical Study on Compositionality in Compound Nouns , 2011, IJCNLP.