Learning Composition Models for Phrase Embeddings

Lexical embeddings can serve as useful representations for words for a variety of NLP tasks, but learning embeddings for phrases can be challenging. While separate embeddings are learned for each word, this is infeasible for every phrase. We construct phrase embeddings by learning how to compose word embeddings using features that capture phrase structure and context. We propose efficient unsupervised and task-specific learning objectives that scale our model to large datasets. We demonstrate improvements on both language modeling and several phrase semantic similarity tasks with various phrase lengths. We make the implementation of our model and the datasets available for general use.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Mark Dredze,et al.  Improving Lexical Embeddings with Semantic Knowledge , 2014, ACL.

[3]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[4]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[5]  Mirella Lapata,et al.  Measuring Distributional Similarity in Context , 2010, EMNLP.

[6]  Mo Yu Factor-based Compositional Embedding Models , 2014 .

[7]  Georgiana Dinu,et al.  How to make words with vectors: Phrase generation in distributional semantics , 2014, ACL.

[8]  Mark Dredze,et al.  Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction , 2015, HLT-NAACL.

[9]  Yuan Cao,et al.  Online Learning in Tensor Space , 2014, ACL.

[10]  Phil Blunsom,et al.  The Role of Syntax in Vector Space Models of Compositional Semantics , 2013, ACL.

[11]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[12]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[13]  Ronan Collobert,et al.  Deep Learning for Efficient Discriminative Parsing , 2011, AISTATS.

[14]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[15]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[16]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[17]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[18]  Ioannis Korkontzelos,et al.  SemEval-2013 Task 5: Evaluating Phrasal Semantics , 2013, *SEMEVAL.

[19]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[20]  Katrin Erk Towards a semantics for distributional representations , 2013, IWCS.

[21]  Michael Roth,et al.  Composition of Word Representations Improves Semantic Role Labelling , 2014, EMNLP.

[22]  Ralph Grishman,et al.  Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction , 2014, ACL.

[23]  Mehrnoosh Sadrzadeh,et al.  Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[24]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[25]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[26]  Jackie Chi Kit Cheung,et al.  Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors , 2013, ACL.

[27]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[28]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[29]  Ruslan Salakhutdinov,et al.  A Multiplicative Model for Learning Distributed Text-Based Attribute Representations , 2014, NIPS.

[30]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[31]  Jason Weston,et al.  A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[32]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[33]  Stefan Thater,et al.  Word Meaning in Context: A Simple and Effective Vector Model , 2011, IJCNLP.

[34]  Benjamin Van Durme,et al.  Annotated Gigaword , 2012, AKBC-WEKEX@NAACL-HLT.

[35]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[36]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[37]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[38]  Jason Weston,et al.  Semantic Frame Identification with Distributed Word Representations , 2014, ACL.

[39]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[40]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.