The Lifted Matrix-Space Model for Semantic Composition

Recent advances in tree structured sentence encoding models have shown that explicitly modeling syntax can help handle compositionality. More specifically, recent works by \citetext{Socher2012}, \citetext{Socher2013}, and \citetext{Chen2013} have shown that using more powerful composition functions with multiplicative interactions within tree-structured models can yield significant improvements in model performance. However, existing compositional approaches which make use of these multiplicative interactions usually have to learn task-specific matrix-shaped word embeddings or rely on third-order tensors, which can be very costly. This paper introduces the Lifted Matrix-Space model which improves on the predecessors on this aspect. The model learns a global transformation from pre-trained word embeddings into matrices, which can be composed via matrix multiplication. The upshot is that we can capture the multiplicative interaction without learning matrix-valued word representations from scratch. In addition, our composition function effectively transmits a larger number of activations across layers with comparably few model parameters. We evaluate our model on the Stanford NLI corpus and the Multi-Genre NLI corpus and find that the Lifted Matrix-Space model outperforms the tree-structured long short-term memory networks.

[1]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.

[2]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[3]  Sebastian Rudolph,et al.  Compositional Matrix-Space Models of Language , 2010, ACL.

[4]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[5]  G. Frege Über Sinn und Bedeutung , 1892 .

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Yorick Wilks,et al.  Natural language inference. , 1973 .

[9]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[10]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[11]  Danqi Chen,et al.  Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors , 2013, ICLR.

[12]  Alexandros Potamianos,et al.  Structural Attention Neural Networks for improved sentiment analysis , 2017, EACL.

[13]  Chris Barker,et al.  Continuations and Natural Language , 2014, Oxford Studies in Theoretical Linguistics.

[14]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[15]  David J. Weir,et al.  Aligning Packed Dependency Trees: A Theory of Composition for Distributional Semantics , 2016, CL.

[16]  Yoshua Bengio,et al.  The representational geometry of word meanings acquired by neural machine translation models , 2017, Machine Translation.

[17]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[18]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[19]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[20]  Christopher D. Manning,et al.  Natural language inference , 2009 .

[21]  Nicholas Asher,et al.  Integrating Type Theory and Distributional Semantics: A Case Study on Adjective–Noun Compositions , 2016, CL.

[22]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[23]  Paul D. Elbourne Situations and individuals , 2005 .

[24]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[25]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[26]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Gennaro Chierchia,et al.  Meaning and Grammar: An Introduction to Semantics , 1990 .

[29]  Thomas F. Icard III,et al.  Recent Progress on Monotonicity , 2014, LILT.

[30]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[31]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[32]  S. Clark,et al.  A Compositional Distributional Model of Meaning , 2008 .

[33]  Phong Le,et al.  Compositional Distributional Semantics with Long Short Term Memory , 2015, *SEMEVAL.

[34]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[35]  Irene Heim,et al.  Semantics in generative grammar , 1998 .

[36]  David R. Dowty Compositionality as an Empirical Problem , 2006 .

[37]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[38]  Mohit Bansal,et al.  Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[39]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[40]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[41]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[42]  Stephen Clark,et al.  Concrete Sentence Spaces for Compositional Distributional Models of Meaning , 2010, IWCS.

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[45]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.