A Matrix-Vector Recurrent Unit Model for Capturing Compositional Semantics in Phrase Embeddings

The meaning of a multi-word phrase not only depends on the meaning of its constituent words, but also the rules of composing them to give the so-called compositional semantic. However, many deep learning models for learning compositional semantics target specific NLP tasks such as sentiment classification. Consequently, the word embeddings encode the lexical semantics, the weights of the networks are optimised for the classification task. Such models have no mechanisms to explicitly encode the compositional rules, and hence they are insufficient in capturing the semantics of phrases. We present a novel recurrent computational mechanism that specifically learns the compositionality by encoding the compositional rule of each word into a matrix. The network uses a recurrent architecture to capture the order of words for phrases with various lengths without requiring extra preprocessing such as part-of-speech tagging. The model is thoroughly evaluated on both supervised and unsupervised NLP tasks including phrase similarity, noun-modifier questions, sentiment distribution prediction, and domain specific term identification tasks. We demonstrate that our model consistently outperforms the LSTM and CNN deep learning models, simple algebraic compositions, and other popular baselines on different datasets.

[1]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[2]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[3]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[4]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[7]  Ming Zhou,et al.  A Recursive Recurrent Neural Network for Statistical Machine Translation , 2014, ACL.

[8]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[11]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[12]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[15]  Wenpeng Yin,et al.  An Exploration of Embeddings for Generalized Phrases , 2014, ACL.

[16]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[17]  Mirella Lapata,et al.  Language Models Based on Semantic Composition , 2009, EMNLP.

[18]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[19]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[20]  Peter D. Turney Domain and Function: A Dual-Space Model of Semantic Relations and Compositions , 2012, J. Artif. Intell. Res..

[21]  Wenpeng Yin,et al.  Discriminative Phrase Embedding for Paraphrase Identification , 2016, NAACL.

[22]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[23]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[24]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[25]  Gina R. Kuperberg,et al.  Neural mechanisms of language comprehension: Challenges to syntax , 2007, Brain Research.

[26]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[29]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[30]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[31]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[32]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[33]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[34]  G. Frege Über Sinn und Bedeutung , 1892 .

[35]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[36]  Barbara H. Partee,et al.  Lexical semantics and compositionality. , 1995 .

[37]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[38]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[39]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[40]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.