Semantic Compositionality through Recursive Matrix-Vector Spaces

Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

[1]  G. Frege Über Sinn und Bedeutung , 1892 .

[2]  Richard Montague,et al.  ENGLISH AS A FORMAL LANGUAGE , 1975 .

[3]  Janet Metcalfe,et al.  A composite holographic associative recall model , 1982 .

[4]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[5]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[6]  Tony A. Plate,et al.  Holographic reduced representations , 1995, IEEE Trans. Neural Networks.

[7]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[8]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[9]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[10]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[11]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[12]  J. Elman Distributed representations, simple recurrent networks, and grammatical structure , 1991, Machine Learning.

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Jeffrey P. Bigham,et al.  Names and Similarities on the Web: Fact Extraction in the Fast Lane , 2006, ACL.

[15]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[16]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[17]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[18]  Stephen Clark,et al.  Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.

[19]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[20]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[21]  Dominic Widdows,et al.  Semantic Vector Products: Some Initial Investigations , 2008 .

[22]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[23]  Christopher Potts On the negativity of negation , 2010 .

[24]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[25]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[26]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[27]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[28]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[29]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[30]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[31]  Sebastian Rudolph,et al.  Compositional Matrix-Space Models of Language , 2010, ACL.

[32]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[33]  Katrin Erk,et al.  Integrating Logical Representations with Probabilistic Information using Markov Logic , 2011, IWCS.

[34]  Claire Cardie,et al.  Compositional Matrix-Space Models for Sentiment Analysis , 2011, EMNLP.

[35]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[36]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[37]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[38]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[39]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[40]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[41]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.