Feature Weight Tuning for Recursive Neural Networks

This paper addresses how a recursive neural network model can automatically leave out useless information and emphasize important evidence, in other words, to perform "weight tuning" for higher-level representation acquisition. We propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural Network (BENN), which automatically control how much one specific unit contributes to the higher-level representation. The proposed model can be viewed as incorporating a more powerful compositional function for embedding acquisition in recursive neural networks. Experimental results demonstrate the significant improvement over standard neural models.

[1]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Ani Nenkova,et al.  A Coherence Model Based on Syntactic Patterns , 2012, EMNLP.

[4]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[5]  Eduard H. Hovy,et al.  A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[8]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[9]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[10]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[11]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[12]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[14]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[15]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[16]  Claire Cardie,et al.  Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure , 2013, ArXiv.

[17]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[18]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[21]  Eduard H. Hovy,et al.  Recursive Deep Models for Discourse Parsing , 2014, EMNLP.

[22]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[23]  Jacob Eisenstein,et al.  Representation Learning for Text-level Discourse Parsing , 2014, ACL.

[24]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[25]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[26]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[27]  Misha Denil,et al.  Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network , 2014, ArXiv.

[28]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[29]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[30]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[31]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[32]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[35]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[36]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..