Semantic Relation Classification via Hierarchical Recurrent Neural Network with Attention

Semantic relation classification remains a challenge in natural language processing. In this paper, we introduce a hierarchical recurrent neural network that is capable of extracting information from raw sentences for relation classification. Our model has several distinctive features: (1) Each sentence is divided into three context subsequences according to two annotated nominals, which allows the model to encode each context subsequence independently so as to selectively focus as on the important context information; (2) The hierarchical model consists of two recurrent neural networks (RNNs): the first one learns context representations of the three context subsequences respectively, and the second one computes semantic composition of these three representations and produces a sentence representation for the relationship classification of the two nominals. (3) The attention mechanism is adopted in both RNNs to encourage the model to concentrate on the important information when learning the sentence representations. Experimental results on the SemEval-2010 Task 8 dataset demonstrate that our model is comparable to the state-of-the-art without using any hand-crafted features.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[4]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[5]  Ming Yang,et al.  Bidirectional Long Short-Term Memory Networks for Relation Classification , 2015, PACLIC.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[9]  Mo Yu Factor-based Compositional Embedding Models , 2014 .

[10]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[12]  Bowen Zhou,et al.  Classifying Relations by Ranking with Convolutional Neural Networks , 2015, ACL.

[13]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[16]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[17]  Baobao Chang,et al.  An Effective Neural Network Model for Graph-based Dependency Parsing , 2015, ACL.

[18]  Ngoc Thang Vu,et al.  Combining Recurrent and Convolutional Neural Networks for Relation Classification , 2016, NAACL.

[19]  Nanda Kambhatla,et al.  Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Information Extraction , 2004, ACL.

[20]  Heng Ji,et al.  A Dependency-Based Neural Network for Relation Classification , 2015, ACL.

[21]  Zhi Jin,et al.  Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths , 2015, EMNLP.

[22]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[23]  Eduard H. Hovy,et al.  ISI: Automatic Classification of Relations Between Nominals Using a Maximum Entropy Classifier , 2010, *SEMEVAL.

[24]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[27]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[28]  Sanda M. Harabagiu,et al.  UTD: Classifying Semantic Relations by Combining Lexical and Semantic Resources , 2010, *SEMEVAL.

[29]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[30]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[31]  Dongyan Zhao,et al.  Semantic Relation Classification via Convolutional Neural Networks with Simple Negative Sampling , 2015, EMNLP.

[32]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.