Enhancing Recurrent Neural Networks with Positional Attention for Question Answering

Attention based recurrent neural networks (RNN) have shown a great success for question answering (QA) in recent years. Although significant improvements have been achieved over the non-attentive models, the position information is not well studied within the attention-based framework. Motivated by the effectiveness of using the word positional context to enhance information retrieval, we assume that if a word in the question (i.e., question word) occurs in an answer sentence, the neighboring words should be given more attention since they intuitively contain more valuable information for question answering than those far away. Based on this assumption, we propose a positional attention based RNN model, which incorporates the positional context of the question words into the answers' attentive representations. Experiments on two benchmark datasets show the great advantages of our proposed model. Specifically, we achieve a maximum improvement of 8.83% over the classical attention based RNN model in terms of mean average precision. Furthermore, our model is comparable to if not better than the state-of-the-art approaches for question answering.

[1]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[2]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[3]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[4]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[7]  Bowen Zhou,et al.  LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.

[8]  Xiangji Huang,et al.  Proximity-based rocchio's model for pseudo relevance , 2012, SIGIR '12.

[9]  Xiangji Huang,et al.  Using Term Location Information to Enhance Probabilistic Information Retrieval , 2015, SIGIR.

[10]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[11]  Xiangji Huang,et al.  Rewarding term location information to enhance probabilistic information retrieval , 2012, SIGIR '12.

[12]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[13]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[14]  Zhiguo Wang,et al.  FAQ-based Question Answering via Word Alignment , 2015, ArXiv.

[15]  Jun Zhao,et al.  Inner Attention based Recurrent Neural Networks for Answer Selection , 2016, ACL.

[16]  Ben He,et al.  CRTER: using cross terms to enhance probabilistic information retrieval , 2011, SIGIR '11.

[17]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.