Hierarchical Memory Networks for Answer Selection on Unknown Words

Recently, end-to-end memory networks have shown promising results on Question Answering task, which encode the past facts into an explicit memory and perform reasoning ability by making multiple computational steps on the memory. However, memory networks conduct the reasoning on sentence-level memory to output coarse semantic vectors and do not further take any attention mechanism to focus on words, which may lead to the model lose some detail information, especially when the answers are rare or unknown words. In this paper, we propose a novel Hierarchical Memory Networks, dubbed HMN. First, we encode the past facts into sentence-level memory and word-level memory respectively. Then, (k)-max pooling is exploited following reasoning module on the sentence-level memory to sample the (k) most relevant sentences to a question and feed these sentences into attention mechanism on the word-level memory to focus the words in the selected sentences. Finally, the prediction is jointly learned over the outputs of the sentence-level reasoning module and the word-level attention mechanism. The experimental results demonstrate that our approach successfully conducts answer selection on unknown words and achieves a better performance than memory networks.

[1]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[2]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[3]  A. D. Spiegel,et al.  The teaching machine. , 1966, Postgraduate medicine.

[4]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[5]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[6]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[7]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[8]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[9]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[11]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[12]  Andrei Popescu-Belis,et al.  Multilingual Hierarchical Attention Networks for Document Classification , 2017, IJCNLP.

[13]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[14]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Pascal Vincent,et al.  Hierarchical Memory Networks , 2016, ArXiv.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[19]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[25]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[26]  Yang Yu,et al.  Empirical Study on Deep Learning Models for QA , 2015, 1510.07526.