Integrating Global Attention for Pairwise Text Comparison

Attention guides computation to focus on important parts of the input data. For pairwise input, existing attention approaches tend to bias towards trivial repetitions (e.g. punctuations and stop words) between two texts, and thus failed to contribute reasonable guidance to model predictions. As a remedy, we suggest taking into account the corpus-level information via global-aware attention. In this paper, we propose an attention mechanism that makes use of intratext, inter-text and global contextual information. We undertake an ablation study on paraphrase identification, and demonstrate that the proposed attention mechanism can obviate the downsides of trivial repetitions and provide interpretable word weightings.

[1]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[2]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[3]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[4]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Wenpeng Yin,et al.  Attentive Convolution , 2017, ArXiv.

[7]  Jimmy J. Lin,et al.  Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement , 2016, NAACL.

[8]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[9]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[10]  Daniel P. W. Ellis,et al.  Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems , 2015, ArXiv.

[11]  Ernie Chang,et al.  Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model , 2017, SemEval@ACL.

[12]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[13]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[14]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[15]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.