Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences

Learning a matching function between two text sequences is a long standing problem in NLP research. This task enables many potential applications such as question answering and paraphrase identification. This paper proposes Co-Stack Residual Affinity Networks (CSRAN), a new and universal neural architecture for this problem. CSRAN is a deep architecture, involving stacked (multi-layered) recurrent encoders. Stacked/Deep architectures are traditionally difficult to train, due to the inherent weaknesses such as difficulty with feature propagation and vanishing gradients. CSRAN incorporates two novel components to take advantage of the stacked architecture. Firstly, it introduces a new bidirectional alignment mechanism that learns affinity weights by fusing sequence pairs across stacked hierarchies. Secondly, it leverages a multi-level attention refinement component between stacked recurrent layers. The key intuition is that, by leveraging information across all network hierarchies, we can not only improve gradient flow but also improve overall performance. We conduct extensive experiments on six well-studied text sequence matching datasets, achieving state-of-the-art performance on all.

[1]  Xiaoli Z. Fern,et al.  DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference , 2018, NAACL.

[2]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[3]  Si Li,et al.  A Compare-Aggregate Model with Dynamic-Clip Attention for Answer Selection , 2017, CIKM.

[4]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[5]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.

[6]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[7]  Jimmy J. Lin,et al.  Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement , 2016, NAACL.

[8]  Siu Cheung Hui,et al.  Enabling Efficient Question Answer Retrieval via Hyperbolic Neural Networks , 2017, ArXiv.

[9]  Siu Cheung Hui,et al.  A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference , 2017, ArXiv.

[10]  Zhoujun Li,et al.  Knowledge Enhanced Hybrid Neural Network for Text Matching , 2018, AAAI.

[11]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[12]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[13]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[14]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[15]  Jakob Uszkoreit,et al.  Neural Paraphrase Identification of Questions with Noisy Pretraining , 2017, SWCN@EMNLP.

[16]  Hua He,et al.  A Continuously Growing Dataset of Sentential Paraphrases , 2017, EMNLP.

[17]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[18]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[19]  Siu Cheung Hui,et al.  Cross Temporal Recurrent Networks for Ranking Question Answer Pairs , 2017, AAAI.

[20]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[21]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[22]  Siu Cheung Hui,et al.  Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains , 2018, IJCAI.

[23]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Siu Cheung Hui,et al.  Multi-Cast Attention Networks , 2018, KDD.

[25]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[26]  Yi Tay,et al.  NeuPL: Attention-based Semantic Matching and Pair-Linking for Entity Disambiguation , 2017, CIKM.

[27]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[28]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[29]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[30]  Jimmy J. Lin,et al.  Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search , 2018, AAAI.

[31]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[32]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[33]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[34]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[35]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Chris Callison-Burch,et al.  Extracting Lexically Divergent Paraphrases from Twitter , 2014, TACL.

[39]  Zhi-Hong Deng,et al.  Inter-Weighted Alignment Network for Sentence Pair Modeling , 2017, EMNLP.

[40]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[41]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[42]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yu Zhang,et al.  Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[45]  Mohit Bansal,et al.  Shortcut-Stacked Sentence Encoders for Multi-Domain Inference , 2017, RepEval@EMNLP.

[46]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[47]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.

[48]  Wei Xu,et al.  THE IMPORTANCE OF SUBWORD EMBEDDINGS IN SENTENCE PAIR MODELING , 2018, NAACL 2018.

[49]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[52]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[53]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[54]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[55]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[56]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[57]  Jun Zhao,et al.  Inner Attention based Recurrent Neural Networks for Answer Selection , 2016, ACL.