Multi-Cast Attention Networks

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention Networks (MCAN), a new attention mechanism and general model architecture for a potpourri of ranking tasks in the conversational modeling and question answering domains. Our approach performs a series of soft attention operations, each time casting a scalar feature upon the inner word embeddings. The key idea is to provide a real-valued hint (feature) to a subsequent encoder layer and is targeted at improving the representation learning process. There are several advantages to this design, e.g., it allows an arbitrary number of attention mechanisms to be casted, allowing for multiple attention types (e.g., co-attention, intra-attention) and attention variants (e.g., alignment-pooling, max-pooling, mean-pooling) to be executed simultaneously. This not only eliminates the costly need to tune the nature of the co-attention layer, but also provides greater extents of explainability to practitioners. Via extensive experiments on four well-known benchmark datasets, we show that MCAN achieves state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms existing state-of-the-art models by 9%. MCAN also achieves the best performing score to date on the well-studied TrecQA dataset.

[1]  Gerard de Melo,et al.  RE-PACRR: A Context and Density-Aware Neural Information Retrieval Model , 2017, ArXiv.

[2]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[3]  Jimmy J. Lin,et al.  Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search , 2018, AAAI.

[4]  Zhi-Hong Deng,et al.  Inter-Weighted Alignment Network for Sentence Pair Modeling , 2017, EMNLP.

[5]  Siu Cheung Hui,et al.  A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference , 2017, ArXiv.

[6]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[7]  Tie-Yan Liu,et al.  Word-Entity Duet Representations for Document Ranking , 2017, SIGIR.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Pengfei Liu,et al.  Modelling Interaction of Sentence Pair with Coupled-LSTMs , 2016, EMNLP.

[10]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[11]  Houfeng Wang,et al.  Attentive Interactive Neural Networks for Answer Selection in Community Question Answering , 2017, AAAI.

[12]  W. Bruce Croft,et al.  aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model , 2016, CIKM.

[13]  Jürgen Schmidhuber,et al.  Training Very Deep Networks , 2015, NIPS.

[14]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[15]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[16]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[17]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[18]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.

[19]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[20]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[21]  Shuohang Wang,et al.  A Compare-Aggregate Model for Matching Text Sequences , 2016, ICLR.

[22]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[23]  Jimmy J. Lin,et al.  Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement , 2016, NAACL.

[24]  Siu Cheung Hui,et al.  Multi-Pointer Co-Attention Networks for Recommendation , 2018, KDD.

[25]  Zhen Xu,et al.  Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling , 2016, ArXiv.

[26]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[27]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[28]  Gerard de Melo,et al.  PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[29]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[30]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[31]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[32]  Siu Cheung Hui,et al.  Cross Temporal Recurrent Networks for Ranking Question Answer Pairs , 2017, AAAI.

[33]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[34]  Larry P. Heck,et al.  Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[35]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[36]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[37]  Roberto Basili,et al.  KeLP at SemEval-2016 Task 3: Learning Semantic Relations between Questions and Answers , 2016, *SEMEVAL.

[38]  Jimmy J. Lin,et al.  Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks , 2016, CIKM.

[39]  Zhiyuan Liu,et al.  End-to-End Neural Ad-hoc Ranking with Kernel Pooling , 2017, SIGIR.

[40]  Dawei Song,et al.  End-to-End Quantum-like Language Models with Application to Question Answering , 2018, AAAI.

[41]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[42]  Jun Zhao,et al.  Inner Attention based Recurrent Neural Networks for Answer Selection , 2016, ACL.

[43]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[44]  Zhen Xu,et al.  Incorporating loose-structured knowledge into conversation modeling via recall-gate LSTM , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[45]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[46]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[47]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[48]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[49]  Jimmy J. Lin,et al.  Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[50]  Shafiq R. Joty,et al.  ConvKN at SemEval-2016 Task 3: Answer and Question Selection for Question Answering on Arabic and English Fora , 2016, *SEMEVAL.

[51]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[52]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[53]  Zhoujun Li,et al.  Knowledge Enhanced Hybrid Neural Network for Text Matching , 2018, AAAI.

[54]  W. Bruce Croft,et al.  Neural Ranking Models with Weak Supervision , 2017, SIGIR.

[55]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[56]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.