Hermitian Co-Attention Networks for Text Matching in Asymmetrical Domains

Co-Attentions are highly effective attention mechanisms for text matching applications. Co-Attention enables the learning of pairwise attentions, i.e., learning to attend based on computing word-level affinity scores between two documents. However, text matching problems can exist in either symmetrical or asymmetrical domains. For example, paraphrase identification is a symmetrical task while question-answer matching and entailment classification are considered asymmetrical domains. In this paper, we argue that Co-Attention models in asymmetrical domains require different treatment as opposed to symmetrical domains, i.e., a concept of word-level directionality should be incorporated while learning word-level similarity scores. Hence, the standard inner product in real space commonly adopted in co-attention is not suitable. This paper leverages attractive properties of the complex vector space and proposes a co-attention mechanism based on the complex-valued inner product (Hermitian products). Unlike the real dot product, the dot product in complex space is asymmetric because the first item is conjugated. Aside from modeling and encoding directionality, our proposed approach also enhances the representation learning process. Extensive experiments on five text matching benchmark datasets demonstrate the effectiveness of our approach.

[1]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[2]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[3]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[4]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[5]  Jun Zhao,et al.  Inner Attention based Recurrent Neural Networks for Answer Selection , 2016, ACL.

[6]  Peter Clark,et al.  SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[7]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[8]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[9]  Siu Cheung Hui,et al.  A Compare-Propagate Architecture with Alignment Factorization for Natural Language Inference , 2017, ArXiv.

[10]  Xueqi Cheng,et al.  Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[11]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[12]  Yelong Shen,et al.  A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval , 2014, CIKM.

[13]  Sandeep Subramanian,et al.  Deep Complex Networks , 2017, ICLR.

[14]  Siu Cheung Hui,et al.  Cross Temporal Recurrent Networks for Ranking Question Answer Pairs , 2017, AAAI.

[15]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[16]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[17]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[18]  Siu Cheung Hui,et al.  Multi-Pointer Co-Attention Networks for Recommendation , 2018, KDD.

[19]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[20]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[21]  Zhoujun Li,et al.  Knowledge Enhanced Hybrid Neural Network for Text Matching , 2018, AAAI.

[22]  Zhen Xu,et al.  Incorporating Loose-Structured Knowledge into LSTM with Recall Gate for Conversation Modeling , 2016, ArXiv.

[23]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[24]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[25]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.