Multi-level Head-wise Match and Aggregation in Transformer for Textual Sequence Matching

Transformer has been successfully applied to many natural language processing tasks. However, for textual sequence matching, simple matching between the representation of a pair of sequences might bring in unnecessary noise. In this paper, we propose a new approach to sequence pair matching with Transformer, by learning head-wise matching representations on multiple levels. Experiments show that our proposed approach can achieve new state-of-the-art performance on multiple tasks that rely only on pre-computed sequence-vector-representation, such as SNLI, MNLI-match, MNLI-mismatch, QQP, and SQuAD-binary.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[4]  Bowen Zhou,et al.  LSTM-based Deep Learning Models for non-factoid answer selection , 2015, ArXiv.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[7]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[8]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[9]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[10]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[11]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[14]  Alex Wang,et al.  Can You Tell Me How to Get Past Sesame Street? Sentence-Level Pretraining Beyond Language Modeling , 2018, ACL.

[15]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[16]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[17]  SangKeun Lee,et al.  Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding , 2018, ArXiv.

[18]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[19]  Chengqi Zhang,et al.  Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling , 2018, IJCAI.

[20]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[21]  Ali Farhadi,et al.  Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension , 2018, EMNLP.

[22]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[23]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[24]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[25]  Stephen Wan,et al.  Using Dependency-Based Features to Take the ’Para-farce’ out of Paraphrase , 2006, ALTA.

[26]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[27]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[28]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[29]  Siu Cheung Hui,et al.  Compare, Compress and Propagate: Enhancing Neural Architectures with Alignment Factorization for Natural Language Inference , 2017, EMNLP.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[32]  Alex Wang,et al.  Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling , 2018, ArXiv.

[33]  Rajarshi Das,et al.  Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.

[34]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.