SDNET: CONTEXTUALIZED ATTENTION-BASED DEEP NETWORK FOR CONVERSATIONAL QUESTION AN-

Conversational question answering (CQA) is a novel QA task that requires the understanding of dialogue context. Different from traditional single-turn machine reading comprehension (MRC), CQA is a comprehensive task comprised of passage reading, coreference resolution, and contextual understanding. In this paper, we propose an innovative contextualized attention-based deep neural network, SDNet, to fuse context into traditional MRC models. Our model leverages both inter-attention and self-attention to comprehend the conversation and passage. Furthermore, we demonstrate a novel method to integrate the BERT contextual model as a sub-module in our network. Empirical results show the effectiveness of SDNet. On the CoQA leaderboard, it outperforms the previous best model’s F1 score by 1.6%. Our ensemble model further improves the F1 score by 2.7%.

[1]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[2]  Eunsol Choi,et al.  FlowQA: Grasping Flow in History for Conversational Machine Comprehension , 2018, ICLR.

[3]  Mark Yatskar,et al.  A Qualitative Comparison of CoQA, SQuAD 2.0 and QuAC , 2018, NAACL.

[4]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[5]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[6]  Xiaodong Liu,et al.  Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[7]  Yelong Shen,et al.  FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension , 2017, ICLR.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[10]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[11]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Diederik P. Kingma,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Ahmed Elgohary,et al.  A dataset and baselines for sequential open-domain question answering , 2018, EMNLP.