Globally Normalized Reader

Rapid progress has been made towards question answering (QA) systems that can extract answers from text. Existing neural approaches make use of expensive bi-directional attention mechanisms or score all possible answer spans, limiting scalability. We propose instead to cast extractive QA as an iterative search problem: select the answer’s sentence, start word, and end word. This representation reduces the space of each search step and allows computation to be conditionally allocated to promising search paths. We show that globally normalizing the decision process and back-propagating through beam search makes this representation viable and learning efficient. We empirically demonstrate the benefits of this approach using our model, Globally Normalized Reader (GNR), which achieves the second highest single model performance on the Stanford Question Answering Dataset (68.4 EM, 76.21 F1 dev) and is 24.7x faster than bi-attention-flow. We also introduce a data-augmentation method to produce semantically valid examples by aligning named entities to a knowledge base and swapping them with new entities of the same type. This method improves the performance of all models considered in this work and is of independent interest for a variety of NLP tasks.

[1]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[2]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[3]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[4]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Yue Zhang,et al.  A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[9]  Sidney D'Mello,et al.  Data mining and education. , 2015, Wiley interdisciplinary reviews. Cognitive science.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[12]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[13]  Ryen W. White,et al.  Influence of Pokémon Go on Physical Activity: Study and Implications , 2016, Journal of medical Internet research.

[14]  Philip Bachman,et al.  A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data , 2016, ACL.

[15]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[16]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[17]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[18]  Soroush Vosoughi,et al.  DeepStance at SemEval-2016 Task 6: Detecting Stance in Tweets Using Character and Word-Level CNNs , 2016, *SEMEVAL.

[19]  Kenton Lee,et al.  Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[20]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[21]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[22]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[23]  Hoifung Poon,et al.  Distant Supervision for Relation Extraction beyond the Sentence Boundary , 2016, EACL.

[24]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[25]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[26]  Dirk Weissenborn,et al.  Making Neural QA as Simple as Possible but not Simpler , 2017, CoNLL.

[27]  Ming Zhou,et al.  Neural Question Generation from Text: A Preliminary Study , 2017, NLPCC.

[28]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[29]  Dirk Weissenborn,et al.  FastQA: A Simple and Efficient Neural Architecture for Question Answering , 2017, ArXiv.

[30]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[31]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.