FastQA: A Simple and Efficient Neural Architecture for Question Answering

Recent development of large-scale question answering (QA) datasets triggered a substantial amount of research into end-toend neural architectures for QA. Increasingly complex systems have been conceived without comparison to a simpler neural baseline system that would justify their complexity. In this work, we propose a simple heuristic that guided the development of FastQA, an efficient endto-end neural model for question answering that is very competitive with existing models. We further demonstrate, that an extended version (FastQAExt) achieves state-of-the-art results on recent benchmark datasets, namely SQuAD, NewsQA and MsMARCO, outperforming most existing models. However, we show that increasing the complexity of FastQA to FastQAExt does not yield any systematic improvements. We argue that the same holds true for most existing systems that are similar to FastQAExt. A manual analysis reveals that our proposed heuristic explains most predictions of our model, which indicates that modeling a simple heuristic is enough to achieve strong performance on extractive QA datasets. The overall strong performance of FastQA puts results of existing, more complex models into perspective.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Sanda M. Harabagiu,et al.  LASSO: A Tool for Surfing the Answer Net , 1999, TREC.

[3]  Eriks Sneiders,et al.  Automated question answering: review of the main approaches , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[4]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[5]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[6]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[10]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[11]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[12]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[13]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[14]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[15]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[16]  Peng Li,et al.  Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering , 2016, ArXiv.

[17]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[18]  Zhiguo Wang,et al.  Multi-Perspective Context Matching for Machine Comprehension , 2016, ArXiv.

[19]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[20]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[21]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[22]  Ye Yuan,et al.  Words or Characters? Fine-grained Gating for Reading Comprehension , 2016, ICLR.

[23]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[24]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.