NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing large, redundant, retrieval corpora or the parameters of large learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA. 1http://efficientqa.github.io/ 2https://neurips.cc/Conferences/2020/CompetitionTrack 1 ar X iv :2 10 1. 00 13 3v 1 [ cs .C L ] 1 J an 2 02 1

[1]  Edward A. Feigenbaum,et al.  Computers & thought , 1995 .

[2]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[3]  Mark Andrew Greenwood,et al.  Open-domain question answering , 2005 .

[4]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[5]  Jordan L. Boyd-Graber,et al.  Besting the Quiz Master: Crowdsourcing Incremental Classification Games , 2012, EMNLP.

[6]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[7]  Jordan L. Boyd-Graber,et al.  Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[8]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[9]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[10]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[11]  Kentaro Inui,et al.  What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[12]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[13]  Zachary C. Lipton,et al.  How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks , 2018, EMNLP.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[16]  Danqi Chen,et al.  A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Danqi Chen,et al.  Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , 2019, ArXiv.

[19]  Jordan L. Boyd-Graber,et al.  Quizbowl: The Case for Incremental Question Answering , 2019, ArXiv.

[20]  Shi Feng,et al.  What can AI do for me?: evaluating machine learning interpretations in cooperative play , 2019, IUI.

[21]  Shijie Chen,et al.  Technical report on Conversational Question Answering , 2019, ArXiv.

[22]  Michael Collins,et al.  Synthetic QA Corpora Generation with Roundtrip Consistency , 2019, ACL.

[23]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[24]  Ludovic Denoyer,et al.  Unsupervised Question Answering by Cloze Translation , 2019, ACL.

[25]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[26]  Shi Feng,et al.  Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples , 2019, Trans. Assoc. Comput. Linguistics.

[27]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[30]  Julian Michael,et al.  AmbigQA: Answering Ambiguous Open-domain Questions , 2020, EMNLP.

[31]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[32]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[33]  Sohee Yang,et al.  Is Retriever Merely an Approximator of Reader? , 2020, ArXiv.

[34]  Dmytro Okhonko,et al.  Unified Open-Domain Question Answering with Structured and Unstructured Knowledge , 2020, ArXiv.

[35]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[36]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[37]  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[38]  Sebastian Riedel,et al.  Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension , 2020, Transactions of the Association for Computational Linguistics.

[39]  Jordan L. Boyd-Graber What Question Answering can Learn from Trivia Nerds , 2019, ACL.

[40]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[41]  Kenton Lee,et al.  Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering , 2020, ACL.

[42]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[43]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[44]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[45]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[46]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[47]  Armen Aghajanyan,et al.  Pre-training via Paraphrasing , 2020, NeurIPS.

[48]  Nicola De Cao,et al.  A Memory Efficient Baseline for Open Domain Question Answering , 2020, ArXiv.

[49]  Yuxiang Wu,et al.  PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them , 2021, Transactions of the Association for Computational Linguistics.

[50]  Yelong Shen,et al.  UnitedQA: A Hybrid Approach for Open Domain Question Answering , 2021, ACL.

[51]  Xiaodong Liu,et al.  Posterior Differential Regularization with f-divergence for Improving Model Robustness , 2020, NAACL.

[52]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[53]  Pavel Smrz,et al.  Rethinking the Objectives of Extractive Question Answering , 2020, MRQA.

[54]  Edouard Grave,et al.  Distilling Knowledge from Reader to Retriever for Question Answering , 2020, ArXiv.

[55]  Minjoon Seo,et al.  Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering , 2021, NAACL.

[56]  Yelong Shen,et al.  Generation-Augmented Retrieval for Open-Domain Question Answering , 2020, ACL.

[57]  P. Smrz,et al.  Pruning the Index Contents for Memory Efficient Open-Domain QA , 2021, ArXiv.

[58]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[59]  Sebastian Riedel,et al.  Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets , 2020, EACL.