Is Retriever Merely an Approximator of Reader?

The state of the art in open-domain question answering (QA) relies on an efficient retriever that drastically reduces the search space for the expensive reader. A rather overlooked question in the community is the relationship between the retriever and the reader, and in particular, if the whole purpose of the retriever is just a fast approximation for the reader. Our empirical evidence indicates that the answer is no, and that the reader and the retriever are complementary to each other even in terms of accuracy only. We make a careful conjecture that the architectural constraint of the retriever, which has been originally intended for enabling approximate search, seems to also make the model more robust in large-scale search. We then propose to distill the reader into the retriever so that the retriever absorbs the strength of the reader while keeping its own benefit. Experimental results show that our method can enhance the document recall rate as well as the end-to-end QA accuracy of off-the-shelf retrievers in open-domain QA tasks.

[1]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[2]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[3]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[4]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[5]  Julian Michael,et al.  AmbigQA: Answering Ambiguous Open-domain Questions , 2020, EMNLP.

[6]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[7]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[8]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[9]  R. Socher,et al.  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[10]  Danqi Chen,et al.  Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering , 2019, ArXiv.

[11]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[12]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[13]  Danqi Chen,et al.  A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[14]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[15]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[16]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[17]  Xiaodong Liu,et al.  Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding , 2019, ArXiv.

[18]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[21]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[22]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[23]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[24]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Ulrich Paquet,et al.  Speeding up the Xbox recommender system using a euclidean transformation for inner-product spaces , 2014, RecSys '14.

[27]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[28]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[29]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[30]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[31]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[32]  Dik Lun Lee,et al.  Document Ranking and the Vector-Space Model , 1997, IEEE Softw..

[33]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .