Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension

We formalize a new modular variant of current question answering tasks by enforcing complete independence of the document encoder from the question encoder. This formulation addresses a key challenge in machine comprehension by requiring a standalone representation of the document discourse. It additionally leads to a significant scalability advantage since the encoding of the answer candidate phrases in the document can be pre-computed and indexed offline for efficient retrieval. We experiment with baseline models for the new task, which achieve a reasonable accuracy but significantly underperform unconstrained QA models. We invite the QA research community to engage in Phrase-Indexed Question Answering (PIQA, pika) for closing the gap. The leaderboard is at: nlp.cs.washington.edu/piqa

[1]  Richard Socher,et al.  Efficient and Robust Question Answering from Minimal Context over Documents , 2018, ACL.

[2]  Kenton Lee,et al.  Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[3]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[4]  Jonathan Berant,et al.  Contextualized Word Representations for Reading Comprehension , 2017, NAACL.

[5]  Bhaskar Mitra,et al.  Neural Models for Information Retrieval , 2017, ArXiv.

[6]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[7]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[8]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[9]  William W. Cohen,et al.  Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[10]  Ping Li,et al.  Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS) , 2014, NIPS.

[11]  Yelong Shen,et al.  FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension , 2017, ICLR.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[16]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[17]  Wei Zhang,et al.  R3: Reinforced Reader-Ranker for Open-Domain Question Answering , 2017, ArXiv.

[18]  Rudolf Kadlec,et al.  Text Understanding with the Attention Sum Reader Network , 2016, ACL.

[19]  Kyunghyun Cho,et al.  Task-Oriented Query Reformulation with Reinforcement Learning , 2017, EMNLP.

[20]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[21]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[24]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[25]  Yuxing Peng,et al.  Reinforced Mnemonic Reader for Machine Comprehension , 2017 .

[26]  Ankur P. Parikh,et al.  Multi-Mention Learning for Reading Comprehension with Neural Cascades , 2017, ICLR.

[27]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[28]  Eunsol Choi,et al.  Coarse-to-Fine Question Answering for Long Documents , 2016, ACL.

[29]  Sanja Fidler,et al.  Order-Embeddings of Images and Language , 2015, ICLR.

[30]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Siu Cheung Hui,et al.  HyperQA: Hyperbolic Embeddings for Fast and Efficient Ranking of Question Answer Pairs , 2017 .

[33]  Ping Li,et al.  Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS) , 2014, UAI.

[34]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[35]  Ming Zhou,et al.  Reinforced Mnemonic Reader for Machine Reading Comprehension , 2017, IJCAI.

[36]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[37]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[38]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[39]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[40]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[41]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[42]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.