Retrieval Augmented via Execution Guidance in Open-domain Table QA

The goal of the open-domain table QA task is to answer a question based on retrieving and extracting information from a large corpus of structured tables. Currently, the accuracy of the most popular framework in open-domain QA: the two-stage retrieval, is limited by the table retriever. Inspired by the research on Text-to-SQL, this paper proposes to use execution guidance to enhance the effect of table retrieval. Our contributions are mainly threefold: 1. Proposed using execution-guided method to enhance table retrieval to fully leveraging schema information of tables. 2. Proposed the pure Text-to-SQL task for open domains. We design a two-stage Table QA framework based on semantic parsing to generate logical forms and answers simultaneously. 3. Proposed an open-domain Text-to-SQL dataset: Open-domain WikiSQL. We change the original WikiSQL to become suitable for the Open-domain setting, by removing the approximate tables, decontextualizing the questions, etc. We conducted experiments on the new dataset using BM25 and DPR as the retriever, and HydraNet as the generator of SQL. The results show that the execute-guided significantly improves the table retrieval by 19% (DPR in hit@1) and achieves good performance (accuracy of logical form and execution improves by 12.7% and 13.1%) on end-to-end open-domain Text-to-SQL tasks as well.

[1]  Graham Neubig,et al.  Table Retrieval May Not Necessitate Table-specific Model Design , 2022, SUKI.

[2]  J. Hendler,et al.  End-to-End Table Question Answering via Retrieval-Augmented Generation , 2022, ArXiv.

[3]  Zhiguo Wang,et al.  Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering , 2021, ACL.

[4]  Mustafa Canim,et al.  CLTR: An End-to-End, Transformer-Based System for Cell-Level Table Retrieval and Table Question Answering , 2021, ACL.

[5]  Nicolas Rodolfo Fauceglia,et al.  Capturing Row and Column Semantics in Transformer Based Question Answering over Tables , 2021, NAACL.

[6]  Thomas Muller,et al.  Open Domain Question Answering over Tables via Dense Retrieval , 2021, NAACL.

[7]  Eunsol Choi,et al.  Decontextualization: Making Sentences Stand-Alone , 2021, Transactions of the Association for Computational Linguistics.

[8]  William W. Cohen,et al.  Open Question Answering over Tables and Text , 2020, ICLR.

[9]  Souvik Kundu,et al.  Hybrid Ranking Network for Text-to-SQL , 2020, ArXiv.

[10]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[11]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[12]  Kaushik Chakrabarti,et al.  X-SQL: reinforce schema representation with context , 2019, ArXiv.

[13]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[14]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[15]  Rishabh Singh,et al.  Robust Text-to-SQL Generation with Execution-Guided Decoding , 2018, 1807.03100.

[16]  Dawn Xiaodong Song,et al.  SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning , 2017, ArXiv.

[17]  R. Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2017, ArXiv.

[18]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[19]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.