Table Retrieval May Not Necessitate Table-specific Model Design

Tables are an important form of structured data for both human and machine readers alike, providing answers to questions that cannot, or cannot easily, be found in texts. Recent work has designed special models and training paradigms for table-related tasks such as table-based question answering and table retrieval. Though effective, they add complexity in both modeling and data acquisition compared to generic text solutions and obscure which elements are truly beneficial. In this work, we focus on the task of table retrieval, and ask: “is table-specific model design necessary for table retrieval, or can a simpler text-based model be effectively used to achieve a similar result?’’ First, we perform an analysis on a table-based portion of the Natural Questions dataset (NQ-table), and find that structure plays a negligible role in more than 70% of the cases. Based on this, we experiment with a general Dense Passage Retriever (DPR) based on text and a specialized Dense Table Retriever (DTR) that uses table-specific model designs. We find that DPR performs well without any table-specific design and training, and even achieves superior results compared to DTR when fine-tuned on properly linearized tables. We then experiment with three modules to explicitly encode table structures, namely auxiliary row/column embeddings, hard attention masks, and soft relation-based attention biases. However, none of these yielded significant improvements, suggesting that table-specific model design may not be necessary for table retrieval.

[1]  Luheng He,et al.  TableFormer: Robust Transformer Modeling for Table-Text Encoding , 2022, ACL.

[2]  Dragomir R. Radev,et al.  UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models , 2022, EMNLP.

[3]  Eric Nyberg,et al.  Open Domain Question Answering with A Unified Knowledge Interface , 2021, ACL.

[4]  William W. Cohen,et al.  MATE: Multi-view Attention for Table Transformer Efficiency , 2021, EMNLP.

[5]  Matt Gardner,et al.  QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension , 2021, ACM Comput. Surv..

[6]  Qian Liu,et al.  TAPEX: Table Pre-training via Learning a Neural SQL Executor , 2021, ICLR.

[7]  Thomas Muller,et al.  DoT: An efficient Double Transformer for NLP tasks with tables , 2021, FINDINGS.

[8]  H. Iida,et al.  TABBIE: Pretrained Representations of Tabular Data , 2021, NAACL.

[9]  Pedro A. Szekely,et al.  Retrieving Complex Tables with Multi-Granular Graph Representation Learning , 2021, SIGIR.

[10]  Thomas Muller,et al.  Open Domain Question Answering over Tables via Dense Retrieval , 2021, NAACL.

[11]  Vicky Zayats,et al.  Representations for Question Answering from Documents with Tables and Text , 2021, EACL.

[12]  Xilun Chen,et al.  UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering , 2020, NAACL-HLT.

[13]  Fuzheng Zhang,et al.  Table Fact Verification with Structure-Aware Transformer , 2020, EMNLP.

[14]  Dongmei Zhang,et al.  TUTA: Tree-based Transformers for Generally Structured Table Pre-training , 2020, KDD.

[15]  Dragomir R. Radev,et al.  GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing , 2020, ICLR.

[16]  You Wu,et al.  TURL , 2020, Proc. VLDB Endow..

[17]  Brian D. Davison,et al.  Table Search Using a Deep Contextualized Language Model , 2020, SIGIR.

[18]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[19]  Julian Michael,et al.  AmbigQA: Answering Ambiguous Open-domain Questions , 2020, EMNLP.

[20]  Li Yang,et al.  ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[21]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, FINDINGS.

[22]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[23]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[24]  Daniel Deutch,et al.  Break It Down: A Question Understanding Benchmark , 2020, TACL.

[25]  Brian D. Davison,et al.  Leveraging Schema Labels to Enhance Dataset Search , 2020, ECIR.

[26]  Krisztian Balog,et al.  Web Table Extraction, Retrieval, and Augmentation , 2020, ACM Trans. Intell. Syst. Technol..

[27]  Jeff Heflin,et al.  Improved Table Retrieval Using Multiple Context Embeddings for Attributes , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[28]  Xiaodong Liu,et al.  RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers , 2019, ACL.

[29]  Pedro A. Szekely,et al.  Tabular Cell Classification Using Pre-Trained Cell Embeddings , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[30]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[31]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[32]  Krisztian Balog,et al.  Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval , 2019, SIGIR.

[33]  Ian Horrocks,et al.  Learning Semantic Annotations for Tabular Data , 2019, IJCAI.

[34]  Avishek Anand,et al.  TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables , 2019, WWW.

[35]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[36]  Wolfgang Lehner,et al.  Table Recognition in Spreadsheets via a Graph Representation , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[37]  Krisztian Balog,et al.  Ad Hoc Table Retrieval using Semantic Similarity , 2018, WWW.

[38]  R. Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2017, ArXiv.

[39]  Eduard H. Hovy,et al.  Tables as Semi-structured Knowledge for Question Answering , 2016, ACL.

[40]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[41]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[42]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[43]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[44]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[45]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[46]  Craig A. Knoblock,et al.  A Graph-Based Approach for Inferring Semantic Descriptions of Wikipedia Tables , 2021, SEMWEB.

[47]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[48]  Charles Sutton,et al.  ColNet: Embedding the Semantics of Web Tables for Column Type Prediction , 2018 .

[49]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.