Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models

Open-domain extractive question answering works well on textual data by first retrieving candidate texts and then extracting the answer from those candidates. However, some questions cannot be answered by text alone but require information stored in tables. In this paper, we present an approach for retrieving both texts and tables relevant to a question by jointly encoding texts, tables and questions into a single vector space. To this end, we create a new multi-modal dataset based on text and table datasets from related work and compare the retrieval performance of different encoding schemata. We find that dense vector embeddings of transformer models outperform sparse embeddings on four out of six evaluation datasets. Comparing different dense embedding models, tri-encoders with one encoder for each question, text and table increase retrieval performance compared to bi-encoders with one encoder for the question and one for both text and tables. We release the newly created multi-modal dataset to the community so that it can be used for training and evaluation.

[1]  Krisztian Balog,et al.  Ad Hoc Table Retrieval using Semantic Similarity , 2018, WWW.

[2]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[3]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[4]  Mustafa Canim,et al.  Ad Hoc Table Retrieval using Intrinsic and Extrinsic Similarities , 2020, WWW.

[5]  Thomas Muller,et al.  Open Domain Question Answering over Tables via Dense Retrieval , 2021, NAACL.

[6]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[7]  Brian D. Davison,et al.  Table Search Using a Deep Contextualized Language Model , 2020, SIGIR.

[8]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[9]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10]  Mustafa Canim,et al.  CLTR: An End-to-End, Transformer-Based System for Cell-Level Table Retrieval and Table Question Answering , 2021, ACL.

[11]  Ebrahim Bagheri,et al.  A Latent Model for Ad Hoc Table Retrieval , 2020, ECIR.

[12]  Mustafa Canim,et al.  Web Table Retrieval using Multimodal Deep Learning , 2020, SIGIR.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Zhiguo Wang,et al.  Dual Reader-Parser on Hybrid Textual and Tabular Evidence for Open Domain Question Answering , 2021, ACL.

[15]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[16]  Krisztian Balog,et al.  Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval , 2019, SIGIR.

[17]  Thomas Muller,et al.  Understanding tables with intermediate pre-training , 2020, FINDINGS.

[18]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[19]  Wenhu Chen,et al.  Open Question Answering over Tables and Text , 2020, ArXiv.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[22]  You Wu,et al.  TURL , 2020, Proc. VLDB Endow..

[23]  Jiafeng Guo,et al.  Semantic Models for the First-Stage Retrieval: A Comprehensive Review , 2021, ACM Trans. Inf. Syst..

[24]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[25]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[26]  Thomas Muller,et al.  TaPas: Weakly Supervised Table Parsing via Pre-training , 2020, ACL.

[27]  Jonathan Berant,et al.  MultiModalQA: Complex Question Answering over Text, Tables and Images , 2021, ICLR.