Ad Hoc Table Retrieval using Intrinsic and Extrinsic Similarities

Given a keyword query, the ad hoc table retrieval task aims at retrieving a ranked list of the top-k most relevant tables in a given table corpus. Previous works have primarily focused on designing table-centric lexical and semantic features, which could be utilized for learning-to-rank (LTR) tables. In this work, we make a novel use of intrinsic (passage-based) and extrinsic (manifold-based) table similarities for enhanced retrieval. Using the WikiTables benchmark, we study the merits of utilizing such similarities for this task. To this end, we combine both similarity types via a simple, yet an effective, cascade re-ranking approach. Overall, our proposed approach results in a significantly better table retrieval quality, which even transcends that of strong semantically-rich baselines.

[1]  Kun Bai,et al.  TableSeer: automatic table metadata extraction and searching in digital libraries , 2007, JCDL '07.

[2]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[3]  Haggai Roitman,et al.  Utilizing Passages in Fusion-based Document Retrieval , 2019, ICTIR.

[4]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[5]  Maarten de Rijke,et al.  Manifold Learning for Rank Aggregation , 2018, WWW.

[6]  Chun Chen,et al.  Efficient manifold ranking for image retrieval , 2011, SIGIR.

[7]  Divesh Srivastava Schema extraction , 2010, CIKM '10.

[8]  Oren Kurland,et al.  Utilizing Passage-Based Language Models for Document Retrieval , 2008, ECIR.

[9]  Elena Simperl,et al.  Dataset search: a survey , 2019, The VLDB Journal.

[10]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.

[11]  J. Shane Culpepper,et al.  Fusion in Information Retrieval: SIGIR 2018 Half-Day Tutorial , 2018, SIGIR.

[12]  Krisztian Balog,et al.  Web Table Extraction, Retrieval and Augmentation , 2019, SIGIR.

[13]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[14]  Krisztian Balog,et al.  Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval , 2019, SIGIR.

[15]  Kun Bai,et al.  TableRank: A Ranking Algorithm for Table Search and Retrieval , 2007, AAAI.

[16]  Zhoujun Li,et al.  Content-Based Table Retrieval for Web Queries , 2017, ArXiv.

[17]  W. Bruce Croft,et al.  TINTIN: a system for retrieval in text tables , 1997, DL '97.

[18]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[19]  Krisztian Balog,et al.  Ad Hoc Table Retrieval using Semantic Similarity , 2018, WWW.

[20]  James P. Callan,et al.  Scientific Table Search Using Keyword Queries , 2017, ArXiv.

[21]  J. Guo,et al.  Recommending Diverse and Relevant Queries with A Manifold Ranking Based Approach , 2010 .

[22]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[23]  Mathias Géry,et al.  BM25t: a BM25 extension for focused information retrieval , 2012, Knowledge and Information Systems.

[24]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[25]  Doug Downey,et al.  Methods for exploring and mining tables on Wikipedia , 2013, IDEA@KDD.

[26]  Oren Kurland The Cluster Hypothesis in Information Retrieval , 2014, ECIR.

[27]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[28]  Jimmy J. Lin,et al.  A cascade ranking model for efficient ranked retrieval , 2011, SIGIR.

[29]  Xiaojun Wan,et al.  Towards a unified approach to document similarity search using manifold-ranking of blocks , 2008, Inf. Process. Manag..

[30]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[31]  Haggai Roitman,et al.  An Extended Query Performance Prediction Framework Utilizing Passage-Level Information , 2018, ICTIR.

[32]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[33]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[34]  W. Bruce Croft,et al.  Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval , 2016, ECIR.