Ad Hoc Table Retrieval using Semantic Similarity

We introduce and address the problem of ad hoc table retrieval: answering a keyword query with a ranked list of tables. This task is not only interesting on its own account, but is also being used as a core component in many other table-based information access scenarios, such as table completion or table mining. The main novel contribution of this work is a method for performing semantic matching between queries and tables. Specifically, we (i) represent queries and tables in multiple semantic spaces (both discrete sparse and continuous dense vector representations) and (ii) introduce various similarity measures for matching those semantic representations. We consider all possible combinations of semantic representations and similarity measures and use these as features in a supervised learning model. Using a purpose-built test collection based on Wikipedia tables, we demonstrate significant and substantial improvements over a state-of-the-art baseline.

[1]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[2]  Krisztian Balog,et al.  Design Patterns for Fusion-Based Object Retrieval , 2017, ECIR.

[3]  Craig MacDonald,et al.  Modelling User Preferences using Word Embeddings for Context-Aware Venue Recommendation , 2016, ArXiv.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Eric Crestan,et al.  Web-scale table census and classification , 2011, WSDM '11.

[6]  Tie-Yan Liu,et al.  Word-Entity Duet Representations for Document Ranking , 2017, SIGIR.

[7]  Jayant Madhavan,et al.  Structured Data on the Web , 2009, 2010 12th International Asia-Pacific Web Conference.

[8]  Craig MacDonald,et al.  On the usefulness of query features for learning to rank , 2012, CIKM.

[9]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[10]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[11]  Karl Aberer,et al.  Result selection and summarization for Web Table search , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[12]  Daisy Zhe Wang,et al.  Uncovering the Relational Web , 2008, WebDB.

[13]  Surajit Chaudhuri,et al.  InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[14]  Doug Downey,et al.  Methods for exploring and mining tables on Wikipedia , 2013, IDEA@KDD.

[15]  Sunita Sarawagi,et al.  Open-domain quantity queries on web tables: annotation, response, and consensus models , 2014, KDD.

[16]  Alessandra Mileo,et al.  Using linked data to mine RDF from wikipedia's tables , 2014, WSDM.

[17]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[18]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[19]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.

[20]  Krisztian Balog,et al.  Nordlys: A Toolkit for Entity-Oriented and Semantic Search , 2017, SIGIR.

[21]  Michael Granitzer,et al.  Towards Disambiguating Web Tables , 2013, SEMWEB.

[22]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[23]  Krisztian Balog,et al.  EntiTables: Smart Assistance for Entity-Focused Tables , 2017, SIGIR.

[24]  Meihui Zhang,et al.  InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables , 2013, SIGMOD '13.

[25]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[26]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[27]  Paolo Merialdo,et al.  Knowledge Base Augmentation using Tabular Data , 2014, LDOW.

[28]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[29]  Bhaskar Mitra,et al.  A Dual Embedding Space Model for Document Ranking , 2016, ArXiv.

[30]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[31]  Marie-Francine Moens,et al.  Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings , 2015, SIGIR.

[32]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[33]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[34]  James P. Callan,et al.  Scientific Table Search Using Keyword Queries , 2017, ArXiv.

[35]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[36]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[37]  Fabrizio Silvestri,et al.  Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search , 2015, SIGIR.

[38]  Jing Chen,et al.  An Empirical Study of Learning to Rank for Entity Search , 2016, SIGIR.

[39]  Heiko Paulheim,et al.  The Mannheim Search Join Engine , 2015, J. Web Semant..

[40]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[41]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[42]  Mandar Mitra,et al.  Word Embedding based Generalized Language Model for Information Retrieval , 2015, SIGIR.

[43]  Loredana Afanasiev,et al.  Harnessing the Deep Web: Present and Future , 2009, CIDR.

[44]  M. de Rijke,et al.  Short Text Similarity with Word Embeddings , 2015, CIKM.

[45]  M. de Rijke,et al.  Query modeling for entity search based on terms, categories, and examples , 2011, TOIS.

[46]  Zhengdong Lu,et al.  Neural Enquirer: Learning to Query Tables in Natural Language , 2016, IEEE Data Eng. Bull..

[47]  Oren Kurland,et al.  Document Retrieval Using Entity-Based Language Models , 2016, SIGIR.

[48]  Krisztian Balog,et al.  DBpedia-Entity v2: A Test Collection for Entity Search , 2017, SIGIR.

[49]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[50]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[51]  Wolfgang Lehner,et al.  Towards a Hybrid Imputation Approach Using Web Tables , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).