Web Table Extraction, Retrieval, and Augmentation: A Survey

Tables are a powerful and popular tool for organizing and manipulating data. A vast number of tables can be found on the Web, which represents a valuable knowledge resource. The objective of this survey is to synthesize and present two decades of research on web tables. In particular, we organize existing literature into six main categories of information access tasks: table extraction, table interpretation, table search, question answering, knowledge base augmentation, and table augmentation. For each of these tasks, we identify and describe seminal approaches, present relevant resources, and point out interdependencies among the different tasks.

[1]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[2]  Wolfgang Lehner,et al.  Building the Dresden Web Table Corpus: A Classification Approach , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[3]  Krisztian Balog,et al.  Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval , 2019, SIGIR.

[4]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[5]  Meihui Zhang,et al.  InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables , 2013, SIGMOD '13.

[6]  Zhe Chen,et al.  Automatic web spreadsheet data extraction , 2013, SS@ '13.

[7]  Heiko Paulheim,et al.  The Mannheim Search Join Engine , 2015, J. Web Semant..

[8]  Shriram Krishnamurthi,et al.  A type system for statically detecting spreadsheet errors , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[9]  Wolfgang Lehner,et al.  Towards a Hybrid Imputation Approach Using Web Tables , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[10]  Kugatsu Sadamitsu,et al.  Understanding the Semantic Structures of Tables with a Hybrid Deep Neural Network Architecture , 2017, AAAI.

[11]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[12]  Dominique Ritze,et al.  A Large Public Corpus of Web Tables containing Time and Context Metadata , 2016, WWW.

[13]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[14]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[15]  Avishek Anand,et al.  TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables , 2019, WWW.

[16]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[17]  Rahul Gupta,et al.  Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[18]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[19]  Carina F. Dorneles,et al.  Web table taxonomy and formalization , 2013, SGMD.

[20]  Ziqi Zhang,et al.  Effective and efficient Semantic Table Interpretation using TableMiner+ , 2017, Semantic Web.

[21]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[22]  Christian Bizer,et al.  Web table column categorisation and profiling , 2016, WebDB '16.

[23]  Paolo Merialdo,et al.  Knowledge Base Augmentation using Tabular Data , 2014, LDOW.

[24]  Jing Wang,et al.  Context Retrieval for Web Tables , 2015, ICTIR.

[25]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[26]  Yue Wang,et al.  Synthesizing Mapping Relationships Using Table Corpus , 2017, SIGMOD Conference.

[27]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[28]  Krisztian Balog,et al.  Design Patterns for Fusion-Based Object Retrieval , 2017, ECIR.

[29]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[30]  Yalin Wang,et al.  A machine learning based approach for table detection on the web , 2002, WWW '02.

[31]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[32]  Zhengdong Lu,et al.  Neural Enquirer: Learning to Query Tables in Natural Language , 2016, IEEE Data Eng. Bull..

[33]  Timothy W. Finin,et al.  Semantic Message Passing for Generating Linked Data from Tables , 1999, SEMWEB.

[34]  Sunita Sarawagi,et al.  Open-domain quantity queries on web tables: annotation, response, and consensus models , 2014, KDD.

[35]  Shuo Zhang SmartTable: Equipping Spreadsheets with Intelligent AssistanceFunctionalities , 2018, SIGIR.

[36]  M. de Rijke,et al.  Example Based Entity Search in the Web of Data , 2013, ECIR.

[37]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.

[38]  Marcin Sydow,et al.  Aspect-Based Similar Entity Search in Semantic Knowledge Graphs with Diversity-Awareness and Relaxation , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[39]  Krisztian Balog,et al.  Ad Hoc Table Retrieval using Semantic Similarity , 2018, WWW.

[40]  Alessandra Mileo,et al.  Using linked data to mine RDF from wikipedia's tables , 2014, WSDM.

[41]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[42]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[43]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[44]  Stephen Tyree,et al.  Parallel boosted regression trees for web search ranking , 2011, WWW.

[45]  Krisztian Balog,et al.  Recommending Related Tables , 2019, ArXiv.

[46]  Vasilis Efthymiou,et al.  Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings , 2017, SEMWEB.

[47]  Surajit Chaudhuri,et al.  InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[48]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.

[49]  Krisztian Balog,et al.  Auto-completion for Data Cells in Relational Tables , 2019, CIKM.

[50]  Yelong Shen,et al.  Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[51]  Fei Li,et al.  Constructing an Interactive Natural Language Interface for Relational Databases , 2014, Proc. VLDB Endow..

[52]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[53]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[54]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[55]  Krisztian Balog,et al.  On-the-fly Table Generation , 2018, SIGIR.

[56]  Lei Zou,et al.  Mapping Entity-Attribute Web Tables to Web-Scale Knowledge Bases , 2013, DASFAA.

[57]  Yalin Wang,et al.  Detecting Tables in HTML Documents , 2002, Document Analysis Systems.

[58]  Karl Aberer,et al.  Result selection and summarization for Web Table search , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[59]  H. V. Jagadish,et al.  NaLIX: an interactive natural language interface for querying XML , 2005, SIGMOD '05.

[60]  Guilin Qi,et al.  Entity Linking in Web Tables with Multiple Linked Knowledge Bases , 2016, JIST.

[61]  Wolfgang Lehner,et al.  From Web Tables to Concepts: A Semantic Normalization Approach , 2015, ER.

[62]  Eric Crestan,et al.  Web-scale table census and classification , 2011, WSDM '11.

[63]  Christopher Ré,et al.  Understanding Tables in Context Using Standard NLP Toolkits , 2013, ACL.

[64]  Doug Downey,et al.  Methods for exploring and mining tables on Wikipedia , 2013, IDEA@KDD.

[65]  Timothy W. Finin,et al.  Wikitology: a novel hybrid knowledge base derived from wikipedia , 2010 .

[66]  Somnath Banerjee,et al.  Learning to rank for quantity consensus queries , 2009, SIGIR.

[67]  Marcin Sydow,et al.  QBEES: query by entity examples , 2013, CIKM.

[68]  Renée J. Miller,et al.  Table Union Search on Open Data , 2018, Proc. VLDB Endow..

[69]  Dominique Ritze,et al.  Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases , 2016, WWW.

[70]  Wolfgang Lehner,et al.  Putting Web Tables into Context , 2016, KDIR.

[71]  Beng Chin Ooi,et al.  A hybrid machine-crowdsourcing system for matching web tables , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[72]  Cong Yu,et al.  Generating Titles for Web Tables , 2018, WWW.

[73]  Dominique Ritze,et al.  Matching Web Tables To DBpedia - A Feature Utility Study , 2017, EDBT.

[74]  Krisztian Balog,et al.  Nordlys: A Toolkit for Entity-Oriented and Semantic Search , 2017, SIGIR.

[75]  Krisztian Balog,et al.  EntiTables: Smart Assistance for Entity-Focused Tables , 2017, SIGIR.

[76]  Boualem Benatallah,et al.  Spreadsheet-based complex data transformation , 2011, CIKM '11.

[77]  Wolfgang Lehner,et al.  Column-specific context extraction for web tables , 2015, SAC.

[78]  Christian Bizer,et al.  Stitching Web Tables for Improving Matching Quality , 2017, Proc. VLDB Endow..

[79]  Gerhard Weikum,et al.  Making Sense of Entities and Quantities in Web Tables , 2016, CIKM.

[80]  Daisy Zhe Wang,et al.  Uncovering the Relational Web , 2008, WebDB.

[81]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI.

[82]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.