Ten Years of WebTables

In 2008, we wrote about WebTables, an effort to exploit the large and diverse set of structured databases casually published online in the form of HTML tables. The past decade has seen a flurry of research and commercial activities around the WebTables project itself, as well as the broad topic of informal online structured data. In this paper, we will review the WebTables project, and try to place it in the broader context of the decade of work that followed. We will also show how the progress over the past ten years sets up an exciting agenda for the future, and will draw upon many corners of the data management community. PVLDB Reference Format: Michael Cafarella, Alon Halevy, Hongrae Lee, Jayant Madhavan, Cong Yu, Daisy Zhe Wang, and Eugene Wu. Ten Years of WebTables. PVLDB, 11 (12): 2140-2149, 2018. DOI: https://doi.org/10.14778/3229863.3240492

[1]  Surajit Chaudhuri,et al.  InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[2]  Chao Liu,et al.  FACTO: a fact lookup engine based on web tables , 2011, WWW.

[3]  Daisy Zhe Wang,et al.  Uncovering the Relational Web , 2008, WebDB.

[4]  Richard Zanibbi,et al.  A survey of table recognition , 2004, Document Analysis and Recognition.

[5]  Daisy Zhe Wang,et al.  Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems , 2009, WebDB.

[6]  Wolfgang Gatterbauer,et al.  Towards domain-independent information extraction from web tables , 2007, WWW '07.

[7]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[8]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[9]  Yeye He,et al.  Automatic Discovery of Attribute Synonyms Using Query Logs and Table Corpora , 2016, WWW.

[10]  David R. Karger,et al.  Exhibit: lightweight structured data publishing , 2007, WWW '07.

[11]  Yeye He,et al.  TEGRA: Table Extraction by Global Record Alignment , 2015, SIGMOD Conference.

[12]  Christopher Ré,et al.  SLiMFast: Guaranteed Results for Data Fusion and Source Reliability , 2015, SIGMOD Conference.

[13]  Yalin Wang,et al.  A machine learning based approach for table detection on the web , 2002, WWW '02.

[14]  Jayant Madhavan,et al.  Harvesting relational tables from lists on the web , 2009, The VLDB Journal.

[15]  Alon Y. Halevy,et al.  Synthesizing Union Tables from the Web , 2013, IJCAI.

[16]  Meihui Zhang,et al.  InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables , 2013, SIGMOD '13.

[17]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[18]  Wolfgang Lehner,et al.  Building the Dresden Web Table Corpus: A Classification Approach , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[19]  Dominique Ritze,et al.  A Large Public Corpus of Web Tables containing Time and Context Metadata , 2016, WWW.

[20]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[21]  Beng Chin Ooi,et al.  A hybrid machine-crowdsourcing system for matching web tables , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[22]  Yeye He,et al.  Data services leveraging Bing's data assets , 2016, IEEE Data Eng. Bull..

[23]  Cong Yu,et al.  Knowledge Exploration using Tables on the Web , 2016, Proc. VLDB Endow..

[24]  Alon Y. Halevy,et al.  Data Publishing and Sharing using Fusion Tables , 2013, CIDR.

[25]  Jian Li,et al.  Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases , 2013, Proc. VLDB Endow..

[26]  Christopher Ré,et al.  Extracting Databases from Dark Data with DeepDive , 2016, SIGMOD Conference.

[27]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[28]  Sören Auer,et al.  The emerging web of linked data , 2011, ISWSA '11.

[29]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[30]  Rahul Gupta,et al.  Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[31]  Hsin-Hsi Chen,et al.  Mining Tables from Large Scale HTML Texts , 2000, COLING.

[32]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[33]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[34]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[35]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[36]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[37]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[38]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.

[39]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[40]  Zhe Chen,et al.  Long-tail Vocabulary Dictionary Extraction from the Web , 2016, WSDM.

[41]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[42]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .