论文信息 - Applying WebTables in Practice

Applying WebTables in Practice

We started investigating the collection of HTML tables on the Web and developed the WebTables system a few years ago [4]. Since then, our work has been motivated by applying WebTables in a broad set of applications at Google, resulting in several product launches. In this paper, we describe the challenges faced, lessons learned, and new insights that we gained from our eorts. The main challenges we faced in our eorts were (1) identifying tables that are likely to contain high-quality data (as opposed to tables used for navigation, layout, or formatting), and (2) recovering the semantics of these tables or signals that hint at their semantics. The result is a semantically enriched table corpus that we used to develop several services. First, we created a search engine for structured data whose index includes over a hundred million HTML tables. Second, we enabled users of Google Docs (through its Research Panel) to nd relevant data tables and to insert such data into their documents as needed. Most recently, we brought WebTables to a much broader audience by using the table corpus to provide richer tabular snippets for fact-seeking web search queries on Google.com.

[1] Surajit Chaudhuri,et al. Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers , 2014, Proc. VLDB Endow..

[2] Sunita Sarawagi,et al. Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[3] Rahul Gupta,et al. Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[4] Jun Zhao,et al. Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[5] Sunita Sarawagi,et al. Open-domain quantity queries on web tables: annotation, response, and consensus models , 2014, KDD.

[6] Christian S. Jensen,et al. Google fusion tables: data management, integration and collaboration in the cloud , 2010, SoCC '10.

[7] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[8] Jayant Madhavan,et al. Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[9] Daisy Zhe Wang,et al. WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[10] Surajit Chaudhuri,et al. InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[11] Mehryar Mohri,et al. Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[12] Hanan Samet,et al. Schema Extraction for Tabular Data on the Web , 2013, Proc. VLDB Endow..