Automatically Generating Government Linked Data from Tables

Most open government data is encoded and published in structured tables found in reports, on the Web, and in spreadsheets or databases. Current approaches to generating Semantic Web representations from such data requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table’s meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. We describe techniques grounded in graphical models and probabilistic reasoning to infer meaning (semantics) associated with a table using background knowledge from the Linked Open Data cloud. We represent a table’s meaning by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.

[1]  Satya S. Sahoo,et al.  A Survey of Current Approaches for Mapping of Relational Databases to RDF , 2009 .

[2]  Sören Auer,et al.  The emerging web of linked data , 2011, ISWSA '11.

[3]  Tim Finin,et al.  Exploiting a Web of Semantic Data for Interpreting Tables , 2010 .

[4]  James A. Hendler,et al.  TWC data-gov corpus: incrementally generating linked government data from data.gov , 2010, WWW '10.

[5]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[6]  Haixun Wang,et al.  Towards a Probabilistic Taxonomy of Many Concepts , 2011 .

[7]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[8]  Timothy W. Finin,et al.  T2LD: Interpreting and Representing Tables as Linked Data , 2010, SEMWEB.

[9]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[10]  Timothy W. Finin,et al.  RDF123: From Spreadsheets to RDF , 2008, SEMWEB.

[11]  Timothy W. Finin,et al.  Creating and Exploiting a Hybrid Knowledge Base for Linked Data , 2010, ICAART.

[12]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[13]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[14]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[15]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[16]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[17]  Wolfram Wöß,et al.  XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL , 2009, SEMWEB.

[18]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .