Profiling the Web of Data

The Web of Data contains a large number of openly-available datasets covering a wide variety of topics. In order to benefit from this massive amount of open data such external datasets must be analyzed and understood already at the basic level of data types, constraints, value patterns, etc. For Linked Datasets such meta information is currently very limited or not available at all. Data profiling techniques are needed to compute respective statistics and meta information. However, current state of the art approaches can either not be applied to Linked Data, or exhibit considerable performance problems. This paper presents my doctoral research which tackles these problems.

[1]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[2]  Felix Naumann,et al.  Data profiling revisited , 2014, SGMD.

[3]  Axel Polleres,et al.  OWL: Yet to arrive on the Web of Data? , 2012, LDOW.

[4]  Jens Lehmann,et al.  LODStats - An Extensible Framework for High-Performance Dataset Analytics , 2012, EKAW.

[5]  Huiying Li,et al.  Data Profiling for Semantic Web Data , 2012, WISM.

[6]  Felix Naumann,et al.  Profiling linked open data with ProLOD , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[7]  Felix Naumann,et al.  LODOP - Multi-Query Optimization for Linked Data Profiling Queries , 2014, PROFILES@ESWC.

[8]  Wolfram Wöß,et al.  RDFStats - An Extensible RDF Statistics Generator and Library , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[9]  Felix Naumann,et al.  Creating voiD descriptions for Web-scale data , 2011, J. Web Semant..