论文信息 - Vocab.at - Automatic Linked Data Documentation And Vocabulary Usage Analysis

Vocab.at - Automatic Linked Data Documentation And Vocabulary Usage Analysis

A growing number of Linked Data is being published as RDF data dumps, as RDFa embedded in HTML pages, and via SPARQL endpoints. Unfortunately, the data available is often poorly documented and the consistency of the datasets is unknown. Gaining an understanding of whether a dataset qualifies for the intended use can then be very time consuming and impede the re-use of the data. When considering quality as fitness of use, documentation is a key component for assessing data quality. The common practice today is to document Linked Data vocabularies that are used by Linked Data. However, this approach neglects documenting the actual vocabulary usage in the datasets. In contrast, this paper presents an novel approach for assessing the vocabulary usage in Linked Data. The method generates missing documentation automatically and complements this by analysing the usage of vocabularies in the datasets. The resulted documentation shows the explicit vocabulary usage, which is invaluable when assessing the consistency and usefulness of the data. This method has been evaluated by developing a web service http://vocab.at and applying the analysis to selected datasets on the web.

Tomi Kauppinen | Miika Alonen

[1] Amit P. Sheth,et al. Linked Data Is Merely More Data , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[2] Jürgen Umbrich,et al. An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[3] Christian Bizer,et al. Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[4] Tomi Kauppinen,et al. Linked Brazilian Amazon Rainforest Data , 2014, Semantic Web.

[5] Martin Hepp,et al. Using SPARQL and SPIN for Data Quality Management on the Semantic Web , 2010, BIS.

[6] Landong Zuo,et al. Tracing the provenance of linked data using voiD , 2011, WIMS '11.

[7] Michael Hausenblas,et al. Describing Linked Datasets , 2009, LDOW.

[8] Jeremy J. Carroll,et al. Matching RDF Graphs , 2002, SEMWEB.

[9] Olaf Hartig,et al. Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[10] Michael Hausenblas,et al. Describing linked datasets with the VoID vocabulary , 2011 .

[11] Thomas Redman,et al. Data quality for the information age , 1996 .