Representing dataset quality metadata using multi-dimensional views

Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.

[1]  Philip B. Crosby,et al.  Quality Is Free: The Art of Making Quality Certain , 1979 .

[2]  Steffen Lohmann,et al.  gFacet: A Browser for the Web of Data , 2008, IMC-SSW@SAMT.

[3]  Craig A. Knoblock,et al.  Dealing with the Messiness of the Web of Data , 2012, J. Web Semant..

[4]  A. Maurino,et al.  Quality Assessment Methodologies for Linked Open Data , 2012 .

[5]  Lydia B. Chilton,et al.  Tabulator: Exploring and Analyzing linked data on the Semantic Web , 2006 .

[6]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[7]  Jeremy J. Carroll,et al.  Semantic Web Publishing using Named Graphs , 2004, ISWC Workshop on Trust, Security, and Reputation on the Semantic Web.

[8]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[9]  Christoph Lange,et al.  daQ, an Ontology for Dataset Quality Information , 2014, LDOW.

[10]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[11]  Jens Lehmann,et al.  Managing the Life-Cycle of Linked Data with the LOD2 Stack , 2012, SEMWEB.

[12]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[13]  Joseph Moses Juran,et al.  Quality-control handbook , 1951 .

[14]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[15]  Jens Lehmann,et al.  Linked Open Data Statistics: Collection and Exploitation , 2013, KESW.

[16]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[17]  Christian Bizer,et al.  Quality-Driven Information Filtering- In the Context of Web-Based Information Systems , 2007 .

[18]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[19]  Paul T. Durbin,et al.  Zen and the Art of Motorcycle Maintenance: An Inquiry into Values , 1977 .

[20]  Siegfried Handschuh,et al.  Ontology-based situation recognition for context-aware systems , 2013, I-SEMANTICS '13.

[21]  Andreas Harth,et al.  VisiNav: A system for visual search and navigation on web data , 2010, J. Web Semant..

[22]  Martin Hepp,et al.  Towards a vocabulary for data quality management in semantic web architectures , 2011, LWDM '11.