Luzzu—A Methodology and Framework for Linked Data Quality Assessment

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics; (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) scalable dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.

[1]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[2]  E. Ziegel Juran's Quality Control Handbook , 1988 .

[3]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[4]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[5]  Jens Lehmann,et al.  Managing the Life-Cycle of Linked Data with the LOD2 Stack , 2012, SEMWEB.

[6]  Nám . W. Churchilla Visualizing RDF Data Cubes using the Linked Data Visualization Model , 2014 .

[7]  Christoph Lange,et al.  Luzzu -- A Framework for Linked Data Quality Assessment , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[8]  Christian Bizer,et al.  Quality-driven information filtering using the WIQA policy framework , 2009, J. Web Semant..

[9]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[10]  Rik Van de Walle,et al.  Querying Datasets on the Web with High Availability , 2014, SEMWEB.

[11]  Christoph Lange,et al.  Representing dataset quality metadata using multi-dimensional views , 2014, SEM '14.

[12]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[13]  Olaf Hartig Specification for tSPARQL , 2008 .

[14]  Martin Necaský,et al.  Visualizing RDF Data Cubes Using the Linked Data Visualization Model , 2014, ESWC.

[15]  O. Hartig Trustworthiness of Data on the Web , 2008 .

[16]  Carlo Batini,et al.  A Framework And A Methodology For Data Quality Assessment And Monitoring , 2007, ICIQ.

[17]  Harald Sack,et al.  Collaboratively Patching Linked Data , 2012, ArXiv.

[18]  Yolanda Gil,et al.  TRELLIS: An Interactive Tool for Capturing Information Analysis and Decision Making , 2002, EKAW.

[19]  Maria-Esther Vidal,et al.  Analyzing Linked Data Quality with LiQuate , 2013, ESWC.

[20]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[21]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[22]  Anisa Rula,et al.  Methodology for Assessment of Linked Data Quality , 2014, LDQ@SEMANTICS.

[23]  Edward Curry,et al.  The Role of Community-Driven Data Curation for Enterprises , 2010, Linking Enterprise Data.

[24]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[25]  Jens Lehmann,et al.  User-driven quality evaluation of DBpedia , 2013, I-SEMANTICS '13.

[26]  Anja Jentzsch Linked Open Data Cloud , 2014 .

[27]  Jens Lehmann,et al.  Assessing Linked Data Mappings Using Network Measures , 2012, ESWC.

[28]  Giorgos Flouris,et al.  A Diagnosis and Repair Framework for DL-LiteA KBs , 2015, DIACRON@ESWC.

[29]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[30]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[31]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[32]  Michael Martin,et al.  Facilitating the Exploration and Visualization of Linked Data , 2014, Linked Open Data.

[33]  Sören Auer,et al.  A systematic review of open government data initiatives , 2015, Gov. Inf. Q..

[34]  Maria Kutar,et al.  Cognitive Dimensions of Notations: Design Tools for Cognitive Technology , 2001, Cognitive Technology.

[35]  Christoph Lange,et al.  Quality Assessment of Linked Datasets Using Probabilistic Approximation , 2015, ESWC.

[36]  Sören Auer,et al.  Linked Open Data -- Creating Knowledge Out of Interlinked Data , 2014, Lecture Notes in Computer Science.