Luzzu -- A Framework for Linked Data Quality Assessment

The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data, and subsequently to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This paper describes Luzzu, a framework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics, (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be reused within different semantic frameworks, (3) a scalable stream processor for data dumps and SPARQL endpoints, and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets with regard to relevant metrics.

[1]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[2]  Liviu Badea,et al.  Refining Concepts in Description Logics , 2000, Description Logics.

[3]  Arjan Durresi,et al.  A survey: Control plane scalability issues and approaches in Software-Defined Networking (SDN) , 2017, Comput. Networks.

[4]  Christoph Lange,et al.  daQ, an Ontology for Dataset Quality Information , 2014, LDOW.

[5]  Jens Lehmann,et al.  User-driven quality evaluation of DBpedia , 2013, I-SEMANTICS '13.

[6]  Harald Sack,et al.  Collaboratively Patching Linked Data , 2012, ArXiv.

[7]  Axel Polleres,et al.  Rapid prototyping of semantic mash-ups through semantic web pipes , 2009, WWW '09.

[8]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[9]  Christoph Lange,et al.  Luzzu Quality Metric Language - A DSL for Linked Data Quality Assessment , 2015, ArXiv.

[10]  Siegfried Handschuh,et al.  Processing Ubiquitous Personal Event Streams to Provide User-Controlled Support , 2013, WISE.

[11]  Christoph Lange,et al.  Quality Assessment of Linked Datasets Using Probabilistic Approximation , 2015, ESWC.

[12]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[13]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[14]  Jens Lehmann,et al.  DL-Learner: Learning Concepts in Description Logics , 2009, J. Mach. Learn. Res..

[15]  James Cheney,et al.  PROV-O: The PROV ontology:W3C recommendation 30 April 2013 , 2013 .

[16]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[17]  Edward Curry,et al.  The Role of Community-Driven Data Curation for Enterprises , 2010, Linking Enterprise Data.

[18]  Hervé Panetto,et al.  On the Move to Meaningful Internet Systems: OTM 2013 Workshops , 2013, Lecture Notes in Computer Science.

[19]  Arie van Deursen,et al.  Domain-specific languages: an annotated bibliography , 2000, SIGP.

[20]  Christoph Lange,et al.  Representing dataset quality metadata using multi-dimensional views , 2014, SEM '14.

[21]  Jens Lehmann,et al.  DBpedia and the live extraction of structured data from Wikipedia , 2012, Program.

[22]  E. Ziegel Juran's Quality Control Handbook , 1988 .

[23]  Javier Poncela,et al.  Importance of Big Data , 2015 .

[24]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[25]  Maria Kutar,et al.  Cognitive Dimensions of Notations: Design Tools for Cognitive Technology , 2001, Cognitive Technology.

[26]  A. Maurino,et al.  Quality Assessment Methodologies for Linked Open Data , 2012 .

[27]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[28]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[29]  Jens Lehmann,et al.  Managing the Life-Cycle of Linked Data with the LOD2 Stack , 2012, SEMWEB.

[30]  Jeremy J. Carroll,et al.  Named graphs, provenance and trust , 2005, WWW '05.

[31]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[32]  Alan F. Blackwell,et al.  CHAPTER 5 – Notational Systems—The Cognitive Dimensions of Notations Framework , 2003 .

[33]  Jens Lehmann,et al.  Assessing Linked Data Mappings Using Network Measures , 2012, ESWC.

[34]  Maria-Esther Vidal,et al.  Analyzing Linked Data Quality with LiQuate , 2013, ESWC.