Data Quality Issues in Linked Open Data

The increasing diffusion of linked data as a standard way to share knowledge on the Web allows users and public and private organizations to fully exploit structured data from very large datasets that were not available in the past. Over the last few years, linked data developed into a large number of datasets with an open access from several domains leading to the linking open data (LOD) cloud. Similar to other types of information such as structured data, linked data suffers from quality problems such as inconsistency, inaccuracy , out-of-dateness, incompleteness, and inconsistency, which are frequent and imply serious limitations to the full exploitation of such data. Therefore, it is important to assess the quality of the datasets that are used in linked data applications before using them. The quality assessment allows users or applications to understand whether data is appropriate for their task at hand.

[1]  Chris F. Taylor,et al.  Survey-based naming conventions for use in OBO Foundry ontology development , 2009, BMC Bioinformatics.

[2]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[3]  Martin Hepp,et al.  Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[4]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[5]  Sebastian Speiser,et al.  ShareAlike Your Data: Self-referential Usage Policies for the Semantic Web , 2011, SEMWEB.

[6]  Michael Schroeder,et al.  GoWeb: a semantic search engine for the life science web , 2009, BMC Bioinformatics.

[7]  Enrico Motta,et al.  A framework for evaluating semantic metadata , 2007, K-CAP '07.

[8]  Axel Polleres,et al.  Robust and scalable Linked Data reasoning incorporating provenance and trust annotations , 2011, J. Web Semant..

[9]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[10]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[11]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[12]  Andrea Maurino,et al.  Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity , 2014, COLD.

[13]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[14]  Geoffrey Edwards,et al.  An ontology-based method for quality assessment of spatial data bases , 2004 .

[15]  Felix Naumann,et al.  Profiling linked open data with ProLOD , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[16]  Serena Villata,et al.  Licenses Compatibility and Composition in the Web of Data , 2012, COLD.

[17]  Vincenzo D'Andrea,et al.  Service License Composition and Compatibility Analysis , 2007, ICSOC.

[18]  Jens Lehmann,et al.  DeFacto - Deep Fact Validation , 2012, SEMWEB.

[19]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[20]  Christian Bizer,et al.  Quality-Driven Information Filtering- In the Context of Web-Based Information Systems , 2007 .

[21]  Deborah L. McGuinness,et al.  When owl: sameAs Isn't the Same: An Analysis of Identity in Linked Data , 2010, SEMWEB.

[22]  Jens Lehmann,et al.  Assessing Linked Data Mappings Using Network Measures , 2012, ESWC.

[23]  Jürgen Umbrich,et al.  Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources , 2010, LDOW.

[24]  Jens Lehmann,et al.  LODStats - An Extensible Framework for High-Performance Dataset Analytics , 2012, EKAW.

[25]  Nigel Shadbolt,et al.  Linked Timelines: Temporal Representation and Management in Linked Data , 2010, COLD.

[26]  Elena Paslaru Bontas Simperl,et al.  Labels in the Web of Data , 2011, SEMWEB.