Completeness Management for RDF Data Sources

The Semantic Web is commonly interpreted under the open-world assumption, meaning that information available (e.g., in a data source) captures only a subset of the reality. Therefore, there is no certainty about whether the available information provides a complete representation of the reality. The broad aim of this article is to contribute a formal study of how to describe the completeness of parts of the Semantic Web stored in RDF data sources. We introduce a theoretical framework allowing augmentation of RDF data sources with statements, also expressed in RDF, about their completeness. One immediate benefit of this framework is that now query answers can be complemented with information about their completeness. We study the impact of completeness statements on the complexity of query answering by considering different fragments of the SPARQL language, including the RDFS entailment regime, and the federated scenario. We implement an efficient method for reasoning about query completeness and provide an experimental evaluation in the presence of large sets of completeness statements.

[1]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[2]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[3]  Martin Hepp,et al.  Towards a vocabulary for data quality management in semantic web architectures , 2011, LWDM '11.

[4]  Antoni Ligeza Logical Foundations for Rule-Based Systems, 2nd Ed , 2006, Studies in Computational Intelligence.

[5]  Werner Nutt,et al.  Expressing No-Value Information in RDF , 2015, International Semantic Web Conference.

[6]  Claudio Gutiérrez,et al.  NautiLOD: A Formal Language for the Web of Data Graph , 2015, TWEB.

[7]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[8]  Amihai Motro,et al.  Integrity = validity + completeness , 1989, TODS.

[9]  Carlo Batini,et al.  Data and Information Quality , 2016, Data-Centric Systems and Applications.

[10]  Werner Nutt,et al.  Enabling Fine-Grained RDF Data Completeness Assessment , 2016, ICWE.

[11]  Werner Nutt,et al.  CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources , 2014, ESWC.

[12]  Irina Perfilieva,et al.  Logical foundations of rule-based systems , 2006, Fuzzy Sets Syst..

[13]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[14]  Carsten Lutz,et al.  Ontology-Based Data Access with Closed Predicates is Inherently Intractable(Sometimes) , 2013, IJCAI.

[15]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[16]  Simon Razniewski,et al.  Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities , 2016 .

[17]  Jorge Pérez,et al.  Simple and Efficient Minimal RDFS , 2009, J. Web Semant..

[18]  Werner Nutt,et al.  COOL-WD: A Completeness Tool for Wikidata , 2017, International Semantic Web Conference.

[19]  Mariano P. Consens,et al.  Linked Movie Data Base , 2009, LDOW.

[20]  Pedro Rangel Henriques,et al.  An Ontology-Based Approach for Data Cleaning , 2006, ICIQ.

[21]  Deborah L. McGuinness,et al.  PROV-O: The PROV Ontology , 2013 .

[22]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[23]  Magdalena Ortiz,et al.  Closed Predicates in Description Logics: Results on Combined Complexity , 2016, AMW.

[24]  Jörg Hoffmann,et al.  A New Method to Index and Query Sets , 1999, IJCAI.

[25]  Abraham Bernstein,et al.  Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL , 2009, ESWC.

[26]  Mariano P. Consens,et al.  Extended Property Paths: Writing More SPARQL Queries in a Succinct Way , 2015, AAAI.

[27]  Werner Nutt,et al.  Completeness Statements about RDF Data Sources and Their Use for Query Answering , 2013, SEMWEB.

[28]  Stefan Schlobach,et al.  LOD Laundromat: A Uniform Way of Publishing Other People's Dirty Data , 2014, SEMWEB.

[29]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[30]  Axel Polleres,et al.  Everything you always wanted to know about blank nodes , 2014, J. Web Semant..

[31]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[32]  Carlo Batini,et al.  Data and Information Quality , 2016, Data-Centric Systems and Applications.

[33]  Marcelo Arenas,et al.  On the Semantics of SPARQL , 2009, Semantic Web Information Management.

[34]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[35]  Olaf Hartig,et al.  A Context-Based Semantics for SPARQL Property Paths over the Web (Extended Version) , 2015, ESWC.

[36]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[37]  Jorge Pérez,et al.  Static analysis and optimization of semantic web queries , 2012, PODS '12.

[38]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[39]  Alon Y. Halevy,et al.  Obtaining Complete Answers from Incomplete Databases , 1996, VLDB.

[40]  Ian Horrocks,et al.  How Incomplete Is Your Semantic Web Reasoner? , 2010, AAAI.

[41]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[42]  Michael Hausenblas,et al.  Describing linked datasets with the VoID vocabulary , 2011 .

[43]  Muhammad Saleem,et al.  LSQ: The Linked SPARQL Queries Dataset , 2015, SEMWEB.

[44]  Óscar Corcho,et al.  Federating queries in SPARQL 1.1: Syntax, semantics and evaluation , 2013, J. Web Semant..

[45]  Zachary G. Ives,et al.  The Future of Data Integration , 2012 .

[46]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[47]  V. S. Subrahmanian,et al.  Scaling RDF with Time , 2008, WWW.

[48]  Sven Helmer,et al.  A performance study of four index structures for set-valued attributes of low cardinality , 2003, The VLDB Journal.

[49]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[50]  Aidan Hogan,et al.  Skolemising Blank Nodes while Preserving Isomorphism , 2015, WWW.

[51]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[52]  Martin Hepp,et al.  Using SPARQL and SPIN for Data Quality Management on the Semantic Web , 2010, BIS.

[53]  Umberto Straccia,et al.  AnQL: SPARQLing Up Annotated RDFS , 2010, SEMWEB.

[54]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[55]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language Primer (Second Edition) , 2012 .

[56]  J. Bruijn,et al.  Effective query rewriting with ontologies over DBoxes , 2009, IJCAI 2009.

[57]  Sebastian Speiser,et al.  On Completeness Classes for Query Evaluation on Linked Data , 2012, AAAI.

[58]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[59]  Ian Horrocks,et al.  How Incomplete Is Your Semantic Web Reasoner? , 2010, AAAI.

[60]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[61]  Egor V. Kostylev,et al.  Beyond Well-designed SPARQL , 2016, ICDT.

[62]  D. Hilbert Mathematical Problems , 2019, Mathematics: People · Problems · Results.

[63]  Bill McMullen,et al.  Big data, big data quality problem , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[64]  Olaf Hartig,et al.  SPARQL with property paths on the Web , 2017, Semantic Web.

[65]  Simon Razniewski,et al.  Cardinal Virtues: Extracting Relation Cardinalities from Text , 2017, ACL.

[66]  Carsten Lutz,et al.  Ontology-Mediated Queries with Closed Predicates , 2015, IJCAI.

[67]  Maribel Acosta,et al.  HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowdsourcing , 2017, J. Web Semant..

[68]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[69]  Werner Nutt,et al.  Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinality Information , 2016, SEMWEB.

[70]  Olaf Hartig Provenance Information in the Web of Data , 2009, LDOW.

[71]  Yolanda Gil,et al.  A survey of trust in computer science and the Semantic Web , 2007, J. Web Semant..

[72]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[73]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[74]  Dan Watt,et al.  Quality Assessment , 2009, Encyclopedia of Database Systems.

[75]  Iztok Savnik,et al.  Index Data Structure for Fast Subset and Superset Queries , 2013, CD-ARES.

[76]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[77]  Peter F. Patel-Schneider,et al.  Using Description Logics for RDF Constraint Checking and Closed-World Recognition , 2014, AAAI.

[78]  Werner Nutt,et al.  Managing and Consuming Completeness Information for Wikidata Using COOL-WD , 2016, COLD@ISWC.

[79]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[80]  Claudio Gutiérrez,et al.  Temporal RDF , 2005, ESWC.