High-level change detection in RDF(S) KBs

With the increasing use of Web 2.0 to create, disseminate, and consume large volumes of data, more and more information is published and becomes available for potential data consumers, that is, applications/services, individual users and communities, outside their production site. The most representative example of this trend is Linked Open Data (LOD), a set of interlinked data and knowledge bases. The main challenge in this context is data governance within loosely coordinated organizations that are publishing added-value interlinked data on the Web, bringing together issues related to data management and data quality, in order to support the full lifecycle of data production, consumption, and management. In this article, we are interested in curation issues for RDF(S) data, which is the default data model for LOD. In particular, we are addressing change management for RDF(S) data maintained by large communities (scientists, librarians, etc.) which act as curators to ensure high quality of data. Such curated Knowledge Bases (KBs) are constantly evolving for various reasons, such as the inclusion of new experimental evidence or observations, or the correction of erroneous conceptualizations. Managing such changes poses several research problems, including the problem of detecting the changes (delta) between versions of the same KB developed and maintained by different groups of curators, a crucial task for assisting them in understanding the involved changes. This becomes all the more important as curated KBs are interconnected (through copying or referencing) and thus changes need to be propagated from one KB to another either within or across communities. This article addresses this problem by proposing a change language which allows the formulation of concise and intuitive deltas. The language is expressive enough to describe unambiguously any possible change encountered in curated KBs expressed in RDF(S), and can be efficiently and deterministically detected in an automated way. Moreover, we devise a change detection algorithm which is sound and complete with respect to the aforementioned language, and study appropriate semantics for executing the deltas expressed in our language in order to move backwards and forwards in a multiversion repository, using only the corresponding deltas. Finally, we evaluate through experiments the effectiveness and efficiency of our algorithms using real ontologies from the cultural, bioinformatics, and entertainment domains.

[1]  York Sure,et al.  SemVersion: A Versioning System for RDF and Ontologies , 2005 .

[2]  Carlo Curino,et al.  Graceful database schema evolution: the PRISM workbench , 2008, Proc. VLDB Endow..

[3]  Jiao Tao,et al.  Extending OWL with Integrity Constraints , 2010, Description Logics.

[4]  Jay Banerjee,et al.  Semantics and implementation of schema evolution in object-oriented databases , 1987, SIGMOD '87.

[5]  Vassilis Christophides,et al.  Containment and Minimization of RDF/S Query Patterns , 2005, SEMWEB.

[6]  James Cheney,et al.  Curated databases , 2008, PODS.

[7]  François Scharffe,et al.  Data Linking for the Semantic Web , 2011, Int. J. Semantic Web Inf. Syst..

[8]  Gilbert Paquette,et al.  Managing ontology changes on the semantic Web , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[9]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[10]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[11]  Anna Fensel,et al.  Community-Driven Ontology Evolution: Gene Ontology Case Study , 2008, BIS.

[12]  Enrico Franconi,et al.  Semantic diff as the basis for knowledge base versioning , 2010 .

[13]  Dominique Rieu,et al.  Schema Evolution in Object-Oriented Database Systems , 1989, Data Knowl. Eng..

[14]  Vassilis Christophides,et al.  On Computing Deltas of RDF/S Knowledge Bases , 2011, TWEB.

[15]  DoerrMartin The CIDOC conceptual reference module , 2003 .

[16]  Florian Schmedding,et al.  Incremental SPARQL Evaluation for Query Answering on Linked Data , 2011, COLD.

[17]  Olga De Troyer,et al.  Understanding ontology evolution: A change detection approach , 2007, J. Web Semant..

[18]  Anna V. Zhdanova,et al.  Community-Driven Ontology Evolution: Gene Ontology Case Study , 2008, BIS 2008.

[19]  Vassilis Christophides,et al.  On Detecting High-Level Changes in RDF/S KBs , 2009, SEMWEB.

[20]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[21]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[22]  Vassilis Christophides,et al.  Optimizing taxonomic semantic web queries using labeling schemes , 2004, J. Web Semant..

[23]  Georg Lausen,et al.  SPARQLing constraints for RDF , 2008, EDBT '08.

[24]  Barbara Lerner,et al.  A model for compound type changes encountered in schema evolution , 2000, TODS.

[25]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[26]  Mariano P. Consens,et al.  Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL , 2010 .

[27]  Jakub Simko,et al.  Data linking for the Semantic Web , 2015 .

[28]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[29]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[31]  Michel C. A. Klein,et al.  Change Management for Distributed Ontologies , 2004 .

[32]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[33]  Olga De Troyer,et al.  Ontology Change Detection Using a Version Log , 2005, SEMWEB.

[34]  Grigoris Antoniou,et al.  Ontology change: classification and survey , 2008, The Knowledge Engineering Review.

[35]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[36]  Mark A. Musen,et al.  Promptdiff: a fixed-point algorithm for comparing ontology versions , 2002, AAAI/IAAI.

[37]  Mark A. Musen,et al.  A Framework for Ontology Evolution in Collaborative Environments , 2006, SEMWEB.

[38]  Stanley B. Zdonik,et al.  The management of changing types in an object-oriented database , 1986, OOPLSA '86.

[39]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[40]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[41]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[42]  Frank Wolter,et al.  Can You Tell the Difference Between DL-Lite Ontologies? , 2008, KR.

[43]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[44]  Judith A. Blake,et al.  Gene Ontology annotations: what they mean and where they come from , 2008, BMC Bioinformatics.

[45]  Ian Horrocks,et al.  From SHIQ and RDF to OWL: the making of a Web Ontology Language , 2003, J. Web Semant..

[46]  M. Tamer Özsu,et al.  An axiomatic model of dynamic schema evolution in objectbase systems , 1997, TODS.

[47]  Ljiljana Stojanovic,et al.  Methods and tools for ontology evolution , 2004 .

[48]  Boris Konev,et al.  The Logical Difference Problem for Description Logic Terminologies , 2008, IJCAR.

[49]  Sören Auer,et al.  A Versioning and Evolution Framework for RDF Knowledge Bases , 2006, Ershov Memorial Conference.

[50]  Boris Motik,et al.  User-Driven Ontology Evolution Management , 2002, EKAW.