RDF Graph Alignment with Bisimulation

We investigate the problem of aligning two RDF databases, an essential problem in understanding the evolution of ontologies. Our approaches address three fundamental challenges: 1) the use of "blank" (null) names, 2) ontology changes in which different names are used to identify the same entity, and 3) small changes in the data values as well as small changes in the graph structure of the RDF database. We propose approaches inspired by the classical notion of graph bisimulation and extend them to capture the natural metrics of edit distance on the data values and the graph structure. We evaluate our methods on three evolving curated data sets. Overall, our results show that the proposed methods perform well and are scalable.

[1]  Xiaoqiu Huang A Lower Bound for the Edit-Distance Problem Under an Arbitrary Cost Function , 1988, Inf. Process. Lett..

[2]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[3]  Erhard Rahm,et al.  Frameworks for entity matching: A comparison , 2010, Data Knowl. Eng..

[4]  Axel Polleres,et al.  Everything you always wanted to know about blank nodes , 2014, J. Web Semant..

[5]  JusticeDerek,et al.  A Binary Linear Programming Formulation of the Graph Edit Distance , 2006 .

[6]  Yannis Tzitzikas,et al.  Tasks that Require, or can Benefit from, Matching Blank Nodes , 2014, ArXiv.

[7]  Vassilis Christophides,et al.  High-level change detection in RDF(S) KBs , 2013, TODS.

[8]  Martin Jones,et al.  IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels , 2008, Nucleic Acids Res..

[9]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[10]  Chak-Kuen Wong,et al.  Bounds for the String Editing Problem , 1976, JACM.

[11]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Anna Zhukova,et al.  Modeling sample variables with an Experimental Factor Ontology , 2010, Bioinform..

[13]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[14]  Alfred O. Hero,et al.  A binary linear programming formulation of the graph edit distance , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[16]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[17]  Vassilis Christophides,et al.  On Computing Deltas of RDF/S Knowledge Bases , 2011, TWEB.

[18]  Georg Lausen,et al.  Large-scale bisimulation of RDF graphs , 2013, SWIM '13.

[19]  Yannis Tzitzikas,et al.  Demonstrating Blank Node Matching and RDF/S Comparison Functions , 2012, International Semantic Web Conference.

[20]  Yongtang Shi,et al.  Fifty years of graph matching, network alignment and network comparison , 2016, Inf. Sci..