Ontology-based Entity Matching in Attributed Graphs

Keys for graphs incorporate the topology and value constraints needed to uniquely identify entities in a graph. They have been studied to support object identification, knowledge fusion, and social network reconciliation. Existing key constraints identify entities as the matches of a graph pattern by subgraph isomorphism, which enforce label equality on node types. These constraints can be too restrictive to characterize structures and node labels that are syntactically different but semantically equivalent. We propose a new class of key constraints, Ontological Graph Keys (OGKs) that extend conventional graph keys by ontological subgraph matching between entity labels and an external ontology. We show that the implication and validation problems for OGKs are each NP-complete. To reduce the entity matching cost, we also provide an algorithm to compute a minimal cover for OGKs. We then study the entity matching problem with OGKs, and a practical variant with a budget on the matching cost. We develop efficient algorithms to perform entity matching based on a (budgeted) Chase procedure. Using real-world graphs, we experimentally verify the efficiency and accuracy of OGK-based entity matching. PVLDB Reference Format: Hanchao Ma, Morteza Alipourlangouri, Yinghui Wu, Fei Chiang, Jiaxing Pi. Ontology-based Entity Matching in Attributed Graphs. PVLDB, 12(10): 1195-1207, 2019. DOI: https://doi.org/10.14778/3339490.3339501

[1]  Jian Li,et al.  Cleaning Relations Using Knowledge Bases , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[2]  Marc Gyssens,et al.  Implication and Axiomatization of Functional Constraints on Patterns with an Application to the RDF Data Model , 2014, FoIKS.

[3]  Andreas Björklund,et al.  The traveling salesman problem in bounded degree graphs , 2012, TALG.

[4]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[5]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[6]  Marcelo Arenas,et al.  A normal form for XML documents , 2004, TODS.

[7]  Yinghui Wu,et al.  Functional Dependencies for Graphs , 2016, SIGMOD Conference.

[8]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[9]  Yinghui Wu,et al.  Discovering Graph Patterns for Fact Checking in Knowledge Graphs , 2018, DASFAA.

[10]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[11]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[12]  Georg Gottlob,et al.  Datalog±: a unified approach to ontologies and integrity constraints , 2009, ICDT '09.

[13]  Hector Garcia-Molina,et al.  Evaluating entity resolution results , 2010, Proc. VLDB Endow..

[14]  Rijie Zhao,et al.  Discovering Mis-Categorized Entities , 2018 .

[15]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[16]  Alvaro Cortés-Calabuig,et al.  Constraints in RDF , 2010, SDKB.

[17]  Alvaro Cortés-Calabuig,et al.  Semantics of Constraints in RDFS , 2012, AMW.

[18]  H. Chertkow,et al.  Semantic memory , 2002, Current neurology and neuroscience reports.

[19]  Maria Pershina,et al.  Holistic entity matching across knowledge graphs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[20]  Troels Andreasen,et al.  Perspectives on ontology‐based querying , 2007, Int. J. Intell. Syst..

[21]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[22]  Ping Lu,et al.  Dependencies for Graphs , 2017, PODS.

[23]  Boris Motik,et al.  Adding Integrity Constraints to OWL , 2007, OWLED.

[24]  Chao Tian,et al.  Keys for Graphs , 2015, Proc. VLDB Endow..

[25]  Vasilis Efthymiou,et al.  Benchmarking Blocking Algorithms for Web Entities , 2020, IEEE Transactions on Big Data.

[26]  Marc Gyssens,et al.  Implication and axiomatization of functional and constant constraints , 2015, Annals of Mathematics and Artificial Intelligence.

[27]  Tom M. Mitchell,et al.  Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction , 2015, EMNLP.

[28]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[29]  Jaroslaw Szlichta,et al.  Efficient Discovery of Ontology Functional Dependencies , 2016, CIKM.

[30]  Jeff Heflin,et al.  Extending Functional Dependency to Detect Abnormal Data in RDF Graphs , 2011, SEMWEB.

[31]  Amnon Shashua,et al.  Probabilistic graph and hypergraph matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Nathalie Pernelle,et al.  VICKEY: Mining Conditional Keys on Knowledge Bases , 2017, SEMWEB.

[33]  Jared Freeman,et al.  Learning and Detecting Patterns in Multi-Attributed Network Data , 2012, AAAI Fall Symposium: Social Networks and Social Contagion.

[34]  Nathalie Pernelle,et al.  Defining Key Semantics for the RDF Datasets: Experiments and Evaluations , 2014, ICCS.

[35]  Yinghui Wu,et al.  Ontology-based subgraph querying , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[36]  Nathalie Pernelle,et al.  An automatic key discovery approach for data linking , 2013, J. Web Semant..