Knowledge Reconciliation with Graph Convolutional Networks: Preliminary Results

In this article, we investigate the task of identifying nodes that are identical, more general, or similar within and across knowledge graphs. This task can be seen as an extension of instance matching or entity resolution and is here named knowledge reconciliation. In particular, we explore how Graph Convolutional Networks (GCNs), previously defined in the literature, can be used for this task and evaluate their performance on a real world use case in the domain of pharmacogenomics (PGx), which studies how gene variations impact drug responses. PGx knowledge is represented in the form of n-ary relationships between one or more genomic variations, drugs, and phenotypes. In a knowledge graph named PGxLOD, such relationships are available, coming from three distinct provenances (a reference database, the biomedical literature and Electronic Health Records). We present and discuss our preliminary attempt to generate graph embeddings with GCNs and to use a simple distance between embeddings to assess the similarity between relationships. By experimenting on the 68,686 PGx relationships of PGxLOD, we found that this approach raises several research questions. For example, we discuss the use of the semantics associated with knowledge graphs within GCNs, which is of interest in the considered use case.

[1]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[2]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.

[3]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[4]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[8]  Adrien Coulet,et al.  Mining Electronic Health Records to Validate Knowledge in Pharmacogenomics , 2016, ERCIM News.

[9]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[10]  Jérôme Euzenat,et al.  Ontology Matching, Second Edition , 2013 .

[11]  Heiko Paulheim,et al.  Make Embeddings Semantic Again! , 2018, SEMWEB.

[12]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[13]  Ioana Manolescu,et al.  Web Data Management , 2011 .

[14]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[15]  Artur S. d'Avila Garcez,et al.  Learning and Reasoning with Logic Tensor Networks , 2016, AI*IA.

[16]  Amedeo Napoli,et al.  PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison , 2018, BMC Bioinformatics.