Unsupervised Discovery of Corroborative Paths for Fact Validation

Any data publisher can make RDF knowledge graphs available for consumption on the Web. This is a direct consequence of the decentralized publishing paradigm underlying the Data Web, which has led to more than 150 billion facts on more than 3 billion things being published on the Web in more than 10,000 RDF knowledge graphs over the last decade. However, the success of this publishing paradigm also means that the validation of the facts contained in RDF knowledge graphs has become more important than ever before. Several families of fact validation algorithms have been developed over the last years to address several settings of the fact validation problems. In this paper, we consider the following fact validation setting: Given an RDF knowledge graph, compute the likelihood that a given (novel) fact is true. None of the current solutions to this problem exploits RDFS semantics—especially domain, range and class subsumption information. We address this research gap by presenting an unsupervised approach dubbed COPAAL, that extracts paths from knowledge graphs to corroborate (novel) input facts. Our approach relies on a mutual information measure that takes the RDFS semantics underlying the knowledge graph into consideration. In particular, we use the information shared by predicates and paths within the knowledge graph to compute the likelihood of a fact being corroborated by the knowledge graph. We evaluate our approach extensively using 17 publicly available datasets. Our results indicate that our approach outperforms the state of the art unsupervised approaches significantly by up to 0.15 AUC-ROC. We even outperform supervised approaches by up to 0.07 AUC-ROC. The source code of COPAAL is open-source and is available at https://github.com/dice-group/COPAAL.

[1]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[2]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[3]  Nicola Fanizzi,et al.  Inductive learning for the Semantic Web: What does it buy? , 2010, Semantic Web.

[4]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[5]  Filippo Menczer,et al.  Finding Streams in Knowledge Graphs to Support Fact Checking , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[6]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[7]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2008, IEEE Trans. Knowl. Data Eng..

[8]  Tim Weninger,et al.  Discriminative predicate path mining for fact checking in knowledge graphs , 2015, Knowl. Based Syst..

[9]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[10]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[11]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[12]  Axel-Cyrille Ngonga Ngomo,et al.  Enhancing Community Interactions with Data-Driven Chatbots--The DBpedia Chatbot , 2018, WWW.

[13]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[14]  Jens Lehmann,et al.  DeFacto - Temporal and multilingual Deep Fact Validation , 2015, J. Web Semant..

[15]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[16]  Axel-Cyrille Ngonga Ngomo,et al.  FactCheck: Validating RDF Triples Using Textual Evidence , 2018, CIKM.

[17]  Heiko Paulheim,et al.  Type Inference on Noisy RDF Data , 2013, SEMWEB.

[18]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[19]  Axel-Cyrille Ngonga Ngomo,et al.  GERBIL - Benchmarking Named Entity Recognition and Linking consistently , 2017, Semantic Web.

[20]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[22]  Tommy W. S. Chow,et al.  Automatic image annotation via compact graph based semi-supervised learning , 2015, Knowl. Based Syst..

[23]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[24]  Yinghui Wu,et al.  Fact Checking in Knowledge Graphs with Ontological Subgraph Patterns , 2018, Data Science and Engineering.