Towards Knowledge Graphs Validation through Weighted Knowledge Sources

The performance of applications, such as personal assistants and search engines, relies on high-quality knowledge bases, a.k.a. Knowledge Graphs (KGs). To ensure their quality one important task is knowledge validation, which measures the degree to which statements or triples of KGs are semantically correct. KGs inevitably contain incorrect and incomplete statements, which may hinder their adoption in business applications as they are not trustworthy. In this paper, we propose and implement a Validator that computes a confidence score for every triple and instance in KGs. The computed score is based on finding the same instances across different weighted knowledge sources and comparing their features. We evaluate our approach by comparing its results against a baseline validation. Our results suggest that we can validate KGs with an f-measure of at least 75%. Time-wise, the Validator, performed a validation of 2530 instances in 15 minutes approximately. Furthermore, we give insights and directions toward a better architecture to tackle KG validation.

[1]  Steffen Staab,et al.  Knowledge Graphs , 2020, ACM Computing Surveys.

[2]  Jens Lehmann,et al.  DeFacto - Deep Fact Validation , 2012, SEMWEB.

[3]  FactCheck , 2018, Proceedings of the 27th ACM International Conference on Information and Knowledge Management.

[4]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[5]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[6]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[7]  Dieter Fensel,et al.  Why Are There More Hotels in Tyrol than in Austria? Analyzing Schema.org Usage in the Hotel Domain , 2016, ENTER.

[8]  Ankur Padia,et al.  SURFACE: Semantically Rich Fact Validation with Explanations , 2018, ArXiv.

[9]  Axel-Cyrille Ngonga Ngomo,et al.  Unsupervised Discovery of Corroborative Paths for Fact Validation , 2019, SEMWEB.

[10]  Xiaojun Chen,et al.  Triple Trustworthiness Measurement for Knowledge Graph , 2018, WWW.

[11]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[12]  Tim Weninger,et al.  Discriminative predicate path mining for fact checking in knowledge graphs , 2015, Knowl. Based Syst..

[13]  Enrico Motta,et al.  Towards Linked Data Fact Validation through Measuring Consensus , 2015, LDQ@ESWC.

[14]  Andreas Vlachos,et al.  An Extensible Framework for Verification of Numerical Claims , 2017, EACL.

[15]  Filippo Menczer,et al.  Finding Streams in Knowledge Graphs to Support Fact Checking , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[16]  Katja Hose,et al.  Retrieving Textual Evidence for Knowledge Graph Facts , 2019, ESWC.

[17]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[18]  Umutcan Simsek,et al.  Domain-Specific Customization of Schema.org Based on SHACL , 2020, SEMWEB.

[19]  Yuanyuan Li,et al.  Probabilistic Error Detecting in Numerical Linked Data , 2015, DEXA.

[20]  Axel-Cyrille Ngonga Ngomo,et al.  FactCheck: Validating RDF Triples Using Textual Evidence , 2018, CIKM.

[21]  Jens Lehmann,et al.  TISCO: Temporal scoping of facts , 2019, J. Web Semant..

[22]  Dieter Fensel,et al.  Knowledge Graph Validation , 2020, ArXiv.

[23]  Jürgen Umbrich,et al.  Knowledge Graphs: Methodology, Tools and Selected Use Cases , 2020 .

[24]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[25]  Axel-Cyrille Ngonga Ngomo,et al.  Leopard - A baseline approach to attribute prediction and validation for knowledge graph population , 2019, J. Web Semant..

[26]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[27]  Chun How Tan,et al.  Trust, but verify: predicting contribution quality for knowledge base construction and curation , 2014, WSDM.

[28]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[29]  Andreas Vlachos,et al.  Fact Checking: Task definition and dataset construction , 2014, LTCSS@ACL.

[30]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[31]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[32]  Jens Lehmann,et al.  DeFacto - Temporal and multilingual Deep Fact Validation , 2015, J. Web Semant..

[33]  Gerhard Weikum,et al.  Tracy: Tracing Facts over Knowledge Graphs and Text , 2019, WWW.

[34]  Shahar Ronen,et al.  Pantheon: A Dataset for the Study of Global Cultural Production , 2015, ArXiv.

[35]  Heiko Paulheim,et al.  Detecting Incorrect Numerical Data in DBpedia , 2014, ESWC.

[36]  Xiaoyong Li,et al.  CTransE: An Effective Information Credibility Evaluation Method Based on Classified Translating Embedding in Knowledge Graphs , 2020, DEXA.

[37]  Satoshi Nakamura,et al.  Trustworthiness Analysis of Web Search Results , 2007, ECDL.