Measuring Accuracy of Triples in Knowledge Graphs

An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples.

[1]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[2]  Jens Lehmann,et al.  DeFacto - Deep Fact Validation , 2012, SEMWEB.

[3]  Enrico Motta,et al.  Towards Linked Data Fact Validation through Measuring Consensus , 2015, LDQ@ESWC.

[4]  Yuanyuan Li,et al.  Probabilistic Error Detecting in Numerical Linked Data , 2015, DEXA.

[5]  Maribel Acosta,et al.  Detecting Linked Data quality issues via crowdsourcing: A DBpedia study , 2018, Semantic Web.

[6]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[7]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[8]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[9]  Yuzhong Qu,et al.  C3D+P: A summarization method for interactive entity resolution , 2015, J. Web Semant..

[10]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[11]  Thomas Bartz-Beielstein,et al.  Experimental Methods for the Analysis of Optimization Algorithms , 2010 .

[12]  Heiko Paulheim,et al.  Improving the Quality of Linked Data Using Statistical Distributions , 2014, Int. J. Semantic Web Inf. Syst..

[13]  Heiko Paulheim,et al.  Detecting Incorrect Numerical Data in DBpedia , 2014, ESWC.

[14]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[15]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[16]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[17]  Thomas Stützle,et al.  F-Race and Iterated F-Race: An Overview , 2010, Experimental Methods for the Analysis of Optimization Algorithms.

[18]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[19]  Johanna Völker,et al.  Detecting Errors in Numerical Linked Data Using Cross-Checked Outlier Detection , 2014, SEMWEB.

[20]  Heiko Paulheim,et al.  Adoption of the Linked Data Best Practices in Different Topical Domains , 2014, SEMWEB.

[21]  Ganggao Zhu,et al.  Computing Semantic Similarity of Concepts in Knowledge Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[22]  Jens Lehmann,et al.  DeFacto - Temporal and multilingual Deep Fact Validation , 2015, J. Web Semant..

[23]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.