Knowledge Graph Curation: A Practical Framework

Knowledge Graphs (KGs) have shown to be very important for applications such as personal assistants, question-answering systems, and search engines. Therefore, it is crucial to ensure their high quality. However, KGs inevitably contain errors, duplicates, and missing values, which may hinder their adoption and utility in business applications, as they are not curated, e.g., low-quality KGs produce low-quality applications that are built on top of them. In this vision paper, we propose a practical knowledge graph curation framework for improving the quality of KGs. First, we define a set of quality metrics for assessing the status of KGs, Second, we describe the verification and validation of KGs as cleaning tasks, Third, we present duplicate detection and knowledge fusion strategies for enriching KGs. Furthermore, we give insights and directions toward a better architecture for curating KGs.

[1]  Richard Y. Wang,et al.  Data Quality Assessment , 2002 .

[2]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[3]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[4]  Axel-Cyrille Ngonga Ngomo,et al.  Unsupervised Discovery of Corroborative Paths for Fact Validation , 2019, SEMWEB.

[5]  Jens Lehmann,et al.  DeFacto - Temporal and multilingual Deep Fact Validation , 2015, J. Web Semant..

[6]  Axel-Cyrille Ngonga Ngomo,et al.  FactCheck: Validating RDF Triples Using Textual Evidence , 2018, CIKM.

[7]  Arthur G. Ryman,et al.  OSLC Resource Shape: A language for defining constraints on Linked Data , 2013, LDOW.

[8]  Jens Lehmann,et al.  TISCO: Temporal scoping of facts , 2019, J. Web Semant..

[9]  Ankur Padia,et al.  SURFACE: Semantically Rich Fact Validation with Explanations , 2018, ArXiv.

[10]  Jürgen Umbrich,et al.  Knowledge Graphs: Methodology, Tools and Selected Use Cases , 2020 .

[11]  Heiko Paulheim,et al.  Detecting Incorrect Numerical Data in DBpedia , 2014, ESWC.

[12]  Anna Fensel,et al.  Towards Knowledge Graphs Validation through Weighted Knowledge Sources , 2021, KGSWC.

[13]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[14]  Natasha Noy,et al.  Industry-scale Knowledge Graphs: Lessons and Challenges , 2019, ACM Queue.

[15]  Tim Weninger,et al.  Discriminative predicate path mining for fact checking in knowledge graphs , 2015, Knowl. Based Syst..

[16]  Dimitrios Skoutas,et al.  FAGI: A Framework for Fusing Geospatial RDF Data , 2014, OTM Conferences.

[17]  Zohra Bellahsene,et al.  Legato results for OAEI 2017 , 2017, OM@ISWC.

[18]  Xiaoyong Li,et al.  CTransE: An Effective Information Credibility Evaluation Method Based on Classified Translating Embedding in Knowledge Graphs , 2020, DEXA.

[19]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[20]  Yuanyuan Li,et al.  Probabilistic Error Detecting in Numerical Linked Data , 2015, DEXA.

[21]  Raphaël Troncy,et al.  ADEL@OKE 2017: A Generic Method for Indexing Knowledge Bases for Entity Linking , 2017, SemWebEval@ESWC.

[22]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[23]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[24]  Felix Naumann,et al.  DuDe: The Duplicate Detection Toolkit , 2010 .

[25]  Carlo Batini,et al.  Data Quality: Concepts, Methodologies and Techniques , 2006, Data-Centric Systems and Applications.

[26]  Andreas Vlachos,et al.  An Extensible Framework for Verification of Numerical Claims , 2017, EACL.

[27]  Filippo Menczer,et al.  Finding Streams in Knowledge Graphs to Support Fact Checking , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[28]  Katja Hose,et al.  Retrieving Textual Evidence for Knowledge Graph Facts , 2019, ESWC.

[29]  Dimitrios Skoutas,et al.  SLIPO: Large-Scale Data Integration for Points of Interest , 2019, EDBT.

[30]  Georg Lausen,et al.  RDF Constraint Checking , 2015, EDBT/ICDT Workshops.

[31]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[32]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[33]  Lars Marius Garshol,et al.  Hafslund Sesam - An Archive on Semantics , 2013, ESWC.

[34]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[35]  Xin Luna Dong,et al.  Building a Broad Knowledge Graph for Products , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[36]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[37]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[38]  Dieter Fensel,et al.  Knowledge Graph Lifecycle: Building and Maintaining Knowledge Graphs , 2021, KGCW@ESWC.

[39]  Guntis Barzdins,et al.  Graphical Schema Editing for Stardog OWL/RDF Databases using OWLGrEd/S , 2012, OWLED.

[40]  Mark Stevenson,et al.  Evaluating Topic Coherence Using Distributional Semantics , 2013, IWCS.

[41]  Axel-Cyrille Ngonga Ngomo,et al.  Leopard - A baseline approach to attribute prediction and validation for knowledge graph population , 2019, J. Web Semant..

[42]  Jan Hidders,et al.  SERIMI - resource description similarity, RDF instance matching and interlinking , 2011, OM.

[43]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[44]  Xiaojun Chen,et al.  Triple Trustworthiness Measurement for Knowledge Graph , 2018, WWW.

[45]  Gerhard Weikum,et al.  ExFaKT: A Framework for Explaining Facts over Knowledge Graphs and Text , 2019, WSDM.

[46]  Rob Brennan,et al.  An Intelligent Linked Data Quality Dashboard , 2019, AICS.