Demand-Weighted Completeness Prediction for a Knowledge Base

In this paper we introduce the notion of Demand-Weighted Completeness, allowing estimation of the completeness of a knowledge base with respect to how it is used. Defining an entity by its classes, we employ usage data to predict the distribution over relations for that entity. For example, instances of person in a knowledge base may require a birth date, name and nationality to be considered complete. These predicted relation distributions enable detection of important gaps in the knowledge base, and define the required facts for unseen entities. Such characterisation of the knowledge base can also quantify how usage and completeness change over time. We demonstrate a method to measure Demand-Weighted Completeness, and show that a simple neural network model performs well at this prediction task.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  Amihai Motro,et al.  Integrity = validity + completeness , 1989, TODS.

[3]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[4]  Fabian M. Suchanek,et al.  Watermarking for Ontologies , 2011, International Semantic Web Conference.

[5]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[6]  Werner Nutt,et al.  Identifying the Extent of Completeness of Query Answers over Partially Complete Databases , 2015, SIGMOD Conference.

[7]  Simon Razniewski,et al.  Predicting Completeness in Knowledge Bases , 2016, WSDM.

[8]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[9]  Werner Nutt,et al.  But What Do We Actually Know? , 2016, AKBC@NAACL-HLT.

[10]  Basil Ell,et al.  A Comparative Survey of DBpedia , Freebase , OpenCyc , Wikidata , and YAGO , 2015 .

[11]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[12]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[13]  William Tunstall-Pedoe,et al.  True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference , 2010, AI Mag..

[14]  Simon Razniewski,et al.  Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities , 2016 .

[15]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[16]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[17]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[18]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .