Medical Condition : Towards Entity-specific Rankings of Knowledge Base Properties [ Extended Version ] ?

In knowledge bases such as Wikidata, it is possible to assert a large set of properties for entities, ranging from generic ones such as name and place of birth to highly profession-specific or background-specific ones such as doctoral advisor or medical condition. Determining a preference or ranking in this large set is a challenge in tasks such as prioritisation of edits or natural-language generation. Most previous approaches to ranking knowledge base properties are purely data-driven, that is, as we show, mistake frequency for interestingness. In this work, we have developed a human-annotated dataset of 350 preference judgments among pairs of knowledge base properties for fixed entities. From this set, we isolate a subset of pairs for which humans show a high level of agreement (87.5% on average). We show, however, that baseline and state-of-the-art techniques achieve only 61.3% precision in predicting human preferences for this subset. We then analyze what contributes to one property being rated as more important than another one, and identify that at least three factors play a role, namely (i) general frequency, (ii) applicability to similar entities and (iii) semantic similarity between property and entity. We experimentally analyze the contribution of each factor and show that a combination of techniques addressing all the three factors achieves 74% precision on the task. The dataset is available at www.kaggle.com/srazniewski/wikidatapropertyranking.

[1]  Nicolas de Condorcet Essai Sur L'Application de L'Analyse a la Probabilite Des Decisions Rendues a la Pluralite Des Voix , 2009 .

[2]  Felix Naumann,et al.  Improving RDF Data Through Association Rule Mining , 2013, Datenbank-Spektrum.

[3]  Francesco Ricci,et al.  Pairwise Preferences Based Matrix Factorization and Nearest Neighbor Recommendation Techniques , 2016, RecSys.

[4]  Andrea Dessi,et al.  A machine-learning approach to ranking RDF properties , 2016, Future Gener. Comput. Syst..

[5]  Gjergji Kasneci,et al.  Assigning global relevance scores to DBpedia facts , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[6]  Susan T. Dumais,et al.  Improving information retrieval using latent semantic indexing , 1988 .

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  David Maxwell Chickering,et al.  Here or there: preference judgments for relevance , 2008 .

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Armelle Brun,et al.  Comparisons Instead of Ratings: Towards More Stable Preferences , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[11]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[12]  Kenza Kellou-Menouer,et al.  Schema Discovery in RDF Data Sources , 2015, ER.

[13]  Seung-won Hwang,et al.  Attribute extraction and scoring: A probabilistic approach , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[14]  Eva Zangerle,et al.  An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases , 2016, OpenSym.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Abhay Prakash,et al.  Did You Know? - Mining Interesting Trivia for Entities from Wikipedia , 2015, IJCAI.

[18]  Carlo Zaniolo,et al.  IBminer: A Text Mining Tool for Constructing and Populating InfoBox Databases and Knowledge Bases , 2013, Proc. VLDB Endow..

[19]  Eva Zangerle,et al.  Guided curation of semistructured data in collaboratively-built knowledge bases , 2014, Future Gener. Comput. Syst..

[20]  Manoj Kumar Chinnakotla,et al.  The Unusual Suspects: Deep Learning Based Mining of Interesting Entity Trivia from Knowledge Graphs , 2017, AAAI.

[21]  Michael Gamon,et al.  Predicting Interesting Things in Text , 2014, COLING.

[22]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[23]  David Maxwell Chickering,et al.  Here or There , 2008, ECIR.

[24]  Werner Nutt,et al.  But What Do We Actually Know? , 2016, AKBC@NAACL-HLT.

[25]  Raphaël Troncy,et al.  What Are the Important Properties of an Entity? - Comparing Users and Knowledge Graph Point of View , 2014, ESWC.

[26]  Stefan Heindorf,et al.  WSDM Cup 2017: Vandalism Detection and Triple Scoring , 2017, WSDM.

[27]  Simon Razniewski,et al.  Assessing the Completeness of Entities in Knowledge Bases , 2017, ESWC.

[28]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[29]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[30]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[31]  Hannah Bast,et al.  Relevance Scores for Triples from Type-Like Relations , 2015, SIGIR.