Managing and Consuming Completeness Information for Wikidata Using COOL-WD

Wikidata is a fast-growing, crowdsourced, and entity-centric KB that currently stores over 100 million facts about more than 21 million entities. Such a vast amount of data gives rise to the question: How complete is information in Wikidata? It turns out that there is no easy answer since Wikidata currently lacks a means to describe the completeness of its stored information, as in, Which entities are complete for which properties? In this paper, we discuss how to manage and consume meta-information about completeness for Wikidata. Due to the crowdsourced and entitycentric nature of Wikidata, we argue that such meta-information should be simple, yet still provide potential benefits in data consumption. We demonstrate the applicability of our approach via COOL-WD (http: //cool-wd.inf.unibz.it/), a completeness tool for Wikidata, which at the moment collects around 10,000 real completeness statements.

[1]  Jens Lehmann,et al.  User-driven quality evaluation of DBpedia , 2013, I-SEMANTICS '13.

[2]  Werner Nutt,et al.  MAGIK: managing completeness of data , 2012, CIKM '12.

[3]  Alon Y. Halevy,et al.  Obtaining Complete Answers from Incomplete Databases , 1996, VLDB.

[4]  Werner Nutt,et al.  Completeness of queries over incomplete databases , 2011, Proc. VLDB Endow..

[5]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[6]  Amihai Motro,et al.  Integrity = validity + completeness , 1989, TODS.

[7]  Paolo Papotti,et al.  KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing , 2015, SIGMOD Conference.

[8]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[9]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[10]  Heiko Paulheim,et al.  Improving the Quality of Linked Data Using Statistical Distributions , 2014, Int. J. Semantic Web Inf. Syst..

[11]  Werner Nutt,et al.  But What Do We Actually Know? , 2016, AKBC@NAACL-HLT.

[12]  Harald Sack,et al.  DBpedia ontology enrichment for inconsistency detection , 2012, I-SEMANTICS '12.

[13]  Xin Wang,et al.  Checking and Handling Inconsistency of DBpedia , 2012, WISM.

[14]  Muhammad Saleem,et al.  LSQ: The Linked SPARQL Queries Dataset , 2015, SEMWEB.

[15]  Werner Nutt,et al.  Enabling Fine-Grained RDF Data Completeness Assessment , 2016, ICWE.

[16]  Werner Nutt,et al.  CORNER: A Completeness Reasoner for SPARQL Queries Over RDF Data Sources , 2014, ESWC.

[17]  Simon Razniewski,et al.  Expanding Wikidata's Parenthood Information by 178%, or How To Mine Relation Cardinalities , 2016 .

[18]  Simon Razniewski,et al.  Predicting Completeness in Knowledge Bases , 2016, WSDM.

[19]  Maribel Acosta,et al.  HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing , 2015, K-CAP.

[20]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[21]  Werner Nutt,et al.  Completeness Statements about RDF Data Sources and Their Use for Query Answering , 2013, SEMWEB.