Wikidata Completeness Profiling Using ProWD

Completeness is a crucial data quality aspect that deals with the question: do we have all the data we need? The lack of awareness on the completeness state of a knowledge graph (KG) may result in bias or even falsity for any decisions made based on the KG. Given a KG, one may be wondering how its completeness may vary across different topics. In this paper, we present ProWD, a framework and tool for profiling the completeness of Wikidata, a central KG on the (Semantic) Web that is open and free to use. ProWD measures the degree of completeness based on the Class-Facet-Attribute (CFA) profiles. A class denotes a collection of entities, which can be of multiple facets, allowing attribute completeness to be analyzed and compared, e.g., how does the completeness of the attribute "educated at" and "date of birth" compare between male, German computer scientists, and female, Indonesian computer scientists? ProWD generates summaries and visualizations for such analysis, giving insights into the KG completeness. ProWD is available online at~\urlhttp://prowd.id.

[1]  Martin Hepp,et al.  Swiqa - a semantic web information quality assessment framework , 2011, ECIS.

[2]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[3]  Werner Nutt,et al.  Recoin: Relative Completeness in Wikidata , 2018, WWW.

[4]  Werner Nutt,et al.  Completeness Statements about RDF Data Sources and Their Use for Query Answering , 2013, SEMWEB.

[5]  Asunción Gómez-Pérez,et al.  Loupe - An Online Tool for Inspecting Datasets in the Linked Data Cloud , 2015, SEMWEB.

[6]  Christoph Lange,et al.  Luzzu -- A Framework for Linked Data Quality Assessment , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[7]  Mariano P. Consens,et al.  ExpLOD: Summary-Based Exploration of Interlinking and RDF Usage in the Linked Open Data Cloud , 2010, ESWC.

[8]  Werner Nutt,et al.  Managing and Consuming Completeness Information for Wikidata Using COOL-WD , 2016, COLD@ISWC.

[9]  Aidan Hogan,et al.  GraFa: Scalable Faceted Browsing for RDF Graphs , 2018, International Semantic Web Conference.

[10]  Werner Nutt,et al.  Completeness Management for RDF Data Sources , 2018, ACM Trans. Web.

[11]  Christoph Lange,et al.  Luzzu—A Methodology and Framework for Linked Data Quality Assessment , 2016, JDIQ.

[12]  Jens Lehmann,et al.  LODStats: The Data Web Census Dataset , 2016, SEMWEB.

[13]  Felix Naumann,et al.  Data profiling revisited , 2014, SGMD.

[14]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[15]  Felix Naumann,et al.  Profiling linked open data with ProLOD , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[16]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.