On Evaluating Web-Scale Extracted Knowledge Bases in a Comparative Way

In this article, the authors design two metric sets considering Richness and Correctness based on a quasi-formal conceptual representation. They also design a novel metric set on overlapped instances of different KBs to make the metric results comparable. Finally, they use random sampling techniques to reduce human efforts for assessing the correctness. The authors evaluate three large Chinese KBs including DBpedia Chinese, Zhishi.me and SSCO comparatively, and further compare them with English KBs in terms of data set qualities. They also compare different versions of DBpedia and YAGO. The findings in these KBs not only give a detailed report of the current situation of extracted KBs, but also show the effectiveness of their methods in assessing the quality of Web-Scale KBs comparatively.

[1]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.

[2]  Sören Auer,et al.  EvoPat - Pattern-Based Evolution and Refactoring of RDF Knowledge Bases , 2010, SEMWEB.

[3]  Yang Li,et al.  KBMetrics - A Multi-purpose Tool for Measuring Quality of Linked Open Data Sets , 2015, International Semantic Web Conference.

[4]  Peng Zhang,et al.  XLore: A Large-scale English-Chinese Bilingual Knowledge Graph , 2013, SEMWEB.

[5]  Jens Lehmann,et al.  Assessing Linked Data Mappings Using Network Measures , 2012, ESWC.

[6]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[7]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[8]  Heiko Paulheim,et al.  Detecting Incorrect Numerical Data in DBpedia , 2014, ESWC.

[9]  Christian Bizer,et al.  Quality-driven information filtering using the WIQA policy framework , 2009, J. Web Semant..

[10]  Fanghuai Hu,et al.  Self-Supervised Chinese Ontology Learning from Online Encyclopedias , 2014, TheScientificWorldJournal.

[11]  Declan O'Sullivan,et al.  Improving Curated Web-Data Quality with Structured Harvesting and Assessment , 2014, Int. J. Semantic Web Inf. Syst..

[12]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[13]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[14]  Heiko Paulheim,et al.  Improving the Quality of Linked Data Using Statistical Distributions , 2014, Int. J. Semantic Web Inf. Syst..

[15]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[16]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[17]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[18]  Hongyu Zhang,et al.  Measuring design complexity of semantic web ontologies , 2010, J. Syst. Softw..

[19]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[20]  Raphaël Troncy,et al.  Towards An Objective Assessment Framework for Linked Data Quality: Enriching Dataset Profiles with Quality Indicators , 2016, Int. J. Semantic Web Inf. Syst..

[21]  Elena Paslaru Bontas Simperl,et al.  Labels in the Web of Data , 2011, SEMWEB.

[22]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[23]  Jens Lehmann,et al.  TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data , 2013, KESW.

[24]  Jens Lehmann,et al.  User-driven quality evaluation of DBpedia , 2013, I-SEMANTICS '13.