Assessing and Improving the Quality of SKOS Vocabularies

Controlled vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.

[1]  Gilberto Fragoso,et al.  The NCI Thesaurus quality assurance life cycle , 2009, J. Biomed. Informatics.

[2]  Denny Vrandecic,et al.  Ontology Evaluation , 2009, Handbook on Ontologies.

[3]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[4]  Patricia Harpring,et al.  Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works , 2010 .

[5]  Antoine Isaac,et al.  Finding Quality Issues in SKOS Vocabularies , 2012, TPDL.

[6]  Véronique Malaisé,et al.  A Method to Convert Thesauri to SKOS , 2006, ESWC.

[7]  Aditya Kalyanpur,et al.  Debugging and Repair of OWL Ontologies , 2006 .

[8]  Eero Hyvönen,et al.  ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services , 2009, ESWC.

[9]  Dean Allemang,et al.  Chapter 2 – Semantic modeling , 2011 .

[10]  Timo Borst,et al.  How do Libraries Find their Way onto the Semantic Web , 2010 .

[11]  Bernhard Haslhofer,et al.  Quality Criteria for Controlled Web Vocabularies , 2011 .

[12]  Tassilo Pellegrini,et al.  Exploring structural differences in thesauri for SKOS-based applications , 2011, I-Semantics '11.

[13]  Wolfgang and Greenberg Jane Klas,et al.  Metadata for semantic and social applications , 2008 .

[14]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[15]  Jürgen Umbrich,et al.  An empirical survey of Linked Data conformance , 2012, J. Web Semant..

[16]  Antoine Isaac SKOS (Simple Knowledge Organization System) , 2011 .

[17]  Emily Gallup Fayen,et al.  Guidelines for the construction, format, and management of monolingual controlled vocabularies : A revision of ANSI/NISO Z39.19 for the 21st century , 2007 .

[18]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[19]  Asunción Gómez-Pérez,et al.  Validating Ontologies with OOPS! , 2012, EKAW.

[20]  David Bawden,et al.  Thesaurus Construction and Use: A Practical Manual , 2000 .

[21]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[22]  Dagobert Soergel Thesauri and ontologies in digital libraries: 1. structure and use in knowledge-based assistance to users , 2002, JCDL '02.

[23]  Heather Hedden The Accidental Taxonomist , 2010 .

[24]  Robert Stevens,et al.  Common Modelling Slips in SKOS Vocabularies , 2012, OWLED.

[25]  Bijan Parsia,et al.  Explaining Inconsistencies in OWL Ontologies , 2009, SUM.

[26]  Martin Hepp,et al.  Using Semantic Web Resources for Data Quality Management , 2010, EKAW.

[27]  Robert Stevens,et al.  The Current State of SKOS Vocabularies on the Web , 2012, ESWC.

[28]  Dagobert Soergel Thesauri and ontologies in digital libraries: 2. design, evaluation, and development , 2002, JCDL '02.

[29]  Andreas Blumauer,et al.  PoolParty: SKOS Thesaurus Management Utilizing Linked Data , 2010, ESWC.

[30]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[31]  Lisa Goddard,et al.  The Strongest Link: Libraries and Linked Data , 2010, D Lib Mag..

[32]  Bernhard Haslhofer,et al.  DSNotify: handling broken links in the web of data , 2010, WWW '10.

[33]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[34]  Simon K. Milton,et al.  Towards Quality Measures for Evaluating Thesauri , 2010, MTSR.

[35]  Eero Hyvönen,et al.  Improving the Quality of SKOS Vocabularies with Skosify , 2012, EKAW.

[36]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[37]  Dean Allemang,et al.  Semantic Web for the Working Ontologist - Effective Modeling in RDFS and OWL, Second Edition , 2011 .

[38]  Organización Internacional de Normalización ISO 25964-1 : Information and documentation -- Thesauri and interoperability with other vocabularies -- Part 1: Thesauri for information retrieval , 2011 .

[39]  Olivier Bodenreider,et al.  Approaches to Eliminating Cycles in the UMLS Metathesaurus: Naïve vs. Formal , 2005, AMIA.

[40]  Antoine Isaac,et al.  SKOS Simple Knowledge Organization System Primer , 2009 .

[41]  Antoine Isaac,et al.  LCSH, SKOS and Linked Data , 2008, Dublin Core Conference.

[42]  Martin Malmsten Making a Library Catalogue Part of the Semantic Web , 2009 .

[43]  Marcia Lei Zeng,et al.  Information and documentation - Thesauri and interoperability with other vocabularies , 2013 .

[44]  Dagobert Soergel,et al.  Thesauri and ontologies in digital libraries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[45]  Joachim Neubert,et al.  Bringing the "Thesaurus for Economics" on to the Web of Linked Data , 2009, LDOW.