An automatic method for reporting the quality of thesauri

Abstract Thesauri are knowledge models commonly used for information classification and retrieval whose structure is defined by standards such as the ISO 25964. However, when creators do not correctly follow the specifications, they construct models with inadequate concepts or relations that provide a limited usability. This paper describes a process that automatically analyzes the thesaurus properties and relations with respect to ISO 25964 specification, and suggests the correction of potential problems. It performs a lexical and syntactic analysis of the concept labels, and a structural and semantic analyses of the relations. The process has been tested with Urbamet and Gemet thesauri and the results have been analyzed to determine how well the proposed process works.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Nicola Guarino,et al.  Sweetening WORDNET with DOLCE , 2003, AI Mag..

[3]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[4]  Michel C. A. Klein,et al.  Matching Unstructured Vocabularies Using a Background Ontology , 2006, EKAW.

[5]  Dagobert Soergel,et al.  Indexing languages and thesauri : construction and maintenance , 1974 .

[6]  Osma Suominen,et al.  Assessing and Improving the Quality of SKOS Vocabularies , 2014, Journal on Data Semantics.

[7]  Camille Maillard URBAMET: le thesaurus et les lexiques. D'Urbamet à Urbadisc , 1998 .

[8]  Kai Eckert,et al.  Usage-driven maintenance of knowledge organization systems , 2012 .

[9]  Asunción Gómez-Pérez,et al.  Validating Ontologies with OOPS! , 2012, EKAW.

[10]  Simon K. Milton,et al.  Towards Quality Measures for Evaluating Thesauri , 2010, MTSR.

[11]  Céline Van Damme,et al.  FolksOntology : An Integrated Approach for Turning Folksonomies into Ontologies , 2007 .

[12]  María Pinto A user view of the factors affecting quality of thesauri in social science databases , 2008 .

[13]  David Bawden,et al.  Thesaurus Construction and Use: A Practical Manual , 2000 .

[14]  Stefan Schulz,et al.  Quality issues in thesaurus building: a case study from the medical domain , 2012 .

[15]  Simon Spero,et al.  LCSH is to Thesaurus as Doorbell is to Mammal: Visualizing Structural Problems in the Library of Congress Subject Headings , 2008, Dublin Core Conference.

[16]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[17]  Antoine Isaac,et al.  Finding Quality Issues in SKOS Vocabularies , 2012, TPDL.

[18]  Anne Marsden,et al.  International Organization for Standardization , 2014 .

[19]  Organización Internacional de Normalización ISO 25964-1 : Information and documentation -- Thesauri and interoperability with other vocabularies -- Part 1: Thesauri for information retrieval , 2011 .

[20]  Jacques Teller,et al.  Design and evaluation of a semantic enrichment process for bibliographic databases , 2013, Data Knowl. Eng..

[21]  Harith Alani,et al.  Augmenting Thesaurus Relationships: Possibilities for Retrieval , 2001, J. Digit. Inf..

[22]  Bernhard Haslhofer,et al.  Perception and relevance of quality issues in web vocabularies , 2013, I-SEMANTICS '13.

[23]  Eero Hyvönen,et al.  Improving the Quality of SKOS Vocabularies with Skosify , 2012, EKAW.

[24]  Alireza Vazifedoost,et al.  Creating a Persian Ontology through Thesaurus Reengineering for Organizing the Digital Library of the National Library of Iran , 2007 .

[25]  Jacques Savoy Report on CLEF-2001 Experiments , 2001, CLEF.