论文信息 - Results of the OAEI 2007 Library Thesaurus Mapping Track

Results of the OAEI 2007 Library Thesaurus Mapping Track

The National Library of the Netherlands (KB) maintains two large collections of books: the Deposit Collection, containing all the Dutch printed publications (one million items), and the Scientific Collection, with about 1.4 million books mainly about the history, language and culture of the Netherlands. – indexed – using its own controlled vocabulary. The Scientific Collection is described using the GTT thesaurus, a huge vocabulary containing 35,194 general concepts, ranging from Wolkenkrabbers (Sky-scrapers) to Verzorging (Care). The books in the Deposit Collection are mainly indexed against the Brinkman thesaurus, which contains a large set of headings (5,221) for describing the overall subjects of books. Both thesauri have similar coverage (2,895 concepts actually have exactly the same label) but differ in granularity. For each concept in the two thesauri, the usual detailed lexical information is provided: preferred labels (each concept has exactly one of them), synonyms (961 for Brinkman, 14,607 for GTT), extra hidden labels (134 for Brinkman, a couple of thousands for GTT) or scope notes (6,236 for GTT, 192 for Brinkman). The language of both thesauri is Dutch, which makes this track ideal for testing alignment in a non-English situation. The two thesauri also provide structural information for their concepts, in the form of broader and related links. However, GTT contains only 15,746 hierarchical broader links between 35,194 concepts and 6,980 associative related links. Within the Brinkman thesaurus, there are 4,572 hierarchical links and 1,855 associative ones. On average, one can expect at most one parent per concept, for an average depth of 1 and 2, respectively. The structural information found in the case is very poor. For the purpose of the OAEI campaign, the two thesauri were made available in SKOS format. OWL versions were also provided, according to the – lossy – conversion rules detailed on the track page.

[1] Enrico Motta,et al. DSSim - Managing Uncertainty on the Semantic Web , 2007, OM.

[2] Claus Zinn,et al. The value of usage scenarios for thesaurus alignment in cultural heritage context , 2007 .

[3] Stefan Schlobach,et al. Multi-concept Alignment and Evaluation , 2007, OM.

[4] Heiner Stuckenschmidt,et al. Results of the Ontology Alignment Evaluation Initiative , 2007 .

[5] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[6] Ron Artstein,et al. Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[7] Roelant Ossewaarde. Simple Library Thesaurus Alignment with SILAS , 2007, OM.

[8] Yuzhong Qu,et al. Falcon-AO: Results for OAEI 2007 , 2007, OM.

[9] Heiner Stuckenschmidt,et al. Results of the Ontology Alignment Evaluation Initiative 2007 , 2006, OM.

[10] Stefan Schlobach,et al. An Empirical Study of Instance-Based Ontology Matching , 2007, ISWC/ASWC.

[11] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[12] Willem Robert van Hage,et al. Sample Evaluation of Ontology-Matching Systems , 2007, EON.